Context Navigation

Changeset 60665

Timestamp:

08/26/2025 06:21:00 PM (3 months ago)

Author:

nerrad

Message:

HTML API: Reliably parse HTML in get_url_in_content()

As part of a larger effort in #63694, this utlizes WP_HTML_Tag_Processor instead of regex to parse the string passed into get_url_in_content.

As a benefit this also decodes the URL whereas the previous code didn’t, so strings like http:// will be properly decoded as http://.

Props dmsnell, jonsurrell, nerrad.
Fixes #63694.

File:

-                      r60630
+                      r60665
  * @since 3.6.0
+ *
  * @param string $content A string which might contain a URL.
  * @return string|false The found URL.
+ * @param string $content A string which might contain an `A` element with a non-empty `href` attribute.
+ * @return string|false Database-escaped URL via {@see esc_url()} if found, otherwise `false`.
  */
 function get_url_in_content( $content ) {
 …
+    }
+    if ( preg_match( '/<a\s[^>]*?href=([\'"])(.+?)\1/is', $content, $matches ) ) {
+        return sanitize_url( $matches[2] );
+    $processor = new WP_HTML_Tag_Processor( $content );
+    while ( $processor->next_tag( 'A' ) ) {
+        $href = $processor->get_attribute( 'href' );
+        if ( is_string( $href ) && ! empty( $href ) ) {
+            return sanitize_url( $href );
+        }
+    }

Note: See TracChangeset for help on using the changeset viewer.