开发者

Regular Expression for urls for images and links

开发者 https://www.devze.com 2023-02-19 04:46 出处:网络
EDIT: I\'m not parsing html like the 5 billion other questions that have been posted. This is raw unformatted text that I want to convert into some HTML.

EDIT: I'm not parsing html like the 5 billion other questions that have been posted. This is raw unformatted text that I want to convert into some HTML.

I'm working on a post processing. I need to convert Urls with image endings (jpe?g|png|gif) into image tags, and all other Urls into href links. I have my image replacement correct, however I'm stuck keeping the link replacement from trying to overwrite one another.

I need help with the expression within how to get it to looked for urls without the tags in place from the image replace, or look for urls that do not end in dot jpe?g|png|gif.

public function smartConvertPost($post) {

    /**
     * Match image based urls
     */
    $pattern = '!http://([a-z0-9\-\.\/\_]+\.(?:jpe?g|png|gif))!Ui';
    $replace='<p><img src="http://$1"></p>';
    $postImages = preg_replace($pattern,$replace,$post);

    /**
     * Match url based
     */
    $pattern='/http://([a-z0-9\-\.\/\_]+(?:\S|$))/i';
    $replace='<a href="$1">$1</a>';
    $postUrl = preg_replace($pattern,$replace, $postImages);

return $postUrl;
}

Please note I am not talking about matching tags or html. matching a string like so and converting it to html.

If this was an example post with a Url to a page like http://www.some-website.com/some-page/anything.html and I also put a url to an image http://www.some-website.com开发者_StackOverflow/someimage.jpg you would need to regex the two to be a hyperlink and an image. 

Thanks,


Brad Christie's preg_replace_callback() recommendation is a good one. Here is one possible implementation:

function smartConvertPost($post)
{ // Disclaimer: This "URL plucking" regex is far from ideal.
    $pattern = '!http://[a-z0-9\-._~\!$&\'()*+,;=:/?#[\]@%]+!i';
    $replace='_handle_URL_callback';
    return preg_replace_callback($pattern,$replace, $post);
}

function _handle_URL_callback($matches)
{ // preg_replace_callback() is passed one parameter: $matches.
    if (preg_match('/\.(?:jpe?g|png|gif)(?:$|[?#])/', $matches[0]))
    { // This is an image if path ends in .GIF, .PNG, .JPG or .JPEG.
        return '<p><img src="'. $matches[0] .'"></p>';
    } // Otherwise handle as NOT an image.
    return '<a href="'. $matches[0] .'">'. $matches[0] .'</a>';
}

Note that the regex used to pluck out a URL is not ideal. To do it right is tricky. See the following resources:

  • The Problem With URLs by Jeff Atwood.
  • An Improved Liberal, Accurate Regex Pattern for Matching URLs by John Gruber.
  • URL Linkification (HTTP/FTP). by yours truly.

Edit: Added ability to recognize image URLs having a query or fragment.


Since it's the 215247th post on that kind of topic, let's say it again : HTML is too complicated to use regex. Use a parser. See this. Regular expression for parsing links from a webpage?

PS: no offense =).

Edit:

I personnaly often user symfony, and there's a really great parser for what you need : http://fabien.potencier.org/article/42/parsing-xml-documents-with-css-selectors

You can get all images using simple css expression on your html. Give it a try.


What about using a marker ?


public function smartConvertPost($post) {
    $MY_MARKER="<MYMARKER>"; // Define the marker here

    /**
     * Match image based urls
     */
    $pattern = '!http://([a-z0-9\-\.\/\_]+\.(?:jpe?g|png|gif))!Ui';
    $replace='<p><img src="$MY_MARKERhttp://$1$MY_MARKER"></p>'; // Use it here...
    $postImages = preg_replace($pattern,$replace,$post);

    /**
     * Match url based
     */
    $pattern='/(?<!$MY_MARKER)http://([a-z0-9\-\.\/\_]+(?:\S|$))(?!$MY_MARKER)/i';//...here
    $replace='<a href="$1">$1</a>';
    $postUrl = preg_replace($pattern,$replace, $postImages);


    /**
     * Remove all markers
     */
    $postUrl = str_replace( $MY_MARKER, '', $postUrl);

    return $postUrl;
}

Try to choose a marker that will have no chance to aapear in the post. HTH

0

精彩评论

暂无评论...
验证码 换一张
取 消