I am trying to make a regex to identify relative src paths using PHP. To do this my idea was to use a look ahead (?= then not ^ and a subexpression (http) but this doesn't work. It works for a single charater but the ^ doesn't work with a subexpression. Is there an && operator or something?
<img.*?src=[\'\"]\(?=^(http))
I need it to take the entire http or else imgs with sta开发者_开发问答rting with h, t or p will be prejudiced against. Any suggestions? Is this task too big for regex?
You can use negative lookahead, which is (?!...)
instead of (?=...)
. For your example (I'd put the anchor at the start):
^(?!http)
Which reads: start of string, then something which is not "http".
Edit: since you updated with a fuller example:
<img [^>]*src=['"](?!http)([^'"]+)['"]
^------^ - this capturing group captures the link
which doesn't start with http
Of course, for proper parsing you should use DOM ;)
It's not the most useful answer, but it sounds as though you've reached the limit of applicabiliy for Regex in HTML parsing.
As per this answer here look at using a HTML DOM Parser. I haevn't used PHP DOM Parser's much, but I know in other languages, a DOM parser often makes HTML tasks a 30 second job, rather than an hour or more of weird exceptional case testing.
精彩评论