So I have the following regular expressi开发者_运维知识库on:
https?://(www\.)?flickr\.com/photos/(.+)/?
To match against the following URL:
http://www.flickr.com/photos/username/
How can I stop the final forward slash (/
) from being included in the username sub-pattern (.+)
?
I have tried:
https?://(www\.)?flickr\.com/photos/(.+?)/?
But then it only matches the first letter of the username.
https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
I added ?:
to the first group so it's not capturing, then used [^/]
instead of the dot in the last match. This assures you that everything between "photos/" and the very next "/" is captured.
If you need to capture the first www
just use this:
https?://(www\.)?flickr\.com/photos/([^/]+)/?
You need to make sure it doesn't match the forward slash:
https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
You could also make the regex lazy (which is what I guess you were doing with the (.+?)
syntax), but the above will work just fine
Change (.+)
to ([^/]+)
. This will match until it encounters a /
, so you might want to throw some other stuff in the class too.
There are generally two ways to do this:
Append a question mark, to make the matching non-greedy. .*
will match as much as possible, .*?
will match as little as possible.
Exclude the character you want to match next. If you want to stop on /
, use [^/]*
.
If you know there will be a trailing slash, take out the final ?
.
精彩评论