开发者

Find href attribute values that do not contain “javascript:”

开发者 https://www.devze.com 2022-12-13 20:05 出处:网络
I have a RegEx which nicely finds the href\'s in a URL: <[aA][^>]*? href=[\\\"\'](?<url>[^\\\"]+?)[\\\"\'][^>]*?>

I have a RegEx which nicely finds the href's in a URL:

<[aA][^>]*? href=[\"'](?<url>[^\"]+?)[\"'][^>]*?>

However, I want it to NOT find any href that contains the text, 'javascript:' in it.

The reason is开发者_JAVA技巧 that I sometimes need to mod the href and sometimes don't. When there is a 'javascript:' text in the href I want it not to be found by the regex.

(ASP.NET, C#)


I really wouldn't recommend using a regexp for this, since HTML isn't regular and there are no end of edge cases to cater for. If at all possible, please use an HTML parser. I think you'll find it a lot less grief.


A word javascript can be written in other ways. Look at ha.ckers.org article. Simple excluding javascript word dot't provide you safety at all.

0

精彩评论

暂无评论...
验证码 换一张
取 消