开发者

Regular Expression to check String is valid XHTML or not [duplicate]

开发者 https://www.devze.com 2023-04-08 20:50 出处:网络
This question already has answers here: Closed 11 years ago. Possible Duplicate: regular expression to check if string is valid XML
This question already has answers here: Closed 11 years ago.

Possible Duplicate:

regular expression to check if string is valid XML

I am looking Regular Expression to check String is Valid XHTML or not

example

<h2>Legal HTML Entity References</h2><table align开发者_StackOverflow="center" border="0" ><tr></tr></table>


This sounds like a bad idea: The language of valid XHTML strings is not regular.

Use an HTML parsing library instead. A few examples:

  • JTidy
  • TagSoup
  • HTMLParser

Related question:

  • When should I not use regular expressions?


Regex is exactly the wrong tool to use.

HTML is not a regular language and hence cannot be parsed by regular expressions.

See Jeff's post on the subject here: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

Since you've tagged this post Java, you should look at using one of the myriad of HTML parsing libraries available.


Have a look here why parsing HTML using regular expressions won't work reliably: RegEx match open tags except XHTML self-contained tags

XHTML is just another flavor/superset of HTML, so you're better of using a real validator, like JTidy etc.


Try to check it with a parser. Don't do it the Cthulhu Way.

Here you can find a strating point and some examples on how to do it: The Java XML Validation API

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号