Possible Duplicate:
If you're not supposed to use Regular Expressions to parse HTML, then how are HTML parsers written?
My question is simple: How do current DOM parsers actually parse the DOM from a string (XML, HTML, or otherwise)?
I know you shouldn't parse html with RegEx, but couldn't a DOM parser use RegEx to match patterns for open/close tags? Or, is there a good once-over algorithm for parsing the provided string a开发者_如何学Gos a character array?
Look at this:
- How do HTML parses work if they're not using regexp? 
- Parsing HTML documents: 
![How is the DOM parsed? [duplicate] How is the DOM parsed? [duplicate]](https://i.stack.imgur.com/CjyKU.png)
Here is a good Example
Well, you could start with a basic approach along the lines of:
http://www.blackbeltcoder.com/Articles/strings/parsing-html-tags-in-c
And then just expand it to store everything into the full DOM tree structure.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论