Need to capture content of root <pubDate>
element, but in document it 开发者_如何转开发can be either within <item>
element or within <channel>
element. Also <item>
is child of <channel>
I'll bring example
<channel>
...
<pubDate>10/2/2010</pubDate>
...
<item>
...
<pubDate>13/2/2029</pubDate>
...
</item>
...
</channel>
need to capture 10/2/2010
With the <item>
no problem, can capture it, along with its <pubDate>
.
Regexp is not a good tool to deal with programming language that are parsed with context-free grammars. Try to use XML DOM to do the job.
I don't know JavaScript, so I can't help you with the DOM. I agree 100% that it's a bad idea to try and parse XML with regex. There might be a quick, very dirty, and very brittle workaround, though:
If indentation is consistent throughout the file, and <channel>
elements are always at the same level of indentation, you could use that fact as a guide for the regex. In your example /^ {2}<pubDate>([^<]*)<\/pubdate>/m
(= two spaces after start-of-line) might just work.
Use this at your own risk. Here be dragons etc.
Check out jQuery and see if this helps reading/parsing the XML: http://think2loud.com/reading-xml-with-jquery/
KM
精彩评论