Regex Lookaheads_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-29 04:49 出处：网络

Need to capture content of root <pubDate> element, but in document it 开发者_如何转开发can be either within <item> element or within <channel> element. Also <item> is child of

Need to capture content of root <pubDate> element, but in document it 开发者_如何转开发can be either within <item> element or within <channel> element. Also <item> is child of <channel> I'll bring example

<channel>
  ...
  <pubDate>10/2/2010</pubDate>
  ...
  <item>
    ...
    <pubDate>13/2/2029</pubDate>
    ...
  </item>
  ...
</channel>

need to capture 10/2/2010

With the <item> no problem, can capture it, along with its <pubDate>.

Regexp is not a good tool to deal with programming language that are parsed with context-free grammars. Try to use XML DOM to do the job.

I don't know JavaScript, so I can't help you with the DOM. I agree 100% that it's a bad idea to try and parse XML with regex. There might be a quick, very dirty, and very brittle workaround, though:

If indentation is consistent throughout the file, and <channel> elements are always at the same level of indentation, you could use that fact as a guide for the regex. In your example /^ {2}<pubDate>([^<]*)<\/pubdate>/m (= two spaces after start-of-line) might just work.

Use this at your own risk. Here be dragons etc.

Check out jQuery and see if this helps reading/parsing the XML: http://think2loud.com/reading-xml-with-jquery/