开发者

Python lxml iterfind w/ namespace but prefix=None

开发者 https://www.devze.com 2023-03-14 21:17 出处:网络
I want to perform iterfind() for elements which have a namespace but开发者_C百科 no prefix.I\'d like to call

I want to perform iterfind() for elements which have a namespace but开发者_C百科 no prefix. I'd like to call

iterfind([tagname]) or iterfind([tagname], [namespace dict])

I don't care to enter the tag as follows every time:

"{%s}tagname" % tree.nsmap[None]

Details

I'm running through an xml response from a Google API. The root node defines several namespaces, including one for which there is no prefix: xmlns="http://www.w3.org/2005/Atom"

It looks as though when I try to search through my etree, everything behaves as I would expect for elements with a prefix. e.g.:

>>> for x in root.iterfind('dxp:segment'): print x
...
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211b98>
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211d78>
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211a08>
>>>

But when I try to search for something without a prefix, the search doesn't automatically add the namespace for root.nsmap[None]. e.g.:

>>> for x in root.iterfind('entry'): print x
...
>>>

Even if I try to throw the namespace map in as the optional argument for iterfind, It won't attach the namespace.


Try this:

for x in root.iterfind('{http://www.w3.org/2005/Atom}entry'):
    print x

For more information: read the docs: http://lxml.de/tutorial.html#namespaces

If you do not want to type that, and you want to provide a namespace map, you always have to use a prefix, like this for example:

nsmap = {'atom': 'http://www.w3.org/2005/Atom'}
for x in root.iterfind('atom:entry', namespaces=nsmap):
    print x

(same thing goes if you want to use xpath)

What prefix is used in the document, if any, is not important, it's about you specifying the fully qualified name of the element, either writing it out complete with URI using the curly bracket notation, or using a prefix that is mapped to a URI.


I found that you can simply add an empty string that maps to the default namespace (verified in Python 3.9):

nsmap = {'': 'http://www.w3.org/2005/Atom'}
for x in root.iterfind('entry', namespaces=nsmap):
    print(x)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号