开发者

Python xml etree DTD from a StringIO source?

开发者 https://www.devze.com 2023-01-18 12:34 出处:网络
I\'m adapting the following code (created via advice in this question), that took an XML file and it\'s DTD and converted them to a different format. For this problem only the loading section is impor

I'm adapting the following code (created via advice in this question), that took an XML file and it's DTD and converted them to a different format. For this problem only the loading section is important:

xmldoc = open(filename)

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)    
tree = etree.parse(xmldoc, parser)

This worked fine, whilst using the file system, but I'm converting it to run via a web framework, where the two files are loaded via a form.

Loading the xml file works fine:

tree = etree.parse(StringIO(data['xml_file']) 

But as the DTD is linked to in the top of the xml file, the following statement fails:

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(StringIO(data['xml_file'], parser)

Via this question, I tried:

etree.DTD(StringIO(data['dtd_file'])
tree = etree.parse(StringIO(data['xml_file'])

Whilst the first line doesn't cause an error, the second falls over on unicode entities the DTD is meant to pick up (and does so in the file system version):

XMLSyntaxError: En开发者_运维问答tity 'eacute' not defined, line 4495, column 46

How do I go about correctly loading this DTD?


Here's a short but complete example, using the custom resolver technique @Steven mentioned.

from StringIO import StringIO
from lxml import etree

data = dict(
    xml_file = '''<?xml version="1.0"?>
<!DOCTYPE x SYSTEM "a.dtd">
<x><y>&eacute;zz</y></x>
''',
    dtd_file = '''<!ENTITY eacute "&#233;">
<!ELEMENT x (y)>
<!ELEMENT y (#PCDATA)>
''')

class DTDResolver(etree.Resolver):
     def resolve(self, url, id, context):
         return self.resolve_string(data['dtd_file'], context)

xmldoc = StringIO(data['xml_file'])
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
parser.resolvers.add(DTDResolver())
try:
    tree = etree.parse(xmldoc, parser)
except etree.XMLSyntaxError as e:
    # handle xml and validation errors


You could probably use a custom resolver. The docs actually give an example of doing this to provide a dtd.

0

精彩评论

暂无评论...
验证码 换一张
取 消