开发者

Why is BeautifulSoup modifying my self-closing elements?

开发者 https://www.devze.com 2022-12-09 01:04 出处:网络
This is the script I have: import BeautifulSoup if __name__ == \"__main__\": data = \"\"\" <root> <obj id=\"3\"/>

This is the script I have:

import BeautifulSoup

if __name__ == "__main__":
    data = """
    <root>
        <obj id="3"/>
        <obj id="5"/>
        <obj id="3"/>
    </root>
    """
    soup = BeautifulSoup.BeautifulS开发者_运维技巧toneSoup(data)
    print soup

When ran, this prints:

<root>
  <obj id="3"></obj>
  <obj id="5"></obj>
  <obj id="3"></obj>
</root>

I'd like it to keep the same structure. How can I do that?


From the Beautiful Soup documentation:

The most common shortcoming of BeautifulStoneSoup is that it doesn't know about self-closing tags. HTML has a fixed set of self-closing tags, but with XML it depends on what the DTD says. You can tell BeautifulStoneSoup that certain tags are self-closing by passing in their names as the selfClosingTags argument to the constructor

0

精彩评论

暂无评论...
验证码 换一张
取 消