开发者

Converting a Doc object into a string in python

开发者 https://www.devze.com 2022-12-25 00:08 出处:网络
I\'m using minidom to parse through an xml document. I took the data with yum tags and stored them in a list and calculated the frequency of the words. However, its not storing or reading them as stri

I'm using minidom to parse through an xml document. I took the data with yum tags and stored them in a list and calculated the frequency of the words. However, its not storing or reading them as strings in the list. Is there another way to do it? Right now this is what I have:

yumNodes = [node for node in doc.getElementsByTagName("yum")]

for node in yumNodes:
  开发者_StackOverflow  yumlist.append(t.data for t in node.childNodes if t.nodeType == t.TEXT_NODE)

for ob in yumlist:
    for o in ob:
        if word not in freqDict:
            freqDict[word] = 1
        else:
            freqDict[word] += 1


Not directly related to your question, but as a remark that could improve your code...the pattern

freqDict = {}
...
if word not in freqDict:
    freqDict[word] = 1
else:
    freqDict[word] += 1

is usually replaced with

import collections
freqDict = collections.defaultdict(int)
...
freqDict[word] += 1

or pre-2.5

freqDict = {}
...
freqDict.setdefault(word, 0) += 1


Replace

yumlist.append(t.data for t in node.childNodes if t.nodeType == t.TEXT_NODE)

with the following:

yumlist.append(t.nodeValue for t in node.childNodes if t.nodeType == 3)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号