开发者

Extracting value in Beautifulsoup

开发者 https://www.devze.com 2022-12-27 00:30 出处:网络
I have the following code: f = open(path, \'r\') html = f.read() # no parameters => reads to eof and returns string

I have the following code:

f = open(path, 'r')
html = f.read() # no parameters => reads to eof and returns string

soup = BeautifulSoup(html)
schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolPro开发者_Python百科fileUserControl_SchoolHeaderLabel'})
print schoolname

which gives:

[<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">A B Paterson College, Arundel, QLD</span>]

when I try and access the value (i.e. 'A B Paterson College, Arundel, QLD) by using schoolname['value'] I get the following error:

print schoolname['value'] TypeError: list indices must be integers, not str

What am I doing wrong to get that value?


You can use contents to move down the tree:

>>> for x in schoolname:
>>>    print x.contents
[u'A B Paterson College, Arundel, QLD']    

Note that the contents doesn't necessarily have to be a string - in general it could also be more tags or a mixture of string and tags.


findAll returns a list of strings, which is why you get an exception. I'm pretty sure your problem is solved simply by using find instead of findAll. Then you should be able to access the value you want with:

schoolname['value']

Obviously this only 'works' if you only need one specific value.

0

精彩评论

暂无评论...
验证码 换一张
取 消