Extracting value in Beautifulsoup_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-27 00:30 出处：网络

I have the following code: f = open(path, \'r\') html = f.read() # no parameters => reads to eof and returns string

相关专题：python

I have the following code:

f = open(path, 'r')
html = f.read() # no parameters => reads to eof and returns string

soup = BeautifulSoup(html)
schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolPro开发者_Python百科fileUserControl_SchoolHeaderLabel'})
print schoolname

which gives:

[<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">A B Paterson College, Arundel, QLD</span>]

when I try and access the value (i.e. 'A B Paterson College, Arundel, QLD) by using schoolname['value'] I get the following error:

print schoolname['value'] TypeError: list indices must be integers, not str

What am I doing wrong to get that value?

You can use contents to move down the tree:

>>> for x in schoolname:
>>>    print x.contents
[u'A B Paterson College, Arundel, QLD']

Note that the contents doesn't necessarily have to be a string - in general it could also be more tags or a mixture of string and tags.

findAll returns a list of strings, which is why you get an exception. I'm pretty sure your problem is solved simply by using find instead of findAll. Then you should be able to access the value you want with: