So I have a html page that has a form, and a table inside the form that has rows of products.
I got to the point now where I am looping through the table rows, and in each loop I grab all the table cells.
for tr in t.findAll('tr'):
td = tr.findAll('td')
Now I want to grab the image src url from the first td.
Html looks like:
<tr>
<td ...>
<a href ... >
<img ... src="asdf/asdf.jpg" .. >
</a>
</td>
...
</tr>
How would I go about doing this? I keep thinking in terms of regex.
I tried:
td[0].a.image.src开发者_运维百科
but that didn't work as it says no attribute 'src'.
Use
td[0].a.img['src']
I imagine your use of image
for img
in the question was just a transcription error, but the important point is that, in BeautifulSoup, in order to access a tag's HTML attributes you use indexing notation (like the ['src']
in my code snippet above), not dot-syntax -- the dot-syntax notation actually proceeds down the tree instead (just as it's doing above for the two dots, one each just before a
and img
).
精彩评论