I am new to the world of data scraping,previously used python for web and desktop app dev开发者_C百科elopment. I am just wondering,if there is any way to get the urls from a page then look into it for specific information like,phone no,address etc.
Currently I am using BeautifulSoup and built method where I am telling the urls as a parameter of the methods.
The site I am scraping large and its really tough to pass the specific url for each page.
Any suggestion to make it faster and self driven?
Thanks in advance.
You can use Scrapy. It simplifies both crawling and parsing (it uses libxml2
for parsing by default).
Use a more efficient HTML parser, like lxml. See here for performance comparisons of various Python parsers.
精彩评论