开发者

python scraping by getting urls dynamic way

开发者 https://www.devze.com 2023-02-28 05:58 出处：网络

I am new to the world of data scraping,previously used python for web and desktop app dev开发者_C百科elopment.

相关专题：python web-crawler web-scraping

I am new to the world of data scraping,previously used python for web and desktop app dev开发者_C百科elopment. I am just wondering,if there is any way to get the urls from a page then look into it for specific information like,phone no,address etc.

Currently I am using BeautifulSoup and built method where I am telling the urls as a parameter of the methods.

The site I am scraping large and its really tough to pass the specific url for each page.

Any suggestion to make it faster and self driven?

Thanks in advance.

You can use Scrapy. It simplifies both crawling and parsing (it uses libxml2 for parsing by default).

Use a more efficient HTML parser, like lxml. See here for performance comparisons of various Python parsers.