开发者

Extract all URL from entire WebSite

开发者 https://www.devze.com 2023-02-22 15:53 出处:网络
I want to crawl a website using C# or VB.NET. I\'d like the crawler to extract the URL from the webpage and I\'d also like the crawler to follow U开发者_如何学编程RLs so I am able to extract all the U

I want to crawl a website using C# or VB.NET. I'd like the crawler to extract the URL from the webpage and I'd also like the crawler to follow U开发者_如何学编程RLs so I am able to extract all the URLs from the website.

How can I write this?


What is a website in this case?

A local virtual directory? A static web page? Dynamic pages hosted somewhere?

Look at

wget --mirror

Curl could have options here, too.

Also, please read up about robots.txt before you start scraping the net :)

0

精彩评论

暂无评论...
验证码 换一张
取 消