开发者

Retrieving a web page including embedded objects

开发者 https://www.devze.com 2022-12-27 13:41 出处:网络
I\'d like to fetch a web page including images, flash animations and other embed开发者_开发知识库ded objects. What\'s a straightforward way of achieving this?Writing a web-crawler in the java programm

I'd like to fetch a web page including images, flash animations and other embed开发者_开发知识库ded objects. What's a straightforward way of achieving this?


Writing a web-crawler in the java programming language. http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/


Use an open source HTML Parser such as HTMLCleaner - http://java-source.net/open-source/html-parsers/htmlcleaner or CyberNekoHtml - http://java-source.net/open-source/html-parsers/nekohtml.

Once you have used a parser to create a representation of the DOM of the web page, you can then load/download images and other embedded objects that exist in the DOM by performing queries on the DOM and extracting relevant src attributes from the HTML elements.


try web-harvest

0

精彩评论

暂无评论...
验证码 换一张
取 消