开发者

Rendering an html page and saving it using command line

开发者 https://www.devze.com 2023-03-28 22:11 出处:网络
I would like to load a web page and save it using command line ( want to get a similar behavior that we get for save page as for a complete page in firefox or chrome.)

I would like to load a web page and save it using command line ( want to get a similar behavior that we get for save page as for a complete page in firefox or chrome.)

I tried using wget and httrack, they give me the html files co开发者_Go百科rrectly. But in case of a malformed html the browser corrects it while rendering and using save as over there we get the corrected html but this doesnot happen in case of wget or htttrack.

Is there any tool that would render the page and save the page along with all the images and flash and all other stuff in local.


I couldn't find anything else so finally ended up opening the page in firefox and click on the save as button and saving it.. Wrote a script for it using firefox and xdotools to automate the whole task.

Thanks for all the help and views friends.


When I want to save pages for offline use, I use a Firefox plugin called "Scrapbook". That, of course, does not allow for your command line requirement. But if you use a tool like 'htmlunit' or something like that, you can drive the Firefox browser to go to the page you want to save.


I felt the need for something similar today (and went the xdotool path). You can find my version (a reusable bash script) at: https://github.com/abiyani/automate-save-page-as


You could use curl or wget in combination with tidyhtml, i.e.

    curl http://stackoverflow.com > page.html
    tidy page.html > page_clean.html

Tidy should be able to convert any invalid HTML markup to valid XTML.


There is some sophisticated software available that does exactly that: https://launchpad.net/shotfactory

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号