I'm currently downloading website via an ActionScript HTMLLoader to later have access to the DOM to get some information out of the page.
The problem is: each resource that is linked on the page (images, stylesheets, javascript) is also loaded which takes some additional time. I don't really need th开发者_运维百科ose resources, because only the plain HTML/DOM is interesting.
Is there any way to disable loading of linked resources? At first I tried using an URLLoader and parse the result as XML, but when the website isn't valid this doesn't work. I also didn't find a library that validates/parses a given HTML-string into valid XML.
I'm using Adobe AIR on desktop.
Perhaps convoluted, but you could load the file with URLLoader, convert it to a string, use regex to remove links to the external resources you don't want, and then load the result into the HTMLLoader.
精彩评论