开发者

How to search in not stricted HTML with java?

开发者 https://www.devze.com 2023-04-06 16:55 出处:网络
I have a service that connects to remote site and searches for some elements in the HTML, the incomming data is abount 100-200kbytes but parsing it with strings is sooooooooo slow. I want some suggest

I have a service that connects to remote site and searches for some elements in the HTML, the incomming data is abount 100-200kbytes but parsing it with strings is sooooooooo slow. I want some suggestions开发者_JAVA技巧 for fast framework... so any one???


1) If you can afford about 1Mb memory usage to parse the html into DOM tree you can use tolerant html parsers (NekoHTML, for example).

2) Otherwise extract the data using regular expressions. This will be faster, less memory required. But you'll have to come up with some good expressions and you won't be able to extract some sophisticated structure information.


you can give a try to Tagsoup

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号