开发者

How can I index HTML documents?

开发者 https://www.devze.com 2022-12-14 01:27 出处:网络
I am using Lucene .NEt to do full-t开发者_开发知识库ext searching.Till now I have been indexing PDF docs, but now I have a few webpages that I need to index.What\'s the best/easiest way to index HTML

I am using Lucene .NEt to do full-t开发者_开发知识库ext searching. Till now I have been indexing PDF docs, but now I have a few webpages that I need to index. What's the best/easiest way to index HTML documents to add to my Lucene index? I am using .NET/C#


I am currently working on this problem, the best answer I have found to date is using the HTML Agility Pack to get the plain text content out of the HTML.


Google can index your content for you.

0

精彩评论

暂无评论...
验证码 换一张
取 消