c# html agility pack_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-08 10:57 出处：网络

We are moving an e-commerce website to a new platform and because all of their pages are static html and they do not have all their product information in a database, we must scrape their current webs

Here is one of the pages: http://www.cabinplace.com/accrugsbathblackbear.htm

What is the best was to get the description into a string? Should I use html agility pack? and if so how would this be done? as I am new to html agility 开发者_高级运维pack and xhtml in general.

Thanks

The HTML Agility Pack is a good library to use for this kind of work.

You did not indicate if all of the content is structured this way nor if you have already gotten the kind of fragment you posted from the HTML files, so it is difficult to advise further.

In general, if all pages are structured similarly, I would use an XPath expression to extract the paragraph and pick the innerHtml or innerText from each page.

Something like the following:

var description = htmlDoc.SelectNodes("p[@class='content_txt']")[0].innerText;

Also,

If you need a good tool for testing or finding the Xpath for the HAP you can use this one: HTML-Agility-xpath-finder. It is made using the same library so if you find a xpath in this tool you be securely able to use in your code.

c# html agility pack

精彩评论

关注公众号

热门标签

图文推荐

c# html agility pack

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：