开发者

How do HtmlAgilityPack extract text from html node whose class attribute appended dynamically

开发者 https://www.devze.com 2023-03-08 18:56 出处:网络
Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn. <div class=\"content\"><ul>

Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn.

<div class="content"><ul>
<li><b>用户评分:</b>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
<a>
  <span class="swSprite s_star_3_5 " title="平均3.6 星">
  <span>平均3.6 星</span>
  </span>
</a>

My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.Document开发者_如何学GoNode.SelectSingleNode(" //span[@class='swSprite']").InnerText or //span[@class='swSprite s_star_3_5 '], but the result is an error or not what my want !

Any suggestions?


First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.

I'm suggesting this because I tested //span[@class='swSprite s_star_3_5 '] and worked correctly.

That was the issue in the following questions:

  • Selecting nodes that have an attribute with spaces using HTMLAgilityPack
  • XPath Query Problem using HTML Agility Pack

If that doesn't help, post the HTML code and I'll help you ;)


This works for me:

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(@class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());

and outputs

平均3.6 星

Note I use the XPATH starts-with function.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号