开发者

HtmlAgility problem

开发者 https://www.devze.com 2023-03-26 18:56 出处:网络
i am trying to extract some data between divs. <div class=\"movie_general\"><div class=\"img\"><a href=\"/Movies.html\" title=\"Watch Movie\">

i am trying to extract some data between divs.

<div class="movie_general"><div class="img"><a href="/Movies.html" title="Watch Movie">

Fore example if i want the link "/Movies.html" i used:

string hrefValue = doc.DocumentNode
            .Descendants("div")开发者_运维知识库
            .Where(x => x.Attributes["class"].Value == "movie_general")
            .Select(x => x.Element("a").Attributes["href"].Value)
            .FirstOrDefault();

             MessageBox.Show(hrefValue);

but i get a NullReferenceException at Where(x => x.Attributes["class"].Value == "movie_general")

What am i doing wrong?


It happens because the Linq provider must iterate through all other nodes in the document to check if it matches your search. This document must have at least one div which does not have a class attribute. So, the error happens by trying to read the Value property of an attribute which does not exist.

Replace this

.Where(x => x.Attributes["class"].Value == "movie_general")
.Select(x => x.Element("a").Attributes["href"].Value)

with this

.Where(x => x.Attributes["class"] != null && x.Attributes["class"].Value == "movie_general")
.Select(x => x.Element("a") != null && x.Element("a").Attributes["href"] != null ? x.Element("a").Attributes["href"].Value : string.Empty)


If you already know the class and that the a tag is subordinate to that, why not just grab it directly using:

 HtmlDocument doc = new HtmlDocument();
    doc.Load("C:\\temp\\stackhtml.html");
    string link = doc.DocumentNode.SelectSingleNode("//div[@class='movie_general']//a").GetAttributeValue("href", "unkown");
    Console.WriteLine(link);
    Console.ReadLine();

and the result:

HtmlAgility problem

I added closing div tags to your example so that I could scrape it and dumped it in a file on my c drive:

<div class="movie_general">
   <div class="img">
      <a href="/Movies.html" title="Watch Movie">
    </div>
</div>
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号