i am trying to extract some data between divs.
<div class="movie_general"><div class="img"><a href="/Movies.html" title="Watch Movie">
Fore example if i want the link "/Movies.html" i used:
string hrefValue = doc.DocumentNode
.Descendants("div")开发者_运维知识库
.Where(x => x.Attributes["class"].Value == "movie_general")
.Select(x => x.Element("a").Attributes["href"].Value)
.FirstOrDefault();
MessageBox.Show(hrefValue);
but i get a NullReferenceException at Where(x => x.Attributes["class"].Value == "movie_general")
What am i doing wrong?
It happens because the Linq provider must iterate through all other nodes in the document to check if it matches your search. This document must have at least one div
which does not have a class
attribute. So, the error happens by trying to read the Value
property of an attribute which does not exist.
Replace this
.Where(x => x.Attributes["class"].Value == "movie_general")
.Select(x => x.Element("a").Attributes["href"].Value)
with this
.Where(x => x.Attributes["class"] != null && x.Attributes["class"].Value == "movie_general")
.Select(x => x.Element("a") != null && x.Element("a").Attributes["href"] != null ? x.Element("a").Attributes["href"].Value : string.Empty)
If you already know the class and that the a tag is subordinate to that, why not just grab it directly using:
HtmlDocument doc = new HtmlDocument();
doc.Load("C:\\temp\\stackhtml.html");
string link = doc.DocumentNode.SelectSingleNode("//div[@class='movie_general']//a").GetAttributeValue("href", "unkown");
Console.WriteLine(link);
Console.ReadLine();
and the result:
I added closing div tags to your example so that I could scrape it and dumped it in a file on my c drive:
<div class="movie_general">
<div class="img">
<a href="/Movies.html" title="Watch Movie">
</div>
</div>
精彩评论