开发者

HtmlAgilityPack skip or remove nested table

开发者 https://www.devze.com 2023-03-11 03:24 出处:网络
I’m using HtmlAgilityPack in order to retrieve the following html (notice the nested table): <table class=\"123\">

I’m using HtmlAgilityPack in order to retrieve the following html (notice the nested table):

<table class="123">
<tr>
    <table class="789">
    <tr>
        <td>abc</td>
    </tr>
    <tr>
        <td>def</td>
    </tr>
    </table>
</tr>

<tr>
    <td>info 1</td>
</tr>

<tr>
    <td>info 2</td>
</tr>

<tr>
    <td>info 3</td>
</tr>
</table>

Now, I’m trying to find a clever way to obtain some information from the parent table and some information from the nested table…

So far I have the following:

var parentTable = document.DocumentNode.SelectNodes("//table[@class='123']").FirstOrDefault();

var nestedTable = parentTable.SelectNodes("//table[@class='789']").FirstOrDefault();

I can now play around with the nestedTable and get what I want (abc, def)...

But when I try to get the <tr>’s from the parent table like so:

var parentTableRows = parentTable.SelectNodes(".//tr");

It seems to include (in the collection) the <tr>’s from the nested table as well...

In other words, according to the above html code, I was expecting to have a collection of 4 <tr>’s but since it includes the <tr>’s from the nested table, I’m getting a collection of 6 <tr>’s.

How can I skip the first <tr> that h开发者_Go百科appens to hold the nested table so I can play around and get the information I want (info1, info2, info3) (hope I’m making sense…)

Thanks in advance!


// is an XPATH expression that means "scan all nodes and sub nodes". That's why //tr gets all tr below the root one.

If you just do parentTable.SelectNodes("tr") (or "./tr" which is equivalent), you will select all TR below the root one.

If you want to skip the first one, then you can add an XPATH filter on element's position() (an XPATH function):

var parentTableRows = parentTable.SelectNodes("tr[position() > 1]");
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号