开发者

Correlate Adjacent Element Vales using HTML Agility Pack

开发者 https://www.devze.com 2023-03-22 21:50 出处:网络
I\'m trying to grab the h2 element that follows the HTML comment with the text \"Results\", followed by the table element with the class name \"stockfeed\".

I'm trying to grab the h2 element that follows the HTML comment with the text "Results", followed by the table element with the class name "stockfeed".

I've figured out how to pull the data I need (see below), but I not sure how to pull the 2 elements together at the same time. I know I can iterate the collections using the same indexer to correlate the values, but this seems error prone since it may be possible for one of my h2 elements to not have a adjacent table element (rare but possible).

Example HTML markup:

<h1>
    Results Page</h1>
<h2>
    Updated Daily @ 10:00 AM</h2>
<div class='someClass1'>
    <!-- Results -->
    <div class='something'>
    </div>
    <h2 style='display: inline;'>
        <a href='http://www.somesite.com'>Table 1</a>
    </h2>
    <div class='clr'>
    </div>
    <div class='resultBlock'>
        <table class='stockfeed'>
            <thead>
                <tr>
                    <th>
                        Part
                    </th>
                    <th>
                        Description
                    </th>
                    <th>
                        Stock
                    </th>
                    <th>
                        Price
                    </th>
                </tr>
            </thead>
            <tbody>
                <tr class='row1' valign='top'>
                    <td>
                        A 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        B 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        C 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
    开发者_开发知识库                    1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
    <!-- Results -->
    <div class='something'>
    </div>
    <h2 style='display: inline;'>
        <a href='http://www.somesite.com'>Table 2</a>
    </h2>
    <div class='clr'>
    </div>
    <div class='resultBlock'>
        <table class='stockfeed'>
            <thead>
                <tr>
                    <th>
                        Part
                    </th>
                    <th>
                        Description
                    </th>
                    <th>
                        Stock
                    </th>
                    <th>
                        Price
                    </th>
                </tr>
            </thead>
            <tbody>
                <tr class='row1' valign='top'>
                    <td>
                        A 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        B 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        C 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
</div>

Current code to parse the values separately:

    HtmlNodeCollection titles = doc.DocumentNode.SelectNodes("//comment()[contains(.,'Results')]/following-sibling::h2");
    for (int tit = 0; tit < titles.Count; ++tit)
    {
        // Do Something
    }

    HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table[@class='stockfeed']");
    for (int tab = 0; tab < tables.Count; ++tab)
    {
        // Do Something
    }


So if I'm reading this correctly, you are trying to get the corresponding tables with each result.

You can use a similar approach you used to get the following h2 element to get the following table element relative to it.

var query = doc.DocumentNode
    .SelectNodes("//comment()[contains(.,'Results')]/following-sibling::h2");

foreach (var h2 in query.Cast<HtmlNode>())
{
    var table = h2.SelectSingleNode("following-sibling::*/table[@class='stockfeed']");
    // do stuff with h2 and table
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号