开发者

Html Agility Pack merged queries

开发者 https://www.devze.com 2023-04-08 06:34 出处:网络
I have a table kind of: ...some td\'s with not nee开发者_JAVA技巧ded links <td>1010</td>

I have a table kind of:

...some td's with not nee开发者_JAVA技巧ded links
<td>1010</td>
<td>Building</td>
<td>Adress stree 55</td>
<td>00000 City</td>
<td>
<a href="http://www.adress.xy/file.kml" target="_self">
<img align="top" border="1" src="/custom/img/kml.gif" alt="Details" title="Details" />
</a>
</td>

I use this query to get the innertext information:

HtmlDocumet doc = new HtmlDocument();
        doc.LoadHtml(html);            
        var node = doc.DocumentNode.Descendants("table")
            .FirstOrDefault(x => x.Attributes["style"].Value == "table-layout:auto")
            .Elements("tr")
            .Select(tr => tr.Elements("td").Select(td => td.InnerText).ToArray)).ToArray();

but I would also like to add to the array an url with .kml links. So the question is: how is it possible to merge querys to get innertext and the kml link?

the result of this query is:

string[i][j]

where i= number of tr- elements and j - number of td- elements

Example:

string[0][0]="1010"
string[0][1]="Building"

I would like also to have: string[i][4] = "http://www.adress.xy/file.kml"

P.S. the whole table is here.


I wouldn't worry about getting arrays of arrays, it would be better if you got lists instead.

const string url = "http://www.rwth-aachen.de/go/id/yvu/scol/1/sasc/1/pl/313";
const string kml = "http://www.adress.xy/file.kml";
var newKml = new[] { kml };

var web = new HtmlWeb();
var doc = web.Load(url);
var xpath = "//table[@style='table-layout:auto']/tr[td]";
var rows = doc.DocumentNode.SelectNodes(xpath);
var table = rows
    .Select(row =>
        row.Elements("td")
           .Skip(1)
           .Take(4)
           .Select(col => System.Net.WebUtility.HtmlDecode(col.InnerText))
           .Concat(newKml)
           .ToList()
    ).ToList();

I would consider making an anonymous type to represent your rows that way you could give more useful names you your columns. Perhaps even put the results in a DataTable instead.

Just in case you won't be able to use xpath for whatever reason (or you wanted to know the equivalent LINQ queries), you could replace the line that uses the xpath with this:

var rows = doc.DocumentNode.Descendants("table")
    .Where(t => t.Attributes["style"].Value == "table-layout:auto")
    .SelectMany(t => t.Elements("tr").Where(tr => tr.Elements("td").Any()));
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号