开发者

Parsing HTML using HTTP Agility Pack

开发者 https://www.devze.com 2022-12-24 06:26 出处:网络
Here is one table out of 5: <h3>marec - maj 2009</h3> <div class=\"graf_table\"> <table summary=\"layout table\">

Here is one table out of 5:

<h3>marec - maj 2009</h3>
<div class="graf_table">
<table summary="layout table">
    <tr>
        <th>DATUM</th>
        <td class="datum">10.03.2009</td>
        <td class="datum">24.03.2009</td>
        <td class="datum">07.04.2009</td>
        <td class="datum">21.04.2009</td>
        <td class="datum">05.05.2009</td>
        <td class="datum">06.05.2009</td>
    </tr>
    <tr>
        <th>Maloprodajna cena [EUR/L]</th>
        <td>0,96000</td>
        <td>0,97000</td>
        <td>0,99600</td>
        <td>1,00800</td>
        <td>1,00800</td>
        <td>1,01000</td>
    </tr>
    <tr>
        <th>Maloprodajna cena [SIT/L]</th>
        <td>230,054</td>
        <td>232,451</td>
        <td>238,681</td>
        <td>241,557</td>
        <td>241,557</td>
        <td>242,036</td>
    </tr>
    <tr>
        <th>Prodajna cena brez dajatev</th>
        <td>0,33795</td>
        <td>0,34628</td>
        <td>0,36795</td>
        <td>0,37795</td>
        <td>0,37795</td>
        <td>0,37962</td>
    </tr>
    <tr>
        <th>Trošarina</th>
        <td>0,46205</td>
        <td>0,46205</td>
        <td>0,46205</td>
        <td>0,46205</td>
        <td>0,46205</td>
        <td>0,46205</td>
    </tr>
    <tr>
        <th>DDV</th>
        <td>0,16000</td>
        <td>0,16167</td>
        <td>0,16600</td>
        <td>0,16800</td>
        <td>0,16800</td>
        <td>0,16833</td>
    </tr>
</table>
</div>

I have to extract out values, where table header is DATUM and Maloprodajna cena [EUR/L]. I am using Agility HTML pack.

this.htmlDoc = new HtmlAgilityPack.HtmlDocument();
this.htmlDoc.OptionCheckSyntax = tr开发者_如何学Goue;
this.htmlDoc.OptionFixNestedTags = true;
this.htmlDoc.OptionAutoCloseOnEnd = true;
this.htmlDoc.OptionOutputAsXml = true; // is this necessary ??
this.htmlDoc.OptionDefaultStreamEncoding = System.Text.Encoding.Default;

I had a lot of trouble with getting those values out. I started with:

 var query = from html in doc.DocumentNode.SelectNodes("//div[@class='graf_table']").Cast<HtmlNode>()
 from table in html.SelectNodes("//table").Cast<HtmlNode>()
 from row in table.SelectNodes("tr").Cast<HtmlNode>()
 from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
 select new { Table = table.Id, CellText = cell.InnerHtml };

but could not figure out a way to select only values where table header is DATUM and Maloprodajna cena[EUR/L]. Is it possible to do that with where clause?

Then I ended with those two queries:

var date = (from d in htmlDoc.DocumentNode.SelectNodes("//div[@class='graf_table']//table//tr[1]/td")
                    select DateTime.Parse(d.InnerText)).ToArray();

var price = (from p in htmlDoc.DocumentNode.SelectNodes("//div[@class='graf_table']//table//tr[2]/td")
                     select double.Parse(p.InnerText)).ToArray();

Is it possible to combine those two queries? And how would I convert that to lambda expression? I just started to learn those things and I would like to know how it is done so that in the future I would not have those question.

O, one more question ... does anybody know any graph control, cause I have to show those values in graph. I started with Microsoft Chart Controls, but I am having trouble with setting it. So if anyone has any experience with it I would like to know how to set it, so that x axle will show all values not every second ... example: if I have: 10.03.2009, 24.03.2009, 07.04.2009, 21.04.2009, 05.05.2009, 06.05.2009 it show only: 10.03.2009, 07.04.2009, 05.05.2009, ect.

I bind data to graph like that:

chart1.Series["Series1"].Points.DataBindXY(date, price);

I lot of questions for my fist post ... hehe, hope that I was not indistinct or something. Thank's for any reply!


For such CodePlex projects, please consider posting your questions directly to their Discussion boards. Usually that's the best way to contact the developers.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号