Parse Website Data in C++_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-22 21:13 出处：网络

So I am trying to develop a program that will parse a website for data, send that data into variable that I can then use for functions inside the program.

Specifically I'm trying to parse this page (Click the debuffs tab)

http://worldoflogs.com/reports/rt-1smdoscr7neq0k6b/spell/94075/

The source is pretty simple and looks like this.

    <td><a href='/reports/rt-1smdoscr7neq0k6b/details/62/' class='actor'><span class='Warrior'>Zonnza</span></a></td>
    <td>100</td>
</tr>
<tr>
    <td><a href='/reports/rt-1smdoscr7neq0k6b/details/3/' class='actor'><span class='DeathKnight'>Fillzholez</span></a></td>
    <td>89</td>
</tr>

While I only want the numbers and name, ex what is between <td></td> and between开发者_开发问答 the <span class=''></span> tags. Is there anyway to do what I'm looking for?

Any help would be greatly appreciated.

I'd look into Tag Soup. It's a parser for HTML that can cope with all the horrible HTML that's out there. There's a C++ port of it available too (haven't used that so can't comment on how stable it is).

There are no C++ libraries for what you're trying to do (unless you're going to link with a half of Mozilla or WebKit), but you can consider using Java with HTMLUnit.

And for those suggesting regular expressions, an obligatory reference.

There's no need to use C++, when C-style sscanf will do, or even perl or any language with regular expression support.