开发者

Regex to find content in html tags

开发者 https://www.devze.com 2023-01-19 00:09 出处:网络
I need to parse a html file and extract the NeedThis* strings with C#/.net, sample code is: <tr class=\"class\">

I need to parse a html file and extract the NeedThis* strings with C#/.net, sample code is:

<tr class="class">
    <td style="width: 120px">
        <a href="NeedThis1">NeedThis2</a>
    </td>
    <td style="width: 120px">
        <a href="NeedThis3">
            NeedThis4</a>
    </td>
    <td style="width: 30%">
        NeedThis5
    </td>
    <td>
        NeedThis6
    </td>
    <td style="width: 120px">
        NeedThis7
    </td>
</tr>

I know a html parser should be better here, but all I need is to ext开发者_高级运维ract these texts, this is just for a temp helper tool...

anyone can help me with this?

thanks!


If you are sure that you html is valid you could use Linq to Xml else you are better of using a parser like HTML Agility Pack


It doesn't matter whether you're doing this for a one-off or for a "finished project". Your task isn't text extraction and it's not something that a regex can do effectively. The data you're looking for depends on the structure of the HTML. Your task is parsing HTML. When your task is parsing HTML, use an HTML parser. It's not difficult. In fact it's a lot easier than writing the pile of regexes you would need otherwise.


You seem to have answered your own question. You should use a parser. But if you don't you can use the RE NeedThis.*

Of course, if you want any context with those strings, you should just use a parser.


Hans, as you can see by the other answers using a RegEx is probably not the best way to do what you want to do, but since I need to practice my RegEx anyways I went ahead and made one just in case you wanted to experiment. This will only catch NeedThis2, but it should give you an idea of how you would make your own RegEx when it is an appropriate solution.

<a href="NeedThis1">NeedThis2</a>

RegEx to catch NeedThis2:

(?:<a[^<a]+?>)(\S)*(?:<[^<]+?a>)

Pretty nasty huh? :)

0

精彩评论

暂无评论...
验证码 换一张
取 消