开发者

Regular expression to match characters in a string, excluding matches within HTML anchor elements

开发者 https://www.devze.com 2023-01-20 08:16 出处:网络
Consid开发者_运维技巧er this blob of text: @\" I want to matchthe word \'highlight\' in a string. But I don\'t want to match

Consid开发者_运维技巧er this blob of text:

@"
I want to match  the word 'highlight' in a string. But I don't want to match
highlight when it is contained in an HTML anchor element. The expression
should not match highlight in the following text: <a href='#'>highlight</a>
"

Here's what the output should look like (matches are in bold):

I want to match the word "highlight" in a string. But I don't want to match highlight when it is contained in an HTML anchor element. The expression should not match highlight in the following text: highlight

How would you construct an expression that matches all occurrences of X, excluding matches inside HTML anchor elements?


I know you asked for RegEx, but I won't do it. Instead here's a solution using Html Agility Pack.

public static void Parse()
{
    string htmlFragment =
        @"
    I want to match  the word 'highlight' in a string. But I don't want to match
    highlight when it is contained in an HTML anchor element. The expression
    should not match highlight in the following text: <a href='#'>highlight</a> more
    ";
    HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
    htmlDocument.LoadHtml(htmlFragment);
    foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//.").Where(FilterTextNodes()))
    {
        Console.WriteLine(node.OuterHtml);
    }
}

private static Func<HtmlNode, bool> FilterTextNodes()
{
    return node => node.NodeType == HtmlNodeType.Text && node.ParentNode != null && node.ParentNode.Name != "a" && node.OuterHtml.Contains("highlight");
}
0

精彩评论

暂无评论...
验证码 换一张
取 消