开发者

How do I remove specific elements from HTML with HTML Agility Pack for ASP.NET (vb)

开发者 https://www.devze.com 2023-04-06 23:36 出处:网络
There seems to be no documentation on the codeplex page and for 开发者_StackOverflowsome reason intellisense doesn\'t show me available methods or anything at all for htmlagilitypack (for example when

There seems to be no documentation on the codeplex page and for 开发者_StackOverflowsome reason intellisense doesn't show me available methods or anything at all for htmlagilitypack (for example when I type MyHtmlDocument.DocumentNode. - there is no intellisense to tell me what I can do next)

I need to know how to remove ALL < a > tags and their content from the body of the HTML document I cannot just use Node.InnerText on the Body because that still returns content from A tags.

Here is example HTML

<html>
    <body>
        I was born in <a name=BC>Toronto</a> and now I live in barrie
    </body>
</html>

I need to return

I was born in and now I live in barrie

Thanks, I appreciate the help!

Thomas


Something along the lines of (sorry my code is C# but I hope it will help nonetheless)

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml("some html markup here");

HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[@name]");

foreach(HtmlNode link in links)
{
    link.Remove();
}

//then one of the many doc.Save(...) overrides to actually get the result of the operation.


This gets you the result you require. This uses Recursive method to drill down all your html nodes and you can simply remove more nodes by adding a new if statment.

Public Sub Test()
    Dim document = New HtmlDocument() With { _
        Key .OptionOutputAsXml = True _
    }
    document.LoadHtml("<html><body>I was born in <a name=BC>Toronto</a> and now I live in barrie</body></html>")

    For i As var = 0 To document.DocumentNode.ChildNodes.Count - 1
        RecursiveMethod(document.DocumentNode.ChildNodes(i))
    Next

    Console.Out.WriteLine(document.DocumentNode.InnerHtml.Replace("  ", " "))
End Sub

Public Sub RecursiveMethod(child As HtmlNode)
    For x As var = 0 To child.ChildNodes.Count - 1
        Dim node = child.ChildNodes(x)
        If node.Name = "a" Then
            node.RemoveAll() //removes all the child nodes of "a"
            node.Remove()    //removes the actual "a" node
        Else
            If node.HasChildNodes Then
                RecursiveMethod(node)
            End If
        End If
    Next
End Sub
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号