开发者

What is HTMLCleaner in android

开发者 https://www.devze.com 2023-02-02 02:18 出处:网络
Can anyone tell me what Html Cleaner is and for which purpose it is used? 开发者_如何学CThanks, davidhi Refer this answer fromthis site http://htmlcleaner.sourceforge.net/,

Can anyone tell me what Html Cleaner is and for which purpose it is used?

开发者_如何学CThanks, david


hi Refer this answer from this site http://htmlcleaner.sourceforge.net/,

HtmlCleaner is open-source HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring the order to tags, attributes and ordinary text. For the given HTML document, HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.

For Instance, Consider the following example..........,

<table id=table1 cellspacing=2px
    <h1>CONTENT</h1>
    <td><a href=index.html>1 -> Home Page</a>
    <td><a href=intro.html>2 -> Introduction</a>


    After putting it through HtmlCleaner, XML similar to the following is coming out: 
    <?xml version="1.0" encoding="UTF-8"?>
    <html>
       <head />
       <body>
          <h1>CONTENT</h1>
          <table id="table1" cellspacing="2px">
             <tbody>
                <tr>
                   <td>
                      <a href="index.html">1 -&gt; Home Page</a>
                   </td>
                   <td>
                      <a href="intro.html">2 -&gt; Introduction</a>
                   </td>
                </tr>
             </tbody>
          </table>
       </body>
    </html>

And refer this site for how to use htmlCleaner , http://thinkandroid.wordpress.com/2010/01/05/using-xpath-and-html-cleaner-to-parse-html-xml/


HTML Cleaner is a library to "clean" as it's name says and convert bad-formed HTML to XHTML in order to be able to parse it using an XML parser.

0

精彩评论

暂无评论...
验证码 换一张
取 消