开发者

Parsing using HTMLParser

开发者 https://www.devze.com 2023-04-02 00:06 出处:网络
Parser parser = new Parser(); parser.setInputHTML(\"d:/index.html\"); parser.setEncoding(\"UTF-8\"); NodeList nl = parser.parse(null);
Parser parser = new Parser();
    parser.setInputHTML("d:/index.html");
    parser.setEncoding("UTF-8");
    NodeList nl = parser.parse(null); 
    /*
    SimpleNodeIterator sNI=list.elements();
    while(sNI.hasMoreNodes()){
    System.out.println(sNI.nextNode().getText());}
    */
    NodeList trs = nl.extractAllNodesThatMatch(new TagNameFilter("tr"),true);
    for(int i=0;i<trs.size();i++) {
        NodeList nodes = trs.elementAt(i).getChildren();
        NodeList tds  = nodes.extractAllNodesThatMatch(new TagNameFilter("td"),true);
    System.out.println(tds.toString());

I am not getting any output, e开发者_Python百科clipse shows javaw.exe terminated.


Pass the path to the resource into the constructor.

Parser parser = new Parser("index.html");

Parse and print all the divs on this page:

Parser parser = new Parser("http://stackoverflow.com/questions/7293729/parsing-using-htmlparser/");
parser.setEncoding("UTF-8");
NodeList nl = parser.parse(null);
NodeList div = nl.extractAllNodesThatMatch(new TagNameFilter("div"),true);
System.out.println(div.toString());

parser.setInputHtml(String inputHtml) doesn't do what you think it does. It treats inputHtml as the html input to the parser. You use the constructor to point the parser at an html resource (file or URL).

Example:

Parser parser = new Parser();
parser.setInputHTML("<div>Foo</div><div>Bar</div>");
0

精彩评论

暂无评论...
验证码 换一张
取 消