Parser parser = new Parser();
parser.setInputHTML("d:/index.html");
parser.setEncoding("UTF-8");
NodeList nl = parser.parse(null);
/*
SimpleNodeIterator sNI=list.elements();
while(sNI.hasMoreNodes()){
System.out.println(sNI.nextNode().getText());}
*/
NodeList trs = nl.extractAllNodesThatMatch(new TagNameFilter("tr"),true);
for(int i=0;i<trs.size();i++) {
NodeList nodes = trs.elementAt(i).getChildren();
NodeList tds = nodes.extractAllNodesThatMatch(new TagNameFilter("td"),true);
System.out.println(tds.toString());
I am not getting any output, e开发者_Python百科clipse shows javaw.exe terminated.
Pass the path to the resource into the constructor.
Parser parser = new Parser("index.html");
Parse and print all the divs on this page:
Parser parser = new Parser("http://stackoverflow.com/questions/7293729/parsing-using-htmlparser/");
parser.setEncoding("UTF-8");
NodeList nl = parser.parse(null);
NodeList div = nl.extractAllNodesThatMatch(new TagNameFilter("div"),true);
System.out.println(div.toString());
parser.setInputHtml(String inputHtml)
doesn't do what you think it does. It treats inputHtml
as the html input to the parser. You use the constructor to point the parser at an html resource (file
or URL
).
Example:
Parser parser = new Parser();
parser.setInputHTML("<div>Foo</div><div>Bar</div>");
精彩评论