libxml2 HTML Parse, document from HtmlReadFile() always has a root header of (null)_问答_开发者

libxml2 HTML Parse, document from HtmlReadFile() always has a root header of (null)

开发者 https://www.devze.com 2023-03-24 12:10 出处：网络

I\'m trying to use libxml to parse through some HTML. Here\'s my code: #include <stdio.h> #include <string.h>

I'm trying to use libxml to parse through some HTML.

Here's my code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#include <libxml/HTMLparser.h>

htmlDocPtr gethtml(char *doclocation,char *encoding) {
    htmlDocPtr doc;

    doc = htmlReadFile(doclocation, encoding, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);

    if (doc == NULL) {
        fprintf(stderr, "Document not parsed successfully.\n");
        return;
    }

    return doc;
}

void getroot(htmlDocPtr doc) {
    xmlNode *cur = NULL;

    cur = xmlDocGetRootElement(doc);

    if (cur == NULL) {
        fprintf(stderr, "empty document\n");
        xmlFreeDoc(doc);
        return;
    }

    printf("%s\n", *cur);

}

int main(void) {
    char *website = "http://www.google.com/index.html";
    char *encoding = "UTF-8";
    htmlDocPtr doc;

    doc = gethtml(website, encoding);

    getroot(doc);

    return 0;
}

When I run the program, it always prints out (null). I'm guessing that the document isn't getting parsed c开发者_如何学Goorrected during HtmlReadFile(), but I'm not sure how to correct it.

Any help would be appreciated