开发者

libxml2 in C++, set encoding for parser - "Input is not proper UTF-8"

开发者 https://www.devze.com 2023-03-09 15:57 出处:网络
I want parse in C++ simple status messages from a webservice, xml fragments without encoding attribute.

I want parse in C++ simple status messages from a webservice, xml fragments without encoding attribute.

<message xmlns="http://violation.importer.xyz.de/xsd">
    Der Import-Datensatz mit der Bezeichung=开发者_如何学Go"blabla" und der Id=68809 wurde erfolgreich importiert.
</message>

They seem to be in ISO-8859-1 . Can I set the parser to this encoding? The API is confusing to me.

Here's my code, the xml is in char* it (an iterator btw)

xmlNodePtr root_element_ptr;
xmlDocPtr xmldoc_ptr;

xmldoc_ptr = xmlReadMemory(*it, strlen(*it), "it.xml", NULL, 0);
root_element_ptr = xmlDocGetRootElement(xmldoc_ptr);
xmlNodePtr msgnode = root_element_ptr->xmlChildrenNode;
xmlChar *message = xmlNodeListGetString(xmldoc_ptr, msgnode, 1);
response_msg += *message;
response_msg += " / ";
xmlCleanupParser();
xmlFreeDoc(xmldoc_ptr);

this works, but segfaults on Umlaut character and in my log i see

it.xml:1: parser error : Input is not proper UTF-8, indicate encoding !

Bytes: 0xE4 0x72 0x7A 0x74

so what of these do i have to use? http://xmlsoft.org/html/libxml-encoding.html


After posting a problem here on SO it often becomes clear and more easy. Here's what I changed and it works

xmlParserCtxtPtr ctxt_ptr = xmlNewParserCtxt();
xmldoc_ptr = xmlCtxtReadMemory( ctxt_ptr, *it, strlen(*it), "it.xml", "ISO-8859-1", 0);
//xmldoc_ptr = xmlReadMemory(*it, strlen(*it), "it.xml", NULL, 0);
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号