开发者

LIBXML - how do I get the name of the tag?

开发者 https://www.devze.com 2023-03-27 21:49 出处:网络
I have the following: my $string=\'<entry><name>Bob</name><zip>90210</zip></entry>\';

I have the following:

my $string='<entry><name>Bob</name><zip>90210</zip></entry>';

my $parser=XML::LibXML->new(); 
use HTML::Entities;
my $encodedXml=encode_entities($string,'&\'');

my $doc=$pa开发者_StackOverflowrser->parse_string($encodedXml);

foreach my $text($doc->findnodes("//text()")){
print $text->to_literal,"\n";
}

This prints out 'Bob' and '90210';

How do I get the actual node names...I need a way to get all the nodes within my xml tree....ie 'name' and 'zip'


Text nodes don't have names. Perhaps you want the name of the parent?

I think this will work:

for my $node ($doc->findnodes('//text()')) {
   print $node->parentNode()->nodeName(), ": ", $node->nodeValue(), "\n";
}

I would use

for my $node ($doc->findnodes('//*[text()]')) {
   print $node->nodeName(), ": ", $node->textContent(), "\n";
}

Note: This later version combines all the text children of the element, so it's not equivalent if a node has more than one text child. They should be equivalent for you, though.


What your code does is select the text nodes, which exist as children of the nodes you are looking for. A text node is a separate entity, and it does not have a name. You need to navigate to the text node's parent and that node will contain the tag name.

Things get trickier with mixed-content nodes that contain both text and element nodes, such as

<p>Beginning of <i>sentence</i> and now the end</p>

In this case the structure is

<p>
 |
 +---text (Beginning of )
 |
 +---<i>
 |    |
 |    +---text (sentence)
 |
 +---text ( and now the end)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号