开发者

Perplexed by simple XPath bug

开发者 https://www.devze.com 2023-03-13 23:48 出处:网络
<?php $response = \'<style><div id=\"subhead\"></div></style>\'; //echo $response;
<?php
$response = 
'<style><div id="subhead"></div></style>';
//echo $response;

$doc = new DOMDocument();

$doc->loadHTML($response);  

$finder = new DomXPath($doc);

$term_select = $finder->query('//div[@id="subhead"]');

var_dump($term_select->item(0));

?>

The var_dump gets NULL, and I also get this Warning on line 8:

Warning: DOMDocument::loadHTML(): Unexp开发者_如何学Goected end tag : div in Entity, line: 1 on line 8

Note that this is not my HTML (I'm scraping), so changing the HTML is not an option.


The problem is that you can't have a DIV element instead a STYLE one so when you use loadHTML, it fails to validate the HTML. If you did a $doc->saveHTML(); you would have quickly realized that it's wrapping the <div id="subhead"> in CDATA.

To solve the problem, use loadXML() instead.

$doc->loadXml($response);


loadHTML() expects to find HTML in the string, but that is not valid HTML, so the string does not get loaded properly. XPath will not have that <div> element to get to. Try loadXML() instead.

0

精彩评论

暂无评论...
验证码 换一张
取 消