Good evening dear community,
i need some help with preg_match - i want to optimize the code that allready runs very well! i want to get ony the results - not the overhead of HTML-tags in the result That means i have to tailor the regex a bit. How can i improve the (allready very nice) code!?
<?php
$content = file_get_contents("< - URL - >");
var_dump($content);
$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);
foreach ($matches[1] as $match) {
$match = strip_tags($match);
$match = trim($match);
var_dump($match);
}
?>
See here the url: link text
Hmm - i need to tailor the regex a bit... Cany anybody give me.
Each idea and tipp will be grea开发者_开发知识库tly appreciated regards zero
It appears that you are trying to scrape data from HTML pages. If this is the case, then you really should not use regular expressions to extract information. Take a look instead at the DOMDocument
class.
Note that DOMDocument
requires XML input, so often a "tidying" process needs to prepare the HTML for being parsed as XML. One convenient way to do this is to use the "tidy" extension. See "Tidying up your HTML with PHP 5" for an introduction to its use.
EDIT: How can I scrape a website with invalid HTML
精彩评论