开发者

Which regex expression do I need for this?

开发者 https://www.devze.com 2023-03-31 19:30 出处:网络
I\'m kinda stuck here. I have this pattern: <a class=\"title\" href=\"showthread.php?t=XXXXX\" id=\"thread_title_XXX\">DATADATA</a>

I'm kinda stuck here.

I have this pattern:

<a class="title" href="showthread.php?t=XXXXX" id="thread_title_XXX">DATADATA</a>

I know that in my string (a webpage) all my data is stored in this format, while it has the 'unique 开发者_StackOverflow社区signature' I just wrote. the XXX's count is dynamic, probabaly somewhere between 2 to 12 DIGITS (each X is a digit).

I can write a long expression to find the whole line, but I want to extract the data, not the whole thing.

How can I do it ? An example would be appreciated.

Thank you!


Forget about regular expressions, they're not meant to parse formats like HTML, especially if an actual parser exists for it already.

Find the nodes using XPath:

$html = <<<EOT

<html>
Some html
<a class="title" href="showthread.php?t=XXXXX" id="thread_title_XXX">DATADATA</a>
</html>

EOT;

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[starts-with(@href, "showthread.php")]') as $node) {
    // ...
}

Then extract the data using substr, strpos and parse_str:

$href = $node->getAttribute('href');
parse_str(substr($href, strpos($href, '?')+1), $query);
$t = $query['t'];

$id = $node->getAttribute('id');
$title = substr($id, strlen('thread_title_'));

$data = $node->nodeValue;

var_dump($t, $title, $data);

You get:

string(5) "XXXXX"
string(3) "XXX"
string(8) "DATADATA"


Try to use :

 $parsed_str = '<a class="title" href="showthread.php?t=45343" id="thread_title_XXX">DATADATA</a><a class="title" href="showthread.php?t=466666" id="thread_title_XXX">DATADATA</a> fasdfasdfsdfasd gfgfkgbc  04034kgs <fdfd> dfs</fdfa> <a class="title" href="showthread.php?t=7777" id="thread_title_XXX">DATADATA</a>';
 preg_match_all("/.*?\?t\=([\d]{2,12}).*?/", $parsed_str, $result);
 print_r($result);


what actually you want to do ? Get the XXXXX signature or all links?

try this - this is get a signature and data

<?php 
$S = '<a class="title" href="showthread.php?t=1234567" id="thread_title_XXX">DATADATA</a>';
$pattern = '!<a.*href="showthread.php\?t=(.*)".* id=".*">(.*)</a>!';

echo "<pre>";
print_r(preg_match($pattern, $S, $res));
print_r($res);
echo "</pre>";
?>
0

精彩评论

暂无评论...
验证码 换一张
取 消