I'm trying to parse a Twitte开发者_开发问答r atom feed in PHP but am running into this strange issue. I'm calling preg_match_all with this regexp string:
"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>.*</entry>|xsU"
It matches all the entries OK, but the captured subgroups title/published do not show up in the results (no arrays for the captured subgroups are created in the result object).
Now to the strange part, I try to capture the last bit as well:
"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>(.*)</entry>|xsU"
And now the capturing works. I get the title and the published date and the large chunk of final data that I don't want.
I tried to add the non capturing string "?:" to the last subgroup but then capturing stopped working alltogether again.
So how do I capture the data I want, without having to capture the large chunk of unwanted data at the end?
I recommend you use DOM (or SimpleXML) for parsing RSS/Atom feeds. You will get way better results than with regular expressions.
Here's an example (using SimpleXML):
$rss_feed = file_get_contents('http://stackoverflow.com/feeds/question/4187945');
$sxml = new SimpleXMLElement($rss_feed);
$title = $sxml->entry[0]->title;
echo $title;
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论