开发者

Regular expression to extract another expression from a string with delimiters

开发者 https://www.devze.com 2023-03-30 05:14 出处:网络
This question is a little odd, and I have spent a fair while pushing my knowledge of regular expressions to get this to the point it is at. I\'m stuck at the last little bit however. The problem is as

This question is a little odd, and I have spent a fair while pushing my knowledge of regular expressions to get this to the point it is at. I'm stuck at the last little bit however. The problem is as follows:

I have a string (which denotes a url in a routing system I'm modifying), that may contain a regular expression to match some segment. For example:

$pattern = "/some/path/to/</[a-z]+/>regex_var1/location";

The important bits to note here are:

  • The regular expression is delimited within the string with </ /> (this is not especially optional unless its the end of the world for legacy reasons. I would prefer to leave this as is).
  • The bit of text after the /> (regex_var1) is a name for the match of this parameter. I need to keep this out of the expression to keep it compatible with the rest of the system, suffice to say it can be ignored in this context.
  • This url pattern would match /some/path/to/another/location

What I want to achieve is to split a given format (example as above) into segments. These segments are used in a backtracking开发者_如何学Go tree traversal to match a Request URI with a controller. At present regular expressions are not supported, my intention is to allow this. In the past each segment was denoted by a /, however I require / characters in the contained regular expression. If I use it in it's current form the expression is split across two segments. For example

$pattern = "/some/</([a-z]+)(/optional)?/>regex2/location";
$segments = preg_split('/(?<!<)\/(?!>)/', $pattern);

yields 4 segments

// print_r($segments)
Array
(
    [0] => 
    [1] => some
    [2] => </([a-z]+)(
    [3] => optional)?/>regex2
    [4] => location
)

when I actually only want 3

// print_r($segments)
Array
(
    [0] => 
    [1] => some
    [2] => </([a-z]+)(/optional)?/>regex2
    [3] => location
)

I am not interested in matching the whole URL with a regular expression, which would defeat the whole point of the exercise. This problem might seem unwarranted in isolation, but details about why I am after this specific implementation are beyond the scope of the question.


Hm, I cannot see an easy way to do it with a regexp only. You might first parse out the regexes (/<\/.*?\/>[^\/]*/), store them in an array and replace them by something easy yet non-clashing ($1), then run your regex and reinsert the regexes.


Another way to do it:

$str = "/some/</([a-z]+)(/optional)?/>regex2/location";
$out_segments = array();
$in_regex = false;
foreach(preg_split('+/+', $str) as $segment) {
    if ($in_regex) {
        if (substr($segment, 0, 1) === '>') {
            $in_regex = false;
        }
        $out_segments[count($out_segments) - 1] .= "/$segment";
        continue;
    }
    if (!$in_regex && substr($segment, -1, 1) === '<') {
        $segment = substr($segment, 0, -1);
        if ($segment !== '') {
            $out_segments[] = $segment;
        }
        $in_regex = true;
        $segment = '<';
    }
    if ($segment !== '') {
        $out_segments[] = $segment;
    }
}
var_dump($out_segments);

Edit: The wrong pseudocode looked much easier. The idea is not that bad, though.


You could try splitting the string into its components first, and then processing it afterwards:

$url = '/some/location/</([a-z]+)(/optional)?/>regex2/here/or/there';
$reg = '#(.*?)(</.*?/>.*?(?=/|$))(.*)?#';
if( preg_match($reg, $url, $matches) ) {
    $result = array_merge(
        preg_split( '#/#', $matches[1], 0, PREG_SPLIT_NO_EMPTY),
        array( $matches[2] ),
        preg_split( '#/#', $matches[3], 0, PREG_SPLIT_NO_EMPTY)
    );
    print_r( $result );    
}

Array
(
    [0] => some
    [1] => location
    [2] => </([a-z]+)(/optional)?/>regex2
    [3] => here
    [4] => or
    [5] => there
)

The regex should always be in $matches[2], so you can find it, no matter where it occurs in the URL.

0

精彩评论

暂无评论...
验证码 换一张
取 消