开发者

Why doesn't Python "grouping" work for regular expressions in C?

开发者 https://www.devze.com 2023-01-24 09:46 出处:网络
Here is my Python program: import re print re.findall( \"([se]{2,30})ting\", \"testingtested\" ) Its output is:

Here is my Python program:

import re

print re.findall( "([se]{2,30})ting", "testingtested" )

Its output is:

['es']

Which is what I expect. I expect to get back "es" because I searched for 2-30 characters of "e" or "s" which are followed by "ting".

Here is my C program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>

int main(void) {

    regex_t preg;
    regmatch_t pmatch;

    char strin开发者_高级运维g[] = "testingtested";

    //Compile the regular expression
    if ( regcomp( &preg, "([se]{2,30})ting", REG_EXTENDED ) ) {
        printf( "ERROR!\n" );
        return -1;
    } else {
        printf( "Compiled\n" );
    }

    //Do the search
    if ( regexec( &preg, string, 1, &pmatch, REG_NOTEOL ) ) {
        printf( "No Match\n" );
    } else {

        //Allocate memory on the stack for this
        char substring[pmatch.rm_eo - pmatch.rm_so + 1];

        //Copy the substring over
        printf( "%d %d\n", pmatch.rm_so, pmatch.rm_eo );
        strncpy( substring, &string[pmatch.rm_so], pmatch.rm_eo - pmatch.rm_so );

        //Make sure there's a null byte
        substring[pmatch.rm_eo - pmatch.rm_so] = 0;

        //Print it out
        printf( "Match\n" );
        printf( "\"%s\"\n", substring );
    }

    //Release the regular expression
    regfree( &preg );

    return EXIT_SUCCESS;
}

It's output is:

Compiled
1 7
Match
"esting"

Why is the C program including the "ting" in the result? And is there a way for me to exclude the "ting" portion?


pmatch is the whole match, not the first parenthesized subexpression.

Try changing pmatch to an array of 2 elements, then passing 2 in place of 1 to regexec and using the [1] element to get the subexpression match.

To others who have cited differences between C and Python and different types of regular expressions, that's all unrelated. This expression is very simple and that's not coming into play.


While regular expressions are "more or less the same everywhere", the exact supported features differ from implementation to implementation.

Unfortunately, you need to consult each regex library's documentation separately when designing your regular expressions.

0

精彩评论

暂无评论...
验证码 换一张
取 消