开发者

How can I get the Regex Groups for a given Capture?

开发者 https://www.devze.com 2023-03-04 17:13 出处:网络
I\'m parsing CSS3 selectors using a regex. For example, the selector a>b,c+d is broken down into: Selector:

I'm parsing CSS3 selectors using a regex. For example, the selector a>b,c+d is broken down into:

  Selector:
    a>b
    c+d
  SOSS:
    a
    b
    c
    d
  TypeSelector:
    a
    b
    c
    d
  Identifier:
    a
    b
    c
    d
  Combinator:
开发者_如何学编程    >
    +

The problem is, for example, I don't know which selector the > combinator belongs to. The Selector Group has 2 captures (as shown above), each containing 1 combinator. I want to know what that combinator is for that capture.

Groups have lists of Captures, but Captures don't have lists of Groups found in that Capture. Is there a way around this, or should I just re-parse each selector?


Edit: Each capture does give you the index of where the match occurred though... maybe I could use that information to determine what belongs to what?


So you don't think I'm insane, the syntax is actually quite simple, using my special dict class:

var flex = new FlexDict
    {
        {"GOS"/*Group of Selectors*/, @"^\s*{Selector}(\s*,\s*{Selector})*\s*$"},
        {"Selector", @"{SOSS}(\s*{Combinator}\s*{SOSS})*{PseudoElement}?"},
        {"SOSS"/*Sequence of Simple Selectors*/, @"({TypeSelector}|{UniversalSelector}){SimpleSelector}*|{SimpleSelector}+"},
        {"SimpleSelector", @"{AttributeSelector}|{ClassSelector}|{IDSelector}|{PseudoSelector}"},

        {"TypeSelector", @"{Identifier}"},
        {"UniversalSelector", @"\*"},
        {"AttributeSelector", @"\[\s*{Identifier}(\s*{ComparisonOperator}\s*{AttributeValue})?\s*\]"},
        {"ClassSelector", @"\.{Identifier}"},
        {"IDSelector", @"#{Identifier}"},
        {"PseudoSelector", @":{Identifier}{PseudoArgs}?"},
        {"PseudoElement", @"::{Identifier}"},

        {"PseudoArgs", @"\([^)]*\)"},

        {"ComparisonOperator", @"[~^$*|]?="},
        {"Combinator", @"[ >+~]"},

        {"Identifier", @"-?[a-zA-Z\u00A0-\uFFFF_][a-zA-Z\u00A0-\uFFFF_0-9-]*"},

        {"AttributeValue", @"{Identifier}|{String}"},
        {"String", @""".*?(?<!\\)""|'.*?(?<!\\)'"},
    };


You shouldn't write one regex to parse the whole thing. But first get the selectors and then get the combinator for each of them. (At least that's how you would parse your example, real CSS is going to be more complicated.)


Each capture does give you the index of where the match occurred though... maybe I could use that information to determine what belongs to what?

Just thinking aloud here; you could pick out each match in the Selector group, get its starting and ending indices relative to the entire match and see if the index of each combinator falls within the start and end index range. If the combinator's index falls within the range, it occurs in that selector.

I'm not sure how this would fare in terms of performance though. But I think you could make it work.


I wouldn't recommend using regex for parsing anything. Except for very simple cases parsers are almost always a better choice. Take a look at this question.

Is there a CSS parser for C#?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号