开发者

Please help clarify my regex pattern

开发者 https://www.devze.com 2023-02-04 20:17 出处:网络
I have the following string: <script>m(\'02:29:1467301/>Sender1*>some text message?<<02:29:13625N1/>Sender2*>Recipient2: another message??<>A<<02:29:1393100=>Us

I have the following string:

<script>m('02:29:1467301/>Sender1*>some text message?<<02:29:13625N1/>Sender2*>Recipient2: another message??<>A<<02:29:1393100=>User1*|0User2*|%></B><<','');</script>

N.B. messages are separated by <<

I need extract from message the following parts:

1. Time

2. Sender

3. Recipient

4. Text

Recipient may being defined or not, this field is optional.

I do this by the following pattern:

(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(.+?)))<<

But, I cannot extract recipient separately from the message text.

(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<开发者_如何学Python;messageData>(?<sender>.+?)\*>(((?<recipient>.+?):){0,1}(?<messageText>.+?))))<<

N.B. In the first message no recipient

Please help correct my pattern.


The <recipient> group pattern needs to exclude < and : or else it will match the text between *> and the timestamp's first colon when the recipient is omitted (as in the first message of your example).

A simple tweak to that group pattern should fix it:

(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(((?<recipient>[^<:]+):)?(?<messageText>.+?))))<<

Note I replaced {0,1} with the optional quantifier (?). It's just shorthand to improve readability (a little goes a long way). :-)

Speaking of readability, here it is in multi-line form:

(?<message>
    (?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>
    (?<messageData>
        (?<sender>.+?)\*>
        (
          ((?<recipient>[^<:]+):)?
          (?<messageText>.+?)
        )
    )
)<<

I don't know if the unnamed group containing <recipient> and <messageText> was intentional, but it's unnecessary. You can break it down to this:

(?<message>
    (?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>
    (?<messageData>
        (?<sender>.+?)\*>
        ((?<recipient>[^<:]+):)?
        (?<messageText>.+?)
    )
)<<


Check this out, may fit little better:

(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]*).+?>(?<messageData>(?<sender>.*?)>(((?<recipient>[^<:]+):)?(?<messageText>.*?))))<<

P.S. Hi there ;)

0

精彩评论

暂无评论...
验证码 换一张
取 消