I have the following string:
<script>m('02:29:1467301/>Sender1*>some text message?<<02:29:13625N1/>Sender2*>Recipient2: another message??<>A<<02:29:1393100=>User1*|0User2*|%></B><<','');</script>
N.B. messages are separated by <<
I need extract from message the following parts:
1. Time 2. Sender 3. Recipient 4. TextRecipient may being defined or not, this field is optional.
I do this by the following pattern:
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(.+?)))<<
But, I cannot extract recipient separately from the message text.
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<开发者_如何学Python;messageData>(?<sender>.+?)\*>(((?<recipient>.+?):){0,1}(?<messageText>.+?))))<<
N.B. In the first message no recipient
Please help correct my pattern.
The <recipient>
group pattern needs to exclude <
and :
or else it will match the text between *>
and the timestamp's first colon when the recipient is omitted (as in the first message of your example).
A simple tweak to that group pattern should fix it:
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(((?<recipient>[^<:]+):)?(?<messageText>.+?))))<<
Note I replaced {0,1}
with the optional quantifier (?
). It's just shorthand to improve readability (a little goes a long way). :-)
Speaking of readability, here it is in multi-line form:
(?<message>
(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>
(?<messageData>
(?<sender>.+?)\*>
(
((?<recipient>[^<:]+):)?
(?<messageText>.+?)
)
)
)<<
I don't know if the unnamed group containing <recipient>
and <messageText>
was intentional, but it's unnecessary. You can break it down to this:
(?<message>
(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>
(?<messageData>
(?<sender>.+?)\*>
((?<recipient>[^<:]+):)?
(?<messageText>.+?)
)
)<<
Check this out, may fit little better:
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]*).+?>(?<messageData>(?<sender>.*?)>(((?<recipient>[^<:]+):)?(?<messageText>.*?))))<<
P.S. Hi there ;)
精彩评论