开发者

Need help with a Regex for parsing human typed times

开发者 https://www.devze.com 2023-01-08 08:28 出处:网络
I\'m really new to Regex and working hard, but this has gone beyond simple in my opinion.I understand how to create the Regex object in .Net but I\'m not sure how to use it for my specific purpose onc

I'm really new to Regex and working hard, but this has gone beyond simple in my opinion. I understand how to create the Regex object in .Net but I'm not sure how to use it for my specific purpose once I have a pattern.

Regex regex = new Regex("(at ){0,1}[0-9]{1,2}(:[0-9]{2}){0,1}(?:[ap]m?){0,1}");

I need to be able to take a sentence like "Dinner will be at 9pm at your favorite restaurant" and get the values { "Dinner will be at your favorite restaurant", "9pm " } (removing "at " if it exists).

Complete(?) test cases:

"Dinner at 9pm"            { "Dinner", "9pm" }
"Dinner at9pm"             { "Dinner", "9pm" }
"Dinner 9pm"               { "Dinner", "9pm" }
"Dinner 9p"                { "Dinner", "9pm" }
"Dinner 9a"                { "Dinner", "9am" }
"Dinner 9pZ"               { "Dinner 9pZ", "" }
"Dinner 9aZ"               { "Dinner 9aZ", "" }
"Dinner at 9"              { "Dinner", "9" }
"Dinner at 9:15pm"         { "Dinner", "9:15pm" }
"Dinner at 9:15"           { "Dinner", "9:15" }
"Dinner at9:15"            { "Dinner", "9:15" }
"Dinner at 9pm in Seattle" { "Dinner in Seattle", "9pm" }
"Dinner at9pmin Seattle"   { "Dinner in Seattle", "9pm" }
"Dinner at9in Seattle"     { "Dinner in Seattle", "9" }
"Dinner 9in Seattle"       { "Dinner 9in Seattle", "" }
"9pm Dinner"               { "Dinner", "9pm" }
"The 9pm Dinner was good"  { "The Dinner as good", "9pm" }
"Dinner at 9pmpm"          { "Dinner pm" "9pm" }
"Dinner at 9:15pmpm"       { "Dinner pm" "9:15pm" }

(just for further clarification, a number without a ":" or "am/pm" must be preceded by "at" unless it is the first number listed. "am" and "pm" require either an ending in "M" or " ".)

Beyond the test cases, I don't understand the syntax needed to get开发者_如何学Go back the values I need using the regex object (list in the brackets above).


A regex for doing this would be complicated and it also wouldn't return the results in the expected order in cases such as "9pm Dinner". If you're willing to spend a little time, it might be simpler to write a basic recursive-descent parser. Each word in the input would form a token, and you can easily come up with rules based on your requirements. For example:

event: "Dinner" time |
       "Dinner" location |
       "Dinner" time location |
       "Dinner" location time

time:  "at" number ":" number "am"/"pm"
       /* etc. */

You then write a small function for each non-terminal (event, time, location etc.) that will do its part and return the result.

As you see, your requirements already bring up so many possibilities that a regex would only make it extremely confusing, if at all possible.

0

精彩评论

暂无评论...
验证码 换一张
取 消