开发者

Splitting a paragraph

开发者 https://www.devze.com 2023-03-27 05:48 出处:网络
I want to split the paragraph using the \".\" operator. But I don\'t want to split it for some cases. Like where \".\" come with word like \"Dr.\", or \"Mrs.\", and \"Miss.\" or some few other words.

I want to split the paragraph using the "." operator. But I don't want to split it for some cases. Like where "." come with word like "Dr.", or "Mrs.", and "Miss." or some few other words.

I n开发者_开发技巧eed some logic whether it is in C# or in SQL Server.


I read the question as "How do I split the paragraph into it's component sentences?", if that's what you meant, here's how I would approach the problem:

  1. Build a "white list" of acceptable period usage inside sentences
  2. Split your paragraph on "." (call these possible sentences)
  3. Loop through your possible sentences, checking the ending characters against your white list of acceptable period usage inside sentences
  4. If it matches, combine that possible sentence with the next, and check it again

Not knowing the scope of your true problem set, I can't say whether this approach is actually feasible or not.

Here is a (possibly) related question, if you're looking into a more robust English language parser, but that question was for Java.

0

精彩评论

暂无评论...
验证码 换一张
取 消