开发者

Multiline Regular Expression replace

开发者 https://www.devze.com 2023-04-08 05:18 出处:网络
Ok, there\'s lots of regular expressions, but as always, none of them seem to match what I\'m trying to do.

Ok, there's lots of regular expressions, but as always, none of them seem to match what I'm trying to do.

I have a text file:

F00220034277909272011                                  
H001500020003000009272011                              
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011                              
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840开发者_StackOverflow中文版000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

and, with a multiline regex (.NET flavored), I want to do a replace so that I get:

H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

so that, basically, I grab everything that starts with [HD]0501 and nothing else.

I know this seems more suited to a match that a replace, but I'm going through a pre-built engine that accepts a Regex pattern string and a regex replace string only.

What can I supply for a pattern and a replace string to get my desired result? Multiline Regex is a hardcoded configuration?

I originally thought something like this would work:

search: (?<Match>^[HD]0501\d+$), but this matched nothing.

search: (?!^[HD]0501\d+$), but this matched a bunch of empty strings, and I couldn't figure out what to put for the replace string.

search: (?!(?<Omit>^[HD]0501\d+$)), "Group 'Omit' not found."

It seems this should be simple, but as always, Regex manages to make me feel dumb. Help would be greatly appreciated.


Try matching the following pattern:

(?m)^(?![HD]0501).+(\r?\n)?

and replace it with an empty string.

The following demo:

using System;
using System.Text.RegularExpressions;

namespace Test
{
  class MainClass
  {  
    public static void Main (string[] args)
    {
      string input = @"F00220034277909272011                                  
H001500020003000009272011                              
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011                              
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000";

      string regex = @"(?m)^(?![HD]0501).+(\r?\n)?";

      Console.WriteLine(Regex.Replace(input, regex, ""));
    }
  }
}

prints:

H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

A quick explanation:

  • (?m)
    • enable multi-line mode so that ^ matches the start of a new line;
  • ^
    • match the start of a new line;
  • (?![HD]0501)
    • look ahead to see if there's no "H0501" or "D0501";
  • .+
    • match one or more chars other than line break-chars;
  • (\r?\n)?
    • match an optional line break.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号