开发者

How to escape Unicode escapes in Groovy's /pattern/ syntax

开发者 https://www.devze.com 2023-01-07 05:37 出处:网络
The following Groovy commands illustrate my problem. First of all, this works (as seen on lotrepls.appspot.com) as expected (note that \\u0061 is \'a\').

The following Groovy commands illustrate my problem.

First of all, this works (as seen on lotrepls.appspot.com) as expected (note that \u0061 is 'a').

>>> print "a".matches(/\u0061/)

true

Now let's say that we want to match \n, using the Unicode escape \u000A. The following, using "pattern" as a string, behaves as expected:

>>> print "\n".matches("\u000A");

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting anything but ''\n''; got it anyway
@ line 1, column 21. 1 error

This is expected because in Java at least, Unicode escapes are processed early (JLS 3.3), so:

print "\n".matches("\u000A")

really is the same as:

print "\n".matches("
")

The fix is to escape the Unicode escape, and let the regex engine process it, as follows:

>>> print "\n".matches("\\u000A")

true

Now here's the question part: how can we get this to work with the Groovy /pattern/ syntax instead of using string literal?

Here are some failed attempts:

>>> print "\n".matches(/\u000A/)开发者_开发问答

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting EOF, found '(' @ line 1, column 19.
1 error

>>> print "\n".matches(/\\u000A/)

false

>>> print "\\u000A".matches(/\\u000A/);

true


~"[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F-\u009F]"

Appears to be working as it should. According to the docs I've seen, the double backslashes shouldn't be required with a slashy string, so I don't know why the compiler's not happy with them.


Firstly, it seems Groovy changed in this regard in the meantime, at least on https://groovyconsole.appspot.com/ and a local Groovy shell, "\n".matches(/\u000A/) works perfectly fine, evaluating to true.

In case you have a similar situation again, just encode the backslash with a unicode escape like in "\n".matches(/\u005Cu000A/) as then the unicode escape to character conversion makes it a backslash again and then the sequence for the regex parser is kept.

Another option would be to separate the backslash from the u for example by using "\n".matches(/${'\\'}u000A/) or "\n".matches('\\' + /u000A/)

0

精彩评论

暂无评论...
验证码 换一张
取 消