开发者

PCRE Encoding Support

开发者 https://www.devze.com 2023-03-19 07:49 出处:网络
I saw in the PCRE Documentation that PCRE support UTF-8 and Unicodegeneral categoryproperties, but i dont see where it say the Native encoding support.

I saw in the PCRE Documentation that PCRE support UTF-8 and Unicode general category properties, but i dont see where it say the Native encoding support.

开发者_开发问答If you say that support ISO-8859-1: where can i found info about that?

In A Nutshell:

Ive compared & im guessing that the encoding supported by PHP is windows-1252 and not the ISO-8859-1 encoding.

if(preg_match('/€/',"\x80"))
    echo "Match";

ISO-8859-1 doesn't have the '€' in that position. Windows-1252 does. Or dependes of the system?

So wich is the native encoding PCRE Support?


Exactly this Example is used on regular-expressions.info to describe the difficulties from mixing 8bit and unicode

Mixing Unicode and 8-bit Character Codes

In short, the Euro symbol is on 80h on all windows code pages. How your regex engine treats this may vary. It works when your regex engine is a 8bit and the text file is using a windows code page.
If your regex engine is a pure unicode one, it will read \x80 as \u0080 which is a control code.

So what do you mean with native encoding PCRE Support? This is system dependend and you should not rely on some code pages.

The advantage of unicode is that you can get rid of all the different code pages and all of the problems derived from that.

So to use unicode for that try matching for \x{20AC} this is the unicode code point for the Euro symbol.

Here is an overview on regular-expressions.info about the unicode syntax

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号