开发者

\w depends of the encoding?

开发者 https://www.devze.com 2023-03-22 06:26 出处:网络
I\'ve been searching the word boundaries from encodings with: <?php header(\'Content-Type: text/plain; charset=\"ISO-8859-7\"\');//Changing the charset attribute

I've been searching the word boundaries from encodings with:

<?php
header('Content-Type: text/plain; charset="ISO-8859-7"');//Changing the charset attribute
$i=0;
for($i=0;$i<=255;$i++){
    $char=chr($i);  
    if(preg_match('/^\w$/',$char,$m)){
    echo "[".ord($m[0])."]";}
    }
?>

I dont know if its wrong. But always is giving me certain positions, no matter what charset is speciefied. It seems that always, no matter what encoding, the '\w' match the bytes that开发者_JS百科 are words from the ISO-8859-1.


Yes! \w, \b is affected by character set! In my code, I use:

setlocale(LC_CTYPE, "cs_CZ");

to handle it. This affects the behaviour of \w, \b in regexps but also strtoupper(). If you also need sorting and comparing of strings to work well, you would use (depending on your country/locale) something like:

setlocale(LC_COLLATE, "cs_CZ");

I also found this hard way - that it didn't work... :)

So, to answer your original question - you cannot affect this with header() function, because this just tells the encoding to the browser. What you need is to change the behaviour of PHP at the server, which is accomplished by the above commands.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号