开发者

preg_replace with cyrillic chars

开发者 https://www.devze.com 2023-04-12 08:30 出处:网络
I want to replace these chars [^a-zа-з0-9_] with null, but I can\'t do it when its multibyte string.

I want to replace these chars [^a-zа-з0-9_] with null, but I can't do it when its multibyte string.

I tried with mb_*, iconv, PCRE, mb_eregi_replace and u modifier (for PCRE), but none of them worked well.

The mb_eregi_replace works, but it only outputs the correct utf8 string, but it doesn't replace the characters, when preg_replace works with the same regex..

Here is my code that works with unicode, but it doesn't replace text.

function _data($data)
{
  mb_regex_encoding('UTF-8');
  return mb_eregi_replace('/[^a-zа-з0-9_]+/', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));

and the result is with the special chars (#_$..) when it should replace them, if I change the func开发者_StackOverflowtion to preg_replace (and no unicode) it should replace them.


As long as your input string is UTF-8 encoded (test if not or re-encode it to UTF-8), you can safely use preg_replace if you use the correct regular expression with the u (PCRE_UTF8) modifier (the is the lower-case U at the end):

function _data($data)
{ 
  return preg_replace('/[^\w_]+/u', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));

Demo

  • \w = any word character
  • u (at then end) = enable UTF-8 for the regex.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号