开发者

how to replace all non alphanumeric characters with space in php?

开发者 https://www.devze.com 2023-02-21 23:06 出处:网络
$html=strip_tags($html); $html=ereg_replace(\"[^A-Za-zäÄÜüÖö]\",\" \",$html); $words = preg_split(\"/[\\s,]+/\", $html);
$html=strip_tags($html);
$html=ereg_replace("[^A-Za-zäÄÜüÖö]"," ",$html);
$words = preg_split("/[\s,]+/", $html);

doesnt this replace all non (A-Z, a-z, a o u with umlauts) ch开发者_运维问答aracters with space? I am losing words like zugänglich etc with umlauts

is there any thing wrong with the regex?

edit:

I replaced ereg_replace with preg_replace but somehow the special characters like :, ® are not getting replace by space...


If you succeed with your approach foremost depends on the encoding. When all umlauts got stripped, it's likely that your source text (or php script) was encoded as UTF-8.

In this case rather use:

$text = preg_replace('/[^\p{L}]/u', " ", $text);

This will match all letter characters, not just umlauts. And /u solves your likely charset problem.


Maybe, your umlauts are still html-entities (ä etc.) which contain non alphanumeric characters, that would be deleted...

BTW: Alphanumeric isn't just a-Z but numbers as well...


the regex should be /[^A-Za-zäÄÜüÖö]+/

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号