开发者

php non latin to hex function

开发者 https://www.devze.com 2023-04-13 09:03 出处:网络
I have website that\'s in win-1251 encoding and it needs to stay that way. But I also need to be able to echo few links that contain non latin, non cyrillic characters like šžāņūī...

I have website that's in win-1251 encoding and it needs to stay that way. But I also need to be able to echo few links that contain non latin, non cyrillic characters like šžāņūī...

I need a function that convert this

"māja un man tā patīk"

to

"māja un man tā patīk"

and that does not touch html, so if there is <b> it nee开发者_JAVA技巧ds to stay as <b>, not &gt; or &lt;

And please no advices about the encoding and how wrong that is.


$str = "<b>Obāchan</b> おばあちゃん";

$str = preg_replace_callback('/./u', function ($matches) {
    $chr = $matches[0];
    if (strlen($chr) > 1) {
        $chr = mb_convert_encoding($chr, 'HTML-ENTITIES', 'UTF-8');
    }
    return $chr;
}, $str);

This expects the original $str to be UTF-8 encoded, i.e. your PHP file should be saved in UTF-8. It encodes all non-ASCII compatible code points to HTML entities. Since all HTML special characters are ASCII characters, they remain untouched. The resulting string is pure ASCII. Since the lower Win-1251 code points are ASCII compatible, the resulting string is also a valid Win-1251 string. The above $str converts to:

<b>Ob&#257;chan</b> &#12362;&#12400;&#12354;&#12385;&#12419;&#12435;


The main things you probably don't want to encode are <, > and &. Those are really the only special characters. So how about encoding everything first, and then just decode <, > and & I feel you should be fine.

This is untested:

$output = 
  htmlspecialchars_decode(
     htmlentities($input, ENT_NOQUOTES, 'CP-1251')
  );

let me know


What Evert suggest looks logical to me too! If you insist this is a way to do it if there are only two letters that bother you. For more letters the scrit will not be as effective and needs to change.

<?PHP 
    function myConvert($str)
    {
        $chars['&#257;']='ā';
        $chars['&#299;']='ī';
                foreach ($chars as $key => $value)  
                    $output = str_replace($key, $value, $str); 
        echo $str;
    }
    myConvert("m&#257;ja un man t&#257; pat&#299;k");
?>

==================edited==============

For many characters maybe this one can help you:

<?PHP  
    function myConvert($str)
    {  
        $final=null;  
        $parts = preg_split("/&#[0-9]*;/i", $str);//get all text parts
        preg_match_all("/&#[0-9]*;/i", $str, $delimiters );//get delimiters;   
        $delimiters[0][]='';//make arrays equal size  
        foreach($parts as $key => $value)
            $final.=$value.mb_convert_encoding
            ($delimiters[0][$key], "UTF-8", "HTML-ENTITIES");
        return $final; 
    }  
$fh = fopen("testFile.txt", 'w') ; 
fwrite($fh, myConvert("m&#257;ja un man t&#257; pat&#299;k&#299;")); 
fclose($fh); 
?>

The desired output is written in the text file. This code, exactly as it is -not merged in some project- does what it claims to do. Converts codes like &#257; to the analogous character they present.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号