php file_put_contents asian character filename encoding_问答_开发者

php file_put_contents asian character filename encoding

开发者 https://www.devze.com 2023-03-26 15:48 出处：网络

I\'m trying to get this scrape images off of wikipedia. What good is free licensed media if you can\'t get it? Original script is here.

I'm trying to get this scrape images off of wikipedia. What good is free licensed media if you can't get it? Original script is here.

If you put this

http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png

in firefox, it will immediately be transformed into

http://upload.wikimedia.org/wikipedia/commons/2/26/的-bw.png

so that开发者_高级运维 when you save the image, it's saved as 的-bw.png

Simple enough eh? Now how to get php to do that? Just guessing, I tried utf8_decode($fileName) .. but getting the wrong Chinese characters.

$src= "http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png";  
$pngData = file_get_contents($src);  
$fileName = basename($src);  
file_put_contents($fileName, $pngData);

Any help appreciated, as I really have no idea where to go from here.

Have you tried url_decode(); ?

<?php
$url = 'http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png';
$parts = explode('/', $url);
$title = $parts[count($parts)-1]; //get last section

$title = urldecode($title);
?>

Squirrelmail contains a nice function in the sources to convert unicode to entities:

<?php 
function charset_decode_utf_8 ($string) { 
       /* Only do the slow convert if there are 8-bit characters */ 
     /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */ 
     if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string)) 
         return $string; 

     // decode three byte unicode characters 
     $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",        
     "'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",    
     $string); 

     // decode two byte unicode characters 
     $string = preg_replace("/([\300-\337])([\200-\277])/e", 
     "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", 
     $string); 

     return $string; 
 } 
?>