I am trying to find a way in Ruby to take a UTF-8 byte array and transform it back to a string.
In irb (Ruby 1.9.2 preview 3) I can create the correct byte array from UTF-8 string:
ruby-1.9.2-preview3 > 'Café'.bytes.to_a
 => [67, 97, 102, 195, 169]
However, I can't find a way to roundtrip from bytes back to an array. I tried to use Array.pack with the U* option, but that doesn't work for multibyte characters.
ruby-1.9.2-preview3 > [67, 97, 102, 195, 开发者_如何学JAVA169].pack('U*')
 => "Café"
Does anybody know a way to take a UTF-8 byte array with multibyte characters and convert it back to a string?
Thanks.
This has to do with how pack interprets its input data. The U* in your example causes it to convert the input data (assumed to be in a default character set, I assume; I really couldn't find any documentation of this) to UTF-8, thus the double encoding. Instead, just pack the bytes and interpret as UTF-8:
irb(main):010:0> [67, 97, 102, 195, 169].pack('C*').force_encoding('utf-8')
=> "Café"
You specifically ask about a byte array, but maybe codepoints are more suitable:
ar = 'Café'.codepoints.to_a
# => [67, 97, 102, 233]
ar.pack('U*')
# => Café
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论