开发者

Node.js buffer encoding issue

开发者 https://www.devze.com 2023-03-25 15:03 出处:网络
I\'m having trouble understanding char开发者_开发知识库acter encoding in node.js. I\'m transmitting data and for some reason the encoding causes certain characters to be replaced with other ones. What

I'm having trouble understanding char开发者_开发知识库acter encoding in node.js. I'm transmitting data and for some reason the encoding causes certain characters to be replaced with other ones. What I'm doing is base 64 encoding at the client side and decoding it in node.js.

To simplify, I narrowed it down to this piece of code which fails:

new Buffer("1w==", 'base64').toString('utf8');

The 1w== is the base 64 encoding of the × character. Now, when passing this string with the 'base64' argument to a buffer and then doing .toString('utf8') I expected to get the same character back, but I didn't. Instead I got (character code 65533).

Is the encoding utf8 wrong? If so, what should I use instead? If not, how can I decode a base 64 string in node.js?


No, your assumption is wrong. The base64-encoded string obviously has only one byte encoded. And all Unicode code points above U+007F need at least two bytes for being encoded in UTF-8.

I'm still not good at decoding base64 in mind, but try ISO-8859-1 instead.

The point is, base64 decoding transforms a character string to a byte string. You assumed that it decodes to a character string, but this is wrong. You still need to encode the byte string to a character string, and in your case the correct encoding is ISO-8859-1.


echo -n x | base64

gives

eA==

The given code would give the expected answer if the encoding were correct. The problem is likely on the encoding side. (1w== translates to the byte 0xD7 which would be the start of a multi-byte UTF-8 character)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号