putwchar / getwchar encoding?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-10 02:34 出处：网络

I\'m writing code which runs on both Windows and Linux.The application works with unicode strings, and I\'m looking to output them to the console using common code.

I'm writing code which runs on both Windows and Linux. The application works with unicode strings, and I'm looking to output them to the console using common code.

Will putwchar and getwchar do the trick? 开发者_如何转开发 For example, can I provide unicode character values to these functions, and they will both display the same character on Linux and Windows?

You are about to enter a world of pain. Invariably *nix consoles prefer you to send them UTF-8 encoded char* data.

Windows on the other hand uses UTF-16 for its Unicode APIs and for console APIs I believe it is limited to UCS2.

You need probably need to find some library code that abstracts away the differences for you. I don't have a good recommendation for you but I am sure that putwchar and getwchar are not the solution.

One of the many ways to reconcile them is to use explicit conversion modes in Windows:

#ifdef _WIN32
#include <fcntl.h>
#include <io.h>
#endif
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main()
{
#ifdef _WIN32
   _setmode(_fileno(stdout), _O_WTEXT);
#else
    setlocale(LC_ALL, "en_US.UTF-8");
#endif
   fputws(L"Кошка\n", stdout);
}

tested with gcc 4.6.1 on Linux and Visual Studio 2010 on windows

There's also a _O_U8TEXT and _O_U16TEXT in Windows. Your mileage may vary.

See the putwchar man page on Linux. It says that the behavior depends on LC_CTYPE and says "It is reasonable to expect that putwchar() will actually write the multibyte sequence corresponding to the wide character wc." Similarly, getwchar() should read a multibyte sequence from standard input, and return it as a wide character.

Don't assume that they will read/write a constant number of bytes like they would in UCS2.

All that said, character-by-character I/O isn't usually the fastest solution, and when you start optimizing, do keep in mind that on Linux and Unix you'll be working in UTF-8.