开发者

utf8 strings and malloc in c

开发者 https://www.devze.com 2023-04-12 08:09 出处:网络
With \"opendir\" and \"readdir\" i do read a directories content. During that process i do some strings manipulation / allocation:

With "opendir" and "readdir" i do read a directories content. During that process i do some strings manipulation / allocation: something like that:

int stringlength = strlen(cur_dir)+s开发者_如何转开发trlen(ep->d_name)+2;
char *file_with_path = xmalloc(stringlength); //xmalloc is a malloc wrapper with some tests (like no more memory)
snprintf (file_with_path, (size_t)stringlength, "%s/%s", cur_dir, ep->d_name);

But what if a string contains a two-byte utf8 char? How do you handle that issue?

stringlength*2?

Thanks


strlen() counts the bytes in the string, it doesn't care if the contained bytes represent UTF-8 encoded Unicode characters. So, for example, strlen() of a string containing an UTF-8 encoding of "aöü" would return 5, since the string is encoded as "a\xc3\xb6\xc3\xbc".


strlen counts the number of bytes in a string (up to the terminating NUL), not the number of UTF-8 characters, so stringlength should already be as large as you need it.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号