开发者

Is the: "std::string can hold '\0' character" by design?

开发者 https://www.devze.com 2023-04-04 15:38 出处:网络
The fact that std::string can actually hold \'\\0\' characters comes up all the time. This is of course inconsistent with C-style strings.

The fact that std::string can actually hold '\0' characters comes up all the time. This is of course inconsistent with C-style strings.

So I'm wondering, is this by design, or is it an omission, or is it just the开发者_如何转开发 fact that standard doesn't forbid it and compilers allow this to happen?


I'm wondering what your quarrel is. '\0' is just another character. There is no efficient way to forbid it in a general purpose 'char' string. That the same character has a special meaning in C is unfortunate but has to be dealt with as every restriction that is imposed by legacy code as soon as you interoperate with it.

This shouldn't be an issue as long as you stick to code that uses std::string exclusively.

To address your comment we need to look at the constructor that takes a char* which would be basic_string(const charT* s, const Allocator& a = Allocator()) in 21.4.2 9/10 in n3242. It says that the size of the internal string is determined through traits::length(s) which in the case of std::string is strlen which requires its argument to be null terminated. So yes, if you try to construct a std::string from an const char* it needs to be null terminated.


There is a set of functions that accept 'char *' arguments and assume that the string is terminated by a zero. If you use them carefully, you can certainly have strings with 0's in them.

STL strings, in contrast, intentionally permit zero bytes, since they don't use 0 for termination. So the simple answer to your question is, 'yes, by design.'


The standard doesn't say that in case of an std::string '\0' is any special character. Therefore, any compliant implementation of std::string should not treat '\0' as any special character. Unless of course a const char* is passed to a member function of a string, which is assumed to be null-terminated.


By design.

C also can have not null terminated strings:

char sFoo[4];
strncpy(sFoo,"Test",sizeof(sFoo));

Where sFoo holds non-NULL terminated string.

And it have have Null-Terminated strings that can have 0 like

struct String {
  char *str;
  size_t length;
  size_t capacity;
};

String literals are NUL terminated but this is not always refers to strings.

So having NUL terminated string is practice but it does mean that 0 in invalid character.


strncpy vs. strncat

That said, strncpy and strncat etc. will append a null terminator if there's room.

Actually strncpy and strncat are very different:

strncpy writes a "NUL-filled n-bytes string" to a n-bytes buffer: a string whose length l is at most n, such that the last n - l bytes are filled with NUL. Note the plural: all last bytes are zeroed, note just one. Also note the fact that the maximum allowed value for l is really n, so there can be zero NUL bytes: the buffer may no hold a NUL-terminated string. (GCC has a non-portable function to measure such "NUL-filled n-bytes string": strnlen.)

On the contrary, strncat outputs a NUL-terminated string to a buffer. In both cases, the string is truncated if it is too long, but in the case of strncpy, a n letters string will fit in a n-bytes buffer, whereas in the case of strncat, a result of n letters will only fit in (n+1)-bytes buffer.

This difference causes a lot of confusion to C beginners and even non-beginners. I have even seen lesson and books that teach "safe C programming" that had confused and contradicting informations about these standard functions.

These so-called "safe" C string manipulation functions (the "strn*" family) have been very criticized in the C "secure programming" community, and better designed (but non-standard) alternatives have been invented (notably the "strl*" family: strlcpy...).

Summary:

  • strncpy will append a null terminator if there's room;
  • strncat will append a null terminator always.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号