开发者

Structure of C language

开发者 https://www.devze.com 2023-04-09 22:53 出处:网络
Why does this work pr开发者_Python百科intf(\"Hello\" \"World\"); Whereas printf(\"Hello \"\"World\");

Why does this work

pr开发者_Python百科intf("Hello"
"World");

Whereas

printf("Hello
""World");

does not? ANSI C concatenates adjacent Strings, that's ok... but it's a different thing. Does this have something to do with the C language parser or something? Thanks


The string must be terminated before the end of the line. This is a good thing. Otherwise, a forgotten close-quote could prevent subsequent lines of code from executing.

This could cost significant time to debug. These days syntax coloring would provide a clue, but in the early years there were monochrome displays.


You can't make a new line in a string literal. This was a choice made my the designers of C. IMO it's a good feature though. You can however do this:

printf("Hello\
""World");

Which gives the same results.


The C language is defined in terms of tokens and one of the tokens is a string literal (in standardese: an s-char-sequence). s-char-sequences start and end with unescaped double quotes and must not contain an unescaped newline.

Relevant standard (C99) quote:

> Syntax
>   string-literal:
>     " s-char-sequence(opt) "
>     L" s-char-sequence(opt) "
>   s-char-sequence:
>     s-char
>     s-char-sequence s-char
>   s-char:
>     any member of the source character set
>           except the double-quote ", backslash \,
>           or new-line character
>     escape-sequence

Escaped newlines, however, are removed in an early translation phase called line splicing, so the compiler never gets to interpret them. Here's the relevant standard (C99) quote:

The precedence among the syntax rules of translation is specified by the following phases.

  1. Physical source file multibyte characters are mapped, in an implementationdefined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.
  2. Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.
  3. The source file is decomposed into preprocessing tokens6) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
  4. Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
  5. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementationdefined member other than the null (wide) character.7)
  6. Adjacent string literal tokens are concatenated.
  7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
  8. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号