Tuesday, July 28, 2009

String literals in C++

Here's a short one on string literals in C++. Ask yourself: what is their type? It is 'array of n const char', correct! So, we might think:

char* literal = "Hello World!";

would be "invalid"/"illegal" in C++. But you'd be surprised it is not and even Comeau online compiles it successfully without even a warning.

The C++ standard, however, tries to protect you hinting that the above is wrong by stating that it is a deprecated feature in C++ that probably was allowed to keep backward compatibility with C.

Here is what the standard says as part of section [2.13.4/2]:

[quote]
A string literal that does not begin with u, U, or L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char”, where n is the size of the string as defined below; it has static storage duration (3.7) and is initialized with the given characters.
[/quote]

So, the following would have definitely been invalid in C++:

char* literal = "Hello World!";

"Hello World!" is an array of 13 [spooky :-)] constant characters. 'literal' is the pointer to the first element of the array and since that element is const, the pointer cannot be declared as a non-const char*. The pointer has to be of the type 'pointer to a const char'.

But as mentioned above, to have backward compatibility with C where the above works an implicit conversion is defined for array to pointer conversion where a string literal would be converted to an r-value of type "pointer to a char". The standard mentions this in section [4.2/2]:

[quote]
A string literal (2.13.4) with no prefix, with a u prefix, with a U prefix, or with an L prefix can be converted to an rvalue of type “pointer to char”, “pointer to char16_t”, “pointer to char32_t”, or “pointer to wchar_t”, respectively. In any case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [ Note: this conversion is deprecated. See Annex D. —end note ]
[/quote]

But, the thing to be happy about is the note above, that is re-iterated in Annexure D section [D.4/1] as:

[quote]
The implicit conversion from const to non-const qualification for string literals (4.2) is deprecated.
[/quote]

So, best is to keep the good habit of declaring the pointer to the literal as a pointer to a const. :-)

[The C++ standard's draft version used for quotes above, has document number : N2315=07-0175 dated 2007-06-25]

1 comment:

japh said...

like your blog...