开发者

Casting a pointer does not produce an lvalue. Why?

开发者 https://www.devze.com 2023-04-05 14:35 出处:网络
Afte开发者_JS百科r posting one of my most controversial answers here, I dare to ask a few questions and eventually fill some gaps in my knowledge.

Afte开发者_JS百科r posting one of my most controversial answers here, I dare to ask a few questions and eventually fill some gaps in my knowledge.

Why isn't an expression of the kind ((type_t *) x) considered a valid lvalue, assuming that x itself is a pointer and an lvalue, not just some expression?

I know many will say "the standard disallows it", but from a logical standpoint it seems reasonable. What is the reason that the standard disallows it? After all, any two pointers are of the same size and the pointer type is just a compile-time abstraction that indicates the appropriate offset that should be applied when doing pointer arithmetic.


An even better example, unary + yields an rvalue, as does x+0.

The underlying reason is that all these things, including your cast, create a new value. Casting a value to the type it already is, likewise creates a new value, never mind whether pointers to different types have the same representation or not. In some cases, the new value happens to be equal to the old value, but in principle it's a new value, it's not intended to be used as a reference to the old object, and that's why it's an rvalue.

For these to be lvalues, the standard would have to add some special cases that certain operations when used on an lvalue result in a reference to the old object, instead of a new value. AFAIK there's no great demand for those special cases.


The result of a cast is never an lvalue by itself. But *((type_t *) x) is an lvalue.


Because cast expressions in general do not yield lvalues.


Well, casting performs a type conversion. In general case type conversion is a non-trivial operation, which completely changes the representation of the value. Under these circumstances it should be painfully obvious that the result of any conversion is cannot possibly be an lvalue.

For example, if you have an int i = 0; variable, you can convert it to type double as (double) i. How can you possibly expect the result of this conversion to be an lvalue? I mean, it just doesn't make any sense. You apparently expect to be able to do (double) i = 3.0;... Or double *p = &(double) i; So, what should happen to the value of i in the former example, considering that type double might not even have the same size as type int? And even if they had the same size, what would you expect to happen?

Your assumption about all pointers having the same size is incorrect. In C language in general case (with few exceptions) different pointer types have different sizes, different internal representations and different alignment requirements. Even if they were guaranteed to have the same representation, I still don't see why pointers should be separated from all other types and given special treatment in explicit cast situations.

Finally, what you seem to be advocating here is that your original conversion should perform a raw-memory reinterpretation of one pointer type as another pointer type. Raw-memory reinterpretation in almost all cases is a hack. Why this hack should be elevated to the level of the language feature is entirely not clear to me.

Since it is a hack, performing such reinterpretations should require a conscious effort from the user. If you want to perform it in your example, you should do *(type_t **) &x, which will indeed reinterpret x as an lvalue of type type_t *. But allowing the same thing through a mere (type_t *) x would be a disaster completely disconnected with the design principles of C language.


Actually you are right and wrong at the same time.

In C there is an ability to safely typecast any lvalue to any lvalue. However, the syntax is a bit different than your straight forward approach:

lvalue pointers can be casted to lvalue pointers of a different type like this in C:

char *ptr;

ptr = malloc(20);
assert(ptr);
*(*(int **)&ptr)++ = 5;

As malloc() is required to fulfill all alignment requirements, this also is an acceptable use. However, following is not portable and may lead to an exception due to wrong alignment on certain machines:

char *ptr;

ptr = malloc(20);
assert(ptr);
*ptr++ = 0;
*(*(int **)&ptr)++ = 5;  /* can throw an exception due to misalignment */

To sum it up:

  • If you cast a pointer, this leads to an rvalue.
  • Using * on a pointer leads to an lvalue (*ptr can be assigned to).
  • ++ (like in *(arg)++) needs an lvalue to operate on (arg must be an lvalue)

Hence ((int *)ptr)++ fails, because ptr is an lvalue, but (int *)ptr is not. The ++ can be rewritten as ((int *)ptr += 1, ptr-1), and it's the (int *)ptr += 1 which fails due to the cast resulting in a pure rvalue.


Please note that it is not a language shortcoming. Casting must not produce lvalues. Look at following:

(double *)1   = 0;
(double)ptr   = 0;
(double)1     = 0;
(double *)ptr = 0;

The first 3 do not compile. Why would anybody expect the 4th line to compile? Programming languages should never expose such surprising behavior. Even more, this may lead to some unclear behavior of programs. Consider:

#ifndef DATA
#define DATA double
#endif
#define DATA_CAST(X) ((DATA)(X))

DATA_CAST(ptr) = 3;

This cannot compile, right? However if your expectation helds, this suddenly compiles with cc -DDATA='double *'! From a stability point of view it is important not to introduces such contextual lvalues for certain casts.

The right thing for C is that there are either lvalues or there are not, and this shall not depend on some arbitrary context which might be surprising.


As noted by Jens there already is one operator to create lvalues. It's the pointer dereferencing operator, the "unary *" (as in *ptr).

Note that *ptr can be written as 0[ptr] and *ptr++ can be written as 0[ptr++]. Array subscripts are lvalues, so *ptr is an lvalue, too.

Wait, what? 0[ptr] must be an error, right?

Actually, no. Try it! This is valid C. Following C program is valid on Intel 32/64 bit in all respects, so it compiles and runs successfully:

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int
main()
{
  char *ptr;

  ptr = malloc(20);
  assert(ptr);

  0[(*(int **)&ptr)++] = 5;
  assert(ptr[-1]==0 && ptr[-2]==0 && ptr[-3]==0 && ptr[-4]==5);

  return 0;
}

In C we can have it both. Casts, which never create lvalues. And the ability to use casts in a way, that we can keep the lvalue property alive.

But to get an lvalue out of casting, two more steps are needed:

  • Before the cast, get the address of the original lvalue. As it is an lvalue, you always can get this address.
  • Cast to the pointer of the desired type (usually the desired type is a pointer as well, so you have a pointer to that pointer).
  • After the cast, dereference this additional pointer, which gives you an lvalue again.

Hence instead of the wrong *((int *)ptr)++ we can write *(*(int **)&ptr)++. This also makes sure, that ptr in this expression must be an lvalue already. Or to write this with the help of the C Preprocessor:

#define LVALUE_CAST(TYPE,PTR) (*((TYPE *)&(PTR)))

So for any passed in void *ptr (which might disguises as char *ptr), we can write:

*LVALUE_CAST(int *,ptr)++ = 5;

Except the usual pointer arithmetic caveats (abnormal program termination or undefined behavior on incompatible types, which mostly stems from aligment issues), this is proper C.


The C standard was written to support exotic machine architectures that require weird hacks to implement the C pointer model for all pointed-to types. In order to allow the compiler to use the most efficient pointer representation for each pointed-to type, the C standard does not require different pointer representations to be compatible. On such an exotic architecture the void pointer type must use the most general and thus slowest of the different representations. There are some specific examples of such now obsolete architectures in the C FAQ: http://c-faq.com/null/machexamp.html


Note that if x is a pointer type, *(type_t **)&x is an lvalue. However, accessing it as such except perhaps in very restricted circumstances invokes undefined behavior due to aliasing violations. The only time it might be legal is if the pointer types are corresponding signed/unsigned or void/char pointer types, but even then I'm doubtful.

(type_t *)x is not an lvalue, because (T)x is never an lvalue, regardless of the type T or the expression x. (type_t *) is just a special case of (T).


From a top level, it would generally serve no purpose. Instead of '((type_t *) x) = ' one might as well go ahead and do 'x = ', assuming x is a pointer in your example. If one wishes to directly modify values pointed by the address 'x' but at the same time after interpreting it to be a pointer to a new data type then *((type_t **) &x) = is the way forward. Again ((type_t **) &x) = would serve no purpose, let alone the fact it is not a valid lvalue.

Also in cases of ((int *)x)++, where at least 'gcc' does not complain along the lines of 'lvalue' it could be reinterpreting it as 'x = (int *)x + 1'

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号