dynamic_cast overhead in C++_问答_开发者_运维开发者技术经验分享

I know that dynamic_cast have runtime check and therefor consider safer (can return null pointer on failure) but slower then static_cast. but how bad is the overhead between the two?

should I realy consider use static_cast in loops for performance issues in regular large projects? or the difference is minor and 开发者_如何转开发only relevant for special real-time programs.

Did you profile it?

The rule is:

Use static_cast when you know that the target type is valid.
Use dynamic_cast when you're not sure, and you need the program to look up the object's runtime type for you.

It's as simple as that. All other considerations are secondary.

Depends on how the dynamic cast does its class safety/correctness check. In systems I've profiled, it can turn into a very large amount of string compares very quickly. It's a big enough deal that we pretty much use an assert_cast style system where static cast is done for performance and dynamic is used for debug.

Extremely large C++ codebases (e.g. Mozilla, OpenOffice) have a habit of disabling RTTI (and therefore being unable to use dynamic_cast and exceptions) because the overhead of merely including RTTI data in the executable is seen to be unacceptable. Particularly, it is reported to cause a large (I remember numbers on the order of 10%) increase in startup time due to additional dynamic relocations.

Whether or not the additional code required to avoid dynamic_cast and exceptions is actually even slower is never discussed.

Tomalak Geret'kal is right, use static_cast when you know, dynamic_cast when you don't. If you want to avoid the cost, you have to structure your design in such a way that you DO know. Storing separate types in separate containers will make your loop logic more complex but you can fix that with template algorithms.

For simple inheritence trees it's pretty fast. If you are casting sideways in a complex hierarchy, with virtual inheritence, then it has to do a nontrivial search.

Examples:

struct Base {virtual ~Base () {}};
struct Foo : Base {};

struct Bar1 : virtual Base {};
struct Bar2 : virtual Base {};

struct Baz : Bar1, Bar2 {};

Base * a = new Foo ();
Bar1 * b = new Baz ();

dynamic_cast <Foo *> (a); // fast
dynamic_cast <Bar2 *> (b); // slow

The performance will depend a lot on the compiler. Measure, measure, measure! Bear in mind that run time type information is typically factored out and will be in non-local memory -- you should consider what the cache is going to do in loops.

I just tried out a small benchmark of casts (on my ~3 year old netbook, so the numbers are quite high, but well). This is the test setup:

class A {
  public:
    virtual ~A() {}
};

class B : public A {
};

#define IT(DO) \
    for (unsigned i(1<<30); i; i--) { \
      B* volatile b(DO); \
      (void)b; \
    }

#define CastTest(CAST) IT(CAST<B*>(a))
#define NullTest() IT(NULL)

int main(int argc, char** argv) {
  if (argc < 2) {
    return 1;
  }
  A* a(new B());
  switch (argv[1][0]) {
    case 'd':
      CastTest(dynamic_cast)
      break;
    case 's':
      CastTest(static_cast)
      break;
    default:
      NullTest()
      break;
  }
  return 0;
}

I found that it is highly dependent on the compiler optimisation, so here are my results:

(see Evaluation below)

O0:

g++ -O0 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m7.139s
user        0m6.112s
sys         0m0.044s

real        0m8.177s
user        0m6.980s
sys         0m0.024s

real        1m38.107s
user        1m23.929s
sys         0m0.188s

O1:

g++ -O1 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m4.412s
user        0m3.868s
sys         0m0.032s

real        0m4.653s
user        0m4.048s
sys         0m0.000s

real        1m33.508s
user        1m21.209s
sys         0m0.236s

O2:

g++ -O2 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m4.526s
user        0m3.960s
sys         0m0.044s

real        0m4.862s
user        0m4.120s
sys         0m0.004s

real        0m2.835s
user        0m2.548s
sys         0m0.008s

O3:

g++ -O3 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m4.896s
user        0m4.308s
sys         0m0.004s

real        0m5.032s
user        0m4.284s
sys         0m0.008s

real        0m4.828s
user        0m4.160s
sys         0m0.008s

Edit: Evaluation

For one cast (in the above test we had a total of 2**30 casts) we get the following times in the minimal example above:

-O0    71.66 ns
-O1    71.86 ns
-O2    -1.46 ns
-O3    -0.11 ns

The negative values are probably due to different loads at the moment where the program was executed and are small enough to be discarded as unsignificant (i.e. ==0). Since here there is no overhead, we have to assume that the compiler was smart enough to optimise the cast away, even although we said that b was volatile. Hence, the only reliable values are the 70 ns results.