How fast are lambda functions in GCC_问答_开发者

Having toyed around a bit with C++0x Lambda Expression in G++, I was wondering as to how well the performance will be in general/specific situations compared to alternative ways without using lambda functions.

Does anybody know a more or less comprehensive discussi开发者_Go百科on of lambda expression performance or situations in which one should avoid them despite more comfort while developing?

If you consider the old-school way of defining a struct with operator () as an alternative way then there isn't gonna be any difference because lambdas are pretty much equivalent to that. Just syntactically more convenient. Let's wait for a more complete answer...

When encountering lambda expression for the first time, many people get the vague impression that there's some runtime compilation magic happening to create these functions. Specifically, if you have a function that returns a newly made function as its result, it would seem that the returned function is "created" every time the surrounding function is called. But this is wrong -- a lambda expression (and this is true in any language) contains some code that can be compiled just like any other code, and it all happens statically, without any cost that needs to be left for the runtime.

The only issue is what happens with variables that are closed over, but that does not preclude such compilation -- to create a closure, you just pair up the closure data (these variables) with a pointer to the statically compiled code. The implication of that in terms of performance is that there should be no loss of performance at all -- closed variables or not. Even with closed over variables there is no cost -- any other way to approach whatever problem you're facing would require packaging up those values in some way, so the cost of allocation is the same regardless of how you keep it (explicitly, or implicitly in closed over variables). If an alternative solution doesn't need to package some values, then there wouldn't be any need for closing over them with closures too. This is really the same as with local allocation needed to execute the code -- which would obviously be the same regardless of whether the code comes from a closure with its local scope or from some other scope that would need the same kind of local state.

Again, this is all stuff that holds in any language with closures, and there is no reason for C++ code to suffer from some performance issues where no other language does. One oddity in the C++ lambda expressions is the need to specify which variables you close over, whereas in most other languages you just get everything closed over by default. This would seem like it gives C++ code some edge in having greater control over how much stuff needs to be packages with the closure -- but that's something that is very easy for a compiler to do automatically, without explicit annotations. It leads to one of the most common things that compilers of functional languages do -- "lambda lifting" -- where a function is effectively lifted to the toplevel, avoiding the need to create closures at runtime if they're not needed. For example, if you write (using some JS-like pseudo code):

function foo(x) {
  return function(y) { return y+3; }
}

then it's easy (for a compiler as well as for a human) to see that the returned function does not depend on x, and the compiler can now lift it, as if you wrote:

function generated_name1234(y) { return y+3; }
function foo(x) {
  return generated_name1234;
}

Similar techniques are used when only some values are closed over.

But this is diverging. The bottom line is that lambda expressions are not something that incurs some performance penalty -- closed variables or not.

(As for comparing lambda expressions with using operator(), I'm not sure what most C++ compilers will do, but lambdas should be faster since there is no runtime dispatch that is needed for any method call. Even if lambdas are implemented as anonymous classes with a () operator, the above techniques can apply in that case too, meaning that the dispatch machinery can be compiled away, which would mean that it shouldn't have additional costs too, making it similar to a special case where the anonymous class is trivial to the point of efficient compilation.)

i don't see any design reason why closures should be lesser performers than equivalent function with the same number and size of passed parameters

even closures capturing all context variables should be able to optimize away only the context variables actually being used in the lambda.

specific context variables (either captured by value or by reference) will need some storage initialised at instantiation time, which happens at the point the lambda is first found during execution. But this storage doesn't need to be heap, stack allocation is perfectly fine.

a lambda is exactly the same as a regular function, the only difference is entirely sintactical; it is defined inside other functions, and can capture some external vars, which are compiled as an additional context parameter. the context parameter might have a POD-like initialization at the point where the lambda is defined.

if a specific compiler (i.e: g++ or clang) behave in conflict with the above, its a warning sign of a bad implementation. clang has the ability to easily extend optimization passed by design, so any such shortcomings should be easier to address in the long run, compared to say, g++

the bottom line is if you don't use context variables, a lambda is (should be) totally indistinguishable from a regular free function to the compiler

We're starting to avoid using lambdas in some cases (gaming environment) because the closure created (if it has captured values) has an associated new/delete to hold any captured values. While in many environments this is not an issue, we're in the business of shaving off microseconds at a time to get the best performance. Lambdas (those with enclosed variables) featured highly on our profiling and were among the first to go.

Just as others mentioned, there is no reason why compiler generated closures resulting from lambdas should be any different in performance than the hand written ones. And in order to verify it, I just changed the lambdas used in a parser with hand written classes. They all contain only few lines of tight code and executed millions of times, so every change in performance would be immediately noticeable. The result -- exactly the same execution time. So no, there is no difference in performance.