As a follow up to this question, is it a good thing that my the top 2 things are still exception handlers? on one hand, its doing a lot of exceptions. on the other, this is in sdl, meaning that its probably as optimized as possible, which means that my other functions are really fast. so...
here's the top of the profile of the program after running for around 64 seconds, after i did some optimization
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative se开发者_如何学Golf self total
time seconds seconds calls s/call s/call name
8.32 3.39 3.39 _Unwind_SjLj_Register
6.77 6.15 2.76 _Unwind_SjLj_Unregister
6.28 8.71 2.56 4000006 0.00 0.00 CAST128::setkey(std::string)
3.73 10.23 1.52 std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()
3.61 11.70 1.47 __dynamic_cast
3.56 13.15 1.45 64000080 0.00 0.00 CAST128::F(int&, unsigned int&, unsigned int&, unsigned char&)
3.26 14.48 1.33 std::string::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&)
3.09 15.74 1.26 std::istreambuf_iterator<char, std::char_traits<char> > std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::_M_extract_int<unsigned long long>(std::istreambuf_iterator<char, std::char_traits<char> >, std::istreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, std::_Ios_Iostate&, unsigned long long&) const
2.94 16.94 1.20 std::string::compare(char const*) const
2.32 17.89 0.94 4002455 0.00 0.00 unhexlify(std::string)
2.06 18.73 0.84 std::string::operator[](unsigned int)
2.01 19.55 0.82 32037245 0.00 0.00 std::string makehex<int>(int, unsigned int)
1.94 20.34 0.79 std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned int)
1.91 21.11 0.78 operator new(unsigned int)
1.87 21.88 0.76 std::string::append(std::string const&)
cast128 is run 100000 times, which explains explains why it is on top
These are not exception handlers, but rather functions to notify the exception handler of destructors that need to be called if an exception is thrown. For example, consider
for (int i = 0; i < 1000000; ++i) {
std::string stuff; // or any type with a non-trivial destructor
function(); // maybe with arguments that depend on "stuff"
// inline code using "stuff" here
}
Every iteration, the string is created and destroyed, and must be registered and unregistered with the exception handler each time in case function()
throws. You may be able to avoid this overhead in various ways, depending on exactly how the object is used:
- If the object (or anything created from it) is not passed to the function, move its definition after the function call
- Declare
function
(and any functions it calls)inline
, so that any exception is thrown from the same stack frame as the object; I believe it won't be necessary to register the object then, as long as the function actually is inlined - Make sure
function
won't throw an exception, and add athrow()
specification to it - Move the object outside the loop, reinitialising (or calling
clear()
in my example) if necessary at the start of each iteration (this will also eliminate the overhead of creating and destroying it)
For the purposes of this discussion I'm conflating try/catch and ctor/dtor.
In C++ one can declare a variable at the first point of use. One can get the same effect in C by sticking an open curly brace in front of the declaration, at the cost of a slew of close braces at the end of the function.
In C++ the declaration of a variable requires that its constructor be called, and its destructor called when it goes out of scope, or an exception is thrown prior to the physical end of its scope. (For POD types the ctor and dtor are elided.)
Early C++ implementations registered each scope with global destructor stack. Later implementations made that a thread-local destructor stack. Subsequent implementations emitted descriptive information near each function so that the combination of a program counter and stack pointer could imply what destructors need be executed when an exception is thrown.
On some systems the PC/SP combo is not sufficient, or it's difficult to interpret. iOS devices have ARM processors, and ARM has two forms of instruction -- arm and thumb. I suspect that one might be able to walk back through the stack to determine some aspects of the call chain, but not be able to efficiently and reliably determine the instruction set in use in each, so the PC/SP + data approach is just too tricky.
So, iOS uses SjLj-based exceptions. They're implemented by libunwind. Prior to iOS 5.0 the implementation had some performance problems (see http://www.em.net/portfolio/2012/06/unwind_sjlj_top_items_in_profi.html).
iOS 5.0 provides about as optimal an implementation as is viable.
精彩评论