Why Java is running faster than C here?_问答_开发者

Inspired by this question,

_{Now visible only for users with > 10k rep}

I came up with the following code:

$cat loop.c 
int main( int argc, char ** argv ) 
{
    int i = 0;
    while( i++ < 2147483647 );
}

$cc -o loop loop.c  

$ time ./loop
real 0m11.161s
user 0m10.393s
sys 0m0.012s


$cat Loop.开发者_开发问答java 
class Loop {
    public static void main( String [] args ) { 
        int i = 0;
        while( i++ < 2147483647 );
    }
}

$javac Loop.java 

$time java  Loop  
real 0m4.578s
user 0m3.980s
sys 0m0.048s

Why does the Java version runs almost 3x faster than the C version? What I'm missing here?

This is run on Ubuntu 9.04 with:

Intel(R) Pentium(R) M @ 1.73GHz

32 bits

EDIT

This is amazing. Using the -O3 option in C optimize the loop and using -server in Java does the same. This are the "optimized times".

Why Java is running faster than C here?

I expect javac is defaulting to some higher level of optimization than your C compiler. When I compile with -O3 here, the C is way faster:

C with -O3:

real    0m0.003s
user    0m0.000s
sys     0m0.002s

Your java program:

real    0m0.294s
user    0m0.269s
sys     0m0.051s

Some more details; without optimization, the C compiles to:

0000000100000f18 pushq %rbp
0000000100000f19 movq %rsp,%rbp
0000000100000f1c movl %edi,0xec(%rbp)
0000000100000f1f movq %rsi,0xe0(%rbp)
0000000100000f23 movl $0x00000000,0xfc(%rbp)
0000000100000f2a incl 0xfc(%rbp)
0000000100000f2d movl $0x80000000,%eax
0000000100000f32 cmpl %eax,0xfc(%rbp)
0000000100000f35 jne  0x00000f2a
0000000100000f37 movl $0x00000000,%eax
0000000100000f3c leave
0000000100000f3d ret

With optimization (-O3), it looks like this:

0000000100000f30 pushq %rbp
0000000100000f31 movq %rsp,%rbp
0000000100000f34 xorl %eax,%eax
0000000100000f36 leave
0000000100000f37 ret

As you can see, the entire loop has been removed. javap -c Loop gave me this output for the java bytecode:

public static void main(java.lang.String[]);
  Code:
   0:   iconst_0
   1:   istore_1
   2:   iload_1
   3:   iinc    1, 1
   6:   ldc #2; //int 2147483647
   8:   if_icmpge   14
   11:  goto    2
   14:  return

}

It appears the loop is compiled in, I guess something happens at runtime to speed that one up. (As others have mentioned, the JIT compiler squashes out the loop.)

My guess is that the JIT is optimizing away the empty loop.

Update: The Java Performance Tuning article Followup to Empty Loop Benchmark seems to support that, along with the other answers here that point out that the C code needs to also be optimized in order to make a meaningful comparison. Key quote:

Had I chosen to use the client mode 1.4.1 JVM (client is the default mode), the loops would not be optimized away. Had I chosen to use Microsoft's C++ compiler, the C version would take no time. Clearly, the choice of compiler is critical.

There are some things you need to control for here:

the startup of the JVM is nontrivial compared to startup of a compiled C program
your loop isn't doing anything, and the compiler probably knows that
JIT compilers often produce better code than a non-optimised C compiler

"What I'm missing here?" Optimization flags.

I don't think this question really has an answer; it depends on the optimizations both compilers perform. In this case I expect either, if poked into sufficient optimization effort, would eliminate the loop entirely as i is never used.

Optimization - you are at least missing the -O2 flag on the gcc command line.

The Java JIT compiler is smart enough to optimize the loop away, while your C compiler seems to have most of the optimizations turned off.

So you are really comparing the time to start up the Java machine with the time it takes unoptimized C code to count to 2 billion.

Because the program doesn't do anything, an optimizer can remove the loop

If you are trying to make a compiler do a certain unit or, well, benchmark of work, then you need to fool it into thinking the result of the work will actually be used.

One way to do this is to write a function in one file, compile it, and then call it with the setup from another file. No compiler can anticipate what will be compiled in the future.

Without that, it's just sort of a contest between default optimization levels and has no useful significance.

Your program does absolutely nothing so this says nothing about the performance of both languages. The only thing it tells you is if your compiler is able to figure this out and therefore completely skips your program.

To make it do "something" you would have to print every increment to stdout. If you print only the end result a good compiler could optimize your program to a statement that just prints this result and skips the whole "computation".