Cost of locking in .NET vs Java_问答_开发者_运维开发者技术经验分享

I was playing with Disruptor framework and its port for .NET platform and found an interesting case. May be I completely miss something so I'm looking for help from almighty Community.

        long iterations = 500*1000*1000;
        long testValue = 1;

        //.NET 4.0. Release build. Mean time - 26 secs;
        object lockObject = new object();
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            lock (lockObject)
            {
                testValue++;    
            }
        }
        sw.Stop();

        //Java 6.25. Default JVM params. Mean time - 17 secs.
        Object lock = new Object();
        long start = System.currentTimeMillis();
        for (int i = 0; i < iterations; i++)
        {
                synchronized (lock)
                {
                    testValue++;
                }
        }
        long stop = System.currentT开发者_高级运维imeMillis();

It seems that acquiring the lock in the scenario with a signle thread in .NET costs just 50% more than in Java. At first I was suspicious at timers but I've ran the same test for a few times with results just around mentioned above mean values. Then I was suspicious at synchronized block of code but it does no more than just monitorenter / monitorexit byte code instructions - the same thing as lock keyword in .NET. Any other ideas why taking a lock is so expensive in .NET vs Java?

Yes, it looks like taking an uncontended lock is more expensive in .NET than in Java. (The results on my netbook are slightly more dramatic still.)

There are various aspects to performance which will be faster on one platform than another, sometimes to this extent. The HotSpot JIT and the .NET JIT are pretty radically different in various ways - not least because the .NET JIT only runs once on IL, whereas HotSpot is able to optimize more and more as a particular piece of code is run more and more often.

The important question is whether this is really significant. If your real life application spends really acquires an uncontented lock 500 million times every minute, it probably is significant - and you should probably redesign your app somewhat. If your real life application actually does real work within the lock (or between acquisitions of the lock) then it's unlikely to be a real bottleneck.

I recently found two .NET gotchas (part one; part two) which I'm having to work round as I'm writing a "system level library" and they would have made a significant difference when an application did a lot of date/time parsing - but this sort of micro-optimization is rarely worth doing.

The first thing to remember about micro-benchmarks is that Java is particularly good at identifying and eliminating code which doesn't do anything. I have found that again and again, Java does pointless code faster than any other language. ;)

If Java is surprising fast compared to another language the first question should be; Does the code do anything remotely useful? (or even look like it could be useful)

Java tends to loop unroll more than it used to. It can also combine locks. As your test is uncontested and does do anything your code is like to look something like.

for (int i = 0; i < iterations; i+=8) {
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
}

which becomes

for (int i = 0; i < iterations; i+=8) {
    synchronized (lock) {
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
    }
}

since testValue is not used.

for (int i = 0; i < iterations; i+=8) {
    synchronized (lock) {
    }
}

and finally

{ }

Is the variable 'testValue' local to a method? If so, it is possible that the JRE has detected that locking is unnecessary as the variable is local to one thread and is therefore not locking at all.

This is explained here.

To show just how hard it is to tell what optimisations the JVM decides to do - and when it decides to do it - examine these results from running your code three consecutive times:

public static void main(String[] args) {
  System.out.println("Java version: " + System.getProperty("java.version"));
  System.out.println("First call : " + doIt(500 * 1000 * 1000, 1)); // 14 secs
  System.out.println("Second call: " + doIt(500 * 1000 * 1000, 1)); // 1 sec
  System.out.println("Third call : " + doIt(500 * 1000 * 1000, 1)); // 0.4 secs
}

private static String doIt(final long iterations, long testValue) {
    Object lock = new Object();
    long start = System.currentTimeMillis();
    for (int i = 0; i < iterations; i++) {
        synchronized (lock) {
            testValue++;
        }
    }
    long stop = System.currentTimeMillis();
    return (stop - start) + " ms, result = " + testValue;
}

These results are so hard to explain, I think only a JVM engineer could help shed light.

Remember, both are extremely fast; we are talking about 50 CPU cycles for lock-read-write-unlock here.

In Java, I compared it with a simulated impl in uncontended case

volatile int waitingList=0;

    AtomicInteger x = new AtomicInteger(0);
    for (int i = 0; i < iterations; i++)
    {
        while( ! x.compareAndSet(0, 1) )
            ;

        testValue++;

        if(waitingList!=0)
            ;
        x.set(0);
    }

This bare bone simulation is a little faster than the synchronized version, time taken is 15/17.

That shows that in your test case, Java didn't do crazy optimizations, it honestly did lock-read-update-unlock for each iteration. However, Java's impl is as fast as the bare bone impl; it can't be any faster.

Although C#'s impl is also close to minimum, it apparently does one or two things more than Java. I'm not familiar with C#, but this probably indicates some semantics difference, so C# has to do something extra.

When I investigated lock/sync costs a few years ago in Java I ended up with a big question how locking affected over-all performance also for other threads accessing any kind of memory. What may be affected is the CPU cache, especially on a multi-processor computer - and depends on how the specific CPU architecture handles cache synchronization. I believe the overall performance is not affected on a modern single CPU architecture, but I am not sure.

Anyway, when in doubt especially when multi-process computers may be used to host the software, it may be worth putting a lock on a higher level over several operations.

The Java JIT will optimize the synchronization away as the lock object is thread local (i.e. it is confined to the thread's stack and never shared) and thus can never be synchronized on from another thread. I'm not sure if the .NET JIT will do this.

See this very informative article, especially the part on lock elision.