Post

"Double-checked locking is a matter of style, not performance"

Double-checked locking is 100 times faster than synchronized locking, but there's a caveat

You think I’m joking?
I was thinking so as well… when I first heard such a phrase from a very experienced engineer. And then I heard it again. I was literally shocked, as no one ever managed to undermine the authority of double-checked locking to me. Last time I had to prove performance of it was my 2nd year in bachelor’s.
My second thought was that the life changed so much, that basic computer science rules are now broken… maybe there’s also no more Newton’s law of universal gravitation? Hell, how much I should have missed!

Okay, after a tiny shock I decided to check.
Started with the Newton’s law.
Tea successfully managed to stay inside the cup, that was a relief.
After some tea I got to a computer science part.
Next part introduces double-checked locking, so you might want to skip to performance comparison.

Double-checked locking

I mean, I guess it could be, but according to Wikipedia

In software engineering, double-checked locking (also known as “double-checked locking optimization”) is a software design pattern used to reduce the overhead of acquiring a lock by testing the locking criterion (the “lock hint”) before acquiring the lock. Locking occurs only if the locking criterion check indicates that locking is required.

The pattern is typically used to reduce locking overhead when implementing “lazy initialization” in a multithreaded environment, especially as part of the Singleton pattern. Lazy initialization avoids initializing a value until the first time it is accessed.

Oh my, Wikipedia also thinks it’s about performance. But no one trusts Wikipedia here, right?! Right…?! Let’s rely on three things from there, though:

  1. It’s a software design pattern
  2. It has something to do about locking and (probably) its performance
  3. It is needed in a multithreaded (concurrent) environment

Let’s take a look at a tiny example of how people (might) end up using double check. Imagine a situation, when you’ve got a resource which you want to share between multiple threads.

Intuitive approach

Someone who does not know about concurrency issues (or expects no concurrency here) could come up with a following solution:

1
2
3
4
5
6
7
8
9
10
11
class BrokenResource {
  private HugeObject instance = null;

  // NEVER DO THIS IN CONCURRENT ENVIRONMENT!
  public Integer getInstance() {
    if (instance == null) {
      instance = new HugeObject();
    }
    return instance;
  }
}

But we’ve agreed to trust Wiki. At least about a multithreaded environment. Now we’ve got a problem: in a concurrent environment there’s a huge chance that HugeObject would be created multiple times.

Working solution

Fix? Here you go:

1
2
3
4
5
6
7
8
9
10
class AlmostNotBrokenResource {
  private HugeObject instance = null;

  public synchronized HugeObject getInstance() {
    if (instance == null) {
      instance = new HugeObject();
    }
    return instance;
  }
}

This would make sure, that it’s impossible to use method getInstance of a single AlmostNotBrokenResource instance concurrently. Likely, this one would pass code review in many places/cases. But, if there’s an insidious foe around you, though (and there’s always one… usually, that’s actually you), don’t do this. This solution is locking on a class instance, and thus violating encapsulation. Anything having a reference to this object might lock you out of this method by acquiring a lock on the instance.

Why would I be locked out from using the method?

In case, when you use synchronized in the non-static method definition:

1
2
3
4
5
6
7
class Example {
//...
  synchronized void doSomething() {
    //...
  }
//...
}

This is effectively equivalent to:

1
2
3
4
5
6
7
8
9
class Example {
//...
  void doSomething() {
    synchronized(this) {
     //...
    }
  }
//...
}

So given there was an instance existing:

1
2
3
//...
Example example = new Example();
//...

And someone locked on it

1
2
3
4
5
//...
synchronized(example) {
 //...
}
//...

getInstance() method would be locked until the example object gets unlocked.

Safe working solution

So here’s an improved solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
class WorkingResource {
  private HugeObject instance = null;
  private final Object lock = new Object();

  public HugeObject getInstance() {
    synchronized (lock) {
      if (instance == null) {
        instance = new HugeObject();
      }
      return instance;
    }
  }
}

Now no one can easily lock our object, so we’re finally safe? Perfectly safe!

Double-checked locking

The problem is, that old-fashioned computer science tells us, that due to synchronized block in a previous example, at some point we would reach a “saturation”, making multiple threads wait for the resource, causing infamous harvesting. So, according to this old-fashioned science, to prevent this we need to, as one of the options, use double-checked locking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class WorkingResource {
  private volatile HugeObject instance = null;
  private final Object lock = new Object();

  public HugeObject getInstance() {
    if (instance == null) {
      synchronized (lock) {
        if (instance == null) {
          instance = new HugeObject();
        }
      }
    }
    return instance;
  }
}

The idea here is that before making getInstance non-accessible by multiple threads, we first ensure that it is truly required. According to my experience, usually, double-checked locking would be implemented like this.
But wait! There’s a caveat hidden on line 2. We’ve used volatile variable, which is not blazing fast (we’ll test this knowledge later).

Why volatile is required here?

Only 32 bit (some of the primitive types) are guaranteed to be written atomically in Java. I would omit the details (as that would take an another blogpost to describe), but that means, that concurrent thread might read the variable, which would consist partially of the new and an old state at the same time. References in Java could be both 32 and 64 bit, so we could not afford risk here.
When we used synchronized block, volatile would have been excessive, as it’s guaranteed that only one thread would have an access to the variable at the same time.

Improved double-checked locking

To prevent multiple access to a volatile variable, we need to save it to a local variable first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class WorkingResource {
  private volatile HugeObject instance = null;
  private final Object lock = new Object();

  public HugeObject getInstance() {
    HugeObject currentInstance = instance;
    if (currentInstance == null) {
      synchronized (lock) {
        if (instance == null) {
          currentInstance = new HugeObject();
          instance = currentInstance;
        }
      }
    }
    return currentInstance;
  }
}

Now we are finished with the code. Let’s go test it and see how this all performs.

Performance comparison

After testing that pushing the chair makes it pushing me back with an equal force, I’ve written a sample benchmark to compare the performance. It uses famous jmh microbenchmarking framework, which (fortunately for us) allows to measure performance of concurrent code with the specified amount of threads.

Double-check that you really want to check the code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
@Timeout(time = 15)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class TestThreadsJmh {
  ImportantResource resource = new ImportantResource();
  @Setup
  public void setUp() {
//    Ensure resource is pre-initialized
    resource.getInstanceDoubleCheck();
  }

  @Benchmark
  public int doubleCheck() {
    return resource.getInstanceDoubleCheck();
  }

  @Benchmark
  public int doubleCheckImproved() {
    return resource.getInstanceDoubleCheckImproved();
  }

  @Benchmark
  public int synchronizedLocking() {
    return resource.getInstanceSynchronized();
  }


  static class ImportantResource {
    //  Value to doublecheck for
    private volatile Integer instance = null;
    private final Object lock = new Object();

    public Integer getInstanceDoubleCheck() {
      if (instance == null) {
        synchronized (lock) {
          if (instance == null) {
            Random random = new Random();
            instance = random.nextInt();
          }
        }
      }
      return instance;
    }

   public Integer getInstanceDoubleCheckImproved() {
      Integer result = instance;
      if (result == null) {
        synchronized (lock) {
          if (instance == null) {
            Random random = new Random();
            result = random.nextInt();
            instance = result;
          }
        }
      }
      return result;
    } 

    public Integer getInstanceSynchronized() {
      synchronized (lock) {
        if (instance == null) {
          Random random = new Random();
          instance = random.nextInt();
        }
        return instance;
      }
    }
  }

  public static void main(String[] args) throws RunnerException {
    List<Integer> threadsCount = List.of(1, 2, 4, 6, 8, 10, 12, 24, 36, 64, 128);
    for (int threads : threadsCount) {
      Options opt = new OptionsBuilder()
          .include(TestThreadsJmh.class.getCanonicalName())
          .threads(threads)
          .result("path-to-dir/result-threads-%d.txt".formatted(threads))
          .mode(Mode.Throughput)
          .forks(5)
          .build();

      new Runner(opt).run();
    }
  }
}

Although JMH is trying to make results as reproducible as possible, there still can be a slight difference between runs. They would definitely vary between JDK versions, CPUs, OSs.
To give you an idea, my setup was:

  • JDK Temurin-21.0.1+12
  • Hardware: MacBook Pro 2018 (2.2 GHz 6 core, 16gb RAM)
  • Mac OS 14.2.1

Let’s see how a synchronized code would compare with basic and improved double-checked locking.

---
config:
 themeVariables:
   xyChart:
     plotColorPalette: "#008000, #faba63, #c42116"
---
xychart-beta
    title "getInstance method performance"
    x-axis "Number of threads" [1, 2, 4, 6, 8, 10, 12, 24, 36, 64, 128]
    y-axis "ops/μs" 1 --> 6500 
    line "Synchronized" [60.811122, 35.895496, 18.873457, 12.074054, 12.518262, 13.938834, 11.879644, 12.915154, 15.587374, 15.033182, 18.681862]
    line "Basic double-checked locking" [1102.941127, 2198.644819, 4000.315515, 5153.423459, 5263.936999, 5168.389552, 5190.063020, 5230.766818, 5229.003301, 5209.035184, 3627.289318]
    line "Improved double-checked locking" [1306.641707, 2596.135830, 4528.744632, 5989.270898, 6122.370113, 6196.279973, 6091.301301, 6072.149273, 6116.128474, 6115.881072, 4436.121656]
  • Chart compares number of operations per microsecond to number of threads (higher is better)
  • Red line: Improved double-checked locking
  • Yellow line: Basic double-checked locking
  • Green line: Synchronized code

Aargh! We almost could not see the synchronized code performance. What we can say though, is that synchronized code performs at least one order of magnitude worse (and, starting from 4 threads, 2 orders of magnitude…). The “improved” double-checked locking version performs about 20% quicker than the basic one, which is pretty good for a single additional variable, but I’ve expected more. And both double-checked locking based solutions grow almost linearly with the number of threads used until we reach number of physical cores, which proves the scalability of the approach.

Let’s zoom it in.

---
config:
 themeVariables:
   xyChart:
     plotColorPalette: "#008000, #faba63, #c42116"
---
xychart-beta
    title "getInstance method performance with synchronized"
    x-axis "Number of threads" [1, 2, 4, 6, 8, 10, 12, 24, 36, 64, 128]
    y-axis "ops/μs" 1 --> 70
    line "Synchronized" [60.811122, 35.895496, 18.873457, 12.074054, 12.518262, 13.938834, 11.879644, 12.915154, 15.587374, 15.033182, 18.681862]

Sychronized just degrades when we increase number of threads.

Results

So, what are the results?

  • Double-checked locking is faster by up to 2 orders of magnitude.
  • Thinking about volatile variables can save you significant amount of time.
  • Even synchronized code is capable of performing… At least 12 million times a second!

Is double-checked locking more a matter of style? Based on my performance analysis the answer is no. But being prargmatic I would propose following strategy:

  • If your code is never expected to call the function more frequently than 1 million times per second - consider using synchronized for simplicity. If you’re not getting stuck waiting, you’re unlikely to notice it.
  • If, during the design phase, you expect hundreds of thousands calls per second (and you expect it to grow), I would really recommend using double-checked locking. It’s not so complicated, but would save you lots of profiling time in the future.

Liked my post? Have some private feedback? Want to ask some questions, or maybe discuss a job opportunity? Feel free to reach out to me directly using any of the linked social network!

This post is licensed under CC BY 4.0 by the author.