, ,

In these days I’m working on a multi-threaded piece of code for my company, with which I’m trying to improve performance of the system. In this context, every thread performs some independent operations. Furthermore, every thread contributes, together with other threads, to the construction of some data needed, in the end of the parallel process, by the main thread.

The instinctive idea to face up this kind of scenario is to create one or more shared variables and, synchronizing somehow the threads in order to maintain consistency, make them build the final data concurrently. This creates what is known as thread contention, i.e. a condition where one thread is waiting for a lock/object that is currently being held by another thread.

But are we sure this is the only solution we can realize? Because, reading this post by Marc Brooker, which shows how atomic and volatile operations work in Java, we can see how maintaining consistency through thread contention can be time-expensive.

If we think about particular scenarios, where threads contribute to the construction of shared data independently, and with independently in this case I mean that they do their operations without caring about what the other threads are doing and when the final result can be built from the combination of the independent partial thread results. In situation like these, there is another solution which is more memory-expensive, but which can be decisive from the point of view of performance.

What I’m trying to say is that in a similar scenario we can avoid thread contention, making threads work on different memory locations and, only in the end, we can reconstruct the final result from the individual partial results.

In order to test what I’m saying, I wrote some Java code, which can be found here on GitHub.

I created a Thread class which can work in different modalities:

public enum Modality{
   CONTENTION_ATOMIC_LONG, // Increment a shared AtomicLong
   CONTENTION_SYNC, // Increment a shared int with sync method
   AVOID_CONTENTION; // Increment a reserved long on a shared array

A Thread owns a SumObject, which provides different way to increment a number:

  • AtomicLong value: used in CONTENTION_ATOMIC_LONG modality. Every Thread will atomically increment this variable
  • long[] partialValues: used in AVOID_CONTENTION modality. Every Thread will write in a different array position ignoring others Thread.
  • long valueSyn: used in CONTENTION_SYNC modality. Every Thread will increment through synchronized method this variable

For this test, a Thread does a very simple job: it increments, according to the modality and for a fixed number of times, one of the values contained in the SumObject:

public void run() {
	out.println("\tThread ["+this.idThread+"] started...");
	switch (this.modality) {
		while(this.loops > 0){
		while(this.loops > 0){
		while(this.loops > 0){
	out.println("\tThread ["+this.idThread+"] ended.");

Of course avoiding contention brings a final overhead because we need to assemble all the partial results. It’s also on this parameter that we must evaluate case-by-case. In my case, but I’m sure also in a lot of other cases, this overhead is very small and it is more than acceptable when compared to the overhead of threads synchronization.

It’s important to notice that, in order to build a consistent final result, the main Thread has to join all the other Threads:

// I'll wait for all Threads to be done...
for(int i = 0; i < nThread; i++){
    try {
        // The join() creates a happens-before barrier
    } catch (InterruptedException e) {
       //TODO: Handle this...

The join operation creates what is called happens-before barrier, which gives us the assurance that the main thread will read consistently the data produced by the other threads. In this Oracle’s tutorial, in fact, we can read: “When a thread terminates and causes a Thread.join in another thread to return, then all the statements executed by the terminated thread have a happens-before relationship with all the statements following the successful join. The effects of the code in the thread are now visible to the thread that performed the join”.

In other words, in this case, after Thread.join, the following code is safe:

long value = 0;
// I'll sum all the array values in one...
for(int i = 0; i < so.partialValues.length; i++){
    value += so.partialValues[i]; // This is safe!

Following are some graphs that show the test results:

thread contetion 1 thread

This first graph shows the results with one thread over different number of loops. Please notice that the Y-Axis is not linear but exponential, this means that a difference between the two functions has more value in the upper part of the graph. We can see that already with only one thread the solution without contention is much faster: over 10^9 loops it took 0.131s versus 12.145s of the solution with thread contention.


With 10 threads and 10^9 loops the speed difference between the two solution grows more: 2.773s vs 1127.47s (18min!!).
Even worse with 25 threads, where I couldn’t perform the test for 10^9 loops for the contention solution because after 45min my laptop just turned off! 😀


In particular the solution without thread contention seems to be indifferent to the number of threads, but it gets a bit slower increasing the loops. On the other hand the time took by the solution with thread contention grows linearly with the number of threads:


The bottom line is: try, where it is possible and when there are not memory constraints, to design threads’ working space as more independent as possible (even if this means creating N variables instead of a single one) and you’ll build much faster systems! 🙂

If you want to try the code, follow these steps:

  1. git clone https://github.com/luigimassagallerano/JavaThreadContention.git
  2. cd JavaThreadContention
  3. ant clean compile
  4. java -classpath bin com.mydevelopedworld.contention.TestContention N-Threads Modality M-loops