Most Recent Blog

King's View of the Battle

 Open GL model of a chess board.

February 5, 2020

Performance of the parallelSort method introduced in Java SE 8.

Taking a quick look at a feature added to java in SE8.  This is a parallel Array sort, and it takes advantage of multiple processors to sort arrays more quickly.  The feature is really easy to use, simply invoke Arrays.parallelSort(input);. instead of the old Arrays.sort(input);.  That is it.  Lets look at a simple performance test with results:


            size %Faster               serial ns      parallel ns
              10 -83.5% Faster ->          1,908            3,502
             100 17.0%  Faster ->          5,496            4,560
            1000 65.3%  Faster ->        149,033           51,768
           10000 5.1%   Faster ->        488,044          463,089
          100000 57.7%  Faster ->      6,153,690        2,601,578
         1000000 56.2%  Faster ->     72,704,992       31,827,891
        10000000 58.2%  Faster ->    855,719,416      357,283,539


Some Notes about this test:
  • The sort and parallel sort are tested on the exact same Array.  An array is generated using the syntactic sugar of $\lambda$ expressions and then duplicated with a System.arrayCopy(...).  as 
private double[] randomArray(int size) {
    double[] result = new double[size];
    Random randGen = new Random();

    // Use the parallel Set All function to generate this random number quickly.    
    Arrays.parallelSetAll(result, value -> {
        return randGen.nextDouble() * 10000;
    });
    return result;
}
  • The test is run 9 times without recording the result.  This is to make sure that the JVM as "warmed up" and the hotspot will not experience a compile pause.
  • Prior to each sort a System.gc() is invoked to give the JVM a chance to clean up any garbage.
Looking at the results on a 4 core iMac:
Hardware Overview:
  Model Name: iMac
  Processor Name: Quad-Core Intel Core i5
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 4
  L2 Cache (per Core): 256 KB
  L3 Cache: 6 MB

  Memory: 32 GB
Based on the above results it is clear that an array must be at least 100 elements in size to overcome the cost of using multiple threads for the sorting, however at 100,000 a consistent cost savings of 50%+ exists.
The gist for this is: