Catalog of Good Practices#

This catalog has been prepared by gathering recommended good practices in code development. Additionally, measurements have been made with the JoularJX tool of the energy consumption of some well-known algorithms and data structures implemented in Java.

For each entry in the catalog, you will find the description of benchmarks, in some cases the specification of the tests we performed and bibliographical references.

If you are interested and want to reproduce the tests, all the Java programs and shell scripts used to perform the measurements are available in:

or, alternatively, you can copy the following command on your terminal:

wget https://www-inf.telecom-sudparis.eu/COURS/cen/Mesures/SoftwareEfficiencyCatalog.zip

Specifications of the testing PC#

OS

Ubuntu 22.04 LTS 64 bits GNOME 42.2

Model

Dell Inc. Latitude 5580

CPU

Intel® Core™ i7-7600U CPU @ 2.80GHz × 4

RAM

32 Go

Measurement tool

JoularJX 1.0

Java Version

openjdk 17.0.3 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)

Sorting#

The time complexity of a sorting algorithm can not be better than \(O(n \log(n))\).

Many sorting algorithms exist, we only present a few of them:

Slow sorts#

These algorithms are simple to implement but have a time complexity of \(O(n^2)\).

int[]

ArrayList<Integer>

ArrayList<String>

Size

\(50\ 000\)

\(50\ 000\)

\(50\ 000\)

Bubble sort

\(68.89 \pm 3.06\ J\)

\(265.28 \pm 9.01\ J\)

\(593.44 \pm 11.12\ J\)

Selection sort

\(2.77 \pm 2.14\ J\)

\(37.97 \pm 1.49\ J\)

\(295.76 \pm 8.98\ J\)

Insertion sort

\(14.51 \pm 0.79\ J\)

\(73.06 \pm 7.40\ J\)

\(143.99 \pm 9.27\ J\)

We can notice that the selection sort is here surprisingly better than the insertion sort to sort lists or arrays of integers.

Quick Sorts#

These algorithms have a time complexity of \(O(n\log(n))\). The sort() method is a built-in method of Java. We did not code the Tim sort and the dual-pivot quicksort, but we show here which sort has been selected by Java developers.

Note that the Array.sort() method uses a dual-pivot quicksort for primitive arrays (int[], bytes[], long[], etc) but a Tim sort for Object arrays.

int[]

ArrayList<Integer>

ArrayList<String>

Size

\(3\ 000\ 000\)

\(3\ 000\ 000\)

\(3\ 000\ 000\)

Merge sort

\(11.33 \pm 0.65\ J\)

\(42.18 \pm 2.21\ J\)

\(59.08 \pm 2.46\ J\)

(one-pivot) Quicksort

\(6.38 \pm 0.38\ J\)

\(23.72 \pm 1.00\ J\)

\(49.78 \pm 3.48\ J\)

Tim sort (Collections.sort())

-

\(22.73 \pm 1.88\ J\)

\(37.10 \pm 6.54\ J\)

Dual-pivot quicksort (Arrays.sort() for int[])

\(2.57 \pm 0.64\ J\)

-

-

Conclusion#

It is acceptable to use a slow sort on a small list or to learn how to code, but it should never be used for bigger lists.

Moreover, if you know how to use a Comparator, you could simply use the implemented sort(Comparator) method.


Test specifications#

The methods used are in the collections_comparison.ListSortingComparison.java and the collections_comparison.ArraySortingComparison.java classes.

The test was launched using the collections_comparison.TestListSortingComparison.java and the collections_comparison.TestArraySortingComparison.java classes with JoularJX by using the bash script and the files located in the folder scripts/test/testListSortingComparison/ and in scripts/test/testArraySortingComparison/ .

We launched 10 times the test class to get these means and standard deviations.

The integers used where randomly selected with the ThreadLocalRandom class and its nextInt() method.

We used this script to quickly launch the tests for the List:

Click to see
#!/bin/bash
# Launches testListSortingComparison.sh

bash testListSortingComparison.sh 10 0 3000000 Integer	# Quick sorts
bash testListSortingComparison.sh 10 1 50000   Integer	# Slow sorts

bash testListSortingComparison.sh 10 0 3000000 String	# Quick sorts
bash testListSortingComparison.sh 10 1 50000   String	# Slow sorts

We used this script to quickly launch the tests for the int[]:

Click to see
#!/bin/bash
# Launches testArraySortingComparison.sh

bash testArraySortingComparison.sh 10 0 3000000	# Quick sorts
bash testArraySortingComparison.sh 10 1 50000	# Slow sorts

SQL Queries#

Query Performance#

The advices we can give you are really depending on the database you are using. General good practices are:

  • If possible, use the LIMIT clause. The gain will be proportional to the quantity of retrieved data.

  • The SELECT * should be avoided. The select should only contain the columns you need and no other one.

  • Unnecessary JOIN also.

  • To analyze the query, you can add the EXPLAIN FORMAT=JSON cause before the query. It will returned a table with one row containing a JSON column and its content is a representation of the query plan. The cause to add depends on the SQL server used. For our server (MariaDB), we need to use ANALYZE FORMAT=JSON.

We give you a simple example of a JSON file with a query on our database. The query is:

ANALYZE FORMAT=JSON
SELECT
    Orders.Id
FROM
    Orders
WHERE
    ShippedDate IS NULL

and it gives us the JSON:

{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 2.429,
    "table": {
      "table_name": "Orders",
      "access_type": "ALL",
      "r_loops": 1,
      "rows": 6166,
      "r_rows": 6110,
      "r_total_time_ms": 2.1289,
      "filtered": 100,
      "r_filtered": 14.845,
      "attached_condition": "Orders.ShippedDate is null"
    }
  }
}
  • r_loops show the number of times the node was actually executed. Here the table Orders was only scanned once.

  • r_total_time_ms show the time that was spent to execute the node. Here the query took 2.429 ms to be traited.

  • r_rows show how many rows were actually read. Here 6110 rows of the Orders table were read. rows shows the number of rows expected to be read by the optimizer, but here, the optimizer did not have any way to estimate it.

  • r_filtered show the percentage of rows that have been actually filtered with the attached_condition. In the same way as with rows and r_rows, filtered show what percentage was expected by the optimizer.

Other data can be obtained with the EXPLAIN/ANALYZE statement, you will then have to see the documentation in order to understand them.

References#

String concatenation#

Be aware that to concatenate \(n\) strings requires a time complexity of \(O(n^2)\) because strings are immutable. Therefore, each concatenation results in the creation of a new string, which means poor performance.

Consider using the StringBuilder class when appending characters for example. Alternatively, use a character array, which is even better optimized. The thread-safe version of the StringBuilder is the StringBuffer.

Use comparison#

String

StringBuilder

StringBuffer

Mutable

No

Yes

Yes

Thread-safe

The String is immutable, therefore the object is thread-safe

No

Yes

Append chars at the end

\(488.73 \pm 28.39\ J\)

\(0.32 \pm 0.14\ J\)

\(0.33 \pm 0.27\ J\)

Append chars at the beginning

\(219.64 \pm 23.91\ J\)

\(47.50 \pm 2.83\ J\)

\(42.74 \pm 4.54\ J\)

Test specifications#

The methods used are in the string_manipulation.StringManipulationComparison.java class.

The test was launched using the string_manipulation.TestStringManipulation.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testStringManipulation/.

We launched 5 times the test class to get these means and standard deviations.

To get the results of the “Append chars” parts, we start with a empty String/StringBuilder/StringBuffer and we append 500 000 times the same character ‘a’.

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

# Launches testStringManipulation.sh with the 3 possible choices with size 500 000

bash testStringManipulation.sh 10 0 500000 # String
bash testStringManipulation.sh 10 1 500000 # StringBuilder
bash testStringManipulation.sh 10 2 500000 # StringBuffer

References#

  • Effective Java. Third Edition. Joshua Bloch. “Item 64: Beware the performance of String concatenation”.

Java List#

Choice of a list class#

The List interface (and associated implementation classes) is one of the most used data structures for collections in Java. Java classes that implement this interface may be more consuming and less efficient than a Java array but it is much easier and safer to use.

When using a List in Java, you need to choose between different implementations. The main ones are the ArrayList, which is based on an array, and the LinkedList, which is based on a doubly linked list. Moreover, the Vector, which is based on the ArrayList, can be used for thread-safe usage. Here are the time complexities and energy consumptions resulting from a test of the main methods.

Main choices#

ArrayList

LinkedList

Vector

Data structure

Based on an array

Based on a doubly linked list

Based on an array

Thread-safe

No

No

Yes

For the memory use, the LinkedList nodes have to store the adresses of the previous and the next nodes in addition of the data.

On the other hand, if the reserved capacity for the ArrayList is higher than what it actually needs, then the remaining memory is wasted.

So, if we know the exact capacity to use and we reserve it at the creation of an ArrayList, then the ArrayList would take less memory than a LinkedList filled with the same data.

Benchmark 1 (100 000 elements)#

The following results were obtained with Lists of 100 000 elements.

Insertion of elements#

ArrayList

LinkedList

Vector

List.add(Object)

\(O(1)\)
(except if the ArrayList has to be extended: \(O(n)\))

\(O(1)\)

same as the ArrayList

List.add(int index, Object)

\(O(n)\)
(\(O(1)\) if adding at the end)

\(O(n)\)
(\(O(1)\) if adding at the beginning or at the end)

same as the ArrayList

Insertion at the beginning

\(10.605 \pm 0.421\ J\)

\(0.73 \pm 0.22 \ J\)

\(10.619 \pm 0.674\ J\)

Insertion in the middle

\(4.494 \pm 0.286\ J\)

\(\mathbf{564.60 \pm 28.95\ J}\)

\(4.466 \pm 0.225\ J\)

Insertion at the end

\(0.249 \pm 0.026\ J\)

\(0.41 \pm 0.05\ J\)

\(0.282 \pm 0.023\ J\)

Remove elements#

ArrayList

LinkedList

Vector

List.remove(int index)

\(O(n)\)
(\(O(1)\) if removing at the end)

\(O(n)\)
(\(O(1)\) if removing at the beginning or at the end)

same as the ArrayList

Remove randomly half of the elements by using List.remove(int index)

\(3.587 \pm 0.226\ J\)

\(\mathbf{211.21 \pm 24.93\ J}\)

\(3.516 \pm 0.247\ J\)

Remove elements whose index is even by using List.iterator() and Iterator.remove()

\(4.656 \pm 0.296\ J\)

\(0\ J\)

\(4.769 \pm 0.247\ J\)

Accessing the elements#

ArrayList

LinkedList

Vector

List.get(int index)

\(O(1)\)

\(O(n)\)
(\(O(1)\) if getting at the beginning or at the end)

same as the ArrayList

Random access

\(0.072 \pm 0.026\ J\)

\(\mathbf{306.84 \pm 26.78\ J}\)

\(0.134 \pm 0.034\ J\)

Iterate by using List.get(int index)

\(0.035 \pm 0.017\ J\)

\(\mathbf{295.52 \pm 18.27\ J}\)

\(0.069 \pm 0.034\ J\)

Iterate by using List.iterator() and Iterator.next()

\(0.047 \pm 0.015\ J\)

\(0.17 \pm 0.00\ J\)

\(0.061 \pm 0.025\ J\)

Iterate by using the enhanced for-each loop

\(0.046 \pm 0.016\ J\)

\(0.22 \pm 0.02\ J\)

\(0.037 \pm 0.012\ J\)

Sort the list#

ArrayList

LinkedList

Vector

Best sorting complexity

\(O(n \log(n))\)

\(O(n \log(n))\)

\(O(n \log(n))\)

Sort by using Collections.sort(List)

\(1.062 \pm 0.053\ J\)

\(1.32 \pm 0.30\ J\)

\(1.081 \pm 0.059\ J\)

Benchmark 2 (500000 elements)#

The following results were obtained with Lists of 500 000 elements. This time, we did not use the methods that require List.get(int) with the LinkedList. It will be written “-” in the table when no testing was done for the concerning method.

Insertion of elements#

ArrayList

LinkedList

Vector

List.add(Object)

\(O(1)\)
(except if the ArrayList has to be extended: \(O(n)\))

\(O(1)\)

same as the ArrayList

List.add(int index, Object)

\(O(n)\)
(\(O(1)\) if adding at the end)

\(O(n)\)
(\(O(1)\) if adding at the beginning or at the end)

same as the ArrayList

Insertion at the beginning

\(\mathbf{450.40 \pm 24.04\ J}\)

\(1.349 \pm 0.063 \ J\)

\(\mathbf{456.33 \pm 16.97\ J}\)

Insertion in the middle

\(163.92 \pm 7.74\ J\)

-

\(157.13 \pm 10.00 \ J\)

Insertion at the end

\(1.19 \pm 0.69\ J\)

\(1.259 \pm 0.041\ J\)

\(1.32 \pm 0.17\ J\)

Remove elements#

ArrayList

LinkedList

Vector

List.remove(int index)

\(O(n)\)
(\(O(1)\) if removing at the end)

\(O(n)\)
(\(O(1)\) if removing at the beginning or at the end)

same as the ArrayList

Remove randomly half of the elements by using List.remove(int index)

\(163.93 \pm 6.96\ J\)

-

\(165.89 \pm 3.22\ J\)

Remove elements whose index is even by using List.iterator() and Iterator.remove()

\(205.24 \pm 6.60\ J\)

\(0.257 \pm 0.333\ J\)

\(165.89 \pm 3.22\ J\)

Accessing the elements#

ArrayList

LinkedList

Vector

List.get(int index)

\(O(1)\)

\(O(n)\)
(\(O(1)\) if getting at the beginning or at the end)

same as the ArrayList

Random access

\(0.96 \pm 0.14\ J\)

-

\(1.40 \pm 0.13\ J\)

Iterate once by using List.get(int index)

\(0.29 \pm 0.01\ J\)

-

\(0.30 \pm 0.01\ J\)

Iterate once by using List.iterator() and Iterator.next()

\(0\ J\)

\(0.149 \pm 0.034\ J\)

\(0.22 \pm 0.03\ J\)

Iterate once by using the enhanced for-each loop

\(0.25 \pm 0.03\ J\)

\(0.173 \pm 0.032\ J\)

\(0.33 \pm 0.09\ J\)

Sort the list#

ArrayList

LinkedList

Vector

Best sorting complexity

\(O(n \log(n))\)

\(O(n \log(n))\)

\(O(n \log(n))\)

Sort by using Collections.sort(List)

\(2.16 \pm 2.18\ J\)

\(6.339 \pm 0.307\ J\)

\(3.12 \pm 2.32\ J\)

Conclusion#

The LinkedList is designed to be used as a list on which we iterate (by using an Iterator or by using the enhanced for-each loop) or to be used as a queue, that is working with the head or with the tail of the list.

For general purposes, the ArrayList seems to be a better choice. But if we have to work on elements that are near the head or the tail of the list, then the LinkedList would have better performance, otherwise, it has poor performance when it has to explore deeply the list. Moreover, if you need a thread-safe list, the Vector, which is based on the ArrayList, is a synchronized object you could use without losing too much performance.


Test specifications#

The methods used are in the collections_comparison.ListMethodsComparison.java class.

The test was launched using the collections_comparison.TestListMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testListMethodsComparison/.

We write in the table the mean and the standard deviation obtained after launching the testing class 10 times to get a mean and a standard deviation of the methods’ consumption and looping over the functions 10 times (so we divided the results by 10).

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

### Benchmark 1 ###
# Launches testListMethodsComparison.sh with the 3 possible lists with size 100 000

bash testListMethodsComparison.sh 10 0 100000 1 10	# ArrayList # Use of List.get() and loop 10 times
bash testListMethodsComparison.sh 10 1 100000 	# LinkedList
bash testListMethodsComparison.sh 10 2 100000 1 10	# Vector

### Benchmark 2 ###
# Launches testListMethodsComparison.sh with the 3 possible lists with size 500 000

bash testListMethodsComparison.sh 10 0 500000	# ArrayList
bash testListMethodsComparison.sh 10 1 500000 0 10 # LinkedList # No use of List.get() and loop 10 times
bash testListMethodsComparison.sh 10 2 500000	# Vector

The results were obtained using List of strings. The strings were randomly created following the pattern “Number is %d”, where %d is a random integer. When the energy consumption is 0 J, it means that the execution was to quick for JoularJX to give a value for the concerned method.

When we iterate, we only iterate one time over the whole list.

For random access, we use List.get(int randomInt), where randomInt is a random integer, as many times as there are elements in the list, that is 100 000 times in our case.

For the second benchmark, with the LinkedList, we have launched the methods 10 times in order for JoularJX to give us a power consumption higher than 0 J, so we divided by 10 the results we obtained.

References#

  • Effective Java. Third Edition. Joshua Bloch. “Item 28: Prefer List to Array”.

  • Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf

  • Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html

Java Map#

Choice of the map class#

A Map is an interface that maps keys to values.

When using a Map in Java, you have to choose between differents implementations. The main ones are the HashMap, the LinkedHashMap and the TreeMap. Moreover, the Hashtable and the ConcurrentHashMap can be used for thread-safe usage. These implementations differs in the data structures used. Here are the average time complexities and energy consumptions resulting from a test for the main methods.

Main choices#

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Data structure

Hash table

Hash table and linked lists (for keys, values and entries collections)

Red-Black tree

Hash table

Hash table

Order of collections

No (random order)

Order of insertion (as a LinkedList)

Natural order

No (random order)

No (random order)

Permits null as key

Yes

Yes

No

No

No

Thread-safe

No

No

No

Yes

Yes

The order of the TreeMap is the natural order by default, but other orders can be used by precising the Comparator when creating the TreeMap. Moreover, if this Comparator allows null, then null can be used as a key.

Benchmark 1#

Insertion of elements#

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Map.put( Object key, Object value )

\(O(1)\)
(except if the Hash table has to be extended: \(O(n)\))

\(O(1)\)
(same comment as the HashMap)

\(O(\log(n))\)

\(O(1)\)
(same comment as the HashMap)

\(O(1)\)
(same comment as the HashMap)

Insertion

\(393.97 \pm 8.65\ J\)

\(444.42 \pm 10.33\ J\)

\(\mathbf{918.49 \pm 19.11\ J}\)

\(421.14 \pm 11.02\ J\)

\(431.79 \pm 8.66\ J\)

However, if we use a bad hashing function, the worst-case scenario for the hash tables is \(O(n)\) (as if we iterate over a List).

Remove elements#

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Map.remove( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

\(O(1)\)

\(O(1)\)

Remove half the elements

\(13.61 \pm 6.12\ J\)

\(28.87 \pm 6.68\ J\)

\(66.67 \pm 8.91\ J\)

\(15.66 \pm 6.41\ J\)

\(17.70 \pm 4.20\ J\)

Access#

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Map.get( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

\(O(1)\)

\(O(1)\)

Random access

\(37.40 \pm 0.81\ J\)

\(44.00 \pm 2.54\ J\)

\(296.74 \pm 20.49\ J\)

\(42.49 \pm 2.73\ J\)

\(39.37 \pm 2.75\ J\)

Use Map.get() for each key

\(33.77 \pm 1.88\ J\)

\(39.85 \pm 2.75\ J\)

\(94.39 \pm 7.31\ J\)

\(27.93 \pm 3.21\ J\)

\(23.92 \pm 1.49\ J\)

Iterate over the Map.entrySet() with the Set.iterator()

\(24.32 \pm 3.92\ J\)

\(8.01 \pm 1.88\ J\)

\(42.99 \pm 10.61\ J\)

\(22.87 \pm 4.04\ J\)

\(27.74 \pm 4.42\ J\)

Iterate over the Map.entrySet() with a for-each loop

\(19.24 \pm 2.95\ J\)

\(6.28 \pm 1.53\ J\)

\(39.40 \pm 6.49\ J\)

\(16.20 \pm 0.94\ J\)

\(17.27 \pm 3.58\ J\)

Conclusion#

The choice really depends on the usage.

If the order of elements does not matter, then do not take the TreeMap, which has the worse performance overall, supposing the hash function used for the hash table is good enough.

For thread-safe usage, our results show a little difference between the Hashtable and the ConcurrentHashMap. Due to their way of dealing with read-write locks, the ConcurrentHashMap was supposed to be overall better than the Hashtable but their consumption are pretty close.

For general purposes, take the HashMap. TreeMap has the worst performance for a Map implementation and should be used only if you know what you are doing. Moreover, if you want to insert and keep elements in a particular order, then you might think of using the LinkedHashMap. This latter is also useful for copying a Map while keeping its order.

Test specifications#

The methods used are in the collections_comparison.MapMethodsComparison.java class.

The test was launched using the collections_comparison.TestMapMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testMapMethodsComparison/.

We launched 10 times the test class to get these means and standard deviations. The results were obtained using 20 000 000 entries where the key was a random integer and the value was a string, which was created following the pattern “Number is %d”, where %d is an other random integer. When we iterate, we only iterate one time over the whole list.

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

# Launches testMapMethodsComparison.sh
bash testMapMethodsComparison.sh 10 0 20000000	# HashMap
bash testMapMethodsComparison.sh 10 1 20000000	# LinkedHashMap
bash testMapMethodsComparison.sh 10 2 20000000	# TreeMap
bash testMapMethodsComparison.sh 10 3 20000000	# Hashtable
bash testMapMethodsComparison.sh 10 4 20000000	# ConcurrentHashMap

For random access, we use Map.get(int randomInt), where randomInt is a random integer, as many times as there are elements in the list, that is 5 000 000 times in our case.

To remove half of the elements, we use the Map.entrySet.iterator() and we remove one out two objects.

References#

  • Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf

  • Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html

Java Set#

Choice of the set class#

In Java, the main implementations of the Set, which is a Collection whose elements are unique, are the HashSet, the LinkedHashSet and the TreeSet. They are based on their Map counterpart where the keys would be the elements of the set and the values would all be null.

Main choices#

HashSet

LinkedHashSet

TreeSet

Data structure

Hash table

Hash table and linked list

Red-Black tree

Order of collections

No (random order)

Order of insertion (as a LinkedList)

Natural order

Permits null as key

Yes

Yes

No, except if using a comparator that allow

Thread-safe

No

No

No

Benchmark 1#

Insertion of elements#

HashSet

LinkedHashSet

TreeSet

Set.add(Object)

\(O(1)\)
(except if the Hash table has to be extended: \(O(n)\))

\(O(1)\)
(same comment as the HashSet)

\(O(\log(n))\)

Insertion

\(86.04 \pm 1.39\ J\)

\(96.92 \pm 1.08\ J\)

\(186.42 \pm 4.65\ J\)

However, if we use a bad hashing function, the worst-case scenario for the hash tables is \(O(n)\) (as if we iterate over a List).

Remove elements#

HashSet

LinkedHashSet

TreeSet

Set.remove( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

Remove half the elements

\(15.47 \pm 0.75\ J\)

\(15.97 \pm 0.16\ J\)

\(5.73 \pm 0.81\ J\)

Access#

HashSet

LinkedHashSet

TreeSet

Set.contains( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

Check if objects belong to the Set

\(12.87 \pm 7.74\ J\)

\(26.12 \pm 1.25\ J\)

\(106.58 \pm 3.89\ J\)

Iterate with the Set.iterator()

\(4.32 \pm 0.42\ J\)

\(1.52 \pm 0.02\ J\)

\(6.65 \pm 0.72\ J\)

The order of the TreeSet is the natural order by default, but other orders can be used by precising the Comparator when creating the TreeSet. Moreover, if this Comparator allows null, then null can be used as a key.

Conclusion#

We have the exact same conclusion than with the Map implementations:

  • For general purposes, HashSet

  • To keep the order of insertion, LinkedHashSet

  • To keep the elements sorted, TreeSet

And you should consider using a LinkedHashSet while inserting yourself the elements in the order you want instead of using a TreeSet.

Test specifications#

The methods used are in the collections_comparison.SetMethodsComparison.java class.

The test was launched using the collections_comparison.TestSetMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testSetMethodsComparison/.

We launched 5 times the test class to get these means and standard deviations. The results were obtained using 5 000 000 strings which were created following the pattern “Number is %d”, where %d is an other random integer. When we iterate, we only iterate one time over the whole list.

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

# Launches testSetMethodsComparison.sh

bash testSetMethodsComparison.sh 5 0 5000000	# HashSet
bash testSetMethodsComparison.sh 5 1 5000000	# LinkedHashSet
bash testSetMethodsComparison.sh 5 2 5000000	# TreeSet

To check if some objects belonged to the Set, we use Set.contains(String s), where s was created following the pattern “Number is %d”, where %d is an other random integer as many times as there are elements in the list, that is 5 000 000 times in our case.

To remove half of the elements, we use the Set.iterator() and we remove one out of two objects.

References#

  • Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf

  • Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html

Looping over an array or an ArrayList#

It seems that depending on the device you use, especially for the devices without Just-In-Time (JIT) compiler, like the first versions of Android and some IoT devices, some practises allow the application to be a little faster and less consuming.

Depending on the class of the array, it seems to be better to store the length of an array when we iterate over it, because without a JIT, the Java Virtual Machine (JVM) will have to retrieve the value of the length at each iteration.

It should not affect a computer or any device with a JIT, but we check it here and check if other loop structures could be used or not.


For the following benchmarks:

The “!=” versions are the use of the != comparator instead of the < comparator for the loop with array.length.

The definitions of our loops:

Base iteration
Object[] array = createMyArray(); // or ArrayList<Object> al = createMyArrayList()
for (int j = 0; j < numberOfLoops; j++) {
  for (int i = 0; i < array.length; i++) { // or i < al.size()
    // Doing stuff
  }
}
Recommended iteration
Object[] array = createMyArray();
int n = array.length
for (int j = 0; j < numberOfLoops; j++) {
  for (int i = 0; i < n; i++) {
    // Doing stuff
  }
}
Descending order
Object[] array = createMyArray();
for (int j = 0; j < numberOfLoops; j++) {
  for (int i = array.length - 1; i > -1; i--) {
    // Doing stuff
  }
}

Benchmark 1#

The int[] array were filled of 250 zeros and we looped over it 1 000 000 000 times.

For the ArrayList<Object> and the Object[] array, they were filled of 250 null and we looped over them 100 000 000 times.

We launched 10 times the test class to get these means and standard deviations. We have divided the result of the int[] by 10 because there were 10 times more loops and it gives a better view to compare the different consumption between the data structure.

int[]

Object[]

ArrayList<Object>

Base iteration

\(33.859 \pm 1.031\ J\)

\(206.85 \pm 11.72\ J\)

\(222.02 \pm 7.00\ J\)

Base iteration “!=”

\(319.562 \pm 10.887\ J\)

\(363.97 \pm 40.28\ J\)

\(542.67 \pm 48.36\ J\)

Recommended iteration

\(\mathbf{29.630 \pm 1.048\ J}\)

\(491.82 \pm 15.55\ J\)

\(220.49 \pm 10.57\ J\)

Recommended iteration “!=”

\(319.562 \pm 10.887\ J\)

\(368.17 \pm 20.98\ J\)

\(500.41 \pm 18.42\ J\)

Descending order iteration

\(39.016 \pm 1.440\ J\)

\(\mathbf{181.72 \pm 14.75\ J}\)

\(\mathbf{178.07 \pm 10.33\ J}\)

Benchmark 2#

The int[] array were filled of 250 000 zeros and we looped over it 100 000 times.

For the ArrayList<Object> and the Object[] array, they were filled of 250 000 null and we looped over them 100 000 times.

We launched 10 times the test class to get these means and standard deviations.

int[]

Object[]

ArrayList<Object>

Base iteration

\(\mathbf{61.81 \pm 4.06\ J}\)

\(\mathbf{208.89 \pm 6.32\ J}\)

\(231.58 \pm 7.88\ J\)

Base iteration “!=”

\(337.73 \pm 12.71\ J\)

\(354.30 \pm 9.90\ J\)

\(523.77 \pm 55.55\ J\)

Recommended iteration

\(\mathbf{62.86 \pm 2.96\ J}\)

\(341.34 \pm 10.12\ J\)

\(272.26 \pm 11.12\ J\)

Recommended iteration “!=”

\(266.58 \pm 12.22\ J\)

\(408.28 \pm 18.84\ J\)

\(555.32 \pm 55.55\ J\)

Descending order iteration

\(\mathbf{60.92 \pm 4.31\ J}\)

\(233.45 \pm 9.37\ J\)

\(\mathbf{194.29 \pm 11.55\ J}\)

Conclusion#

  • The != comparator should be avoided if the > or the < comparator can be used instead.

  • The so-called “recommended structure” is not a good name.

  • Surprisingly, the better structure for an ArrayList seems to be in a descending order.

Test specifications#

The methods used are in the collections_comparison.IterationComparison.java class.

The test was launched using the collections_comparison.TestIterationComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testIterationComparison/.

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

# Launches testIterationComparison.sh with the 3 possible arrays with size 250 and number of loops = 100 000 000 or 1 000 000 000
bash testIterationComparison.sh 10 0 250 100000000	# ArrayList<Object>
bash testIterationComparison.sh 10 1 250 100000000	# Object[]
bash testIterationComparison.sh 10 2 250 1000000000	# int[]

# Launches testIterationComparison.sh with the 3 possible arrays with size 250 000 and number of loops = 100 000
bash testIterationComparison.sh 10 0 250000 100000	# ArrayList<Object>
bash testIterationComparison.sh 10 1 250000 100000	# Object[]
bash testIterationComparison.sh 10 2 250000 100000	# int[]

References#