Catalog of Good Practices#

This catalog has been prepared by gathering recommended good practices in code development. Additionally, measurements have been made with the JoularJX tool of the energy consumption of some well-known algorithms and data structures implemented in Java.

For each entry in the catalog, you will find the description of benchmarks, in some cases the specification of the tests we performed and bibliographical references.

If you are interested and want to reproduce the tests, all the Java programs and shell scripts used to perform the measurements are available in:

or, alternatively, you can copy the following command on your terminal:

wget https://www-inf.telecom-sudparis.eu/COURS/cen/Mesures/SoftwareEfficiencyCatalog.zip


Specifications of the testing PC#

OS

Ubuntu 22.04 LTS 64 bits GNOME 42.2

Model

Dell Inc. Latitude 5580

CPU

Intel® Core™ i7-7600U CPU @ 2.80GHz × 4

RAM

32 Go

Measurement tool

JoularJX 1.0

Java Version

openjdk 17.0.3 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)


Sorting#

The time complexity of a sorting algorithm can not be better than \(O(n \log(n))\).

Many sorting algorithms exist, we only present a few of them:


Slow sorts#

These algorithms are simple to implement but have a time complexity of \(O(n^2)\).

int[]

ArrayList<Integer>

ArrayList<String>

Size

\(50\ 000\)

\(50\ 000\)

\(50\ 000\)

Bubble sort

\(68.89 \pm 3.06\ J\)

\(265.28 \pm 9.01\ J\)

\(593.44 \pm 11.12\ J\)

Selection sort

\(2.77 \pm 2.14\ J\)

\(37.97 \pm 1.49\ J\)

\(295.76 \pm 8.98\ J\)

Insertion sort

\(14.51 \pm 0.79\ J\)

\(73.06 \pm 7.40\ J\)

\(143.99 \pm 9.27\ J\)

We can notice that the selection sort is here surprisingly better than the insertion sort to sort lists or arrays of integers.


Quick Sorts#

These algorithms have a time complexity of \(O(n\log(n))\). The sort() method is a built-in method of Java. We did not code the Tim sort and the dual-pivot quicksort, but we show here which sort has been selected by Java developers.

Note that the Array.sort() method uses a dual-pivot quicksort for primitive arrays (int[], bytes[], long[], etc) but a Tim sort for Object arrays.

int[]

ArrayList<Integer>

ArrayList<String>

Size

\(3\ 000\ 000\)

\(3\ 000\ 000\)

\(3\ 000\ 000\)

Merge sort

\(11.33 \pm 0.65\ J\)

\(42.18 \pm 2.21\ J\)

\(59.08 \pm 2.46\ J\)

(one-pivot) Quicksort

\(6.38 \pm 0.38\ J\)

\(23.72 \pm 1.00\ J\)

\(49.78 \pm 3.48\ J\)

Tim sort (Collections.sort())

-

\(22.73 \pm 1.88\ J\)

\(37.10 \pm 6.54\ J\)

Dual-pivot quicksort (Arrays.sort() for int[])

\(2.57 \pm 0.64\ J\)

-

-


Conclusion#

It is acceptable to use a slow sort on a small list or to learn how to code, but it should never be used for bigger lists.

Moreover, if you know how to use a Comparator, you could simply use the implemented sort(Comparator) method.


Test specifications#

Click to see

The methods used are in the collections_comparison.ListSortingComparison.java and the collections_comparison.ArraySortingComparison.java classes.

The test was launched using the collections_comparison.TestListSortingComparison.java and the collections_comparison.TestArraySortingComparison.java classes with JoularJX by using the bash script and the files located in the folder scripts/test/testListSortingComparison/ and in scripts/test/testArraySortingComparison/ .

We launched 10 times the test class to get these means and standard deviations.

The integers used where randomly selected with the ThreadLocalRandom class and its nextInt() method.

We used this script to quickly launch the tests for the List:

Click to see
#!/bin/bash
# Launches testListSortingComparison.sh

bash testListSortingComparison.sh 10 0 3000000 Integer	# Quick sorts
bash testListSortingComparison.sh 10 1 50000   Integer	# Slow sorts

bash testListSortingComparison.sh 10 0 3000000 String	# Quick sorts
bash testListSortingComparison.sh 10 1 50000   String	# Slow sorts
```

</details>

We used this script to quickly launch the tests for the `int[]`:

<details>

<summary>Click to see</summary>

```bash
#!/bin/bash
# Launches testArraySortingComparison.sh

bash testArraySortingComparison.sh 10 0 3000000	# Quick sorts
bash testArraySortingComparison.sh 10 1 50000	# Slow sorts
```

</details>

</details>

</details>

******

## SQL Queries

### Query Performance

The advices we can give you are really depending on the database you are using. General good practices are:

- If possible, use the `LIMIT` clause. The gain will be proportional to the quantity of retrieved data.
- The `SELECT *` should be avoided. The select should only contain the columns you need and no other one.
- Unnecessary `JOIN` also.
- To analyze the query, you can add the `EXPLAIN FORMAT=JSON` cause before the query. It will returned a table with one row containing a JSON column and its content is a representation of the query plan.
The cause to add depends on the SQL server used. For our server (MariaDB), we need to use `ANALYZE FORMAT=JSON`.


We give you a simple example of a JSON file with a query on our database.
The query is:

```sql
ANALYZE FORMAT=JSON
SELECT
    Orders.Id
FROM
    Orders
WHERE
    ShippedDate IS NULL
```

and it gives us the JSON:

```JSON
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 2.429,
    "table": {
      "table_name": "Orders",
      "access_type": "ALL",
      "r_loops": 1,
      "rows": 6166,
      "r_rows": 6110,
      "r_total_time_ms": 2.1289,
      "filtered": 100,
      "r_filtered": 14.845,
      "attached_condition": "Orders.ShippedDate is null"
    }
  }
}
```

- `r_loops` show the number of times the node was actually executed. Here the table `Orders` was only scanned once.
- `r_total_time_ms` show the time that was spent to execute the node. Here the query took 2.429 ms to be traited.
- `r_rows` show how many rows were actually read. Here 6110 rows of the `Orders` table were read. `rows` shows the number of rows expected to be read by the optimizer, but here, the optimizer did not have any way to estimate it.
- `r_filtered` show the percentage of rows that have been actually filtered with the `attached_condition`. In the same way as with `rows` and `r_rows`, `filtered` show what percentage was expected by the optimizer.

Other data can be obtained with the `EXPLAIN/ANALYZE` statement, you will then have to see the documentation in order to understand them.

### References

- **Les 115 bonnes pratiques**, 2022, collectif GreenIT. https://github.com/cnumr/best-practices. BP 75 "Optimiser les requêtes aux bases de données" - https://github.com/cnumr/best-practices/blob/main/chapters/BP_075_fr.md
- **MariaDB - ANALYZE and EXPLAIN Statements**, 2022, https://mariadb.com/kb/en/analyze-and-explain-statements/


*****

## String concatenation

<details>

<summary>Click to see</summary>

Be aware that to concatenate $n$ strings requires a time complexity of $O(n^2)$ because strings are immutable. Therefore, each concatenation results in the creation of a new string, which means poor performance.

Consider using the `StringBuilder` class when appending characters for example. Alternatively, use a character array, which is even better optimized.
The thread-safe version of the `StringBuilder` is the `StringBuffer`.

******

### Use comparison

````{div}
```{list-table}
:header-rows: 1
:name: java-strings

* -
  - String
  - StringBuilder
  - StringBuffer
* - **Mutable**
  - No
  - Yes
  - Yes
* - **Thread-safe**
  - The String is immutable, therefore the object is thread-safe
  - No
  - Yes
* - **Append chars at the end**
  - $488.73 \pm 28.39\ J$
  - $0.32 \pm 0.14\ J$
  - $0.33 \pm 0.27\ J$
* - **Append chars at the beginning**
  - $219.64 \pm 23.91\ J$
  - $47.50 \pm 2.83\ J$
  - $42.74 \pm 4.54\ J$
```
````
******

### Test specifications

<details>

<summary>Click to see</summary>

The methods used are in the {{'[string_manipulation.StringManipulationComparison.java]({repo}/EfficaciteLogiciel/src/main/java/ecolog_tsp_uppa/string_manipulation/StringManipulationComparison.java)'.format(repo=urlweb)}} class.

The test was launched using the {{'[string_manipulation.TestStringManipulation.java]({repo}/EfficaciteLogiciel/src/main/test/ecolog_tsp_uppa/string_manipulation/TestStringManipulation.java)'.format(repo=urlweb)}} class with JoularJX by using the bash script and the files located in the folder {{'[scripts/test/testStringManipulation/]({repo}/EfficaciteLogiciel/scripts/test/testStringManipulation/)'.format(repo=urlweb)}}.

We launched 5 times the test class to get these means and standard deviations.

To get the results of the "Append chars" parts, we start with a empty String/StringBuilder/StringBuffer and we append 500 000 times the same character 'a'.

We used this script to quickly launch those tests:
<details>

<summary>Click to see</summary>

```bash
#!/bin/bash

# Launches testStringManipulation.sh with the 3 possible choices with size 500 000

bash testStringManipulation.sh 10 0 500000 # String
bash testStringManipulation.sh 10 1 500000 # StringBuilder
bash testStringManipulation.sh 10 2 500000 # StringBuffer
```

</details>

</details>

****
### References

* **Effective Java**. Third Edition. Joshua Bloch. "Item 64: Beware the performance of String concatenation".

</details>

****


## Java List

<details>

<summary>Click to see</summary>

### Choice of a list class

The `List` interface (and associated implementation classes) is one of the most used data structures for collections in Java. Java classes that implement this interface may be more consuming and less efficient than a Java `array` but it is much easier and safer to use.

When using a `List` in Java, you need to choose between different implementations. The main ones are the `ArrayList`, which is based on an array, and the `LinkedList`, which is based on a doubly linked list. Moreover, the `Vector`, which is based on the `ArrayList`, can be used for thread-safe usage. Here are the time complexities and energy consumptions resulting from a test of the main methods.

#### Main choices

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-data-structure
* -
  - ArrayList
  - LinkedList
  - Vector
* - **Data structure**
  - Based on an array
  - Based on a doubly linked list
  - Based on an array
* - **Thread-safe**
  - No
  - No
  - Yes
```
````
For the memory use, the `LinkedList` nodes have to store the adresses of the previous and the next nodes in addition of the data.

On the other hand, if the reserved capacity for the `ArrayList` is higher than what it actually needs, then the remaining memory is wasted.

So, if we know the exact capacity to use and we reserve it at the creation of an `ArrayList`, then the `ArrayList` would take less memory than a `LinkedList` filled with the same data.

******

#### Benchmark 1

<details>

<summary>Click to expand for benchmark 1</summary>

The following results were obtained with Lists of 100 000 elements.

<h4> Insertion of elements </h4>

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-insertion
* -
  - ArrayList
  - LinkedList
  - Vector
* - **List.add(Object)**
  - $O(1)$<br> (except if the ArrayList has to be extended: $O(n)$)
  - $O(1)$
  - *same as the ArrayList*
* - **List.add(int index, Object)**
  - $O(n)$<br>($O(1)$ if adding at the end)
  - $O(n)$<br>($O(1)$ if adding at the beginning or at the end)
  - *same as the ArrayList*
* - **Insertion** at the beginning
  - $10.605 \pm 0.421\ J$
  - $0.73 \pm 0.22 \ J$
  - $10.619 \pm 0.674\ J$
* - **Insertion** in the middle
  - $4.494 \pm 0.286\ J$
  - $\mathbf{564.60 \pm 28.95\ J}$
  - $4.466 \pm 0.225\ J$
* - **Insertion** at the end
  - $0.249 \pm 0.026\ J$
  - $0.41 \pm 0.05\ J$
  - $0.282 \pm 0.023\ J$
```
````

******

<h4>  Remove elements </h4>

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-remove  
* -
  - ArrayList
  - LinkedList
  - Vector
* - **List.remove(int index)**
  - $O(n)$<br>($O(1)$ if removing at the end)
  - $O(n)$<br>($O(1)$ if removing at the beginning or at the end)
  - *same as the ArrayList*
* - **Remove** randomly half of the elements by using List.remove(int index)
  - $3.587 \pm 0.226\ J$
  - $\mathbf{211.21 \pm 24.93\ J}$
  - $3.516 \pm 0.247\ J$
* - **Remove** elements whose index is even by using List.iterator() and Iterator.remove()
  - $4.656 \pm 0.296\ J$
  - $0\ J$
  - $4.769 \pm 0.247\ J$
```
````

******

<h4>Accessing the elements</h4>

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-access
* -
  - ArrayList
  - LinkedList
  - Vector
* - **List.get(int index)**
  - $O(1)$
  - $O(n)$<br>($O(1)$ if getting at the beginning or at the end)
  - *same as the ArrayList*
* - **Random access**
  - $0.072 \pm 0.026\ J$
  - $\mathbf{306.84 \pm 26.78\ J}$
  - $0.134 \pm 0.034\ J$
* - **Iterate** by using List.get(int index)
  - $0.035 \pm 0.017\ J$
  - $\mathbf{295.52 \pm 18.27\ J}$
  - $0.069 \pm 0.034\ J$
* - **Iterate** by using List.iterator() and Iterator.next()
  - $0.047 \pm 0.015\ J$
  - $0.17 \pm 0.00\ J$
  - $0.061 \pm 0.025\ J$
* - **Iterate** by using the enhanced for-each loop
  - $0.046 \pm 0.016\ J$
  - $0.22 \pm 0.02\ J$
  - $0.037 \pm 0.012\ J$
```
````

******

<h4> Sort the list</h4>

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-sort
* -
  - ArrayList
  - LinkedList
  - Vector
* - **Best sorting complexity**
  - $O(n \log(n))$
  - $O(n \log(n))$
  - $O(n \log(n))$
* - **Sort** by using Collections.sort(List)
  - $1.062 \pm 0.053\ J$
  - $1.32 \pm 0.30\ J$
  - $1.081 \pm 0.059\ J$
```
````

</details>

******

#### Benchmark 2

<details>

<summary>Click to expand for benchmark 2</summary>

The following results were obtained with Lists of 500 000 elements.
This time, we did not use the methods that require `List.get(int)` with the `LinkedList`. It will be written "-" in the table when no testing was done for the concerning method.

******

<h4> Insertion of elements</h4>

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-insertion
* -
  - ArrayList
  - LinkedList
  - Vector
* - **List.add(Object)**
  - $O(1)$<br> (except if the ArrayList has to be extended: $O(n)$)
  - $O(1)$
  - *same as the ArrayList*
* - **List.add(int index, Object)**
  - $O(n)$<br>($O(1)$ if adding at the end)
  - $O(n)$<br>($O(1)$ if adding at the beginning or at the end)
  - *same as the ArrayList*
* - **Insertion** at the beginning
  - $\mathbf{450.40 \pm 24.04\ J}$
  - $1.349 \pm 0.063 \ J$
  - $\mathbf{456.33 \pm 16.97\ J}$
* - **Insertion** in the middle
  - $163.92 \pm 7.74\ J$
  - `-`
  - $157.13 \pm 10.00 \ J$
* - **Insertion** at the end
  - $1.19 \pm 0.69\ J$
  - $1.259 \pm 0.041\ J$
  - $1.32 \pm 0.17\ J$
```
````

******

<h4>  Remove elements </h4>

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-remove  
* -
  - ArrayList
  - LinkedList
  - Vector
* - **List.remove(int index)**
  - $O(n)$<br>($O(1)$ if removing at the end)
  - $O(n)$<br>($O(1)$ if removing at the beginning or at the end)
  - *same as the ArrayList*
* - **Remove** randomly half of the elements by using List.remove(int index)
  - $163.93 \pm 6.96\ J$
  - `-`
  - $165.89 \pm 3.22\ J$
* - **Remove** elements whose index is even by using List.iterator() and Iterator.remove()
  - $205.24 \pm 6.60\ J$
  - $0.257 \pm 0.333\ J$
  - $165.89 \pm 3.22\ J$
```
````

******

<h4> Accessing the elements </h4>

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-access
* -
  - ArrayList
  - LinkedList
  - Vector
* - **List.get(int index)**
  - $O(1)$
  - $O(n)$<br>($O(1)$ if getting at the beginning or at the end)
  - *same as the ArrayList*
* - **Random access**
  - $0.96 \pm 0.14\ J$
  - `-`
  - $1.40 \pm 0.13\ J$
* - **Iterate** once by using List.get(int index)
  - $0.29 \pm 0.01\ J$
  - `-`
  - $0.30 \pm 0.01\ J$
* - **Iterate** once by using List.iterator() and Iterator.next()
  - $0\ J$
  - $0.149 \pm 0.034\  J$
  - $0.22 \pm 0.03\  J$
* - **Iterate** once by using the enhanced for-each loop
  - $0.25 \pm 0.03\  J$
  - $0.173 \pm 0.032\  J$
  - $0.33 \pm 0.09\  J$
```
````

******

<h4> Sort the list </h4>  

````{div}
```{list-table}
:header-rows: 1
:name: java-lists-sort
* -
  - ArrayList
  - LinkedList
  - Vector
* - **Best sorting complexity**
  - $O(n \log(n))$
  - $O(n \log(n))$
  - $O(n \log(n))$
* - **Sort** by using Collections.sort(List)
  - $2.16 \pm 2.18\  J$
  - $6.339 \pm 0.307\  J$
  - $3.12 \pm 2.32\  J$
```
````

</details>

******

### Conclusion

The `LinkedList` is designed to be used as a list on which we iterate (by using an `Iterator` or by using the enhanced for-each loop) or to be used as a queue, that is working with the head or with the tail of the list.

For general purposes, the `ArrayList` seems to be a better choice. But if we have to work on elements that are near the head or the tail of the list, then the `LinkedList` would have better performance, otherwise, it has poor performance when it has to explore deeply the list. Moreover, if you need a thread-safe list, the `Vector`, which is based on the `ArrayList`, is a synchronized object you could use without losing too much performance.

******

### Test specifications
<details>

<summary>Click to see</summary>

The methods used are in the {{'[collections_comparison.ListMethodsComparison.java]({repo}/EfficaciteLogiciel/src/main/java/ecolog_tsp_uppa/collections_comparison/ListMethodsComparison.java)'.format(repo=urlweb)}} class.

The test was launched using the {{'[collections_comparison.TestListMethodsComparison.java]({repo}/EfficaciteLogiciel/src/test/java/ecolog_tsp_uppa/collections_comparison/TestListMethodsComparison.java)'.format(repo=urlweb)}} class with JoularJX by using the bash script and the files located in the folder {{'[scripts/test/testListMethodsComparison/]({repo}/EfficaciteLogiciel/scripts/test/testListMethodsComparison/)'.format(repo=urlweb)}}.

We write in the table the mean and the standard deviation obtained after launching the testing class 10 times to get a mean and a standard deviation of the methods' consumption and looping over the functions 10 times (so we divided the results by 10).

We used this script to quickly launch those tests:
<details>

<summary>Click to see</summary>

```bash
#!/bin/bash

### Benchmark 1 ###
# Launches testListMethodsComparison.sh with the 3 possible lists with size 100 000

bash testListMethodsComparison.sh 10 0 100000 1 10	# ArrayList # Use of List.get() and loop 10 times
bash testListMethodsComparison.sh 10 1 100000 	# LinkedList
bash testListMethodsComparison.sh 10 2 100000 1 10	# Vector

### Benchmark 2 ###
# Launches testListMethodsComparison.sh with the 3 possible lists with size 500 000

bash testListMethodsComparison.sh 10 0 500000	# ArrayList
bash testListMethodsComparison.sh 10 1 500000 0 10 # LinkedList # No use of List.get() and loop 10 times
bash testListMethodsComparison.sh 10 2 500000	# Vector

The results were obtained using List of strings. The strings were randomly created following the pattern “Number is %d”, where %d is a random integer. When the energy consumption is 0 J, it means that the execution was to quick for JoularJX to give a value for the concerned method.

When we iterate, we only iterate one time over the whole list.

For random access, we use List.get(int randomInt), where randomInt is a random integer, as many times as there are elements in the list, that is 100 000 times in our case.

For the second benchmark, with the LinkedList, we have launched the methods 10 times in order for JoularJX to give us a power consumption higher than 0 J, so we divided by 10 the results we obtained.


References#

  • Effective Java. Third Edition. Joshua Bloch. “Item 28: Prefer List to Array”.

  • Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf

  • Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html


Java Map#

Choice of the map class#

A Map is an interface that maps keys to values.

When using a Map in Java, you have to choose between differents implementations. The main ones are the HashMap, the LinkedHashMap and the TreeMap. Moreover, the Hashtable and the ConcurrentHashMap can be used for thread-safe usage. These implementations differs in the data structures used. Here are the average time complexities and energy consumptions resulting from a test for the main methods.


Main choices#

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Data structure

Hash table

Hash table and linked lists (for keys, values and entries collections)

Red-Black tree

Hash table

Hash table

Order of collections

No (random order)

Order of insertion (as a LinkedList)

Natural order

No (random order)

No (random order)

Permits null as key

Yes

Yes

No

No

No

Thread-safe

No

No

No

Yes

Yes

The order of the TreeMap is the natural order by default, but other orders can be used by precising the Comparator when creating the TreeMap. Moreover, if this Comparator allows null, then null can be used as a key.


Benchmark 1#

Click to expand for benchmark 1

Insertion of elements

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Map.put( Object key, Object value )

\(O(1)\)
(except if the Hash table has to be extended: \(O(n)\))

\(O(1)\)
(same comment as the HashMap)

\(O(\log(n))\)

\(O(1)\)
(same comment as the HashMap)

\(O(1)\)
(same comment as the HashMap)

Insertion

\(393.97 \pm 8.65\ J\)

\(444.42 \pm 10.33\ J\)

\(\mathbf{918.49 \pm 19.11\ J}\)

\(421.14 \pm 11.02\ J\)

\(431.79 \pm 8.66\ J\)

However, if we use a bad hashing function, the worst-case scenario for the hash tables is \(O(n)\) (as if we iterate over a List).


Remove elements

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Map.remove( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

\(O(1)\)

\(O(1)\)

Remove half the elements

\(13.61 \pm 6.12\ J\)

\(28.87 \pm 6.68\ J\)

\(66.67 \pm 8.91\ J\)

\(15.66 \pm 6.41\ J\)

\(17.70 \pm 4.20\ J\)


Access

HashMap

LinkedHashMap

TreeMap

Hashtable

ConcurrentHashMap

Map.get( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

\(O(1)\)

\(O(1)\)

Random access

\(37.40 \pm 0.81\ J\)

\(44.00 \pm 2.54\ J\)

\(296.74 \pm 20.49\ J\)

\(42.49 \pm 2.73\ J\)

\(39.37 \pm 2.75\ J\)

Use Map.get() for each key

\(33.77 \pm 1.88\ J\)

\(39.85 \pm 2.75\ J\)

\(94.39 \pm 7.31\ J\)

\(27.93 \pm 3.21\ J\)

\(23.92 \pm 1.49\ J\)

Iterate over the Map.entrySet() with the Set.iterator()

\(24.32 \pm 3.92\ J\)

\(8.01 \pm 1.88\ J\)

\(42.99 \pm 10.61\ J\)

\(22.87 \pm 4.04\ J\)

\(27.74 \pm 4.42\ J\)

Iterate over the Map.entrySet() with a for-each loop

\(19.24 \pm 2.95\ J\)

\(6.28 \pm 1.53\ J\)

\(39.40 \pm 6.49\ J\)

\(16.20 \pm 0.94\ J\)

\(17.27 \pm 3.58\ J\)


Conclusion#

The choice really depends on the usage.

If the order of elements does not matter, then do not take the TreeMap, which has the worse performance overall, supposing the hash function used for the hash table is good enough.

For thread-safe usage, our results show a little difference between the Hashtable and the ConcurrentHashMap. Due to their way of dealing with read-write locks, the ConcurrentHashMap was supposed to be overall better than the Hashtable but their consumption are pretty close.

For general purposes, take the HashMap. TreeMap has the worst performance for a Map implementation and should be used only if you know what you are doing. Moreover, if you want to insert and keep elements in a particular order, then you might think of using the LinkedHashMap. This latter is also useful for copying a Map while keeping its order.


Test specifications#

Click to see

The methods used are in the collections_comparison.MapMethodsComparison.java class.

The test was launched using the collections_comparison.TestMapMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testMapMethodsComparison/.

We launched 10 times the test class to get these means and standard deviations. The results were obtained using 20 000 000 entries where the key was a random integer and the value was a string, which was created following the pattern “Number is %d”, where %d is an other random integer. When we iterate, we only iterate one time over the whole list.

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

# Launches testMapMethodsComparison.sh
bash testMapMethodsComparison.sh 10 0 20000000	# HashMap
bash testMapMethodsComparison.sh 10 1 20000000	# LinkedHashMap
bash testMapMethodsComparison.sh 10 2 20000000	# TreeMap
bash testMapMethodsComparison.sh 10 3 20000000	# Hashtable
bash testMapMethodsComparison.sh 10 4 20000000	# ConcurrentHashMap

For random access, we use Map.get(int randomInt), where randomInt is a random integer, as many times as there are elements in the list, that is 5 000 000 times in our case.

To remove half of the elements, we use the Map.entrySet.iterator() and we remove one out two objects.


References#

  • Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf

  • Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html


Java Set#

Choice of the set class#

In Java, the main implementations of the Set, which is a Collection whose elements are unique, are the HashSet, the LinkedHashSet and the TreeSet. They are based on their Map counterpart where the keys would be the elements of the set and the values would all be null.


Main choices#

HashSet

LinkedHashSet

TreeSet

Data structure

Hash table

Hash table and linked list

Red-Black tree

Order of collections

No (random order)

Order of insertion (as a LinkedList)

Natural order

Permits null as key

Yes

Yes

No, except if using a comparator that allow

Thread-safe

No

No

No


Benchmark 1#

Click to expand for benchmark 1

Insertion of elements

HashSet

LinkedHashSet

TreeSet

Set.add(Object)

\(O(1)\)
(except if the Hash table has to be extended: \(O(n)\))

\(O(1)\)
(same comment as the HashSet)

\(O(\log(n))\)

Insertion

\(86.04 \pm 1.39\ J\)

\(96.92 \pm 1.08\ J\)

\(186.42 \pm 4.65\ J\)

However, if we use a bad hashing function, the worst-case scenario for the hash tables is \(O(n)\) (as if we iterate over a List).


Remove elements

HashSet

LinkedHashSet

TreeSet

Set.remove( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

Remove half the elements

\(15.47 \pm 0.75\ J\)

\(15.97 \pm 0.16\ J\)

\(5.73 \pm 0.81\ J\)


Access

HashSet

LinkedHashSet

TreeSet

Set.contains( Object key )

\(O(1)\)

\(O(1)\)

\(O(\log(n))\)

Check if objects belong to the Set

\(12.87 \pm 7.74\ J\)

\(26.12 \pm 1.25\ J\)

\(106.58 \pm 3.89\ J\)

Iterate with the Set.iterator()

\(4.32 \pm 0.42\ J\)

\(1.52 \pm 0.02\ J\)

\(6.65 \pm 0.72\ J\)


The order of the TreeSet is the natural order by default, but other orders can be used by precising the Comparator when creating the TreeSet. Moreover, if this Comparator allows null, then null can be used as a key.


Conclusion#

We have the exact same conclusion than with the Map implementations:

  • For general purposes, HashSet

  • To keep the order of insertion, LinkedHashSet

  • To keep the elements sorted, TreeSet

And you should consider using a LinkedHashSet while inserting yourself the elements in the order you want instead of using a TreeSet.


Test specifications#

Click to see

The methods used are in the collections_comparison.SetMethodsComparison.java class.

The test was launched using the collections_comparison.TestSetMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testSetMethodsComparison/.

We launched 5 times the test class to get these means and standard deviations. The results were obtained using 5 000 000 strings which were created following the pattern “Number is %d”, where %d is an other random integer. When we iterate, we only iterate one time over the whole list.

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

# Launches testSetMethodsComparison.sh

bash testSetMethodsComparison.sh 5 0 5000000	# HashSet
bash testSetMethodsComparison.sh 5 1 5000000	# LinkedHashSet
bash testSetMethodsComparison.sh 5 2 5000000	# TreeSet

To check if some objects belonged to the Set, we use Set.contains(String s), where s was created following the pattern “Number is %d”, where %d is an other random integer as many times as there are elements in the list, that is 5 000 000 times in our case.

To remove half of the elements, we use the Set.iterator() and we remove one out of two objects.


References#

  • Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf

  • Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html


Looping over an array or an ArrayList#

It seems that depending on the device you use, especially for the devices without Just-In-Time (JIT) compiler, like the first versions of Android and some IoT devices, some practises allow the application to be a little faster and less consuming.

Depending on the class of the array, it seems to be better to store the length of an array when we iterate over it, because without a JIT, the Java Virtual Machine (JVM) will have to retrieve the value of the length at each iteration.

It should not affect a computer or any device with a JIT, but we check it here and check if other loop structures could be used or not.


For the following benchmarks:

The “!=” versions are the use of the != comparator instead of the < comparator for the loop with array.length.

The definitions of our loops:

Base iteration
Object[] array = createMyArray(); // or ArrayList<Object> al = createMyArrayList()
for (int j = 0; j < numberOfLoops; j++) {
  for (int i = 0; i < array.length; i++) { // or i < al.size()
    // Doing stuff
  }
}
Recommended iteration
Object[] array = createMyArray();
int n = array.length
for (int j = 0; j < numberOfLoops; j++) {
  for (int i = 0; i < n; i++) {
    // Doing stuff
  }
}
Descending order
Object[] array = createMyArray();
for (int j = 0; j < numberOfLoops; j++) {
  for (int i = array.length - 1; i > -1; i--) {
    // Doing stuff
  }
}

Benchmark 1#

Click to expand for benchmark 1

The int[] array were filled of 250 zeros and we looped over it 1 000 000 000 times.

For the ArrayList<Object> and the Object[] array, they were filled of 250 null and we looped over them 100 000 000 times.

We launched 10 times the test class to get these means and standard deviations. We have divided the result of the int[] by 10 because there were 10 times more loops and it gives a better view to compare the different consumption between the data structure.

int[]

Object[]

ArrayList<Object>

Base iteration

\(33.859 \pm 1.031\ J\)

\(206.85 \pm 11.72\ J\)

\(222.02 \pm 7.00\ J\)

Base iteration “!=”

\(319.562 \pm 10.887\ J\)

\(363.97 \pm 40.28\ J\)

\(542.67 \pm 48.36\ J\)

Recommended iteration

\(\mathbf{29.630 \pm 1.048\ J}\)

\(491.82 \pm 15.55\ J\)

\(220.49 \pm 10.57\ J\)

Recommended iteration “!=”

\(319.562 \pm 10.887\ J\)

\(368.17 \pm 20.98\ J\)

\(500.41 \pm 18.42\ J\)

Descending order iteration

\(39.016 \pm 1.440\ J\)

\(\mathbf{181.72 \pm 14.75\ J}\)

\(\mathbf{178.07 \pm 10.33\ J}\)


Benchmark 2#

Click to expand for benchmark 2

The int[] array were filled of 250 000 zeros and we looped over it 100 000 times.

For the ArrayList<Object> and the Object[] array, they were filled of 250 000 null and we looped over them 100 000 times.

We launched 10 times the test class to get these means and standard deviations.

int[]

Object[]

ArrayList<Object>

Base iteration

\(\mathbf{61.81 \pm 4.06\ J}\)

\(\mathbf{208.89 \pm 6.32\ J}\)

\(231.58 \pm 7.88\ J\)

Base iteration “!=”

\(337.73 \pm 12.71\ J\)

\(354.30 \pm 9.90\ J\)

\(523.77 \pm 55.55\ J\)

Recommended iteration

\(\mathbf{62.86 \pm 2.96\ J}\)

\(341.34 \pm 10.12\ J\)

\(272.26 \pm 11.12\ J\)

Recommended iteration “!=”

\(266.58 \pm 12.22\ J\)

\(408.28 \pm 18.84\ J\)

\(555.32 \pm 55.55\ J\)

Descending order iteration

\(\mathbf{60.92 \pm 4.31\ J}\)

\(233.45 \pm 9.37\ J\)

\(\mathbf{194.29 \pm 11.55\ J}\)


Conclusion#

  • The != comparator should be avoided if the > or the < comparator can be used instead.

  • The so-called “recommended structure” is not a good name.

  • Surprisingly, the better structure for an ArrayList seems to be in a descending order.


Test specifications#

Click to see

The methods used are in the collections_comparison.IterationComparison.java class.

The test was launched using the collections_comparison.TestIterationComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testIterationComparison/.

We used this script to quickly launch those tests:

Click to see
#!/bin/bash

# Launches testIterationComparison.sh with the 3 possible arrays with size 250 and number of loops = 100 000 000 or 1 000 000 000
bash testIterationComparison.sh 10 0 250 100000000	# ArrayList<Object>
bash testIterationComparison.sh 10 1 250 100000000	# Object[]
bash testIterationComparison.sh 10 2 250 1000000000	# int[]

# Launches testIterationComparison.sh with the 3 possible arrays with size 250 000 and number of loops = 100 000
bash testIterationComparison.sh 10 0 250000 100000	# ArrayList<Object>
bash testIterationComparison.sh 10 1 250000 100000	# Object[]
bash testIterationComparison.sh 10 2 250000 100000	# int[]

References#