Catalog of Good Practices#
This catalog has been prepared by gathering recommended good practices in code development. Additionally, measurements have been made with the JoularJX tool of the energy consumption of some well-known algorithms and data structures implemented in Java.
For each entry in the catalog, you will find the description of benchmarks, in some cases the specification of the tests we performed and bibliographical references.
If you are interested and want to reproduce the tests, all the Java programs and shell scripts used to perform the measurements are available in:
or, alternatively, you can copy the following command on your terminal:
wget https://www-inf.telecom-sudparis.eu/COURS/cen/Mesures/SoftwareEfficiencyCatalog.zip
Note
For more information on JoularJX:
Specifications of the testing PC#
OS |
Ubuntu 22.04 LTS 64 bits GNOME 42.2 |
---|---|
Model |
Dell Inc. Latitude 5580 |
CPU |
Intel® Core™ i7-7600U CPU @ 2.80GHz × 4 |
RAM |
32 Go |
Measurement tool |
JoularJX 1.0 |
Java Version |
openjdk 17.0.3 2022-04-19 |
Sorting#
The time complexity of a sorting algorithm can not be better than \(O(n \log(n))\).
Many sorting algorithms exist, we only present a few of them:
Slow sorts#
These algorithms are simple to implement but have a time complexity of \(O(n^2)\).
int[] |
ArrayList<Integer> |
ArrayList<String> |
|
---|---|---|---|
Size |
\(50\ 000\) |
\(50\ 000\) |
\(50\ 000\) |
Bubble sort |
\(68.89 \pm 3.06\ J\) |
\(265.28 \pm 9.01\ J\) |
\(593.44 \pm 11.12\ J\) |
Selection sort |
\(2.77 \pm 2.14\ J\) |
\(37.97 \pm 1.49\ J\) |
\(295.76 \pm 8.98\ J\) |
Insertion sort |
\(14.51 \pm 0.79\ J\) |
\(73.06 \pm 7.40\ J\) |
\(143.99 \pm 9.27\ J\) |
We can notice that the selection sort is here surprisingly better than the insertion sort to sort lists or arrays of integers.
Quick Sorts#
These algorithms have a time complexity of \(O(n\log(n))\). The sort()
method is a built-in method of Java. We did not code the Tim sort and the dual-pivot quicksort, but we show here which sort has been selected by Java developers.
Note that the Array.sort()
method uses a dual-pivot quicksort for primitive arrays (int[], bytes[], long[], etc) but a Tim sort for Object arrays.
int[] |
ArrayList<Integer> |
ArrayList<String> |
|
---|---|---|---|
Size |
\(3\ 000\ 000\) |
\(3\ 000\ 000\) |
\(3\ 000\ 000\) |
Merge sort |
\(11.33 \pm 0.65\ J\) |
\(42.18 \pm 2.21\ J\) |
\(59.08 \pm 2.46\ J\) |
(one-pivot) Quicksort |
\(6.38 \pm 0.38\ J\) |
\(23.72 \pm 1.00\ J\) |
\(49.78 \pm 3.48\ J\) |
Tim sort (Collections.sort()) |
|
\(22.73 \pm 1.88\ J\) |
\(37.10 \pm 6.54\ J\) |
Dual-pivot quicksort (Arrays.sort() for int[]) |
\(2.57 \pm 0.64\ J\) |
|
|
Conclusion#
It is acceptable to use a slow sort on a small list or to learn how to code, but it should never be used for bigger lists.
Moreover, if you know how to use a Comparator, you could simply use the implemented sort(Comparator) method.
Test specifications#
The methods used are in the collections_comparison.ListSortingComparison.java and the collections_comparison.ArraySortingComparison.java classes.
The test was launched using the collections_comparison.TestListSortingComparison.java and the collections_comparison.TestArraySortingComparison.java classes with JoularJX by using the bash script and the files located in the folder scripts/test/testListSortingComparison/ and in scripts/test/testArraySortingComparison/ .
We launched 10 times the test class to get these means and standard deviations.
The integers used where randomly selected with the ThreadLocalRandom
class and its nextInt()
method.
We used this script to quickly launch the tests for the List
:
Click to see
#!/bin/bash
# Launches testListSortingComparison.sh
bash testListSortingComparison.sh 10 0 3000000 Integer # Quick sorts
bash testListSortingComparison.sh 10 1 50000 Integer # Slow sorts
bash testListSortingComparison.sh 10 0 3000000 String # Quick sorts
bash testListSortingComparison.sh 10 1 50000 String # Slow sorts
We used this script to quickly launch the tests for the int[]
:
Click to see
#!/bin/bash
# Launches testArraySortingComparison.sh
bash testArraySortingComparison.sh 10 0 3000000 # Quick sorts
bash testArraySortingComparison.sh 10 1 50000 # Slow sorts
SQL Queries#
Query Performance#
The advices we can give you are really depending on the database you are using. General good practices are:
If possible, use the
LIMIT
clause. The gain will be proportional to the quantity of retrieved data.The
SELECT *
should be avoided. The select should only contain the columns you need and no other one.Unnecessary
JOIN
also.To analyze the query, you can add the
EXPLAIN FORMAT=JSON
cause before the query. It will returned a table with one row containing a JSON column and its content is a representation of the query plan. The cause to add depends on the SQL server used. For our server (MariaDB), we need to useANALYZE FORMAT=JSON
.
We give you a simple example of a JSON file with a query on our database. The query is:
ANALYZE FORMAT=JSON
SELECT
Orders.Id
FROM
Orders
WHERE
ShippedDate IS NULL
and it gives us the JSON:
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 2.429,
"table": {
"table_name": "Orders",
"access_type": "ALL",
"r_loops": 1,
"rows": 6166,
"r_rows": 6110,
"r_total_time_ms": 2.1289,
"filtered": 100,
"r_filtered": 14.845,
"attached_condition": "Orders.ShippedDate is null"
}
}
}
r_loops
show the number of times the node was actually executed. Here the tableOrders
was only scanned once.r_total_time_ms
show the time that was spent to execute the node. Here the query took 2.429 ms to be traited.r_rows
show how many rows were actually read. Here 6110 rows of theOrders
table were read.rows
shows the number of rows expected to be read by the optimizer, but here, the optimizer did not have any way to estimate it.r_filtered
show the percentage of rows that have been actually filtered with theattached_condition
. In the same way as withrows
andr_rows
,filtered
show what percentage was expected by the optimizer.
Other data can be obtained with the EXPLAIN/ANALYZE
statement, you will then have to see the documentation in order to understand them.
References#
Les 115 bonnes pratiques, 2022, collectif GreenIT. cnumr/best-practices. BP 75 “Optimiser les requêtes aux bases de données” - cnumr/best-practices
MariaDB - ANALYZE and EXPLAIN Statements, 2022, https://mariadb.com/kb/en/analyze-and-explain-statements/
String concatenation#
Be aware that to concatenate \(n\) strings requires a time complexity of \(O(n^2)\) because strings are immutable. Therefore, each concatenation results in the creation of a new string, which means poor performance.
Consider using the StringBuilder
class when appending characters for example. Alternatively, use a character array, which is even better optimized.
The thread-safe version of the StringBuilder
is the StringBuffer
.
Use comparison#
String |
StringBuilder |
StringBuffer |
|
---|---|---|---|
Mutable |
No |
Yes |
Yes |
Thread-safe |
The String is immutable, therefore the object is thread-safe |
No |
Yes |
Append chars at the end |
\(488.73 \pm 28.39\ J\) |
\(0.32 \pm 0.14\ J\) |
\(0.33 \pm 0.27\ J\) |
Append chars at the beginning |
\(219.64 \pm 23.91\ J\) |
\(47.50 \pm 2.83\ J\) |
\(42.74 \pm 4.54\ J\) |
Test specifications#
The methods used are in the string_manipulation.StringManipulationComparison.java class.
The test was launched using the string_manipulation.TestStringManipulation.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testStringManipulation/.
We launched 5 times the test class to get these means and standard deviations.
To get the results of the “Append chars” parts, we start with a empty String/StringBuilder/StringBuffer and we append 500 000 times the same character ‘a’.
We used this script to quickly launch those tests:
Click to see
#!/bin/bash
# Launches testStringManipulation.sh with the 3 possible choices with size 500 000
bash testStringManipulation.sh 10 0 500000 # String
bash testStringManipulation.sh 10 1 500000 # StringBuilder
bash testStringManipulation.sh 10 2 500000 # StringBuffer
References#
Effective Java. Third Edition. Joshua Bloch. “Item 64: Beware the performance of String concatenation”.
Java List#
Choice of a list class#
The List
interface (and associated implementation classes) is one of the most used data structures for collections in Java. Java classes that implement this interface may be more consuming and less efficient than a Java array
but it is much easier and safer to use.
When using a List
in Java, you need to choose between different implementations. The main ones are the ArrayList
, which is based on an array, and the LinkedList
, which is based on a doubly linked list. Moreover, the Vector
, which is based on the ArrayList
, can be used for thread-safe usage. Here are the time complexities and energy consumptions resulting from a test of the main methods.
Main choices#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
Data structure |
Based on an array |
Based on a doubly linked list |
Based on an array |
Thread-safe |
No |
No |
Yes |
For the memory use, the LinkedList
nodes have to store the adresses of the previous and the next nodes in addition of the data.
On the other hand, if the reserved capacity for the ArrayList
is higher than what it actually needs, then the remaining memory is wasted.
So, if we know the exact capacity to use and we reserve it at the creation of an ArrayList
, then the ArrayList
would take less memory than a LinkedList
filled with the same data.
Benchmark 1 (100 000 elements)#
The following results were obtained with Lists of 100 000 elements.
Insertion of elements#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
List.add(Object) |
\(O(1)\) |
\(O(1)\) |
same as the ArrayList |
List.add(int index, Object) |
\(O(n)\) |
\(O(n)\) |
same as the ArrayList |
Insertion at the beginning |
\(10.605 \pm 0.421\ J\) |
\(0.73 \pm 0.22 \ J\) |
\(10.619 \pm 0.674\ J\) |
Insertion in the middle |
\(4.494 \pm 0.286\ J\) |
\(\mathbf{564.60 \pm 28.95\ J}\) |
\(4.466 \pm 0.225\ J\) |
Insertion at the end |
\(0.249 \pm 0.026\ J\) |
\(0.41 \pm 0.05\ J\) |
\(0.282 \pm 0.023\ J\) |
Remove elements#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
List.remove(int index) |
\(O(n)\) |
\(O(n)\) |
same as the ArrayList |
Remove randomly half of the elements by using List.remove(int index) |
\(3.587 \pm 0.226\ J\) |
\(\mathbf{211.21 \pm 24.93\ J}\) |
\(3.516 \pm 0.247\ J\) |
Remove elements whose index is even by using List.iterator() and Iterator.remove() |
\(4.656 \pm 0.296\ J\) |
\(0\ J\) |
\(4.769 \pm 0.247\ J\) |
Accessing the elements#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
List.get(int index) |
\(O(1)\) |
\(O(n)\) |
same as the ArrayList |
Random access |
\(0.072 \pm 0.026\ J\) |
\(\mathbf{306.84 \pm 26.78\ J}\) |
\(0.134 \pm 0.034\ J\) |
Iterate by using List.get(int index) |
\(0.035 \pm 0.017\ J\) |
\(\mathbf{295.52 \pm 18.27\ J}\) |
\(0.069 \pm 0.034\ J\) |
Iterate by using List.iterator() and Iterator.next() |
\(0.047 \pm 0.015\ J\) |
\(0.17 \pm 0.00\ J\) |
\(0.061 \pm 0.025\ J\) |
Iterate by using the enhanced for-each loop |
\(0.046 \pm 0.016\ J\) |
\(0.22 \pm 0.02\ J\) |
\(0.037 \pm 0.012\ J\) |
Sort the list#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
Best sorting complexity |
\(O(n \log(n))\) |
\(O(n \log(n))\) |
\(O(n \log(n))\) |
Sort by using Collections.sort(List) |
\(1.062 \pm 0.053\ J\) |
\(1.32 \pm 0.30\ J\) |
\(1.081 \pm 0.059\ J\) |
Benchmark 2 (500000 elements)#
The following results were obtained with Lists of 500 000 elements.
This time, we did not use the methods that require List.get(int)
with the LinkedList
. It will be written “-” in the table when no testing was done for the concerning method.
Insertion of elements#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
List.add(Object) |
\(O(1)\) |
\(O(1)\) |
same as the ArrayList |
List.add(int index, Object) |
\(O(n)\) |
\(O(n)\) |
same as the ArrayList |
Insertion at the beginning |
\(\mathbf{450.40 \pm 24.04\ J}\) |
\(1.349 \pm 0.063 \ J\) |
\(\mathbf{456.33 \pm 16.97\ J}\) |
Insertion in the middle |
\(163.92 \pm 7.74\ J\) |
|
\(157.13 \pm 10.00 \ J\) |
Insertion at the end |
\(1.19 \pm 0.69\ J\) |
\(1.259 \pm 0.041\ J\) |
\(1.32 \pm 0.17\ J\) |
Remove elements#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
List.remove(int index) |
\(O(n)\) |
\(O(n)\) |
same as the ArrayList |
Remove randomly half of the elements by using List.remove(int index) |
\(163.93 \pm 6.96\ J\) |
|
\(165.89 \pm 3.22\ J\) |
Remove elements whose index is even by using List.iterator() and Iterator.remove() |
\(205.24 \pm 6.60\ J\) |
\(0.257 \pm 0.333\ J\) |
\(165.89 \pm 3.22\ J\) |
Accessing the elements#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
List.get(int index) |
\(O(1)\) |
\(O(n)\) |
same as the ArrayList |
Random access |
\(0.96 \pm 0.14\ J\) |
|
\(1.40 \pm 0.13\ J\) |
Iterate once by using List.get(int index) |
\(0.29 \pm 0.01\ J\) |
|
\(0.30 \pm 0.01\ J\) |
Iterate once by using List.iterator() and Iterator.next() |
\(0\ J\) |
\(0.149 \pm 0.034\ J\) |
\(0.22 \pm 0.03\ J\) |
Iterate once by using the enhanced for-each loop |
\(0.25 \pm 0.03\ J\) |
\(0.173 \pm 0.032\ J\) |
\(0.33 \pm 0.09\ J\) |
Sort the list#
ArrayList |
LinkedList |
Vector |
|
---|---|---|---|
Best sorting complexity |
\(O(n \log(n))\) |
\(O(n \log(n))\) |
\(O(n \log(n))\) |
Sort by using Collections.sort(List) |
\(2.16 \pm 2.18\ J\) |
\(6.339 \pm 0.307\ J\) |
\(3.12 \pm 2.32\ J\) |
Conclusion#
The LinkedList
is designed to be used as a list on which we iterate (by using an Iterator
or by using the enhanced for-each loop) or to be used as a queue, that is working with the head or with the tail of the list.
For general purposes, the ArrayList
seems to be a better choice. But if we have to work on elements that are near the head or the tail of the list, then the LinkedList
would have better performance, otherwise, it has poor performance when it has to explore deeply the list. Moreover, if you need a thread-safe list, the Vector
, which is based on the ArrayList
, is a synchronized object you could use without losing too much performance.
Test specifications#
The methods used are in the collections_comparison.ListMethodsComparison.java class.
The test was launched using the collections_comparison.TestListMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testListMethodsComparison/.
We write in the table the mean and the standard deviation obtained after launching the testing class 10 times to get a mean and a standard deviation of the methods’ consumption and looping over the functions 10 times (so we divided the results by 10).
We used this script to quickly launch those tests:
Click to see
#!/bin/bash
### Benchmark 1 ###
# Launches testListMethodsComparison.sh with the 3 possible lists with size 100 000
bash testListMethodsComparison.sh 10 0 100000 1 10 # ArrayList # Use of List.get() and loop 10 times
bash testListMethodsComparison.sh 10 1 100000 # LinkedList
bash testListMethodsComparison.sh 10 2 100000 1 10 # Vector
### Benchmark 2 ###
# Launches testListMethodsComparison.sh with the 3 possible lists with size 500 000
bash testListMethodsComparison.sh 10 0 500000 # ArrayList
bash testListMethodsComparison.sh 10 1 500000 0 10 # LinkedList # No use of List.get() and loop 10 times
bash testListMethodsComparison.sh 10 2 500000 # Vector
The results were obtained using List
of strings. The strings were randomly created following the pattern “Number is %d”, where %d is a random integer. When the energy consumption is 0 J, it means that the execution was to quick for JoularJX to give a value for the concerned method.
When we iterate, we only iterate one time over the whole list.
For random access, we use List.get(int randomInt)
, where randomInt is a random integer, as many times as there are elements in the list, that is 100 000 times in our case.
For the second benchmark, with the LinkedList
, we have launched the methods 10 times in order for JoularJX to give us a power consumption higher than 0 J, so we divided by 10 the results we obtained.
References#
Effective Java. Third Edition. Joshua Bloch. “Item 28: Prefer
List
toArray
”.Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf
Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html
Java Map#
Choice of the map class#
A Map
is an interface that maps keys to values.
When using a Map
in Java, you have to choose between differents implementations. The main ones are the HashMap
, the LinkedHashMap
and the TreeMap
. Moreover, the Hashtable
and the ConcurrentHashMap
can be used for thread-safe usage. These implementations differs in the data structures used. Here are the average time complexities and energy consumptions resulting from a test for the main methods.
Main choices#
HashMap |
LinkedHashMap |
TreeMap |
Hashtable |
ConcurrentHashMap |
|
---|---|---|---|---|---|
Data structure |
Hash table |
Hash table and linked lists (for keys, values and entries collections) |
Red-Black tree |
Hash table |
Hash table |
Order of collections |
No (random order) |
Order of insertion (as a LinkedList) |
Natural order |
No (random order) |
No (random order) |
Permits null as key |
Yes |
Yes |
No |
No |
No |
Thread-safe |
No |
No |
No |
Yes |
Yes |
The order of the TreeMap
is the natural order by default, but other orders can be used by precising the Comparator
when creating the TreeMap. Moreover, if this Comparator
allows null
, then null
can be used as a key.
Benchmark 1#
Insertion of elements#
HashMap |
LinkedHashMap |
TreeMap |
Hashtable |
ConcurrentHashMap |
|
---|---|---|---|---|---|
Map.put( Object key, Object value ) |
\(O(1)\) |
\(O(1)\) |
\(O(\log(n))\) |
\(O(1)\) |
\(O(1)\) |
Insertion |
\(393.97 \pm 8.65\ J\) |
\(444.42 \pm 10.33\ J\) |
\(\mathbf{918.49 \pm 19.11\ J}\) |
\(421.14 \pm 11.02\ J\) |
\(431.79 \pm 8.66\ J\) |
However, if we use a bad hashing function, the worst-case scenario for the hash tables is \(O(n)\) (as if we iterate over a List).
Remove elements#
HashMap |
LinkedHashMap |
TreeMap |
Hashtable |
ConcurrentHashMap |
|
---|---|---|---|---|---|
Map.remove( Object key ) |
\(O(1)\) |
\(O(1)\) |
\(O(\log(n))\) |
\(O(1)\) |
\(O(1)\) |
Remove half the elements |
\(13.61 \pm 6.12\ J\) |
\(28.87 \pm 6.68\ J\) |
\(66.67 \pm 8.91\ J\) |
\(15.66 \pm 6.41\ J\) |
\(17.70 \pm 4.20\ J\) |
Access#
HashMap |
LinkedHashMap |
TreeMap |
Hashtable |
ConcurrentHashMap |
|
---|---|---|---|---|---|
Map.get( Object key ) |
\(O(1)\) |
\(O(1)\) |
\(O(\log(n))\) |
\(O(1)\) |
\(O(1)\) |
Random access |
\(37.40 \pm 0.81\ J\) |
\(44.00 \pm 2.54\ J\) |
\(296.74 \pm 20.49\ J\) |
\(42.49 \pm 2.73\ J\) |
\(39.37 \pm 2.75\ J\) |
Use Map.get() for each key |
\(33.77 \pm 1.88\ J\) |
\(39.85 \pm 2.75\ J\) |
\(94.39 \pm 7.31\ J\) |
\(27.93 \pm 3.21\ J\) |
\(23.92 \pm 1.49\ J\) |
Iterate over the Map.entrySet() with the Set.iterator() |
\(24.32 \pm 3.92\ J\) |
\(8.01 \pm 1.88\ J\) |
\(42.99 \pm 10.61\ J\) |
\(22.87 \pm 4.04\ J\) |
\(27.74 \pm 4.42\ J\) |
Iterate over the Map.entrySet() with a for-each loop |
\(19.24 \pm 2.95\ J\) |
\(6.28 \pm 1.53\ J\) |
\(39.40 \pm 6.49\ J\) |
\(16.20 \pm 0.94\ J\) |
\(17.27 \pm 3.58\ J\) |
Conclusion#
The choice really depends on the usage.
If the order of elements does not matter, then do not take the TreeMap
, which has the worse performance overall, supposing the hash function used for the hash table is good enough.
For thread-safe usage, our results show a little difference between the Hashtable
and the ConcurrentHashMap
. Due to their way of dealing with read-write locks, the ConcurrentHashMap
was supposed to be overall better than the Hashtable
but their consumption are pretty close.
For general purposes, take the HashMap
. TreeMap
has the worst performance for a Map
implementation and should be used only if you know what you are doing.
Moreover, if you want to insert and keep elements in a particular order, then you might think of using the LinkedHashMap
. This latter is also useful for copying a Map
while keeping its order.
Test specifications#
The methods used are in the collections_comparison.MapMethodsComparison.java class.
The test was launched using the collections_comparison.TestMapMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testMapMethodsComparison/.
We launched 10 times the test class to get these means and standard deviations. The results were obtained using 20 000 000 entries where the key was a random integer and the value was a string, which was created following the pattern “Number is %d”, where %d is an other random integer. When we iterate, we only iterate one time over the whole list.
We used this script to quickly launch those tests:
Click to see
#!/bin/bash
# Launches testMapMethodsComparison.sh
bash testMapMethodsComparison.sh 10 0 20000000 # HashMap
bash testMapMethodsComparison.sh 10 1 20000000 # LinkedHashMap
bash testMapMethodsComparison.sh 10 2 20000000 # TreeMap
bash testMapMethodsComparison.sh 10 3 20000000 # Hashtable
bash testMapMethodsComparison.sh 10 4 20000000 # ConcurrentHashMap
For random access, we use Map.get(int randomInt)
, where randomInt
is a random integer, as many times as there are elements in the list, that is 5 000 000 times in our case.
To remove half of the elements, we use the Map.entrySet.iterator()
and we remove one out two objects.
References#
Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf
Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html
Java Set#
Choice of the set class#
In Java, the main implementations of the Set
, which is a Collection
whose elements are unique, are the HashSet
, the LinkedHashSet
and the TreeSet
. They are based on their Map
counterpart where the keys would be the elements of the set and the values would all be null
.
Main choices#
HashSet |
LinkedHashSet |
TreeSet |
|
---|---|---|---|
Data structure |
Hash table |
Hash table and linked list |
Red-Black tree |
Order of collections |
No (random order) |
Order of insertion (as a LinkedList) |
Natural order |
Permits null as key |
Yes |
Yes |
No, except if using a comparator that allow |
Thread-safe |
No |
No |
No |
Benchmark 1#
Insertion of elements#
HashSet |
LinkedHashSet |
TreeSet |
|
---|---|---|---|
Set.add(Object) |
\(O(1)\) |
\(O(1)\) |
\(O(\log(n))\) |
Insertion |
\(86.04 \pm 1.39\ J\) |
\(96.92 \pm 1.08\ J\) |
\(186.42 \pm 4.65\ J\) |
However, if we use a bad hashing function, the worst-case scenario for the hash tables is \(O(n)\) (as if we iterate over a List).
Remove elements#
HashSet |
LinkedHashSet |
TreeSet |
|
---|---|---|---|
Set.remove( Object key ) |
\(O(1)\) |
\(O(1)\) |
\(O(\log(n))\) |
Remove half the elements |
\(15.47 \pm 0.75\ J\) |
\(15.97 \pm 0.16\ J\) |
\(5.73 \pm 0.81\ J\) |
Access#
HashSet |
LinkedHashSet |
TreeSet |
|
---|---|---|---|
Set.contains( Object key ) |
\(O(1)\) |
\(O(1)\) |
\(O(\log(n))\) |
Check if objects belong to the Set |
\(12.87 \pm 7.74\ J\) |
\(26.12 \pm 1.25\ J\) |
\(106.58 \pm 3.89\ J\) |
Iterate with the Set.iterator() |
\(4.32 \pm 0.42\ J\) |
\(1.52 \pm 0.02\ J\) |
\(6.65 \pm 0.72\ J\) |
The order of the TreeSet
is the natural order by default, but other orders can be used by precising the Comparator
when creating the TreeSet
. Moreover, if this Comparator
allows null
, then null
can be used as a key.
Conclusion#
We have the exact same conclusion than with the Map
implementations:
For general purposes,
HashSet
To keep the order of insertion,
LinkedHashSet
To keep the elements sorted,
TreeSet
And you should consider using a LinkedHashSet
while inserting yourself the elements in the order you want instead of using a TreeSet
.
Test specifications#
The methods used are in the collections_comparison.SetMethodsComparison.java class.
The test was launched using the collections_comparison.TestSetMethodsComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testSetMethodsComparison/.
We launched 5 times the test class to get these means and standard deviations. The results were obtained using 5 000 000 strings which were created following the pattern “Number is %d”, where %d is an other random integer. When we iterate, we only iterate one time over the whole list.
We used this script to quickly launch those tests:
Click to see
#!/bin/bash
# Launches testSetMethodsComparison.sh
bash testSetMethodsComparison.sh 5 0 5000000 # HashSet
bash testSetMethodsComparison.sh 5 1 5000000 # LinkedHashSet
bash testSetMethodsComparison.sh 5 2 5000000 # TreeSet
To check if some objects belonged to the Set
, we use Set.contains(String s)
, where s
was created following the pattern “Number is %d”, where %d is an other random integer as many times as there are elements in the list, that is 5 000 000 times in our case.
To remove half of the elements, we use the Set.iterator()
and we remove one out of two objects.
References#
Energy profiles of Java collections classes. Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, and Abram Hindle - 38th International Conference on Software Engineering (2016). pdf
Java™ Platform, Standard Edition 8 API Specification, (2022). Package Util. [online] Available at: https://docs.oracle.com/javase/8/docs/api/java/util/package-summary.html
Looping over an array or an ArrayList#
It seems that depending on the device you use, especially for the devices without Just-In-Time (JIT) compiler, like the first versions of Android and some IoT devices, some practises allow the application to be a little faster and less consuming.
Depending on the class of the array, it seems to be better to store the length of an array when we iterate over it, because without a JIT, the Java Virtual Machine (JVM) will have to retrieve the value of the length at each iteration.
It should not affect a computer or any device with a JIT, but we check it here and check if other loop structures could be used or not.
For the following benchmarks:
The “!=” versions are the use of the !=
comparator instead of the <
comparator for the loop with array.length
.
The definitions of our loops:
Base iteration
Object[] array = createMyArray(); // or ArrayList<Object> al = createMyArrayList()
for (int j = 0; j < numberOfLoops; j++) {
for (int i = 0; i < array.length; i++) { // or i < al.size()
// Doing stuff
}
}
Recommended iteration
Object[] array = createMyArray();
int n = array.length
for (int j = 0; j < numberOfLoops; j++) {
for (int i = 0; i < n; i++) {
// Doing stuff
}
}
Descending order
Object[] array = createMyArray();
for (int j = 0; j < numberOfLoops; j++) {
for (int i = array.length - 1; i > -1; i--) {
// Doing stuff
}
}
Benchmark 1#
The int[]
array were filled of 250 zeros and we looped over it 1 000 000 000 times.
For the ArrayList<Object>
and the Object[]
array, they were filled of 250 null and we looped over them 100 000 000 times.
We launched 10 times the test class to get these means and standard deviations. We have divided the result of the int[]
by 10 because there were 10 times more loops and it gives a better view to compare the different consumption between the data structure.
int[] |
Object[] |
ArrayList<Object> |
|
---|---|---|---|
Base iteration |
\(33.859 \pm 1.031\ J\) |
\(206.85 \pm 11.72\ J\) |
\(222.02 \pm 7.00\ J\) |
Base iteration “!=” |
\(319.562 \pm 10.887\ J\) |
\(363.97 \pm 40.28\ J\) |
\(542.67 \pm 48.36\ J\) |
Recommended iteration |
\(\mathbf{29.630 \pm 1.048\ J}\) |
\(491.82 \pm 15.55\ J\) |
\(220.49 \pm 10.57\ J\) |
Recommended iteration “!=” |
\(319.562 \pm 10.887\ J\) |
\(368.17 \pm 20.98\ J\) |
\(500.41 \pm 18.42\ J\) |
Descending order iteration |
\(39.016 \pm 1.440\ J\) |
\(\mathbf{181.72 \pm 14.75\ J}\) |
\(\mathbf{178.07 \pm 10.33\ J}\) |
Benchmark 2#
The int[]
array were filled of 250 000 zeros and we looped over it 100 000 times.
For the ArrayList<Object>
and the Object[]
array, they were filled of 250 000 null and we looped over them 100 000 times.
We launched 10 times the test class to get these means and standard deviations.
int[] |
Object[] |
ArrayList<Object> |
|
---|---|---|---|
Base iteration |
\(\mathbf{61.81 \pm 4.06\ J}\) |
\(\mathbf{208.89 \pm 6.32\ J}\) |
\(231.58 \pm 7.88\ J\) |
Base iteration “!=” |
\(337.73 \pm 12.71\ J\) |
\(354.30 \pm 9.90\ J\) |
\(523.77 \pm 55.55\ J\) |
Recommended iteration |
\(\mathbf{62.86 \pm 2.96\ J}\) |
\(341.34 \pm 10.12\ J\) |
\(272.26 \pm 11.12\ J\) |
Recommended iteration “!=” |
\(266.58 \pm 12.22\ J\) |
\(408.28 \pm 18.84\ J\) |
\(555.32 \pm 55.55\ J\) |
Descending order iteration |
\(\mathbf{60.92 \pm 4.31\ J}\) |
\(233.45 \pm 9.37\ J\) |
\(\mathbf{194.29 \pm 11.55\ J}\) |
Conclusion#
The
!=
comparator should be avoided if the>
or the<
comparator can be used instead.The so-called “recommended structure” is not a good name.
Surprisingly, the better structure for an
ArrayList
seems to be in a descending order.
Test specifications#
The methods used are in the collections_comparison.IterationComparison.java class.
The test was launched using the collections_comparison.TestIterationComparison.java class with JoularJX by using the bash script and the files located in the folder scripts/test/testIterationComparison/.
We used this script to quickly launch those tests:
Click to see
#!/bin/bash
# Launches testIterationComparison.sh with the 3 possible arrays with size 250 and number of loops = 100 000 000 or 1 000 000 000
bash testIterationComparison.sh 10 0 250 100000000 # ArrayList<Object>
bash testIterationComparison.sh 10 1 250 100000000 # Object[]
bash testIterationComparison.sh 10 2 250 1000000000 # int[]
# Launches testIterationComparison.sh with the 3 possible arrays with size 250 000 and number of loops = 100 000
bash testIterationComparison.sh 10 0 250000 100000 # ArrayList<Object>
bash testIterationComparison.sh 10 1 250000 100000 # Object[]
bash testIterationComparison.sh 10 2 250000 100000 # int[]
References#
Performance Tips | Android Developpers. “Use Enhanced For Loop Syntax” part.[online] Available at: https://android-doc.github.io/training/articles/perf-tips.html#Loops
An investigation into energy-saving programming practices for android smartphone app development, D. Li, W. G. J. Halfond, in: Proceedings of the 3rd International Workshop on Green and Sustainable Software, GREENS 2014, ACM, New York, NY, USA, 2014, pp. 46–53. doi:10.1145/2593743.2593750. URL http://doi.acm.org/10.1145/2593743.2593750