Measure of a processor electricity consumption with a software meter#

Note

For this lab, you have to work on the lab room PCs. The Perf command uses RAPL that is only available on some processors (modern x86 CPUs from Intel and AMD).

Objectives

In this lab, you will discover the use of a software energy meter to measure the power and the energy of the CPU and memory components.

We will do the following exercices:

  1. Discover perf

  2. Measure the energy consumed by the computer while idle (doing nothing special) for a given period

  3. Study the impact of the programming language on energy consumption:

    • Measure the energy consumed for computing the Mandelbrot Fractal set in Java and in Python

  4. Study the link between processor frequency and energy consumption

  5. Analyze the differences in the measurements and the advantages/disadvantages of Wattmeter and perf (if you have already done the Wattmeter lab. If not, this point will be handled in the Wattmeter lab)

  6. Calculate carbon impact of a functional unit with the Software Carbon Intensity formula

  7. Measure the impact of network/disk activity on the computer energy consumption.

1. Discover Perf (20mn)#

Initialisations#

Python installation#

If you have not yet done the Wattmeter lab, you first have to activate a python environment with the required modules as described on the following page:

Installations Python

Activate the python environment#

In all the terminals used for this lab, activate the env4101 environment:

source $HOME/env4101/bin/activate

Download Perf lab scripts#

The lab will be done in the $HOME/ENV4101 directory.

mkdir -p $HOME/ENV4101 
cd $HOME/ENV4101

Then get the python scripts used in this lab:

unzip PerfLab.zip 
cd PerfLab

RAPL and perf in two words#

RAPL stands for Running Average Power Limit Energy Reporting.
With RAPL, the power measurement is provided by counters integrated into the CPU. The CPU counters are made available through specific registers which can be accessed by software through system calls. As you can expect, reading specific registers may be complicated and depends on the underlying operating system architectures. But there are many tools that allow us to read RAPL CPU counters. In this lecture, we will focus on one of them, perf.

perf is a powerful linux command line tool to measure the CPU performance through many indicators (called events). It is part of the linux-tools package.

The measurements provided by perf can be seen with the perf list command. You can see with this command that there are thousands of events (among them the duration of a process, the CPU usage, the memory usage; you may study some of those events in a computer system course)!

In this lab, we will focus on energy measurement events. perf provides those measurements thanks to the RAPL registers. To see energy-related events available on your computer, you can use the command
perf list | grep energy

NB ignore the error : Error: failed to open tracing event directory, it is about a directory you don’t have right to go, but the work will be done.

Here is the result of the execution of perf list | grep energy on computer a005-19 (this list depends on the computer architecture).:

perf list | grep energy
  power/energy-cores/                                [Kernel PMU event]
  power/energy-gpu/                                  [Kernel PMU event]
  power/energy-pkg/                                  [Kernel PMU event]
  power/energy-psys/                                 [Kernel PMU event]
  power/energy-ram/                                  [Kernel PMU event]
  • power/energy-cores, RAPL PP0 domain: Measures the energy consumption of all processor cores on the socket.

  • power/energy-gpu, RAPL PP1 domain: Measures the energy consumption of the graphical processor units.

  • power/energy-pkg/, RAPL Package (PKG) domain: Measures the energy consumption of the entire processor socket, including all cores, GPU, and cache and memory

  • power/energy-ram/, RAPL RAM domain: Measures the energy consumption of the random access memory (RAM) attached to the integrated memory controller.

  • power/energy-psys/, RAPL PSys domain: Monitors and controls the thermal and power specifications of the entire SoC (System on a Chip). It is especially useful when the power consumption source is neither the CPU nor the GPU. PSys includes power consumption from the package domain, PCH (Platform Controller Hub), eDRAM (embedded Dynamic Random Access Memory), and other domains within a single-socket SoC.

tp-perf-files/RAPL-domains.png

RAPL domains (figure extracted from Kashif Nizam Khan, Mikael Hirki, Tapio Niemi, Jukka K. Nurminen, and Zhonghong Ou. Rapl in action: Experiences in using rapl for power measurements. ACM Trans. Model. Perform. Eval. Comput. Syst., 3(2), March 2018.)#

Discover the architecture of your computer#

To understand the meaning of those energy events, such as power/energy-cores/, power/energy-gpu/, power/energy-pkg/, power/energy-psys/, you may need to understand the architecture of your computer.

hwloc-ls provides a hierarchical view of the machine. It also gathers various attributes such as cache and memory information.

To see the architecture of your computer you can use this command:

hwloc-ls &

For example here is the output for the a004-16 computer:

tp-perf-files/hwloc-ls-a004-16.png

Architecture of a004-16, a lab computer#

The main part corresponds to the Package (named also socket, or processor), (e.g. L#0). A processor Package is the physical package that usually gets inserted into a socket on the motherboard. It is also often called a physical processor or a CPU, even if these names bring confusion with respect to cores and processing units. The package is one of the entities for which the energy consumption is measured by RAPL. Inside the package, there are the following elements:

  • The cores (e.g. Core L#0) with their processors units or hyper-threads (e.g. PU L#0). For this computer the package has 8 cores with 2 hyper-threads each and 4 other cores with 1 hyper-thread each.

  • The memory caches (L*), where data and programs that are used regularly by your processor are stored. The memory caches are smaller than the DRAM. Processors have a faster access to this cache than to the DRAM.

  • The size of the main memory (DRAM) is indicated (e.g. 31GB); it is the second entity for which RAPL measures energy consumption.

  • The PCI devices (audio card, disk, and ethernet interface), no energy consumption are measured by RAPL for those components.

Share your results

All along the lab, you will have to fill a questionnaire. This Questionnaire will be uploaded in moodle at the end of the Wattmeter and Perf labs. The questionnaire is in the markdown format You can edit the Questionnaire with the command

ghostwriter Questionnaire.txt &

Do not forget to fill in the Questionnaire file.

Energy measure#

With perf, we can obtain the consumed energy during the execution of one command.
NB: perf does not isolate the consumption of one process. It measures the consumption of all the processes that have run for the duration of the command.

Now let us look at how we can measure the energy consumed during the execution of a command.

In order to do this, we can use command perf stat.

For example, with this command, we measure the energy consumed by the RAPL PKG domain while sleeping for 2 seconds: perf stat -e power/energy-pkg/ sleep 2

At the end of the execution, perf prints the result on the standard output.

$ perf stat -e power/energy-pkg/ sleep 2

 Performance counter stats for 'system wide':

              4,34 Joules power/energy-pkg/                                           

       2,010166524 seconds time elapsed

You can change the event you want to measure using the parameter after the -e option. For example, if energy-gpu is available on your computer, perf stat -e power/energy-gpu/ sleep 2 will give information about the energy consumed by the GPU.

In this lab, we will look particularly at two events (as those events are available on many computers):

  • power/energy-pkg/, which provides the average power consumed by the package or processor (in Joules).

  • power/energy-cores/, which provides the average power consumed by the cores (in Joules).

Unfortunately the power/energy-psys/ which is the most global value is not available on all the computers.

2. Idle state consumption with a script (~5 minutes)#

Note

In this part, for more accurate results, you will have to close all the running applications before making the measurement.

You can now measure the idle state consumption of your computer using the shell program run_sleep_perf.sh which you have downloaded before in the zip file.

Explanation on the run_sleep_perf.sh script

This command takes two arguments: the number of measurements you want to perform and the duration of the measurement in seconds.

The script includes a loop for the number of measurements to be performed:

  • perf stat -e power/energy-cores/ -e power/energy-pkg/ -e power/energy-psys/ sleep $duration &> $results_directory/perf_$i, this line analyses the performance of the command sleep $duration, the results are shown on the screen and copied to an intermediary result file: for example, if the chosen duration is 30 seconds, and the number of iterations is 1, the result file is results_sleep_30/perf_1

  • then the time (duration) and energy data are extracted from the results files

    time=\((tail \)results_directory/perf_\(i | grep seconds | sed -e 's/,/./g' |awk '{print \)1}’) core_energy=\((tail \)results_directory/perf_\(i | grep cores | sed -e 's/,/./g' | awk '{print \)1}’) pkg_energy=\((tail \)results_directory/perf_\(i | grep pkg | sed -e 's/,/./g' | awk '{print \)1}’)

  • Those results are saved in a csv file results_sleep_30/output.csv

Finally, after the loop, it runs python mean_of_results_perf.py results_sleep_30/output.csv that

  • calculates pkg_power (the processor energy for 1s) and core_power (the cores energy for 1s)

  • prints the means and standard deviations.

In order to test it quickly, run it for 3 measures of 2 seconds. The result of this command is the mean of the energy consumption in Joules for a period of 2 seconds.

./run_sleep_perf.sh  3 2

To have a slightly better estimation of the idle consumption, we will make 3 measures of 30 seconds.

In the meantime, do NOT do anything on the computer in order to save realistic results usable in the sequel of the lab. So wait for 1 minute and 30 seconds !

Remark

To have a better evaluation, we should stop more processes, and measure the energy consumption on a longer period.

If you are not on your own computer, you can only stop the processes that you have started (such as navigators, editors). Be aware that there are a lot of daemons (processes that run forever). If it were your own computer, you could limit the running processes to the strict minimum.

To have an idea of the number of processes running on your computer, you can test the following commands, they all share the same processor:

ps -u $USER # your processes

ps -u $USER |wc -l # the number of your running processes

ps -ax # all the processes

ps -ax|wc -l # the number of running processes

./run_sleep_perf.sh 3 30

The results of the last command have been saved in the results_sleep_30/output.csv file, it will be used in the next step of the lab.

Share your results

Do not forget to fill in the Questionnaire file.

Bravo!

Congratulations, you have been able to perform a first evaluation of the energy consumption of your computer with a software power meter when it was “idle” for 30 seconds!

3. Comparison of energy consumption of Mandelbrot in Java and in python (30mn)#

Objectives

In this part of the lab, we compare the efficiency of 2 programming languages: Java and Python. From this experiment, you should try to explain the differences in terms of energy consumption between them.

This example has been experimented in the article Ranking programming languages by energy efficiency

Article summary This study answers the following questions:
  • Can we compare the energy efficiency of programming languages?

  • Is the fastest language the most energy efficient?

  • What is the link between memory usage and energy consumption?

  • Can we find the best language in terms of time, energy and memory?

We test an algorithm that plots the mandelbrot set, one of the best-known examples of mathematical fractal.

The programs come from the Computer language benchmark game, a benchmark which objective is to test the same algorithm written in several languages.

Mandelbrot details

Mandelbrot

The mandelbrot set is defined by the set of complex numbers \(c\) for which the complex numbers \(z_{n}\) of the sequence remain bounded in absolute value. The sequence \(z_{n}\) is defined by:

  • \(z_{0}=0\)

  • \(z_{n+1}=z_{n}^{2}+c\)

To learn more about mandelbrot:

tp-language-comparison-files/Mandelbrot_Set_Whole_12800x10240_6x6.png

Mandelbrot set fractal example#

Run the experiment#

Launch the experiment first (it takes some time) and while it is running:

  • Read the explanations below

  • Read the analysis question and try to guess the answers

./run_mandelbrot_perf.sh

In this experiment, you plot the Mandelbrot set \((-1.5-i,0.5+i)\) on an N-by-N bitmap and write output byte-by-byte in portable bitmap format.

  • There is a first loop to vary the bitmap precision (N) to compute the fractal image with different levels of details (160, 1600, 3200, 8000 and 16000).

    • There is a second loop in order to do the computation twice for computing the mean and standard deviation (for sake of time, we do it only twice; it should be done more to improve the standard deviation. Feel free to augment the number of loops for a better result).

      • Compute the fractal with two languages: java and python.

      • The bitmaps are saved and you can visualize them (at the end of the experiment):

        display language_comparison/mandelbrot_bitmap/*.bmp &
        
  • Plot the graphs for time, Average DRAM power, Average CORES power, Total DRAM energy and Total CORES energy in function of the bitmap definition.

  • Display the graphs. The graphs are saved at location ./language_comparison/results_perf/ you can view them with the eog command.

    Note

    NB: For this experiment, we plot the total energy consumption. We did not remove the idle consumption.

Analyse the results and answer the following questions (for this algorithm)#

  • Is there a link between the execution time and the energy?

  • Is there a link between the power and the energy, why?

  • Which language is the most energy consuming and why?

Share your results

Do not forget to fill in the Questionnaire file.

Warning

Be careful, do not generalize those results for all programs; the results may be different for some specific cases.

Bravo!

Congratulations, you have been able to comment on the impact of the programming language on the processor energy consumption!

In the future, we advise you to choose a sober programming language when you develop widely used software!

4. Looking at the processor frequency when using Python (20mn)#

You may have noticed, when looking at the python curves, that the energy consumption seems to stay under a certain level. Let’s try to see how the processor manages to do this.
Let’s look more precisely at how python handles the CPU usage during the execution of Mandelbrot in Python.

We will use an option of perf stat that measures different events at an interval defined using the -I option. For example, the following command:

perf stat -I 100 -e power/energy-psys/ sleep 2 

shows the results of power/energy-psys/ measured each \(100\ ms\) when executing the sleep 2 command.

We will now try to see data about the CPU usage thanks to the event cpu-cycles.

CPU cycles and processor frequency are linked together; when you augment the processor frequency, you augment the number of CPU cycles by second.

Dynamic frequency scaling is a power management technique whereby the frequency of a microprocessor can be automatically adjusted “on the fly” depending on the actual needs, to conserve power and reduce the amount of heat generated by the chip. Dynamic frequency scaling can be useful as a security measure for overheated systems.

You can examine the results for Mandelbrot in java using the following command. The program examines the frequency of the CPU every second.

perf stat -I 1000 -e cpu-cycles java mandelbrot.java 32000 language_comparison/java_mandelbrot_32000.bmp

What is the frequency of the CPU for this experiment ? What is the variation interval ?

We will now do the same for the python program with the objective to better understand the power results we had before.
We provide the run_mandelbrot_cpu_usage_perf.sh script for this purpose. Inspect the run_mandelbrot_cpu_usage_perf.sh program and launch the experiment using wisely chosen parameters (for example Mandelbrot definition 16000, interval 1s).

./run_mandelbrot_cpu_usage_perf.sh [mandelbrot_definition] [interval_in_ms]
./run_mandelbrot_cpu_usage_perf.sh 16000 1000 

Look at the results:

eog python_cpu_analysis/mandelbrot_python_cpu_usage.svg &

What is interesting with the results of the experiment? Is the frequency of the processor stable ? What can you conclude about the way the processor handles this python execution ? Can you have an hypothesis for the graph language_comparison/results_perf/mandelbrot_bitmap_precision_impact_pkg_power.png ?

5. Analyze the differences in the measurements and the advantages/disadvantages of Wattmeter and perf (20mn)#

If you have already realized the lab with the wattmeter (if not, you will answer those questions in the wattmeter lab so go directly to the next question), we ask you

  • What are the differences between the measures obtained with the wattmeter and those obtained with perf for ploting the mandelbrot? Analyze those differences.

    • Do we have the same energy measure? Why?

  • With a Wattmeter and with perf, how do you isolate the energy consumption of one process or activity

  • What is the power measure sampling rate with perf and with Yocto-watt?

  • Which tool is the easiest one to use?

Share your results

Do not forget to fill in the Questionnaire file.

Bravo!

Have you been able to differentiate Wattmeter and Perf characteristics?

6. Calculate carbon impact with the Software Carbon Intensity formula (15mn)#

Use the Software Carbon Intensity formula to calculate in g CO2-eq the impact of the mandelbrot calculation for parameter value 16000 in python and in java.

For this purpose, you will need to:

  • Get the energy mix with electricity map; you can get the mean value for the last year, given by the consumption tag for the chosen Country

tp-perf-files/ElectricityMap.png
  • Get your computer CO2 impact. You can have an estimation of it through the ADEME empreinte database (you will need to create an account to access the data base). Then you can access to “Consulter les données”. We suggest to choose the “Ordinateur” in the keyword and you will obtain an evaluation of the carbon footprint during the production lifecycle of such a computer.

tp-perf-files/EmpreinteConsulterDonnees.png
tp-perf-files/Empreinte-MotCle.png
tp-perf-files/EmpreinteOrdinateur.png

To help you in the calculation, we provide you with a jupyter notebook: SCI.ipynb provided in the zip file.

You can use it with:

jupyter-lab SCI.ipynb

You can execute this notebook step by step while choosing the appropriate input values:

  • E (J) Energy consumed per software unit (obtained in the step 3 of the lab), as a software unit take:

    • One Mandelbrot execution in python for definition 16000

  • I (gCO2e/KWh) Carbon emission by KWh of the electricity in the last year in the country where the software runs (found in electricity map)

  • EM (kgCO2e) Carbon emission of the production lifecycle for the computer (found in ADEME empreinte database)

  • LT (y) Lifetime of the computer

  • SD (s) software unit duration (obtained in the step 3 of the lab)

Concerning carbon emission, what is, for this software unit, the most impacting part between the execution and the production ? Does changing the country execution has an impact ?

Give three ideas to reduce the software footprint and verify with the notebook (change country, change language, change the computer lifetime).

Share your results

Do not forget to fill in the Questionnaire file.

7. Impact of network/disk intensive operation on energy consumption (30mn)#

In the previous exercice (mandelbrot), we have stressed the processor. Now let us stress also the network and/or the disk. For this purpose, we suggest you to use the following commands.

  • Stress the network: Getting a file from the network, occupies the network interface as well as the processor and the memory.

    • Get one voluminous file through the network (choose one). We can use those video files with three definitions:

      • 360p.mp4 : 25M

      • 720p.mp4 : 63M

      • 1080p.mp4 : 169M

wget http://www-inf.telecom-sudparis.eu/COURS/cen/Mesures/tp-wattmeter-files/videos/360p.mp4
wget http://www-inf.telecom-sudparis.eu/COURS/cen/Mesures/tp-wattmeter-files/videos/720p.mp4
wget http://www-inf.telecom-sudparis.eu/COURS/cen/Mesures/tp-wattmeter-files/videos/1080p.mp4
  • Stress the disk: Write in a file 1000 blocks of 124 characters:

   dd if=/dev/zero of=/tmp/testdd bs=124c count=1000 oflag=dsync

For this purpose, you can adapt the run_sleep_perf.sh shell script, and run it.

Tip

Details
  • We suggest you to copy the file run_sleep_perf.sh in another file
  • Then change this new file
    • You do not need the duration variable (remove the line and each time it is used)
    • Change the name of the cmd variable
    • The sleep 30 command to run with perf has to be changed to the chosen command

In a second time, you can estimate the cost of the command only (i.e. substract the idle energy that would have been consumed whatever you have done to the total energy).

  • For this purpose we provide you with the analyse_with_idle_perf.py python script:

    python analyse_with_idle_perf.py csv-file-cmd csv-file-idle cmd_name
    

    Tip

    Details For example for wget, it could be (according on what you have written in your shell script)
    • csv-file-cmd: results_wget/output.csv
    • csv-file-idle: results_sleep_30/output.csv
    • cmd: wget

Next step#

What next?

If it was your first lab, the next lab is the Wattmeter lab, where you will perform measures at the level of the processor.

If it is your second lab,

Share your results

Do not forget to return on Moodle the Questionnaire file at the end of the two labs on Wattmeter AND Perf (one file only).

In the next lab, we will see a new tool, Joular-JX, that is able to measure the consumption of a java activity and that is able to find the most energy-consuming methods/classes.