HPDA – High Performance Data Analytics track

Master of computer science

Research project

During the master, a student will learn research by doing research. During the two years of the master, a student will thus spend between one or two days each week in a research group in order to do research projects with professors and PhD students of IP Paris.

Research project schedule

When Where What
21/09/2021 at 10h00 Telecom Paris, Amphi 7 and online Presentation of project proposals
28/09/2021 TBD Project starts
10/02/2022 TBD Mid-project review (for M1 students), or final evaluation (for M2 students)
18/02/2022 Deadline for submitting the research report
16/06/2022 TBD Project evaluation (for M1 students)

Project Evaluation

Research projects will be evaluated on a research report, and a presentation.

Research report

The report is a expected to be a 5 to 8 pages research paper formatted using the IEEE conference style. The report should present the context of the work, its contribution, a positioning with respect to related works.

Both M1 and M2 students are expected to send their research report by email before friday, 18/02/2022. M1 students are expected to update their report and resubmit it in june.

Project defense

The project defense is a 15 minutes presentation of the research work followed by 5-10 minutes of questions. The presentation has to explain the work context and problematics, and describe the contribution of the conducted research project.

Proposed projects

The following project proposals may be shared with several masters track, including DataAI, HPDA, PDS, Cybersecurity.

Id Title Advisor Description
1 Energy consumption of Machine Learning applications François Trahay Training a machine learning model usually require significant computing power. The goal of this project is to measure and modelize the energy consumption of several machine learning models running on various hardware resources (eg. powerfull GPU, CPU, edge computing cluster)
2 Analyzing the performance of distributed Machine Learning applications François Trahay Performance analysis tools for parallel applications have been studied for decades. The current convergence between HPC and IA changes the software stack and requires a redesign of the performance analysis tools. The goal of this project is to develop new performance analysis techniques for tracing the behavior of ML applications that are distributed with MPI
3 Profiling NVRAM François Trahay Non-Volatile memory (NVRAM) is a new type of memory that provides permanent storage but degrades the performance of memory. Memory performance thus becomes heteroegeneous (RAM vs NVRAM, local NUMA node vs remote NUMA node), and allocating memory on a particular memory node may impact the performance of applications. The goal of this project is to collect memory access traces with NumaMMA, to analyze them, and to propose memory allocation policies that improve application performance.
4 PlanMosquitto: enhancing the Eclipse Mosquitto message broker with AI planning capabilities Georgios Bouloukakis, Denis Conan, Eric Lallet Project Description
5 Analysis, dimensionning and performance evaluation of Solar Powered Data Center Omar Hammami TODO
6 HPC for Mobile Robotics and Autonomous Vehicles Omar Hammami TODO
7 Large Scale Network on Chip (NOC) Synthesis for Manycore on 3D ASIC Omar Hammami TODO
8 A garbage collector for disaggregated memory Yohan Pipereau, Mathieu Bacou, Gaël Thomas Description
9 co-localizing data and task in an NVMM-based distributed data analytic framework Anatole Lefort, Pierre Sutra, Gaël Thomas TODO
10 Degradable concurrent and distributed data structures Pierre Sutra, Gaël Thomas TODO
11 Rethinking the hardware interfaces and the operating system principles with a 128-bits address space Mathieu Bacou, Gaël Thomas TODO
12 Formal verification of operating system components Xavier Rival, Julia Lawall, Gilles Muller Vérifier un système d'exploitation est un objectif très important pour apporter des garanties sur l'ensemble des calculs effectués par une machine, mais particulièrement difficile et qui n'a été considéré à ce jour qu'à l'aide de technique purement déductives (comme dans les projets seL4 ou Certikos). Nous proposons d'utiliser une approche hybride qui repose sur la formalisation manuelle de spécifications et la vérification automatique des invariants que celles-ci expriment. Pour ce travail, on utilisera des outils formels tels que la logique de séparation. Si le temps le permet, on considérera également le déploiement d'outils de vérification automatique développés dans les équipes INRIA Antique et Whisper.
13 Configuration of the Nest Linux kernel thread scheduler Julia Lawall, Gilles Muller Nest is a thread scheduler being developed in the Whisper team at Inria Paris with the goal of better exploiting the frequency scaling and turbo modes of modern servers. Nest is implemented within the source code of the Linux kernel. It has been evaluated on a number of benchmark suites such as the DaCapo benchmark suite of Java applications, the NAS benchmark suite of high performance computing kernels, and the Phoronix multicore benchmarks suite. The implementation of Nest relies on a number of parameters. The goal of this project is to evaluate the influence of the values of these parameters on the performance of Nest obtained on the various benchmarks. The project will involve modifying the Linux kernel source code as well as creating scripts to set up the experiments and manage the results.
14 Towards automatic bug fix backporting in the Linux kernel Julia Lawall, Gilles Muller Promptly fixing bugs at the OS level is critical to the security of any system. The Linux kernel accepts bug fixes to the mainline (the latest commits) and then adapt them so that they apply to older, so-called "stable", versions. As the Linux kernel is very large (over 20 MLOC) and receives over 10 000 commits per release, this bug fix backporting is time consuming. It is also error-prone, as reflected by some regressions in the stable kernel commit history. The Whisper team at Inria Paris is investigating how to automate this task, building on our previous work on Coccinelle for automating transformations in the Linux kernel and Prequel for finding commits having particular properties in the commit history. The goal of this project is to analyze the changes made by previous Linux kernel bug fix backports and determine what kinds of information would be required to make these changes automatically.
15 The serverless shell Pierre Sutra description
16 A tigher bound on the space complexity of consensus Pierre Sutra description
17 A Study of Decentralised Task Distribution via Agent-Based Simulations Ada Diaconescu Co-supervisors: Louisa Jane Di Felice (Univ. Copenhagen, DK); Patricia Mellofge (Univ. Hartford, USA)
The purpose of this project is to study an application of multi-scale feedback system [1],[2] -- namely, decentralised task distribution -- and to assess the impact of the system's size, network topology and communication delays on the emerging system behaviour, performance and resilience.
The decentralised task distribution application consists of a set of agents that aim to distribute a number of tasks of different types amongst themselves, so as to reach a particular task distribution, given as input. The agents can be either static, connected via a communication network; or mobile, communicating upon opportunistic encounters. The task-distribution algorithm is decentralised, meaning that it only involves local inter-agent communication (i.e. no centralisation of the entire system knowledge is allowed).
The project will rely on several task-distribution algorithms that were developed previously, based on NetLogo multi-agent simulation platform (https://ccl.northwestern.edu/netlogo - which can be learned within a few days if you have already programmed in, e.g. Phyton, C, Java).
References
[1] A. Diaconescu, L. J. Di Felice, P. Mellodge, 'An Information-oriented View of Multi-Scale Systems', SISSY 2021, with IEEE ACSOS 2021, Washington (DC), USA (virtual), 27 Sept - 1 Oct 2021
[2] A. Diaconescu, L.J. Di Felice, P. Mellodge, 'Exogenous Coordination in Multi-Scale Systems: How Information Flows and Timing Affect System Properties', FGCS, Vol 114, January 2021, pp 403-426
18 Hybrid Relaxed Concurrent Data Structures Petr Kuznetsov, Armando Castaneda Description
19 Cryptocurrencies in Decentralized Trust Systems Petr Kuznetsov Recently, the idea of trust assumptions in a distributed system was explored in completely new way. All this started as cryptocurrency systems proposed to encompass users who do not necessarily hold the same assumptions of who to trust. In this project, we intend to build a relaxed cryptocurrency system on top of our recently proposed broadcast algorithm (https://arxiv.org/abs/2109.08611).
20 Fast and Optimally Resilient Byzantine Consensus Petr Kuznetsov The goal is to check if the recently proposed consensus algorithm yields performance benefits, and if so - in which scenarios.
https://arxiv.org/abs/2102.12825
21 Lattice Agreement in Practice Petr Kuznetsov In theory, Lattice Agreement (LA) can be seen as a great alternative to consensus. Its relaxed consistency level enables efficient implementations with provable worst-case latencies and, potentially, may lead to efficient implementations of a large class of objects, including cryptocurrencies and CRDTs. The goal of the project is to study several recently proposed LA algorithms from the practical perspective: which ones perform better in realistic scenarios.
22 Concurrent Data Structures for Big Data Streaming Albert Bifet, Panagiota Fatourou, Petr Kuznetsov https://perso.telecom-paristech.fr/kuznetso/projects/Streaming/concur-youla.pdf
23 distributed processing of the global graph of public software development Stefano Zacchiroli Software Heritage is the largest public archive of public software development with its development history, consisting of ~10 billion source code files and ~2 billion commits. Its data model is a giant graph (a Merkle DAG) formed by ~20 billion nodes and ~200 billion edges. Current processing of this graph is entirely scale-up, based on a compressed representation of its adjacency list. The purpose of this internship is to experiment with distributed processing of the graph. We will partition the graph according to different heuristics (e.g., BFS visit order, Layered Labeled Propagation ordering, modular decomposition, etc.) and verify which classic graph algorithms that are currently used at Software Heritage
24 Language and infrastructure for analyzing the archive (internship) Stefano Zacchiroli Description