HPDA – High Performance Data Analytics track

Research project

During the master, a student will learn research by doing research. During the two years of the master, a student will thus spend between one or two days each week in a research group in order to do research projects with professors and PhD students of IP Paris.

Research project schedule

When	Where	What
21/09/2021 at 10h00	Telecom Paris, Amphi 7 and online	Presentation of project proposals
28/09/2021	TBD	Project starts
10/02/2022	TBD	Mid-project review (for M1 students), or final evaluation (for M2 students)
18/02/2022		Deadline for submitting the research report
24/06/2022	4A312	Project evaluation (for M1 students)

Project Evaluation

Research projects will be evaluated on a research report, and a presentation.

Research report

The report is a expected to be a 5 to 8 pages research paper formatted using the IEEE conference style. The report should present the context of the work, its contribution, a positioning with respect to related works.

Both M1 and M2 students are expected to send their research report by email before friday, 18/02/2022. M1 students are expected to update their report and resubmit it in june.

Project defense

The project defense is a 20 minutes presentation of the research work followed by 5-10 minutes of questions. The presentation has to explain the work context and problematics, and describe the contribution of the conducted research project.

Project defense schedule

The project defense will take place in room 1C43 or online at this address.

Student	Title	Advisor	Schedule
Adam Chader (M2 PDS)	Degradable concurrent and distributed data structures	Pierre Sutra	10:30
Paul Frambot (M2 PDS)	Vérification formelle de smart contract Morpho	Vincent Danos	11:00
Huiyuan Li (M2 PDS)	PlanMosquitto: enhancing the Eclipse Mosquitto message broker with AI planning capabilities	Georgios Bouloukakis & Denis Conan & Eric Lallet	11:30
Ewa Turska (M1 HPDA)	Concurrent Data Structures for Big Data Streaming	Petr Kuznetsov	12:00
	Break		12:30
Ali Mammadov (M2 HPDA)	Identification automatique d'éléments d'ultra-structure de microscopie électronique du muscle squelettique	Elisabeth Brunet & Daniel Stockholm	14:00
Henon Lamboro (Phd Track)	A Transparent Social Recommender System	Amel Bouzeghoub	14:30
Igor Albuquerque Silva (M2 PDS)	HPC for Mobile Robotics and Autonomous Vehicles	Omar Hammami	15:00
Vitor Goergen (M2 HPDA)	?	Elliot Ash (ETH Zurich)	15:30
	Break		16:00
Mouhamadou Tidiane Mangassouba (M2 HPDA)	Serverless Shell	Pierre Sutra	16:15
Jana Ismail (M2 PDS)	?	?	16:45
Timothée Zerbib (M1 PDS)	Towards an operating system for 128-bit architecture	Gaël Thomas & Mathieu Bacou	17:15
Kwabena Amponsem (PhD track)	Towards an operating system for 128-bit architecture	Mathieu Bacou & Gaël Thomas	17:45

M1 final project defense schedule

The project defense will take place in room 4A312.

Student	Title	Advisor	Schedule
Ewa Turska (M1 HPDA)	Concurrent Data Structures for Big Data Streaming	Petr Kuznetsov	17/06/2022 at 14:00
Henon Lamboro (Phd Track)	A Transparent Social Recommender System	Amel Bouzeghoub	30/06/2022 at 15:00
Timothée Zerbib (M1 PDS)	Towards an operating system for 128-bit architecture	Gaël Thomas & Mathieu Bacou	24/06/2022 at 10:30

Proposed projects

The following project proposals may be shared with several masters track, including DataAI, HPDA, PDS, Cybersecurity.

Id	Title	Advisor	Description
1	Energy consumption of Machine Learning applications	François Trahay	Training a machine learning model usually require significant computing power. The goal of this project is to measure and modelize the energy consumption of several machine learning models running on various hardware resources (eg. powerfull GPU, CPU, edge computing cluster)
2	Analyzing the performance of distributed Machine Learning applications	François Trahay	Performance analysis tools for parallel applications have been studied for decades. The current convergence between HPC and IA changes the software stack and requires a redesign of the performance analysis tools. The goal of this project is to develop new performance analysis techniques for tracing the behavior of ML applications that are distributed with MPI
3	Profiling NVRAM	François Trahay	Non-Volatile memory (NVRAM) is a new type of memory that provides permanent storage but degrades the performance of memory. Memory performance thus becomes heteroegeneous (RAM vs NVRAM, local NUMA node vs remote NUMA node), and allocating memory on a particular memory node may impact the performance of applications. The goal of this project is to collect memory access traces with NumaMMA, to analyze them, and to propose memory allocation policies that improve application performance.
4	PlanMosquitto: enhancing the Eclipse Mosquitto message broker with AI planning capabilities	Georgios Bouloukakis, Denis Conan, Eric Lallet	Project Description
5	Analysis, dimensionning and performance evaluation of Solar Powered Data Center	Omar Hammami	TODO
6	HPC for Mobile Robotics and Autonomous Vehicles	Omar Hammami	TODO
7	Large Scale Network on Chip (NOC) Synthesis for Manycore on 3D ASIC	Omar Hammami	TODO
8	A garbage collector for disaggregated memory	Yohan Pipereau, Mathieu Bacou, Gaël Thomas	Description
9	co-localizing data and task in an NVMM-based distributed data analytic framework	Anatole Lefort, Pierre Sutra, Gaël Thomas	TODO
10	Degradable concurrent and distributed data structures	Pierre Sutra, Gaël Thomas	TODO
11	Rethinking the hardware interfaces and the operating system principles with a 128-bits address space	Mathieu Bacou, Gaël Thomas	TODO
12	Formal verification of operating system components	Xavier Rival, Julia Lawall, Gilles Muller	Vérifier un système d'exploitation est un objectif très important pour apporter des garanties sur l'ensemble des calculs effectués par une machine, mais particulièrement difficile et qui n'a été considéré à ce jour qu'à l'aide de technique purement déductives (comme dans les projets seL4 ou Certikos). Nous proposons d'utiliser une approche hybride qui repose sur la formalisation manuelle de spécifications et la vérification automatique des invariants que celles-ci expriment. Pour ce travail, on utilisera des outils formels tels que la logique de séparation. Si le temps le permet, on considérera également le déploiement d'outils de vérification automatique développés dans les équipes INRIA Antique et Whisper.
13	Configuration of the Nest Linux kernel thread scheduler	Julia Lawall, Gilles Muller	Nest is a thread scheduler being developed in the Whisper team at Inria Paris with the goal of better exploiting the frequency scaling and turbo modes of modern servers. Nest is implemented within the source code of the Linux kernel. It has been evaluated on a number of benchmark suites such as the DaCapo benchmark suite of Java applications, the NAS benchmark suite of high performance computing kernels, and the Phoronix multicore benchmarks suite. The implementation of Nest relies on a number of parameters. The goal of this project is to evaluate the influence of the values of these parameters on the performance of Nest obtained on the various benchmarks. The project will involve modifying the Linux kernel source code as well as creating scripts to set up the experiments and manage the results.
14	Towards automatic bug fix backporting in the Linux kernel	Julia Lawall, Gilles Muller	Promptly fixing bugs at the OS level is critical to the security of any system. The Linux kernel accepts bug fixes to the mainline (the latest commits) and then adapt them so that they apply to older, so-called "stable", versions. As the Linux kernel is very large (over 20 MLOC) and receives over 10 000 commits per release, this bug fix backporting is time consuming. It is also error-prone, as reflected by some regressions in the stable kernel commit history. The Whisper team at Inria Paris is investigating how to automate this task, building on our previous work on Coccinelle for automating transformations in the Linux kernel and Prequel for finding commits having particular properties in the commit history. The goal of this project is to analyze the changes made by previous Linux kernel bug fix backports and determine what kinds of information would be required to make these changes automatically.
15	The serverless shell	Pierre Sutra	description
16	A tigher bound on the space complexity of consensus	Pierre Sutra	description
17	A Study of Decentralised Task Distribution via Agent-Based Simulations	Ada Diaconescu	Co-supervisors: Louisa Jane Di Felice (Univ. Copenhagen, DK); Patricia Mellofge (Univ. Hartford, USA) The purpose of this project is to study an application of multi-scale feedback system [1],[2] -- namely, decentralised task distribution -- and to assess the impact of the system's size, network topology and communication delays on the emerging system behaviour, performance and resilience. The decentralised task distribution application consists of a set of agents that aim to distribute a number of tasks of different types amongst themselves, so as to reach a particular task distribution, given as input. The agents can be either static, connected via a communication network; or mobile, communicating upon opportunistic encounters. The task-distribution algorithm is decentralised, meaning that it only involves local inter-agent communication (i.e. no centralisation of the entire system knowledge is allowed). The project will rely on several task-distribution algorithms that were developed previously, based on NetLogo multi-agent simulation platform (https://ccl.northwestern.edu/netlogo - which can be learned within a few days if you have already programmed in, e.g. Phyton, C, Java). References [1] A. Diaconescu, L. J. Di Felice, P. Mellodge, 'An Information-oriented View of Multi-Scale Systems', SISSY 2021, with IEEE ACSOS 2021, Washington (DC), USA (virtual), 27 Sept - 1 Oct 2021 [2] A. Diaconescu, L.J. Di Felice, P. Mellodge, 'Exogenous Coordination in Multi-Scale Systems: How Information Flows and Timing Affect System Properties', FGCS, Vol 114, January 2021, pp 403-426
18	Hybrid Relaxed Concurrent Data Structures	Petr Kuznetsov, Armando Castaneda	Description
19	Cryptocurrencies in Decentralized Trust Systems	Petr Kuznetsov	Recently, the idea of trust assumptions in a distributed system was explored in completely new way. All this started as cryptocurrency systems proposed to encompass users who do not necessarily hold the same assumptions of who to trust. In this project, we intend to build a relaxed cryptocurrency system on top of our recently proposed broadcast algorithm (https://arxiv.org/abs/2109.08611).
20	Fast and Optimally Resilient Byzantine Consensus	Petr Kuznetsov	The goal is to check if the recently proposed consensus algorithm yields performance benefits, and if so - in which scenarios. https://arxiv.org/abs/2102.12825
21	Lattice Agreement in Practice	Petr Kuznetsov	In theory, Lattice Agreement (LA) can be seen as a great alternative to consensus. Its relaxed consistency level enables efficient implementations with provable worst-case latencies and, potentially, may lead to efficient implementations of a large class of objects, including cryptocurrencies and CRDTs. The goal of the project is to study several recently proposed LA algorithms from the practical perspective: which ones perform better in realistic scenarios.
22	Concurrent Data Structures for Big Data Streaming	Albert Bifet, Panagiota Fatourou, Petr Kuznetsov	https://perso.telecom-paristech.fr/kuznetso/projects/Streaming/concur-youla.pdf
23	distributed processing of the global graph of public software development	Stefano Zacchiroli	Software Heritage is the largest public archive of public software development with its development history, consisting of ~10 billion source code files and ~2 billion commits. Its data model is a giant graph (a Merkle DAG) formed by ~20 billion nodes and ~200 billion edges. Current processing of this graph is entirely scale-up, based on a compressed representation of its adjacency list. The purpose of this internship is to experiment with distributed processing of the graph. We will partition the graph according to different heuristics (e.g., BFS visit order, Layered Labeled Propagation ordering, modular decomposition, etc.) and verify which classic graph algorithms that are currently used at Software Heritage
24	Language and infrastructure for analyzing the archive (internship)	Stefano Zacchiroli	Description