HPDA – High Performance Data Analytics track

Master of computer science

HPDA curriculum

The HPDA track is a two-years program with mandatory courses on core topics (operating systems, parallel programming), and optional courses that target either HPC applications (numerical simulation and modeling, …) or AI applications (Machine learning, data mining, …) In addition to a few mandatory courses, students are free to select courses based on their prefered topics.

Since the courses are provided by several schools (Ecole Polytechnique, ENSTA, Télécom Paris, and Télécom SudParis) whose teaching periods may differ, we advise students to choose courses from Polytechnique during M1, and courses from ENSTA, Télécom Paris, or Télécom SudParis during M2.

M1 courses

Master 1 schedule is based on Polytechnique's Master 1 HPC. Students have to choose 60 ECTS among the following courses.

  • First Period (september-december)
    • Mandatory courses
      • INF553: Database management systems (Ioana Manolescu, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF553

        Contenu du cours
        • Modélisation des données: modèle entité-association, modèle relationnel
        • Algèbre relationelle, calcule relationnel
        • Le langage d'interrogation des bases de données relationnelles: SQL Qualité des schémas relationnels, formes normales
        • Sous-système des bases de données relationnelles: disques, fichiers, buffers
        • Indexation dans les bases de données: structures d'arbres, structures de tableau
        • Evaluation des opérateurs relationnels
        • Optimisation des requêtes SQL
        • Brève introduction aux bases de données NoSQL

      • INF559: A Programmer’s Introduction to Computer Architectures and Operating Systems (Francesco Zappa Nardelli, Timothy Bourke and Théophile Bastian, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF559

        We will explain the enduring concepts underlying all computer systems, and show the concrete ways that these ideas affect the correctness, performance, and utility of any application program.

        This course serves as an introduction to the students who go on to implement systems hardware and software. But this course also pushes students towards becoming the rare programmers who know how things work and how to fix them when they break.

        This course will cover most of the key interfaces between user programs and the bare hardware, including:

        • The representation and manipulation of information
        • Machine-level representation of programs
        • Processor architecture
        • The memory hierarchy
        • Exceptional Control Flow
        • Virtual memory

      • M1 HPDA Project: M1 HPDA research projects (20 ECTS)

        Web page: ?page=../common/research-projects-2021-2022

        During the master, a student will learn research by doing research. During the two years of the master, a student will thus spend between one or two days each week in a research group in order to do research projects with professors and PhD students of IP Paris.

      • M1 Seminar: M1 Seminar (5 ECTS)

        Web page: https://www.inf.telecom-sudparis.eu/pds/seminars/
        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=M1 Seminar

        The seminar consists in presentations of ongoing research works, both by students on papers of conferences or journals, and by professors from IP Paris and other universities.

      • FLE1: French courses for foreign students (M1/S1) (Nicoline Lagel, 2,5 ECTS)
        (for foreign students who do not have a B2 proficiency certificate)
    • Non-mandatory courses
      • INF552: Data visualisation (Emmanuel Pietriga, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF552

        The visual representation of data takes full advantage of the human visual system in terms of perception and cognition. Elaborate patterns, interesting data points and outliers can easily be identified, individual data points and sets can efficiently be compared and contrasted, provided that the data is properly represented. Visualization enables users to explore their data in an interactive manner, to get overviews and drill down to detailed views, following processes that yield insights that would be difficult to obtain using fully automated data analysis techniques from fields such as, e.g., data mining or machine learning. They serve different purposes, but can complement one another very effectively. Visualization can for instance help formulate hypotheses, that can then be tested using statistical tests or other elaborate data analysis techniques. Beyond these exploratory aspects, data visualization can also support decision making, and plays a central role in the communication of findings to a wide range of audiences.

        This course first gives an overview of the field of data visualization. It then discusses fundamental principles of human visual perception, focusing on how they help inform the design of visualizations. The following sessions focus on visualization techniques for specific data structures, and discuss them in depth from both design and implementation perspectives, including: multi-variate data, hierarchical structures, networks, time-series, statistical data and geographical data.

        All exercises are based on Web technologies, including the D3 software library (Data-Driven Documents) and the Vega-lite interactive graphics grammar. While positioned at different levels of abstraction, both enable developers to create a wide range of interactive, Web-based visualizations that run on a variety of platforms, ranging from desktop workstations to mobile devices.

        Requirements: some prior experience with Web-based development (Javascript) is a plus, but not a hard requirement.

        More information, including covid-19 organization at https://www.enseignement.polytechnique.fr/informatique/INF552/

        Course material

        : http://www.enseignement.polytechnique.fr/informatique/INF552/

        Language: The course material is in English. Lectures can be taught either in French or in English, at the students' convenience.

      • INF554: Machine learning introduction (Michalis Vazirgiannis and Jessee Read, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF554

        syllabus of the course:

        • General Introduction to Machine Learning
          • Machine Learning paradigms
          • The Machine Learning Pipeline
        • Supervised Learning
          • Generative and non generative methods
          • Naive Bayes, KNN and regressions
          • Tree based methods
        • Unsupervised Learning
          • Dimensionality reduction
          • Clustering
        • Advanced Machine Learning Concepts
          • Regularization
          • Model selection
          • Feature selection
          • Ensemble Methods
        • Kernels
          • Introduction to kernels
          • Support Vector Machines
        • Neural Networks
          • Introduction to Neural Networks
          • Perceptrons and back-propagation
        • Deep Learning I
          • Convolutional Neural Networks
          • Recurrent Neural Networks
          • Applications
        • Deep Learning II
          • Modern Natural Language Processing
          • Unsupervised Deep Learning
          • Embeddings, Auto-Encoders, Generative Adversarial Networks
        • Machine & Deep Learning for Graphs
          • Graph Similarity
          • Graph Kernels
          • Node Embeddings

      • INF571: Fundamentals in distributed computing 1 (Bernadette Charron-Bost, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF571

        Distributed systems are composed of several computational units, classically called processes, that run concurrently and independently, without any central control. Additional difficulties are introduced by asynchrony (processes and channels operate at different speeds) and by limited local knowledge (each process has only a local view of the system and has a limited amount of information).

        Distributed algorithms are algorithms designed to run in this quite challenging setting. They arise in a wide range of applications, including telecommunications, internet, peer-to-peer computing, blockchain technology...

        This course aims at giving a comprehensive introduction to the field of distributed algorithms. A collection of significant algorithms will be presented for asynchronous networked systems, with a particular emphasis on their correctness proofs. Algorithms will be analyzed according to various measures of interest (eg., time and space complexities, communication costs). We will also present some "negative" results, i.e., impossibility theorems and lower bounds as they play a useful role for a system designer to determine what problems are solvable and at what cost.

        Content:

        • Modelling of distributed networked systems
        • Wave and traversal algorithms
        • Leader election
        • Logical time and global snapshots
        • Detection of stable properties
        • Synchronizers
        • Link reversal algorithms

      • MAP553: Foundation of Machine Learning (Erwan Le Pennec , 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=MAP553

        Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to learn from data. A major focus of machine learning is to automatically learn complex patterns and to make intelligent decisions based on them.

        This course focuses on the methodology underlying supervised and unsupervised learning, with a particular emphasis on the mathematical formulation of algorithms, and the way they can be implemented and used in practice. We will therefore describe some necessary tools from optimization theory, and explain how to use them for machine learning. A glimpse about theoretical guarantees, such as upper bounds on the generalization error, are provided during the last lecture.

        The methodology will be the main concern of the lectures while some proofs will be done during the PCs. Practice will be done through a challenge.

  • Second period (january-march)
    • Mandatory courses:
      • INF560: High performance runtimes (Patrick Carribault, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF560

        With the advent of multicore processors (and now many-core processors with several dozens of execution units), expressing parallelism is mandatory to enable high performance on different kinds of applications (scientific computing, big-data...). In this context, this course details multiple parallel programming paradigms to help exploiting such a large number of cores on different target architectures (regular CPUs and GPUs).It includes distributed-memory model (MPI), shared-memory model (OpenMP) and heterogeneous model (CUDA). All these approaches would allow leveraging the performance of differents computers (from small servers to large supercomputers listed in Top500).

      • INF583: Systems for big data (Angelos Anadiotis , 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF583

        This course covers the design principles and algorithmic foundation of influential software systems for Big Data Analytics. The course begins with the design of large enterprise data warehouses, Online-Analytic processing, and data mining over data warehouses. The course then examines fundamental architectural changes to scale data processing and analysis to a shared-nothing compute cluster, including parallel databases, MapReduce, column stores, and the support of batch processing, stream processing, iterative algorithms, machine learning, and interactive analytics in this new context.

      • M1 HPDA Project: M1 HPDA research projects (20 ECTS)

        Web page: ?page=../common/research-projects-2021-2022

        During the master, a student will learn research by doing research. During the two years of the master, a student will thus spend between one or two days each week in a research group in order to do research projects with professors and PhD students of IP Paris.

      • M1 Seminar: M1 Seminar (5 ECTS)

        Web page: https://www.inf.telecom-sudparis.eu/pds/seminars/
        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=M1 Seminar

        The seminar consists in presentations of ongoing research works, both by students on papers of conferences or journals, and by professors from IP Paris and other universities.

      • FLE2: French courses for foreign students (M1/S2) (Nicoline Lagel, 2,5 ECTS)
      • (for foreign students who do not have a B2 proficiency certificate)
    • Non-mandatory courses:
      • INF564: Compilation (Jean-Christophe Filliatre and Georges-Axel Jaloyan, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=INF564

        This course is an introduction to compilation. It explains the techniques and tools used in the different phases of a compiler, up to the production of optimized assembler code. A compiler for a fragment of the C language to the x86-64 assembler is realized in TD.

      • MAP584: Effective implementation of the finite element method (François Alouges, Aline Lefebvre and Flore Nabet, 5 ECTS)
      • MAP583: Deep Learning do it Yourself (Marc Lelarge, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=MAP583

        Recent developments in neural network approaches (more known now as “deep learning”) have dramatically changed the landscape of several research fields such as image classification, object detection, speech recognition, machine translation, self-driving cars and many more. Due its promise of leveraging large (sometimes even small) amounts of data in an end-to-end manner, i.e. train a model to extract features by itself and to learn from them, deep learning is increasingly appealing to other fields as well: medicine, time series analysis, biology, simulation...

        This course is a deep dive into practical details of deep learning architectures, in which we attempt to demystify deep learning and kick start you into using it in your own field of interest. During this course, you will gain a better understanding of the basis of deep learning and get familiar with its applications. We will show how to set up, train, debug and visualize your own neural network. Along the way, we will be providing practical engineering tricks for training or adapting neural networks to new tasks.

        By the end of this class, you will have an overview on the deep learning landscape and its applications to traditional fields, but also some ideas for applying it to new ones. You should also be able to train a multi-million parameter deep neural network by yourself. For the implementations we will be using the Pytorch library in Python.

  • Third period (april-june)

Additionally, students can pick courses (up to 7.5 ECTS) from other master track (such as master PDS, or master DataIA)

M2 courses

Master 2 schedule uses mostly courses from ENSTA, Télécom Paris, and Télécom SudParis. Students can also choose courses from Polytechnique, but since the periods dates are different, scheduling conflicts may appear.

Students have to choose 30 ECTS among the following courses. The Master 2 programs ends with a 6-months research internship.

  • First period (september-october)
  • Second period (november-february)
    • Mandatory courses
    • Non-mandatory courses
      • CSC5004: Cloud infrastructures (Pierre Sutra and Gaël Thomas, 5 ECTS)

        Web page: https://github.com/otrack/cloud-computing-infrastructures
        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=CSC5004

        This course presents cloud infrastructures in order to:

        • acquire an overview of Cloud computing (e.g., data centers, everything-as-a-service, on-demand computing, cloud economy model)
        • apprehend the fundamental notions in Cloud computing (e.g., fault-tolerance, elasticity, scalability, load balancing)
        • understand how virtualization works (VM, container)
        • deconstruct and classify a distributed data store
        • recognize data consistency problems and know common solutions

        In details, a student will learn how to:

        • deploy and maintain IaaS
        • construct base data storage services (e.g., key-value store, coordination kernels)
        • construct and deploy a micro-service architecture
        • think for dependability & scalability

      • IA317: Large scale machine learning (Thomas Bonald, 2,5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=IA317

        On considère la problématique du passage à l'échelle en machine learning. Il s'agit de comprendre et d'apprendre à implémenter les principales approches permettant de résoudre numériquement des problème d'apprentissage statistique supervisé. Plusieurs angles seront abordé : réduction de la dimension et sélection des features, utilisation d'algorithmes d'optimisation adaptés, et utilisation d'outils informatiques distribués permettant de porter les calculs sur un cluster.

      • IA307: GPU programming for learning (Goran Frehse and Élisabeth Brunet, 2 ECTS)

        Web page: https://sites.google.com/site/frehseg/teaching/ia307
        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=IA307

        The aim of this course is to give a vision of algorithms and their implementations in modern machine learning libraries on neural networks. In particular, the use of specific hardware, such as graphics cards, to improve performance is at the heart of these libraries. It is important to understand how the calculations are shared between the hardware and the CPU.

      • AMS-X02: Advanced numerical methods and high-performance computing for simulating complex phenomena (Marc Massot, 5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=AMS-X02

        Dans un nombre croissant d'applications, scientifiques ou industrielles, la simulation numérique joue un rôle clef pour comprendre et analyser les phénomènes physiques complexes. Elle permet aussi de prédire le fonctionnement de dispositifs comme les chambres de combustion aéronautiques dans l'optique d'une conception avancée. La complexité des systèmes et la taille des simulations multi-dimensionnelles rendent l'utilisation du calcul haute performance nécessaire. Ce cours propose dans un premier temps une présentation des enjeux que pose la modélisation des systèmes complexes pour les méthodes numériques et la simulation et un état de l'art des nouvelles architectures de calcul et des modèles de programmation parallèle. Après avoir rappelé les bases de l'analyse numérique des EDP pour les problèmes multi-échelles, nous proposons d'explorer quelques méthodes numériques avancées conçues pour traiter la raideur présente dans ces modèles complexes tout en tirant le meilleur parti des nouvelles architectures de calcul. Ces méthodes s'appuient sur une combinaison efficace entre analyse numérique, modélisation et calcul scientifique. Des séances de mise en oeuvre sur machines en lien avec un mésocentre de calcul seront proposées.
        Contenu:
        • Modélisation mathématique des systèmes complexes multi-échelles.
        • Définition de la notion de calcul haute performance et synthèse sur les nouvelles architectures de calcul et modèles de programmation parallèle.
        • Analyse numérique des EDP multi-échelles (Décomposition de domaine, séparation d'opérateur...).
        • Présentation et analyse de méthodes numériques avancées (multi-résolution adaptative et séparation d'opérateur avec adaptation temps/espace, algorithme pararéel, méthodes préservant l'asymptotique,...).
        • TP sur machine parallèle avec fourniture de codes de calcul à titre d'exemple pour chaque méthode.

      • ROB307: MPSOC Multiprocesseurs sur puce (Omar Hammami, 2,5 ECTS)

        Calendar: http://www-inf.telecom-sudparis.eu/COURS/masteripparis/hpda/?page=curriculum&genics=ROB307

        La conception de systèmes embarqués génère des systèmes complets comprenant des parties logicielles et matérielles indissociables et concues conjointement. Les systèmes résultants sont de manière quasi systématique amenés a résider sur une seule puce d'ou leurs appellation de systèmes sur puce. Les méthodologies de conception de Systèmes sur Puce (SOC - System on Chip) sont un outil indispensable pour un ingénieur amène a concevoir un système embarqué pour déterminer les possibilités offertes par la technologie pour réaliser le système étudié sous les contraintes spécifiés. Le cours introduit les méthodologies de conception de SOC et leurs applications dans des exemples industriels avec une focalisation sur les MPSOC (Multiprocessors System on Chip) et les NOCs (Network on Chip).

  • Third period (March-august)
    • Mandatory courses
      • Internship: 6-month M2 research internship (30 ECTS)