CSC 5003 – Semantic Web And Big Data Architecture

Portail informatique

Présentation

The course CSC 5003 – Semantic Web And Big Data Architecture is a third-year course in an engineer school (level Master 2) given at Télécom SudParis in the track ASR. At the end of the first part of this course, a student will be able to setup a big data architecture using tools from the Hadoop ecosystem. en particulier avec des outils issus de l'écosystème Hadoop. In details, a student will know how to:
  • program in functional style using Scala
  • use the MapReduce framework to parallelize computations
  • explore and manipulate the Hadoop Distributed File System
  • process a data stream using Kafka and Spark Streaming
  • choose the right tools from the Hadoop ecosystem to solve a given problem

  • Topic
    Content
    Key Notions
  • CM1
    Introduction to Big Data Tools And Concepts
    • Big Data
    • Use cases
  • CI1
    Introduction to Scala
    • Scala
    • Functional programming
  • CI2
    Hadoop And MapReduce
    • Map Reduce Framework
    • HDFS
  • CI3
    Spark
    • YARN
    • RDD, DataFrame, Dataset
    • Spark high-order functions
    • Persistance
    • Shared variables
  • CI4
    Kafka
    • Producer/Consumer
    • Kafka topics and partitioning
  • CI5
    Spark Streaming
    • XXX
  • Presentations
    Hadoop Ecosytem Student Presentations
    • Discover the tools of the Hadoop Ecosystem
  • CF
    Final Exam - Project Start
    • Written Exam
      – 1h30
    • Slides
      – 3h
    • Test your knowledge
    • Start the projects
CM : Cours Magistral (lecture only)       CI : Cours Intégré (lecture and lab)