Présentation
The course CSC 5003 – Semantic Web And Big Data Architecture
is a third-year course in an engineer school (level Master 2) given at
Télécom SudParis in the track ASR.
At the end of the first part of this course, a student will be able to setup a big data architecture using tools from the Hadoop ecosystem.
en particulier avec des outils issus de l'écosystème Hadoop.
In details, a student will know how to:
- program in functional style using Scala
- use the MapReduce framework to parallelize computations
- explore and manipulate the Hadoop Distributed File System
- process a data stream using Kafka and Spark Streaming
- choose the right tools from the Hadoop ecosystem to solve a given problem
-
TopicContentKey Notions
-
CM1Introduction to Big Data Tools And Concepts
- Big Data
- Use cases
-
CI1Introduction to Scala
- Scala
- Functional programming
-
CI2Hadoop And MapReduce
- Slides
- Exercices
- Optional Exercices (Part 1, 2, 3, Hadoop setup and Docker)
- Map Reduce Framework
- HDFS
-
CI3Spark
- YARN
- RDD, DataFrame, Dataset
- Spark high-order functions
- Persistance
- Shared variables
-
CI4Kafka
- Producer/Consumer
- Kafka topics and partitioning
-
CI5Spark Streaming
- XXX
-
PresentationsHadoop Ecosytem Student Presentations
- Discover the tools of the Hadoop Ecosystem
-
CFFinal Exam - Project Start
- Written Exam
- Slides
- Test your knowledge
- Start the projects
CM : Cours Magistral (lecture only) CI : Cours Intégré (lecture and lab)