基于Apache Spark的大数据可扩展机器学习培训
Week 1: Introduction
This is an introduction to Apache Spark.
You'll learn how Apache Spark internally works and how to use it for data processing.
RDD, the low level API is introduced in conjunction with parallel programming / functional programming.
Then, different types of data storage solutions are contrasted. Finally,
Apache Spark SQL and the optimizer Tungsten and Catalyst are explained.
Week 2: Scaling Math for Statistics on Apache Spark
Applying basic statistical calculations using the Apache
Spark RDD API in order to experience how parallelization in Apache Spark works
Week 3: Introduction to Apache SparkML
Understand the concept of machine learning pipelines
in order to understand how Apache SparkML works programmatically
Week 4: Supervised and Unsupervised learning with SparkML
Apply Supervised and Unsupervised Machine Learning tasks using SparkML