大数据基础培训
Section 1: The basics of working with big data
Understand the four V’s of Big Data (Volume, Velocity,
and Variety); Build models for data; Understand the occurrence of rare events in random data.
Section 2: Web and social networks
Understand characteristics of the web and social networks;
Model social networks; Apply algorithms for community detection in networks.
Section 3: Clustering big data
Clustering social networks; Apply hierarchical clustering; Apply k-means clustering.
Section 4: Google web search
Understand the concept of PageRank; Implement the basic; PageRank algorithm for strongly connected graphs;
Implement PageRank with taxation for graphs that are not strongly connected.
Section 5: Parallel and distributed computing using MapReduce
Understand the architecture for massive distributed and parallel computing;
Apply MapReduce using Hadoop; Compute PageRank using MapReduce.
Section 6: Computing similar documents in big data
Measure importance of words in a collection of documents;
Measure similarity of sets and documents; Apply local sensitivity hashing to compute similar documents.
Section 7: Products frequently bought together in stores
Understand the importance of frequent item sets; Design association rules; Implement the A-priori algorithm.
Section 8: Movie and music recommendations
Understand the differences of recommendation systems; Design content-based recommendation systems;
Design collaborative filtering recommendation systems.
Section 9: Google's AdWordsTM System
Understand the AdWords System; Analyse online algorithms in terms of competitive ratio; Use online matching to solve the AdWords problem.
Section 10: Mining rapidly arriving data streams
Understand types of queries for data streams; Analyse sampling methods for data streams;
Count distinct elements in data streams; Filter data streams.