大数据分析培训 1
Section 1: Simple linear regression
Fit a simple linear regression between two variables
in R;Interpret output from R;Use models to predict a response variable;Validate the assumptions of the model.
Section 2: Modelling data
Adapt the simple linear regression model in R to deal with multiple variables;Incorporate continuous and categorical variables
in their models;Select the best-fitting model by inspecting the R output.
Section 3: Many models
Manipulate nested dataframes in R;Use R to apply simultaneous linear models
to large data frames by stratifying the data;Interpret the output of learner models.
Section 4: Classification
Adapt linear models to take into account when the response
is a categorical variable;Implement Logistic regression (LR) in R;Implement
Generalised linear models (GLMs) in R;Implement Linear discriminant analysis (LDA) in R.
Section 5: Prediction using models
Implement the principles of building a model to do prediction using classification;Split data into training and test sets,
perform cross validation and model evaluation metrics;Use model selection for explaining data
with models;Analyse the overfitting and bias-variance trade-off in prediction problems.
Section 6: Getting bigger
Set up and apply sparklyr;Use logical verbs in R by applying native sparklyr versions of the verbs.
Section 7: Supervised machine learning with sparklyr
Apply sparklyr to machine learning regression and classification models;Use machine learning models for prediction;
Illustrate how distributed computing techniques can be used for “bigger” problems.
Section 8: Deep learning
Use massive amounts of data to train multi-layer networks for classification;
Understand some of the guiding principles behind training deep networks, including the use of autoencoders,
dropout, regularization, and early termination;Use sparklyr and H2O to train deep networks.
Section 9: Deep learning applications and scaling up
Understand some of the ways in which massive amounts of unlabelled data, and partially labelled data,
is used to train neural network models;Leverage existing trained networks for targeting
new applications;Implement architectures for object classification and object detection and assess their effectiveness.
Section 10: Bringing it all together
Consolidate your understanding of relationships between the methodologies presented in this course,
theirrelative strengths, weaknesses and range of applicability of these methods.