课程目录: 大数据分析培训 1

4401 人关注
(78637/99817)
课程大纲:

大数据分析培训 1

 

 

 

Section 1: Simple linear regression

Fit a simple linear regression between two variables

in R;Interpret output from R;Use models to predict a response variable;Validate the assumptions of the model.

Section 2: Modelling data

Adapt the simple linear regression model in R to deal with multiple variables;Incorporate continuous and categorical variables

in their models;Select the best-fitting model by inspecting the R output.

Section 3: Many models

Manipulate nested dataframes in R;Use R to apply simultaneous linear models

to large data frames by stratifying the data;Interpret the output of learner models.

Section 4: Classification

Adapt linear models to take into account when the response

is a categorical variable;Implement Logistic regression (LR) in R;Implement

Generalised linear models (GLMs) in R;Implement Linear discriminant analysis (LDA) in R.

Section 5: Prediction using models

Implement the principles of building a model to do prediction using classification;Split data into training and test sets,

perform cross validation and model evaluation metrics;Use model selection for explaining data

with models;Analyse the overfitting and bias-variance trade-off in prediction problems.

Section 6: Getting bigger

Set up and apply sparklyr;Use logical verbs in R by applying native sparklyr versions of the verbs.

Section 7: Supervised machine learning with sparklyr

Apply sparklyr to machine learning regression and classification models;Use machine learning models for prediction;

Illustrate how distributed computing techniques can be used for “bigger” problems.

Section 8: Deep learning

Use massive amounts of data to train multi-layer networks for classification;

Understand some of the guiding principles behind training deep networks, including the use of autoencoders,

dropout, regularization, and early termination;Use sparklyr and H2O to train deep networks.

Section 9: Deep learning applications and scaling up

Understand some of the ways in which massive amounts of unlabelled data, and partially labelled data,

is used to train neural network models;Leverage existing trained networks for targeting

new applications;Implement architectures for object classification and object detection and assess their effectiveness.

Section 10: Bringing it all together

Consolidate your understanding of relationships between the methodologies presented in this course,

theirrelative strengths, weaknesses and range of applicability of these methods.