课程目录: 计算思维与大数据培训

4401 人关注
(78637/99817)
课程大纲:

计算思维与大数据培训

 

 

 

Section 1: Data in R

Identify the components of RStudio; Identify the subjects and types of variables

in R; Summarise and visualise univariate data, including histograms and box plots.

Section 2: Visualising relationships

Produce plots in ggplot2 in R to illustrate the relationship between pairs of variables;

Understand which type of plot to use for different variables; Identify methods to deal with large datasets.

Section 3: Manipulating and joining data

Organise different data types, including strings, dates and times; Filter subjects in a data frame,

select individual variables, group data by variables and calculate summary statistics; Join separate dataframes into

a single dataframe; Learn how to implement these methods in mapReduce.

Section 4: Transforming data and dimension reduction

Transform data so that it is more appropriate for modelling; Use various methods to transform variables,

including q-q plots and Box-Cox transformation, so that they are distributed normally Reduce

the number of variables using PCA; Learn how to implement these techniques into modelling data with linear models.

Section 5: Summarising data

Estimate model parameters, both point and interval estimates; Differentiate between the statistical concepts

or parameters and statistics; Use statistical summaries to infer population characteristics; Utilise strings;

Learn about k-mers in genomics and their relationship to perfect hash functions as an example of text manipulation.

Section 6: Introduction to Java

Use complex data structures; Implement your own data structures to organise data; Explain

the differences between classes and objects; Motivate object-orientation.

Section 7: Graphs

Encode directed and undirected graphs in different data structures, such as matrices and adjacency lists;

Execute basic algorithms, such as depth-first search and breadth-first search.

Section 8: Probability

Determine the probability of events occurring when the probability distribution is discrete; How to approximate.

Section 9: Hashing

Apply hash functions on basic data structures in Java; Implement your own hash functions and execute, these as well as built-in ones;

Differentiate good from bad hash functions based on the concept of collisions.

Section 10: Bringing it all together

Understand the context of big data in programming.