计算思维与大数据培训
Section 1: Data in R
Identify the components of RStudio; Identify the subjects and types of variables
in R; Summarise and visualise univariate data, including histograms and box plots.
Section 2: Visualising relationships
Produce plots in ggplot2 in R to illustrate the relationship between pairs of variables;
Understand which type of plot to use for different variables; Identify methods to deal with large datasets.
Section 3: Manipulating and joining data
Organise different data types, including strings, dates and times; Filter subjects in a data frame,
select individual variables, group data by variables and calculate summary statistics; Join separate dataframes into
a single dataframe; Learn how to implement these methods in mapReduce.
Section 4: Transforming data and dimension reduction
Transform data so that it is more appropriate for modelling; Use various methods to transform variables,
including q-q plots and Box-Cox transformation, so that they are distributed normally Reduce
the number of variables using PCA; Learn how to implement these techniques into modelling data with linear models.
Section 5: Summarising data
Estimate model parameters, both point and interval estimates; Differentiate between the statistical concepts
or parameters and statistics; Use statistical summaries to infer population characteristics; Utilise strings;
Learn about k-mers in genomics and their relationship to perfect hash functions as an example of text manipulation.
Section 6: Introduction to Java
Use complex data structures; Implement your own data structures to organise data; Explain
the differences between classes and objects; Motivate object-orientation.
Section 7: Graphs
Encode directed and undirected graphs in different data structures, such as matrices and adjacency lists;
Execute basic algorithms, such as depth-first search and breadth-first search.
Section 8: Probability
Determine the probability of events occurring when the probability distribution is discrete; How to approximate.
Section 9: Hashing
Apply hash functions on basic data structures in Java; Implement your own hash functions and execute, these as well as built-in ones;
Differentiate good from bad hash functions based on the concept of collisions.
Section 10: Bringing it all together
Understand the context of big data in programming.