课程目录:Hadoop for Developers and Administrators培训
4401 人关注
(78637/99817)
课程大纲:

   Hadoop for Developers and Administrators培训

 

 

 

Module 1. Introduction to Hadoop
The Hadoop Distributed File System (HDFS)
The Read Path and The Write Path
Managing Filesystem Metadata
The Namenode and the Datanode
The Namenode High Availability
Namenode Federation
The Command-Line Tools
Understanding REST Support
Module 2. Introduction to MapReduce
Analyzing the Data with Hadoop
Map and Reduce Pattern
Java MapReduce
Scaling Out
Data Flow
Developing Combiner Functions
Running a Distributed MapReduce Job
Module 3. Planning a Hadoop Cluster
Picking a Distribution and Version of Hadoop
Versions and Features
Hardware Selection
Master and Worker Hardware Selection
Cluster Sizing
Operating System Selection and Preparation
Deployment Layout
Setting up Users, Groups, and Privileges
Disk Configuration
Network Design
Module 4. Installation and Configuration
Installing Hadoop
Configuration: An Overview
The Hadoop XML Configuration Files
Environment Variables and Shell Scripts
Logging Configuration
Managing HDFS
Optimization and Tuning
Formatting the Namenode
Creating a /tmp Directory
Thinking Namenode High Availability
The Fencing Options
Automatic Failover Configuration
Format and Bootstrap the Namenodes
Namenode Federation
Module 5. Understanding Hadoop I/O
Data Integrity in HDFS
Understanding Codecs
Compression and Input Splits
Using Compression in MapReduce
The Serialization mechanism
File-Based Data Structures
The SequenceFile format
Other File Formats and Column-Oriented Formats
Module 6. Developing a MapReduce Application
The Configuration API
Setting Up the Development Environment
Managing Configuration
GenericOptionsParser, Tool, and ToolRunner
Writing a Unit Test with MRUnit
The Mapper and Reducer
Running Locally on Test Data
Testing the Driver
Running on a Cluster
Packaging and Launching a Job
The MapReduce Web UI
Tuning a Job
Module 7. Identity, Authentication, and Authorization
Managing Identity
Kerberos and Hadoop
Understanding Authorization
Module 8. Resource Management
What Is Resource Management?
HDFS Quotas
MapReduce Schedulers
Anatomy of a YARN Application Run
Resource Requests
Application Lifespan
YARN Compared to MapReduce 1
Scheduling in YARN
Scheduler Options
Capacity Scheduler Configuration
Fair Scheduler Configuration
Delay Scheduling
Dominant Resource Fairness
Module 9. MapReduce Types and Formats
MapReduce Types
The Default MapReduce Job
Defining the Input Formats
Managing Input Splits and Records
Text Input and Binary Input
Managing Multiple Inputs
Database Input (and Output)
Output Formats
Text Output and Binary Output
Managing Multiple Outputs
The Database Output
Module 10. Using MapReduce Features
Using Counters
Reading Built-in Counters
User-Defined Java Counters
Understanding Sorting
Using the Distributed Cache
Module 11. Cluster Maintenance and Troubleshooting
Managing Hadoop Processes
Starting and Stopping Processes with Init Scripts
Starting and Stopping Processes Manually
HDFS Maintenance Tasks
Adding a Datanode
Decommissioning a Datanode
Checking Filesystem Integrity with fsck
Balancing HDFS Block Data
Dealing with a Failed Disk
MapReduce Maintenance Tasks
Killing a MapReduce Job
Killing a MapReduce Task
Managing Resource Exhaustion
Module 12. Monitoring
The available Hadoop Metrics
The role of SNMP
Health Monitoring
Host-Level Checks
HDFS Checks
MapReduce Checks
Module 13. Backup and Recovery
Data Backup
Distributed Copy (distcp)
Parallel Data Ingestion
Namenode Metadata