
HADOOP TRAINING
Hadoop Online Training
Duration of Hours
25
Duration time may vary depends on course progress
About
Hadoop Online Training
Training Objectives of Hadoop:
Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.
Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux.
Course Content
Introduction, The Motivation for Hadoop:
- Problems with traditional large-scale systems
- Requirements for a new approach
Hadoop Basic Concepts:
- An Overview of Hadoop
- The Hadoop Distributed File System
- Hands-on Exercise
- How MapReduce Works
- Hands-on Exercise
- Anatomy of a Hadoop Cluster
- Other Hadoop Ecosystem Components
Writing a MapReduce Program:
- Examining a Sample MapReduce Program
- With several examples
- Basic API Concepts
- The Driver Code
- The Mapper
- The Reducer
- Hadoop’s Streaming API
Delving Deeper Into The Hadoop API:
- More About ToolRunner
- Testing with MRUnit
- Reducing Intermediate Data With Combiners
- The configure and close methods for Map/Reduce Setup and Teardown
- Writing Partitioners for Better Load Balancing
- Hands-On Exercise
- Directly Accessing HDFS
- Using the Distributed Cache
- Hands-On Exercise
Performing several Hadoop jobs:
- The configure and close Methods
- Sequence Files
- Record Reader
- Record Writer
- Role of Reporter
- Output Collector
- Processing video files and audio files
- Processing image files
- Processing XML files
- Counters
- Directly Accessing HDFS
- ToolRunner
- Using The Distributed Cache
Common MapReduce Algorithms:
- Sorting and Searching
- Indexing
- Classification/Machine Learning
- Term Frequency-Inverse Document Frequency
- Word Co-Occurrence
- Hands-On Exercise: Creating an Inverted Index
- Identity Mapper
- Identity Reducer
- Exploring well known problems using MapReduce applications
Using HBase:
- What is HBase?
- HBase API
- Managing large data sets with HBase
- Using HBase in Hadoop applications
- Hands-on Exercise
Using Hive and Pig:
- Hive Basics
- Pig Basics
- Hands-on Exercise
- Practical Development Tips and Techniques
- Debugging MapReduce Code
- Using LocalJobRunner Mode for Easier Debugging
- Retrieving Job Information with Countries
- Logging
- Splittable File Formats
- Determining the Optimal Number of Reducers
- Map-Only MapReduce Jobs
- Hands-on Exercise
Debugging MapReduce Programs:
- Testing with MRUnit
- Logging
- Classification/Machine Learning
- Advanced MapReduce Programming
- A Recap of the MapReduce Flow
- The Secondary Sort
- CustomizedInputFormats and OutputFormats
- Pipelining Jobs With Oozie
- Map-Side Joins
- Reduce-Side Joins
Joining Data Sets in MapReduce:
- Map-Side Joins
- The Secondary Sort
- Reduce-Side Joins
Monitoring and debugging on a Production Cluster:
- Counters
- Skipping Bad Records
- Rerunning failed tasks with Isolation Runner
Tuning for Performance in MapReduce:
- Reducing network traffic with combiner
- Partitioners
- Reducing the amount of input data
- Using Compression
- Reusing the JVM
- Running with speculative execution
- Refactoring code and rewriting algorithms Parameters affecting Performance
- Other Performance Aspects
Have some Questions?
Call us at our care or drop quick contact box
Why with us?
-
Live Quality Training
-
Live demonstration of of features and practicals.
-
100% Assurance Placement Assistance
-
Effective Resume building
-
Internship Program for real exposure
-
Interview preparation with mock interview drills
-
Process of applying jobs at right places
-
Guidance of getting flexible, part time jobs