Big Data Hadoop

Course Detail

Sr.No Content
1 INTRODUCTION
  • Big Data
  • 3Vs
  • Role of Hadoop in Big data
  • Hadoop and its ecosystem
  • Overview of other Big Data Systems
  • Requirements in Hadoop
  • UseCases of Hadoop
2 HDFS
  • Design
  • Architecture
  • Data Flow
  • CLI Commands
  • Java API
  • Data Flow Archives
  • Data Integrity
  • WebHDFS
  • Compression
3 MAPREDUCE
  • Theory
  • Data Flow (Map – Shuffle – Reduce)
  • Programming [Mapper, Reducer, Combiner, Partitioner]
  • Writables
  • InputFormat
  • Outputformat
  • Streaming API
4 ADVANCED MAPREDUCE PROGRAMMING
  • Counters
  • CustomInputFormat
  • Distributed Cache
  • Side Data Distribution
  • Joins
  • Sorting
  • ToolRunner
  • Debugging
  • Performance Fine tuning
5 ADMINISTRATION – Information required at Developer level
  • Hardware Considerations – Tips and Tricks
  • Schedulers
  • Balancers
  • NameNode Failure and Recovery
6 HBase
  • NoSQL vs SQL
  • CAP Theorem
  • Architecture
  • Configuration
  • Role of Zookeeper
  • Java Based APIs
  • MapReduce Integration
  • Performance Tuning
7 HIVE
  • Architecture
  • Tables
  • DDL – DML – UDF – UDAF
  • Partitioning
  • Bucketing
  • Hive-Hbase Integration
  • Hive Web Interface
  • Hive Server
8 OTHER HADOOP ECOSYSTEMS
  • Pig (Pig Latin , Programming)
  • Sqoop (Need – Architecture ,Examples)
  • Introduction to Components (Flume, Oozie,ambari)