EXPERTISE IN BIGDATA (HADOOP)

LINUX INTRODUTION

  • File Handing
  • Text Processing
  • System Administration
  • Process Management
  • Archival
  • Network
  • File System
  • Advanced Commands

CORE JAVA TRAINING


CLASS-1

Introdution - Oops concept - (Object-Class-Inheritance-Polymorphism-Abstrraction-Encapsulation)

CLASS-2

String(Concept of String - Immutable String - String Concatenation - Concept of Substring - String class methods and its usage-StringBuilder class )

Exception Handing - (try -throw-catch)Advance(throws-finally) Input and output(I/O)function

CLASS-3

Collection - (List Map Set) interface and its algorithm - Iterator interface - Map(hash map-tree map-linked hash map-multi key map)- list(array list-linked list) Set(Hash set-tree set) Serialization - Deserialization

INTRODUCTION TO BIG DATA - HADOOP

BigData(What,Why,Who) - 3++Vs-Overviews of Hadoop EcoSystem - Role of Hadoop in Big data - overviews of other Big Data System - Who is using Hadoop - Hadoop integrations into Exiting Software Products - Current Scenario in Hadoop Ecosystem - Installation - Configuration - UseCases of Hadoop(HealthCare,Retail,teecom)

HDFS

Concepts - Architecture - Data Flow(File Read,File Write) - Fault Tolerance - Shell Commands - Java Base API - Data Flow Archives - coherency - Data Integrity - Role of Secondary NameNode

MAPREDUCE

Theory - Data Flow (Map-shuffle-Reduce) - MapRed vs MapReduce APIs - Programming[Mapper,Reducer, Combiner, Partitioner] - Writable- InputFormat - Outputformat- Streaming API using python - Inherent Failure Handing using Speculative Execution - Magic of Shuffle phase - FileFormats - Sequence Files

ADVANCE MAPREDUCE PROGRAMMING

Counter(Built IN and Custom) - Custom Input Format - Distributed Cache - Joins(MapSide,FReduceSide) - Sorting - Perfomance Tuning-GenericOptionsParser - ToolRunner - Debugging(LocalJobRunner)

ADMINISTRATION

Multi Node Cluster Setup using AWS Cloud Machines - Hardware Considerations - Software Considerations - Commands(fsck,job,dfsadmin)-Schedulers in job Tracker - RackAwareness Policy - Balancing - NameNode Failure and Recovery - commissioning and Decommissioning a Node - Compression Codecs

HBASE

Introduction to NoSQL - CAP Theorem - Classification of NoSQL - Hbase and RDBMS - HBASE and HDFS - Architecture(Read Path,Write Path,Compactions,Splits) - Installation - Configuration - Role of Zookeeper - HBase Shell - Java Based APIs(Scan,Get,Other advanced APIs) - Introduction to Filter - RowKey Design - Map reduce Integration-performance Tuning - What's New in HBase0.98 - Backup and Disaster Recovery - Hands On

HIVE

Architecture - Installation - Configuration - Hive vs RDBM - Tables - DDl - DML - UDF - UDAF - Partitioning - Bucketing - MetaStore - Hive - Hbase Integration - Hive Web Interface - Hive Server(JDBC,ODBC,Thrift) - File Formats(RCFile-ORCFile) - other SQL on Hadoop

PIG

Architecture- Installation - Hive vs Pig-Pig Latin syntax - Data Types - Functions(Eval,Load/Store,String,DateTime) - joins - PigServer - Macros - UDFs -performance - Troubleshooting - Commonly Used Functions

SQOOP

Architecture - Installation,commands(Import,Hive-Import,Eval,Hbase Import,Import All tables,Export)-Connectors to Existing DBs and DW

FLUME

Why Flume?-Architecture,Configuration(Agents), Source(Exec-Avro-NetCat), Channels(File,Memory,JDBC,HBase), Sinks(Logger, Avro, HDFS, Hbase, FileRoll), Contextual Routing(Interceptors, Channel Selectors)-Introduction to other aggregation frameworks

OOZIE

Architecture, Installation, Workflow,Coordinator, Action(Mapreduce,Hive,Pig,Sqoop)-Introduction to Bundle - Mail Notifications

HADOOP 2.0

Limitations in Hadoop-1.0-HDFS Federation-High Availability in HDFS-HDFS Snapshots-Other Improvements in HDFS2-Introduction to YARN aka MR2-Limitations in MR1-Architecture of YARN-MapReduce Job Flow in YARN-Introduction to Stinger Initiative and Tez-BackWard Compatibility for Hadoop 1.X

SOLR

Introduction to Information Retrieval - common usecases- Introduction to Solr and Lucene - Installation - Concepts(Cores,Schema,Documents,fields, Inverted Index)- Configuration - CRUD Operation requests and responses - Java Based APIs - Introduction to SolrCloud

Cloudera Certification Assistance will be Provided!!