Big Data

This series will get you up to speed on Big Data and Hadoop. Topics include how to install, configure and manage a single and multi-node Hadoop cluster, configure and manage HDFS, write MapReduce jobs and work with many of the projects around Hadoop such as Pig, Hive, HBase, Sqoop, and Zookeeper. Topics also include configuring Hadoop in the cloud and troubleshooting a multi-node Hadoop cluster.
The Apache Hadoop software is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Hadoop is a software framework for storing and processing Big Data. It is an open-source tool build on java platform and focuses on improved performance in terms of data processing on clusters of commodity hardware.
  • Master the concepts of Hadoop Distributed File System and MapReduce framework
  • Setup a Hadoop Cluster
  • Understand Data Loading Techniques using Sqoop and Flume
  • Program in MapReduce (Both MRv1 and MRv2)
  • Learn to write Complex MapReduce programs
  • Program in YARN (MRv2)
  • Perform Data Analytics using Pig and Hive
  • Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing
  • Have a good understanding of ZooKeeper service
  • New features in Hadoop 2.0 — YARN, HDFS Federation, NameNode High Availability
  • Implement best Practices for Hadoop Development and Debugging
  • Implement a Hadoop Project
  • Work on a Real Life Project on Big Data Analytics and gain Hands on Project Experience