Hadoop, MapReduce, HDFS, Spark, Pig, Hive, HBase, MongoDB, Cassandra, Flume — the list goes on! Over 25 technologies.

The world of Hadoop and “Big Data” can be intimidating — hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you’ll not only understand what those systems are and how they fit together — but you’ll go hands-on and learn how to use them to solve real business problems.

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. You’ll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI

and the UI Manage big data on a cluster with HDFS and MapReduce

and Write programs to analyze data on Hadoop with Pig and Spark

and Store and query your data with Sqoop , Hive , MySQL , HBase , Cassandra , MongoDB , Drill , Phoenix , and Presto

, , , , , , , , and Design real-world systems using the Hadoop ecosystem

using the Hadoop ecosystem Learn how your cluster is managed with YARN , Mesos , Zookeeper , Oozie , Zeppelin , and Hue

, , , , , and Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Understanding Hadoop is a highly valuable skill for anyone working at companies with large amounts of data.

Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo. And it’s not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

You’ll find a range of activities in this course for people at every level. If you’re a project manager who just wants to learn the buzzwords, there are web UI’s for many of the activities in the course that require no programming knowledge. If you’re comfortable with command lines, we’ll show you how to work with them too. And if you’re a programmer, I’ll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

You’ll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

Everything you need to know about Big Data, and Learn Hadoop, HDFS, MapReduce, Hive & Pig by designing Data Pipeline.

The main objective of this course is to help you understand Complex Architectures of Hadoop and its components, guide you in the right direction to start with, and quickly start working with Hadoop and its components.

It covers everything what you need as a Big Data Beginner. Learn about Big Data market, different job roles, technology trends, history of Hadoop, HDFS, Hadoop Ecosystem, Hive and Pig. In this course, we will see how as a beginner one should start with Hadoop. This course comes with a lot of hands-on examples which will help you learn Hadoop quickly.

The course have 6 sections, and focuses on the following topics:

Big Data at a Glance: Learn about Big Data and different job roles required in Big Data market. Know big data salary trends around the globe. Learn about hottest technologies and their trends in the market.

Getting Started with Hadoop: Understand Hadoop and its complex architecture. Learn Hadoop Ecosystem with simple examples. Know different versions of Hadoop (Hadoop 1.x vs Hadoop 2.x), different Hadoop Vendors in the market and Hadoop on Cloud. Understand how Hadoop uses ELT approach. Learn installing Hadoop on your machine. We will see running HDFS commands from command line to manage HDFS.

Getting Started with Hive: Understand what kind of problem Hive solves in Big Data. Learn its architectural design and working mechanism. Know data models in Hive, different file formats supported by Hive, Hive queries etc. We will see running queries in Hive.

Getting Started with Pig: Understand how Pig solves problems in Big Data. Learn its architectural design and working mechanism. Understand how Pig Latin works in Pig. You will understand the differences between SQL and Pig Latin. Demos on running different queries in Pig.

Use Cases: Real life applications of Hadoop is really important to better understand Hadoop and its components, hence we will be learning by designing a sample Data Pipeline in Hadoop to process big data. Also, understand how companies are adopting modern data architecture i.e. Data Lake in their data infrastructure.

Practice: Practice with huge Data Sets. Learn Design and Optimization Techniques by designing Data Models, Data Pipelines by using real life applications’ data sets.

Master the Hadoop ecosystem using HDFS, MapReduce, Yarn, Pig, Hive, Kafka, HBase, Spark, Knox, Ranger, Ambari, Zookeeper.

In this course you will learn Big Data using the Hadoop Ecosystem. Why Hadoop? It is one of the most sought after skills in the IT industry. The average salary in the US is $112,000 per year, up to an average of $160,000 in San Fransisco (source: Indeed).

The course is aimed at Software Engineers, Database Administrators, and System Administrators that want to learn about Big Data. Other IT professionals can also take this course, but might have to do some extra research to understand some of the concepts.

You will learn how to use the most popular software in the Big Data industry at moment, using batch processing as well as realtime processing. This course will give you enough background to be able to talk about real problems and solutions with experts in the industry. Updating your LinkedIn profile with these technologies will make recruiters want you to get interviews at the most prestigious companies in the world.

The course is very practical, with more than 6 hours of lectures. You want to try out everything yourself, adding multiple hours of learning. If you get stuck with the technology while trying, there is support available. I will answer your messages on the message boards and we have a Facebook group where you can post questions.

Looking to master Apache Hadoop, this course from Infinite Skills shows you how to work with the Hadoop framework.

This Introduction to Apache Hadoop training course from Infinite Skills will teach you the tools and functions needed to work within this open-source software framework. This course is designed for the absolute beginner, meaning no prior experience with Hadoop is required.

You will start out by learning the basics of Hadoop, including the Hadoop run modes and job types and Hadoop in the cloud. You will then learn about the Hadoop distributed file system (HDFS), such as the HDFS architecture, secondary name node, and access controls. This video tutorial will also cover topics including MapReduce, debugging basics, hive and pig basics, and impala fundamentals. Finally, this course will teach you how to import and export data.

Once you have completed this computer based training video, you will be fully capable of using the tools and functions you’ve learned to work successfully in Hadoop. Working files are included, allowing you to follow along with the author throughout the lessons.

Learn Complete Big Data (Spark + MongoDB + Pig + Hadoop + Hive + Cassandra + HBase + Redis + Beeline) with Examples.

This course is specially designed for All profile students i.e. developers and testers who wanted to build their career into Big Data Arena in Real World. So I have designed this course so they can start working with All Big Data Related Tools and technologies i.e. Hadoop, Hive, Pig, HBASE, CASSANDRA, MONGODB, REDIS in complete Big Data. All the users who are working or looking their career in Big Data profile in Big Data and wanted to move into Testing domain should take this course and go through the complete tutorials which has beginner to advance knowledge.

It will give the detailed information for different Commands and Queries which are used in development and testing All Big Data Related Tools and technologies including different databases applications in complete queries/commands which is needed by the tester to move into bigger umbrella i.e. Big Data Ecosystems Environment.

This course is well structured with all elements of different All Big Data Related Tools and technologies databases i.e. Haoop , Hive , HBase + Cassandra + MongoDB + Redis in complete big data with advance commands in practical manner separated by different topics. Students should take this course who wanted to learn End to End Big Data Ecosystem Technologies including different databases in complete big data from scratch.

Learn Hadoop, Pig, Hive and Mahout with a hands on approach without spending too much time and boost your career.

This course teaches you Hadoop, Pig, Hive and Apache Mahout from scratch with an example based and hands on approach.

Master the Fundamental Concepts of Big Data, Hadoop and Mahout with ease

Understand the Big Data & Apache Hadoop landscape

Learn HDFS & MapReduce concepts with examples and hands on labs

Learn Hadoop Streaming

Understand Analytics with Hadoop using Pig and Hive

Machine Learning Concepts

Collaborative Filtering with Apache Mahout

Real world Recommender System with Mahout and Hadoop

Big Data and Data Science Foundation to empower you with the most specialized skills

The core concepts are stressed upon and the focus is on building a solid foundation of the key Hadoop, Map Reduce and collaborative filtering concepts upon which you can learn just about every other technology in the same space. Preliminary Java and Unix knowledge is expected.

The first few topics will focus on the rise of Big Data and how Apache Hadoop fits in. You will focus on the fundamentals of Hadoop and its core components: HDFS and Map Reduce. You will then setup and play around with Hadoop and HDFS and then deep dive into MapReduce programming with hands on examples. You will also spend time on Combiners and Partitioners and how they can help. You will also spend time on Hadoop Streaming: a tool that helps non-Java professionals to leverage the power of Hadoop and do POCs on it.

Once you have a solid foundation of HDFS and MapReduce, in the next couple of topics you will explore higher level components of the Hadoop ecosystem: Hive and Pig. You will go into the details of both Hive and Pig by installing them and working with examples. Hive and Pig can make your life easy by shielding you from the complexity of writing MR jobs and yet leveraging the parallel processing ability of the Hadoop framework.

In the next few lectures you will look at something very interesting: Apache Mahout and Machine Learning. Apache Mahout is a Java library that lets you write machine learning applications with ease. Youwill learn the basics of Machine Learning and go deeper into Collaborative Filtering and recommender systems, something that Mahout excels that.

You will look at some similarity algorithms, understand their real-life implications and apply them when you will build together a real world movie recommender system using Mahout and Hadoop.

The course covers all the must know topics like HDFS, MapReduce, YARN, Apache Pig and Hive etc. and we go deep in exploring the concepts. You just don’t stop with the easy concepts, we take it a step further and cover important and complex topics like file formats, custom Writables, input/output formats, troubleshooting, optimizations etc.

All concepts are backed by interesting hands-on projects like analyzing million song dataset to find less familiar artists with hot songs, ranking pages with page dumps from wikipedia, simulating mutual friends functionality in Facebook just to name a few.

A hands-on workout in Hadoop, MapReduce and the art of thinking “parallel”.

This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel.

Zoom-in, Zoom-Out: This course is both broad and deep. It covers the individual components of Hadoop in great detail, and also gives you a higher level picture of how they interact with each other.

Hands-on workout involving Hadoop, MapReduce : This course will get you hands-on with Hadoop very early on. You’ll learn how to set up your own cluster using both VMs and the Cloud. All the major features of MapReduce are covered — including advanced topics like Total Sort and Secondary Sort.

The art of thinking parallel: MapReduce completely changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is an art. The examples in this course will train you to “think parallel”.

What’s Covered:

Using MapReduce to

Recommend friends in a Social Networking site: Generate Top 10 friend recommendations using a Collaborative filtering algorithm.

Build an Inverted Index for Search Engines: Use MapReduce to parallelize the humongous task of building an inverted index for a search engine.

Generate Bigrams from text: Generate bigrams and compute their frequency distribution in a corpus of text.

Build your Hadoop cluster:

Install Hadoop in Standalone, Pseudo-Distributed and Fully Distributed modes

Set up a hadoop cluster using Linux VMs.

Set up a cloud Hadoop cluster on AWS with Cloudera Manager.

Understand HDFS, MapReduce and YARN and their interaction

Customize your MapReduce Jobs:

Chain multiple MR jobs together

Write your own Customized Partitioner

Total Sort : Globally sort a large amount of data by sampling input files

Secondary sorting

Unit tests with MR Unit

Integrate with Python using the Hadoop Streaming API

.. and of course all the basics:

MapReduce : Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort

HDFS & YARN: Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARN Scheduling, Configuring HDFS and YARN to performance tune your cluster.

Learn from Basics to Advanced Concepts related to Big Data and Hadoop in a Simplified Way.

Course Overview:

Most demanding and sought after skill of the decade.

Secure your career by learning Big Data and Hadoop.

Course taught using a very innovative and simplified method of teaching.

Course covers all the topics related to Hadoop Adminsitration as well as Hadoop Development.

Course Description: In this course, you would be learning all the concepts and terminologies related to Big Data and Hadoop, such as the NameNode, Secondary NameNode, DataNode, JobTracker and TaskTracker, along with other concepts related to it such as what is meant by Rack Awareness and NameNode Federation in a simplified way. It also explains how the data is managed by the Hadoop Distributed File System (HDFS) and explains the process of reading and writing data onto the Hadoop Distributed File System. Later in the course you would also learn how to add or remove a DataNode or a TaskTracker to an existing cluster, how to check the HDFS for errors, Balancing the DataNode and so on. You would also learn all the concepts related to programming in MapReduce along with writing programs using MapReduce. Upon completion of this course, you would have a clear idea about, all the concepts related to the Hadoop, that should be sufficient to help you start off with Administering the Hadoop Cluster as well as Developing MapReduce Applications for Hadoop Cluster.