Featured Articles

by Tanmay Deshpande

CEP is a technique to analyze stream of disparate events occurring with high frequency and low latency. In Oil & Gas industry we can imagine it to be sensors data coming from drilling equipment or sensors data from upstream assembly sending information about the temperature, pressure etc.

by Vaishnavi Agrawal

Today we constantly experience the morbid shadow which is casted by Hadoop skill-gap, so much so in fact that now we regard it as an entity of its own legendary accord. It is believed that due to the restrictive and demanding standards of training, professionals are not motivated to extend their skill-set in the trending technologies.

by Tanmay Deshpande

This article talks about how to generate real time alerts/notifications on certain conditions using ElasticSearch Watcher plugin. Step-by-step guide explains everything about the alert generations right from plugin installation.

by Tanmay Deshpande

This series of article explains how to install Elasticsearch, Logastash and Kibana on Windows. Then it explains how to insert data from Apache log files to Elasticsearch using GeoCity database so that IP addresses from logs get to auto-mapped to Countries and Cities of the world.

by Tanmay Deshpande

To use Hadoop for analytics requires loading data into Hadoop clusters and processing it in conjunction with data that resides on enterprise application servers and databases. Loading GBs and TBs of data into HDFS from production databases or accessing it from map reduce applications is a challenging task. While doing so, we have to consider things like data consistency, overhead of running these jobs on production systems and at the end if this process would be efficient or not. Using batch scripts to load data is an inefficient way to go with.

by Dipayan Dev

In a recent statistics, IBM estimates that every day 2.5 quintillion bytes of data are created - so much that 90% of the data in the world today has been created in the last two years. It is a mind-boggling figure and the irony is that we feel less informed in spite of having more information available today. The surprising growth in volumes of data has badly affected today's business. The online users create content like blog posts, tweets, social networking site interactions and photos. And the servers continuously log messages about what online users are doing.

by Rakesh Porwal

Welcome to the unit of Hadoop Fundamentals: Before we examine Hadoop components and architecture, let's review some of the terms that are used in this discussion. A Node is simply a computer. This is typically non-enterprise, commodity hardware for nodes that contain data. Storage of Nodes is called as rack. A rack is a collection of 30 or 40 nodes that are physically stored close together and are all connected to the same network switch. Network bandwidth between any two nodes in rack is greater than bandwidth between two nodes on different racks.A Hadoop Cluster is a collection of racks.