Jahangir Mohammed provided the most detailed response to our question on Quora – September 2, 2012. We wanted to find out if there was a systematic process/ method involved in the analysis, collection and presentation of big data. Here is Jahangir’s answer:

“I am inclined to say the approach is definitely systematic, but there are lots of options and one needs to figure out what is the best implementation for their specific use case.

Data collection:

There are various distributed data collection and aggregation frameworks like Flume[1], Chukwa[2] and Scribe[3] which can be leveraged efficiently to collect and aggregrate data in real-time from lots of servers.

If one has the data in some form sitting in RDBMS, they can use sqoop[4] to transfer data between RDBMS and to a big-data framework like Hadoop[5](meant HDFS).

Data analysis:

Hadoop[5] is a well-known framework that allows distributed processing and analysis of big data. There are couple of other frameworks like Cascalog[6], storm[7] – stream processing, some MPI frameworks and some BSP frameworks(like Apache Hama[8]) and Dremel’s open source (is currently being worked on) all of which are created to crunch big data. Also, there is Amazon’s EMR[9] or Google’s big query[10] from a cloud perspective, but to keep it explicit there is nothing stopping to run any open source

implementations on cloud.

Presentation/data visualization:

This can be home-grown to using a commercial product. Some of the offerings out there like Datameer[11] and big query[10] do offer some visualizations, dashboards, excel capabilities and so forth.”

[1]. http://www.cloudera.com/blog/201…

[2]. http://incubator.apache.org/chukwa/

[3]. https://github.com/facebook/scribe

[4]. http://sqoop.apache.org/

[5]. http://hadoop.apache.org/

[6]. https://github.com/nathanmarz/ca…

[7]. https://github.com/nathanmarz/storm

[8]. http://hama.apache.org/

[9]. http://aws.amazon.com/elasticmap…

[10]. https://developers.google.com/bi…

[11]. http://www.datameer.com/

Feel free to leave a comment and add your views in the comments section.

Special thanks to Jahangir Mohammed and Vijay Kamath who both took time out to provide answers to our question.