A list of 5 brand new open source big data projects of 2015.Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization. For documentation click Burrow is an open sourced big-data project by LinkedIn, Burrow is a monitoring companion for Apache Kafka that provides consumer lag checking as a service without the need for specifying thresholds. It monitors committed offsets for all consumers and calculates the status of those consumers on demand. An HTTP endpoint is provided to request status on demand, as well as provide other Kafka cluster information. There are also configurable notifiers that can send status out via email or HTTP calls to another service.To be in sync with Airbnb’s vision that enabling humans to partner with a machine in a symbiotic way exceeds the capabilities of humans or machines alone, its project AeroSolve focused on improving the understanding of data sets by assisting people in interpreting complex data with easy to understand models. Instead of hiding meaning beneath many layers of model complexity, Aerosolve models expose data to the light of understanding.They are able to easily determine the negative correlation between the price of a listing in a market and the demand for the listing just by inspecting the image. This makes the model easy to interpret while still maintaining a lot of capacity to learn.Storm has long served as the main platform for real-time analytics at Twitter . However, as the scale of data being processed in real-time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. Twitter need a system that scales better, has better debug-ability, has better performance, and is easier to manage -- all while working in a shared cluster infrastructure. Twitter considered various alternatives to meet these needs, and in the end concluded that they needed to build a new real-time stream data processing system. Heron is the de facto stream data processing engine inside Twitter.Airflow is a platform to programmatically author, schedule and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative. Airflow is not a data streaming solution. Tasks do not move data from one to the other. Airflow is not in the Spark Streaming or Storm space, it is more comparable to Oozie or Azkaban.