In this post I would like to share a small review about 2 article and 3 papers with a lot of useful ideas about how to manage data science projects.

.

Content

This paper is about the technical challenges exploring the potential benefits of Big Data. The most interesting part is the description of the 5 phases in the Big Data Life Cycle, and also it describes the challenges in Big Data Analysis having described the multiple phases in the Big Data analysis pipeline, we now turn to some common challenges that underlie many, and sometimes all, of these phases, due to the characteristics of Big Data.

Phase Challenges 1 Data acquisition Heterogeneity

Inconsistency and incompleteness. 2 Information extraction and cleaning Privacy and data ownership. 3 Data integration, aggregation, and representation Scale 4 Modeling and analysis Timeliness 5 Interpretation

Take away

An idea about how could be the Big Data Life Cycle via a five stages in the Big Data pipeline and major challenges, along with challenges specific to each stage.

Keywords

Big Data | Analysis | Challenges

.

Data Science is an emerging field with a significant research focus on improving the techniques available to analyze data. However, there has been much less focus on how people should work together on a data science project.

In this paper, the authors reports on the results of an experiment comparing four different methodologies to manage and coordinate a data science project.

First introduce a model to compare different project management methodologies and then report on the results of our experiment. The results from the experiment demonstrate that there are significant differences based on the methodology used, with an Agile Kanban methodology being the most effective and surprisingly, an Agile Scrum methodology being the least effective.

Take away

A detailed explanation about the Data science process and the comparing analysis between on the results obtained using Agile Kanban, CRISP, Agile Scrum and Baseline, with a final comment about why the CRISP model was the most effective methodology.

Keywords

project management methodologies | Agile Kanban | CRISP | Agile Scrum | Baseline | Team work | Big data | analytics | project management

.

The leading question along this article is how is the approach used by experienced business and analytics project managers for their projects.

The goal was to fill in gaps in management’s understanding of how project managers involved in analytics projects can contribute to the new intelligent enterprise.

They found that project managers’ most important qualities can be sorted into five areas: (1) having a delivery orientation and a bias toward execution, (2) seeing value in use and value of learning, (3) working to gain commitment, (4) relying on intelligent experimentation and (5) promoting smart use of information technology.

Take away

Good practices to follow in data science project.

Valuable recommendations from the experience.

Keywords

Business analytics | project manager | data science project

.

This paper discusses an integrated methodology to structure and formalize business requirements in large data intensive projects, e.g. data warehouses implementations,turning them into precise and unambiguous data definitions suitable to facilitate harmonization and assignment of data governance responsibilities.

We place a business information model in the center – used end-to-end from analysis, design, development, testing to data quality checks by data stewards. In addition, we show that the approach is suitable beyond traditional data warehouse environments, applying it also to big data landscapes and data science initiatives – where business requirements analysis is often neglected. As proper tool support has turned out to be inevitable in many real-world settings, we also discuss software requirements and their implementation in the Accurity Glossary tool.

The approach is evaluated based on a large banking data warehouse project the authors are currently involved in. Data Science and Big Data Governance project effectiveness.

Keywords

Data Modeling | Project Methodology | Data Governance | Metadata, Information Catalog

.

Big Data is characterized by the five V’s – of Volume, Velocity, Variety, Veracity and Value. Research on Big Data, that is, the practice of gaining insights from it, challenges the intellectual, process, and computational limits of an enterprise. Leveraging the correct and appropriate toolset requires careful consideration of a large software ecosystem.

Powerful algorithms exist, but the exploratory and often adhoc nature of analytic demands and a distinct lack of established processes and methodologies make it difficult for Big Data teams to set expectations or even create valid project plans. The exponential growth of data generated exceeds the capacity of humans to process it, and compels us to develop automated computing methods that require significant and expensive computing power in order to scale effectively. In this paper, we characterize data-driven practice and research and explore how we might design effective methods for systematizing such practice and research [19, 22]. Brief case studies are presented in order to ground our conclusions and insights.

Keywords

Big Data;business data processing;big data;large software ecosystem;data-driven research;data science;Blogs;Big data;Data mining;Predictive models;Feature extraction;Distributed databases;Data-driven research;Agile;Data Science;Methodology;Experimental Methods

.

Citations