When a bunch of open source devotees get together, the results can be quite astounding. In eight short years, Cloudera has become number five on the Forbes Cloud 100 list. As for the “pay it forward” reference – see the comments at the end.

Cloudera’s Daniel Ng, senior director of Marketing – APAC (based in Singapore), was in town to visit its new Sydney office, in addition to its long-established Melbourne one. At a media briefing, Ng, of course, spoke of the meteoric rise of Cloudera and its Hadoop implementation, but he also spoke of his passion in coaching professionals with the end-goal of bringing on the next generation of data, marketing, and business leaders.

Ng is the architect of “The Base of a Smart Nation” (Cloudera BASE) programme which has become a guiding principle behind public and private sector participation to deliver Big Analytics Skills Enablement (BASE).

The programme has been enthusiastically adopted by Cloudera and is being rolled out across Singapore and Malaysia. BASE involves an ecosystem of educational institutions, training partners, government organisations, and the technology community at large working together to train, nurture and eventually produce a new generation of data professionals who can add value to the future economy.

Let’s get Cloudera out of the way first. Ng says its growth has been meteoric and it's quickly moving to “pre-IPO” stage.

In 2008, Mike Olson (Oracle), Christophe Bisciglia (Google), Amr Awadallah (Yahoo!) and Jeff Hammerbacher (Facebook) started Cloudera. Everyone had been involved in big data, analytics and knew it needed some serious disruption.

In 2009, they convinced Doug Cutting (former chairman, Apache Software Foundation), who had written the initial Hadoop software in 2004 (the genesis of the idea came from Google File System), to join them.

Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from x86 commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. It uses a Hadoop Distributed File System (HDFS).

It was named after a toy elephant belonging to Cutting's son, hence further development has had a largely animal/nature nomenclature – Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Cloudera Impala, Apache Flume, Apache Sqoop, Apache Oozie, and Apache Storm.

“We know Hadoop better than anyone, and we contribute a lot of our IP to the open source community. Enterprise loves open source if it can be implemented and that is where we come in. The software may be low or no cost, but it still needs experts to configure and support it to a standard expected by wnterprise,” said Ng.

The concept of open source Hadoop has been very attractive to big “proprietary players” like Oracle (Big Data Appliance), Dell EMC (DSSD support), Intel (US$740 million investment and ceased its Hadoop implementation), SAS Institute (partnership), Capgemini (marketing partnership), Teradata (marketing partnership), and, more recently, Microsoft Azure (full cloud support).

But on to Ng’s passion as the “father of the BASE programme.”

"The skills to analyse Big Data reside in younger people but the skills to make sense of the patterns and insights reside with older, more experienced people. The BASE initiative came from the premise that Big Data is or will underpin everything and we don’t have enough professionals to extract all it promises," he said.

"BASE aims to pull together nation states, education, business and individuals and fast track the output of data analysts. Instead of say seven years to implement and even longer to produce results we believe you can have a tangible outcome in two years."

The infographic below shows Ng’s concept that has been enthusiastically supported by the Singapore Smart Nation Programme Office (SNPO), and the Malaysia Digital Economy Corporation (MDEC).

The Ministry of Communication and Information (MCI) in Singapore recently also announced that it is setting aside S$120 million to support both current and future infocomm professionals, which include high demand areas such as data analytics. Similarly, MDEC in Malaysia has said that it aims to produce 16,000 data professionals and 1,500 data scientists by 2020. BASE can help achieve that.

But Ng said the real benefit was the result – producing enough skilled professionals in as short a time as possible. "Most importantly, it is the benefit to the end user (organisations needing data professionals to harness big data analytic skills and technology). For example, it can be a start-up business focusing on building a smartphone app; such as one that uses big data collected from wearable devices, used to monitor and ensure the safety and well-being of elderly people who are staying alone. Ultimately, it allows the user or related parties to make an informed decision," he said.

"BASE is all about 'upwards' connections adding value on top of value. We create more value as we connect the dots (range of key players). Cloudera is a key player — the catalyst — because part of the speed to market is that it has created the curriculums, it provides its Cloudera Enterprise and CDH software for free to participating institutions and a range of other services as needed.

"Businesses like Dell, Intel, Red Hat and Microsoft have joined the BASE initiative in Singapore and Malaysia. Academic institutions like the National University of Singapore (NUS), Universiti Tunku Abdul Rahman (UTAR), Multimedia University (MMU) and Institute of Technical Education (ITE) have also joined to provide training in data skills. We have over 100 tertiary institutions on board now.

"A key part of the BASE program is to encourage internships. They are good for the 'grad' because they involve a few months of real-world experience and good for the company because they can 'try before they buy' We do not expect any difficulty in all those new data professionals gaining meaningful employment. And because of that, it is a great incentive to young people to start to consider a career in data science – something not currently seen as in vogue."

Ng is already looking beyond the BASE programme into the headier world of Masters and Ph.D in Data Science but admits that is a very much longer term goal. "We need analysts now and scientists later."

If you are interested in the BASE program, he would love to hear from you.

His initial position paper is here.

Comment

It feels really good to hear about initiatives like BASE and how one man, backed by Cloudera, can make a difference – it is real ”pay it forward” stuff, something I strongly believe in.

If Daniel Ng is half as good at his marketing job for Cloudera as he has been so far with its adoption of the BASE programme, I see nothing but blue skies ahead for this open-source devotee – both him and Cloudera.