Okay, before we get started I have to declare the real intent for posting this piece. It is to get you to join The Big Data Contrarians professional group here on LinkedIn.

To apply to join the best Big Data community on the web simply navigate to this address http://www.linkedin.com/grp/home?gid=8338976 (or paste it into your browser) and request membership, the process is quick and painless and well worth the effort.

Now for the rest of the news…

There are many common misconceptions amongst the Big Data collective about Data Warehousing. There are common fallacies that need clearing up in order avoid unnecessary confusion, avoidable risks and the damaging perpetuation of disinformation.

Big Picture

In the dim and distant past of business IT, the best information that senior executives could expect from their computer systems were operational reports typically indicating what went right or wrong or somewhere in between. Applied statistical brilliance made up for what data processing lacked in processing power, up to a point, because even heavy lifting statistics requires computing horsepower, which in those days was really a question of serious capital expenditure, which not all companies were willing to commit to.

Then, and curiously coincidentally, people around the world started to posit the need for using data and information to address significant business challenges, to act as input into the processes of strategy formulation, choice and execution. Reports would no longer just be for the Financial Directors or the paper collectors, but would support serious business decision making.

Many initiatives sprang up to meet the top-level decision-making data requirements; they were invariably expensive attempts, with variable outcomes. Some approaches were quite successful, but far too many failed, until the advent of Data Warehousing.

Back then, most of the data that could potentially aid decision-making was in operational systems. Both an advantage and a problem. Data in operational systems was like having data in gaol. Getting data into operational systems was relatively easy, getting it out and moving it around was a nightmare. However, one of the advantages of operational data is that it was generally stored in a structured format, even if data quality was frequently of a dubious nature, and ideas such as subject orientation and integration were far from being widespread.

Of course, data also came in from external sources, but usually via operational databases as well. An example of such data is instrument pricing in financial services.

Therefore, briefly, a lot of Data Warehousing started as a means to provide data to support strategic decision-making. Data Warehousing ways not about counting cakes, widgets or people, which was the purview of operational reporting, or to measure sentiment, likes or mouse behaviour, but to assist senior executives, address the significant business challenges of the day.

Who's your Daddy?

Bill Inmon, the father of Data Warehousing, defines it as being "a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process."

Subject Oriented: The data in the Data Warehouse is organised conceptually (the big canvas), logically (detailing the big picture and) and physically (detailing how it is implemented) by subjects of interest to the business, such as customer and product.

The thing to remember about subject areas is that they are not created ad-hoc by IT according to the sentiments of the time, e.g. during requirements gathering, but through a deeper understanding of the business, its processes and its pertinent business subject areas.

Integrated: All data entering the data warehouse is subject to normalisation and integration rules and constraints to ensure that the data stored is consistently and contextually unambiguous.

Time Variant: Time variance gives us the ability to view and contrast data from multiple viewpoints over time. It is an essential element in the organisation of data within the data warehouse and dependent data marts.

Non-Volatile: The data warehouse represents structured and consistent snapshots of business data over time. Once a data snapshot is established, it is rarely if ever modified.

Management Decision Making: This is the principal focus of Data Warehousing, although Data Warehouses have secondary uses, such as complementing operational reporting and analysis.

In plain language, if what your business has or is planning to have does not fully satisfy the Inmon criteria then it probably is not a Data Warehouse, but another form of data-store.

The thing to remember about informed management decision making is that it needs to be as good as required but it does not need to achieve technical perfection. This observation underlies the fact that Data Warehouse is a business process, and not an obsessive search for zero defects or the application of so called 'leading edge' technologies – faddish, appropriate or not.

JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976

Some Basic Terms

Before we delve into the meaning of Data Warehousing, there are a couple of terms that need to be understood first, so, by way of illustration:

Let's follow the numbers in the simplification of the process.

We gather specific and well-bound data requirements from a specific business area. These are requirements by talking to business people and in understanding their requirements from a business as well as a data sourcing and data logistics perspective. Here we must remember at all times not to over-promise or to set expectations too high. Be modest. These business requirements are typically captured in a dimensional data model and supporting documentation. Remember that all requirements are subject to revision at a later data, usually in a subsequent iteration of a requirements gathering to implementation cycle. We identify the best source(s) for the required data and we record basic technical, management and quality details. We ensure that we can provide data to the quality required. Note that data quality does not mean perfection but data to the required quality tolerance levels. Data Warehouse data models modified as required to accommodate any new data at the atomic level. We define, document and produce the means (ETL) for getting data from the source and into the target Data Warehouse. Here we also pay especial attention to the four characteristics of Data Warehousing. ETL is an acronym for Extract (the data from source / staging), Transform (the data, making it subject oriented, integrated, and time-variant) and Load (the data into the Data Warehouse and Data Mart). We define, document and produce the means for getting data from the Data Warehouse into the Data Mart. In short, a bit more ETL. User acceptance testing. NB Users must ideally be involved in all parts of the end-to-end process that involves business requirements, participation and validation.

This is a very simplified view, but it serves to convey the fundamental chain of events. The most important aspect being that we start (1) and end (7) with the user, and we fully involve them in the non-technical aspects of the process.

JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976

Business, Enterprise and Technology

Essentially, a Data Warehouse is a business driven, enterprise centric and technology based solution for continual quality improvement in the sourcing, integration, packaging and delivery of data for strategic, tactical and operational modelling, reporting, visualisation and decision-making.

Business Driven

A data warehouse is business centric and nothing happens unless there is a business imperative for doing so. This means that there is no second-guessing the data requirements of the business users, and every piece of data in the data warehouse should be traceable to a tangible business requirement. This tangible business requirement is usually a departmental or process specific dimensional data model produced together in requirements workshops with the business. We build the Data Warehouse over time in iterative steps, based on the criteria that the requirements should be small enough to be delivered in a short timeframe and large enough to be significant.

Typically, a Data Warehouse iteration results in a new Data Mart or the revision of an existing Data Mart.

Enterprise Centric

As we build up the collection of Data Marts, we are also building up the central logical store of data known as the Enterprise Data Warehouse that serves as a structured, coherent and cohesive central clearing area for data that supports enterprise decision making. Therefore, whilst we are addressing specific departmental and process requirements through Data Marts we are also building up an overall view of the enterprise data.

Technology Based

By technology, I mean technology in the broadest sense of techniques, methods, processes and tools, and not just a question of products, brands or badges.

Unfortunately, there is a popular misconception that Data Warehousing is primarily about competing popular and commercial available technology products. It isn't, but they do play an important role.

Architecture

The following is an example of a very high-level Data Warehouse architecture diagram.

Methodologies

Various methodologies support the building, expansion and maintenance of a Data Warehouse. Here is one example of a professional data integration methodology, produced, maintained and used by Cambriano Energy.

And here is an information value-chain map as used by Cambriano Energy as part of its Iter8 process management. There are alternatives, many of which do a satisfactory job.

Last but not least, this was (from memory) the way that Bill Inmon's Prism Solutions ETL company used to view the iterative EDW building process.

JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976

Keeping it Shortish

At this point, I decided to cut short further explanations on aspects on Data Warehousing. However, if you have any question then please address them to me and I will do my best (or something close) to answer them.

That's all folks

Hold this thought for another time: If you think you can replace a Data Warehouse, that is not a Data Warehouse, with another approach to 'Data Warehousing' that doesn't produce a Data Warehouse, for as fast and cheap as one can do it, then you still don't have a Data Warehouse to show for all of your efforts. That is not a great place to be.

Therefore, you see, Data Warehousing was never about a haphazard approach to providing random structured, semi-structured and unstructured data of various qualities, provenance, volumes, varieties and velocities, to whomever was of a mind to want it.

Many thanks for reading.

If you want to connect then please send a request. I you have any questions or comments then fire them off below. Cheers :-)

Oh… and one last thing before I go… DON'T FORGET TO JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976