August 21, 2008
The Active Data Warehouse
Think that active and real data warehouses are pretty much the same? Think again.
So what’s wrong with the active data warehouse? We have a lot of data. We have some users. We have a form of business intelligence.
So what’s the big deal? The big deal is that the active data warehouse is not a data warehouse at all.
The active data warehouse is an operational application, not a data warehouse. While there are certain similarities between an active
data warehouse and a real data warehouse, those similarities are almost a coincidence.
Why is an active data warehouse not a real data warehouse? A real data warehouse is built to support management’s decisions. A real
data warehouse is built to support long-term strategic decisions. A real data warehouse is built to support a level of granularity
that can be used in many different ways and still maintain a consistent system of record. A real data warehouse is not concerned with
up-to-the-second values of data. Instead, the real data warehouse is concerned with a historical record of data.
The active data warehouse is concerned with supporting clerical activities. Whereas the real data warehouse supports the managerial
community, the active data warehouse supports the clerical community. The active data warehouse supports transaction update. The real
data warehouse does not support transaction processing. The active data warehouse supports backup and recovery. The real data
warehouse also supports backup and recovery, but in a very different way than the active data warehouse.
These then are some of the differences between a real data warehouse and an active data warehouse. The biggest single difference
between an active data warehouse and a real data warehouse is in the community supported and the type of decision supported.
The question then becomes, "Can a real data warehouse be an active data warehouse at the same time?" Or, conversely, "Can an active
data warehouse also be a real data warehouse?" While it is true that there are major overlaps between the two types of architectural
structures, on the whole, there is not so much overlap that an active data warehouse is a real data warehouse. There are subtle yet
very real differences between an active data warehouse and a real data warehouse.
Real and active data warehouses differ in terms of scope. An active data warehouse must focus on the requirements for the operational
processing that it must do. Stated differently, if an active data warehouse does not focus on its immediate requirements, then it
will not accomplish those operational requirements well.
A real data warehouse must focus on its objective as well. However, the objective of the real data warehouse is quite different from
that of the active data warehouse. The active data warehouse must focus on corporate-wide integration of data and the storage of that
data at a granular level. This means that the real data warehouse has a much wider scope than the active data warehouse.
Real and active data warehouses also differ in the meaning of data integrity. In the active data warehouse, integrity of data centers
around the insurance that a transaction has processed correctly and that the active data warehouse has been left with the proper
artifacts of that transaction. In the real data warehouse, data integrity refers to the fact that if the same transaction is run
against the data warehouse, the same results will be achieved. You see, in an active data warehouse, you can run a transaction
against the data and achieve one result, and then rerun the same query one minute later and achieve different results. In a real
data warehouse, such a phenomenon cannot occur.
Real and active data warehouses also differ in terms of the support infrastructure. The active data warehouse must support an
infrastructure that supports transaction processing. This means that operating capacity must be sufficient for supporting the
transaction processing at peak period. And, inevitably, this means that during nonpeak period processing, there is a lot of capacity
that is not used. A real data warehouse also has peak and slack periods. The difference is that since transaction processing is not
done against the real data warehouse, there is no reason for the spike that occurs during peak period processing. In other words, the
workload that occurs on the real data warehouse is able to be spread out so that the peaks and valleys are not so pronounced.
Another major difference between a real and active data warehouse is in the support of data marts. In a real data warehouse, there is
strong support for data marts that exist outside of the data warehouse. In a real data warehouse, there is recognition that data has
a legitimate and different existence in different forms and that those forms are best managed outside the data warehouse. In an
active data warehouse, there is strong pressure placed on the consumer to create and manage the data mart inside the active data
warehouse. The vendors of active data warehouses discourage even the existence of data marts.
There are other notable differences between real and active data warehouses, but these are some of the most prominent.
So what’s the problem? The problem is that consumers think they are buying a real data warehouse when they are being sold an active
data warehouse. This causes a lot of confusion in the industry.
It is suggested that the vendors of active data warehousing change their name to something more descriptive. One suggestion might be
an "integrated operational application," or maybe "real time integration." But for the vendor to suggest that what is known as an
"active data warehouse" is, in fact, a real data warehouse is simply confusing and inaccurate.
By Bill Inmon
Business Intelligence Network