Data Warehousing History
Thirty or so years ago, well before the internet, people figured out that you could no longer rely on a single system itself to provide meaningful insights about your business. They are, by definition, only a slice of your business so when you had questions which required correlating multiple systems, a herculean effort was required to answer it. This lead to the Data Warehouse, or Corporate Information Factory as Bill Inmon dubbed it.
The idea was to create a design of a database system which would mirror your business, allowing you to answer any possible questions you had, regardless of which system facilitated that part of your business. It’s the holy grail of data systems as, in theory, you could ask anything you ever could imagine from your data analysts and get an answer reasonably quick.
The Data Mart is Born!
This proved to be incredibly difficult as the pace of business exceeded that by which the data warehouse schema (data model) could be adapted and data could be made available. Around this time is when Dr. Ralph Kimball (Kimball Group) created a new way to solve for the same problem. And thus, the Data Mart was born!
The Data Mart is often synonymous with it’s underlying design, the Star Schema. This design approach focused on individual business processes that were the highest value and built out mini-data-warehouses around them. The design and scale made completing these projects simple and easy to use and they quickly grew in popularity. Some even declared the Data Warehouse dead at this point, but really what was going on is people were just building mini-data-warehouses.
Around 2006 Yahoo! needed to build a new type of data processing system that could scale faster and bigger than anything before it. This was the birth of Hadoop, a distributed computing platform that didn’t require a data model to store data, and was built by nature to be fault-tolerant. Data is saved 3x on different nodes so in order to loose data you’d have to have a major systemic failure, and since there is no data model, there is zero latency when it comes to writing the data to disk.
The Pulsar Method
As those systems evolved, so did the need for how we store and make data readily available. In early 2014 I decided to write this new down in a blog post and label it the Pulsar Method The idea here is that Star Schema’s themselves, while originally intended to be simpler to build, maintain, and access, have themselves become unnecessarily complex and cumbersome.
This new method collapses the star schema into single flat tables. Hence the name Pulsar. The idea being these Pulsars can be easily accessed and understood by people that barely know any database programming (SQL) at all. Database technology today makes this fairly easy and straightforward while providing even greater flexibility and value to your business.
Is the Star Schema dead?
So let’s give this a go…is the Star Schema dead? My answer is no..BUT…the Star Schema in my view is no longer a consumer facing product that the Data team releases to end-users. Instead it is much more of a back-end system that helps the formation of Pulsars easier and consistent.