Is the Star Schema dead?

Contents

Data Warehousing History

Thirty or so years ago, well before the internet, people figured out that you could no longer rely on a single system itself to provide meaningful insights about your business. They are, by definition, only a slice of your business so when you had questions which required correlating multiple systems, a herculean effort was required to answer it. This lead to the Data Warehouse, or Corporate Information Factory as Bill Inmon dubbed it.

The idea was to create a design of a database system which would mirror your business, allowing you to answer any possible questions you had, regardless of which system facilitated that part of your business. It's the holy grail of data systems as, in theory, you could ask anything you ever could imagine from your data analysts and get an answer reasonably quick.

The Data Mart is Born!

This proved to be incredibly difficult as the pace of business exceeded that by which the data warehouse schema (data model) could be adapted and data could be made available. Around this time is when Dr. Ralph Kimball (Kimball Group) created a new way to solve for the same problem. And thus, the Data Mart was born!

The Data Mart is often synonymous with it's underlying design, the Star Schema. This design approach focused on individual business processes that were the highest value and built out mini-data-warehouses around them. The design and scale made completing these projects simple and easy to use and they quickly grew in popularity. Some even declared the Data Warehouse dead at this point, but really what was going on is people were just building mini-data-warehouses.

Database Evolution

Around 2006 Yahoo! needed to build a new type of data processing system that could scale faster and bigger than anything before it. This was the birth of Hadoop, a distributed computing platform that didn't require a data model to store data, and was built by nature to be fault-tolerant. Data is saved 3x on different nodes so in order to loose data you'd have to have a major systemic failure, and since there is no data model, there is zero latency when it comes to writing the data to disk.

The Pulsar Method

As those systems evolved, so did the need for how we store and make data readily available. In early 2014 I decided to write this new down in a blog post and label it the Pulsar Method The idea here is that Star Schema's themselves, while originally intended to be simpler to build, maintain, and access, have themselves become unnecessarily complex and cumbersome.

This new method collapses the star schema into single flat tables. Hence the name Pulsar. The idea being these Pulsars can be easily accessed and understood by people that barely know any database programming (SQL) at all. Database technology today makes this fairly easy and straightforward while providing even greater flexibility and value to your business.