Finding the Grain in Your Database
This page is in support of my course “Star Schema Foundations” on Pluralsight. For more information you can visit Pluralsight.com for a free trial or my author page to see all of my courses.
Finding the grain is key in early going of your Star Schema development. I recommend a three step approach to finding the grain:
- Understand your business process
- Define the grain at the atomic level
- Analyze your data sources for a reality check
Understanding your Business Process
In this step you’ll want to work closely with your internal customers to truly understand their need. Documentation is good if you build it together. The days of sending a giant requirements document to someone are over, or should be. Rather, I recommend that you sit down w/ your customer and talk through their process. If possible, actually go sit with the people executing this process and take notes on points that your solution could help them improve.
Define the Grain at the Atomic Level
The atomic level is the lowest level possible. This is ideal because from there we can derive any aggregation. This is also why transaction fact tables are the most expressive types of Star Schemas (ref).
An example would be the invoice line item, not the invoice header. Capture the data at the actual grain at which it occurs. You scan each boarding pass for the airplane, not the entire flight at once. Later you may want to make higher level aggregations for easier analysis which is great, just don’t lose the detail.
Analyze your Data Sources for a Reality Check
How a business person describes their process and how the data is stored are generally different. Sometimes you’ll find many people use local Excel spreadsheets for where their systems don’t support their data needs. It’s critical that you understand all the sources for your Star Schema so you can accurately represent their process in the database.