With 2012 behind us, we charge full-speed ahead into the new era of Visual Analytics and Big Data. First, let me define these terms and discuss them briefly, then I'll break down how to put together your Business Intelligence (BI) ecosystem and organization structure going forward.
Contents
Visual Analytics
The process by which data is consumed and explored visually as opposed to summary statistics or machine learning algorithms.
The humans visual sense is the most powerful of all the human senses. Somewhere along the line neuroscientists and data geeks got together and had this brilliant thought of making software to allow us to explore and consume data how the human brain consumes the majority of all information, visually. This thought developed into several software vendors making heavy investments in this space which has marked the dawn of a new era in Business Intelligence. The era of Visual Analytics. Leading the charge in this space is Tableau Software. Other big players in this space are QlikView and Tibco Spotfire. Playing catch-up, as they often do, are all the rest of the big BI vendors such as Microsoft, Oracle, and MicroStrategy.
Big Data
Data too large or too unstructured to fit into a database.
This is easily the most misused and misunderstood term in the BI industry. In 2012 I saw countless articles describing “Big Data” in all the wrong ways. One article I found in Inc. Magazine described Big Data as data being so large it cannot fit in a spreadsheet, so greater than 1,000,000 rows long and about 200 columns wide. Well, my friends, if this were the truth, many of us could list “Big Data” experience on our resumes dating back to the early 90's or before.
The reality is that Big Data is a problem you do not want, regardless of whatever snake-oil salesman is telling you. And for those data sets larger than your spreadsheet, a simple database will do just fine, as it has for over 2 decades.
True Big data is when you have such a vast amount of data that a typical database system cannot handle it. In that realm of typical databases I will include Parallel Databases such as Vertica and Netezza which have their own scalability constraints, albeit you'll probably never hit them unless you work for Google or Facebook. For an example, Facebook's Big Data problem is in the realm of 100+ Petabytes and for reference: 1 PB = 1000000000000000B = 10005 B = 1015 B = 1 million gigabytes = 1 thousand terabytes
There are tons of new players in this space but for my dollar I would only invest in Hadoop and Cloudera distributions. These are the folks that invented this technology and have more experience than anyone in dealing with it.
Your 2013 Big BI Ecosystem
To get the most out of all the new innovation in the BI industry that 2012 has provided for us here are my recommendations on how everything fits together from a technical standpoint. Following this is my recommendation on how to structure your organization to take full advantage of everything discussed here.
For this I've divided the new era BI Ecosystem into 3 parts; Data Archival, Data Warehousing, and Data Presentation. Starting with Data Archival this is where you'll store all of your data. If you actually must go to a “Big Data” solution, this is the part of the ecosystem that will handle it. The best option here for my dollar is to go with some flavor of Hadoop from Couldera.
Now that your data lives somewhere you'll need a second place to put it in better form for proper analysis. This is what I call Data Warehousing. In this part of the ecosystem you'll want to create some standard master tables for your analyst and application development groups to use. I usually start with the nouns of your business. For example, if I were Amazon, I could describe my business as “Selling products online to customers”. In this case, products and customers are the nouns and the act of selling is a transaction. You'll want to build out separate tables for each key noun and action of your business in your data warehouse. If you do it right, these data structures should provide long-term value unless your business drastically changes. Don't worry too much about getting every detail right here, remember ever data warehouse is only a prototype.
Last and most important, is data presentation. This part of the ecosystem is when you actually put the food on the table. The absolute key here is using proper visual display techniques. If you are responsible for this part of the ecosystem you must read Stephen Few's “Information Dashboard Design“. Other key parts are to make sure access to this information is ubiquitous meaning users can access from anywhere through any device/interface. Another great practice I recommend here is keeping a log of what is presented on all production class views. I like Evernote for this but if you already have an internal company wiki that will probably do just fine.
Here is a mind-map of how it all fits together. If you need to use the zoom controls below or click full-screen for a larger view.
Your new BI Org
Streamlining teams focus and collaboration has always been a challenge for nearly every organization in any company of size. Here is my recommendation on how to structure yours to maximize that focus and collaboration using all the new innovation in the BI industry that 2012 gave us.
I see a new org structure setup to take full advantage of this new era that includes 4 functional areas of Data Acquisition, Data Warehousing, Analyst Groups, and Advanced Analytics.
On the Data Acquisition and Warehousing sides, these groups focus on acquiring, storing, and organizing the data structures to support thorough analysis as well as a application development needs. This team is data heavy and should not be expected to think in visual terms of how to present the data, rather they should be focused on how to physically store and logically present data structures to analyst and development teams.
For the Advanced Analytics and Data Analyst teams their primary focus is how to extract the knowledge from the data available to them. On the Advanced Analytics side I recommend stacking the team with deep thinkers and statisticians. Don't expect to ever have a quick conversation with this team or get a quick answer to a question. They should be focused on the hardest questions that could lead to large scale changes in how your business operates or what it does.
For the Data Analysts, these folks are on the front-lines side-by-side with business stakeholders providing thoughtful and timely insight into whatever the question du jour may be. These are some of your most valuable assets as they can provide quick and detailed responses that help you make better game-time decisions. Their focus should be on understanding how the business functions and how to present information in the best way to help grease those gears.
Here again is another mind-map of how I picture your 2013 BI org being structured.
Ben,
Since you’ve built not only the ETL/Data Warehouse end but also the Presentation layer, I think you should follow up here with pros/cons of the various DW/Mart platform types available now, including columnar, MPP, in-memory, cubes, vs. traditional RDBMS, star schema, agg tables, etc, for visual consumption? In particular, what do you like about Vertica? What limitations, if any, have you encountered with it?
Lastly, for Visual Analytics on a tight budget, do you have any initial impressions of Excel 2013’s Quick Charts? To me, they look like a major step forward.