Recently I’ve been doing a lot of work leveraging the Tableau Data Server. This is a great new feature of Tableau Server that allows BI groups to publish standard sets of data to Tableau for analysts to consume. This is a great alternative to traditional approaches which take 100x as long plus don’t allow for rapid prototyping of the data asset.
Although this feature is great it hasn’t always worked the best. The most often problem I run into is creating the extract itself when dealing with “large” data volumes of 20M+ rows. I developed a work-around to leverage the power of the server to create the extract and save your poor laptop from having to crunch all that data for you. Here goes…if you end up trying this please let me know how it works for you or if you have any alternatives. I’d like to use this to feedback to Tableau on how to improve this feature of their product.
1. Create a view in your source database that pulls a small amount of the total rows you want in your extract. In SQL Server I do this using the ‘top’ function. In MySQL you’ll need to use the Limit() function. Before moving to the next step test that querying this view works in a reasonable time frame from your local query environment (eg. SSMS, Toad, SQLYog, etc.) and there are no obvious data issues (nulles, dupes, etc.)
2. Create your extract in Tableau importing all data from the view. Make sure to create whatever hierarchies you want, rename columns so they make sense, correct any data types Tableau messed up, and whatever calculations you know of at this point. Again, before moving on, test that this works relatively well with this subset of data.
3. Publish your data extract to Tableau Server. With the extract now on the server, go and manually run a full refresh. Watch the progress in the Background Tasks Maintenance View. Once it finishes successfully test that you can connect to it and all looks good.
4. Now, go change your view to return all rows. Remove whatever condition you put in to trim down the rows returned. This should allow the extract to pull in the entire data set next time it is refreshed on the Tableau Server.
5. Go back to Tableau Server and execute another full refresh of your extract. This time, the extract should get the entire population of data and the beauty is all the horsepower of your Tableau Server will be used to create the extract instead of your poor laptop.
6. Make sure the extract finishes on the server (this may take a while) and then test again that you can connect via the Tableau Data Server. Lastly, revel in your accomplishment 🙂
Hope this helps! Please let me know how it works for you if you end up trying it out…
Here are the results of my most recent use of this technique. I started w/ a view only showing 10 rows then changed it behind the scenes to pull in all 35M rows. Works like a charm!