Leveraging your Tableau Server to create large Data Extracts

Recently I’ve been doing a lot of work leveraging the Tableau Data Server. This is a great new feature of Tableau Server that allows BI groups to publish standard sets of data to Tableau for analysts to consume. This is a great alternative to traditional approaches which take 100x as long plus don’t allow for rapid prototyping of the data asset.

Although this feature is great it hasn’t always worked the best. The most often problem I run into is creating the extract itself when dealing with “large” data volumes of 20M+ rows. I developed a work-around to leverage the power of the server to create the extract and save your poor laptop from having to crunch all that data for you. Here goes…if you end up trying this please let me know how it works for you or if you have any alternatives. I’d like to use this to feedback to Tableau on how to improve this feature of their product.

1. Create a view in your source database that pulls a small amount of the total rows you want in your extract. In SQL Server I do this using the ‘top’ function. In MySQL you’ll need to use the Limit() function. Before moving to the next step test that querying this view works in a reasonable time frame from your local query environment (eg. SSMS, Toad, SQLYog, etc.) and there are no obvious data issues (nulles, dupes, etc.)

2. Create your extract in Tableau importing all data from the view. Make sure to create whatever hierarchies you want, rename columns so they make sense, correct any data types Tableau messed up, and whatever calculations you know of at this point. Again, before moving on, test that this works relatively well with this subset of data.

3. Publish your data extract to Tableau Server. With the extract now on the server, go and manually run a full refresh. Watch the progress in the Background Tasks Maintenance View. Once it finishes successfully test that you can connect to it and all looks good.

4. Now, go change your view to return all rows. Remove whatever condition you put in to trim down the rows returned. This should allow the extract to pull in the entire data set next time it is refreshed on the Tableau Server.

5. Go back to Tableau Server and execute another full refresh of your extract. This time, the extract should get the entire population of data and the beauty is all the horsepower of your Tableau Server will be used to create the extract instead of your poor laptop.

6. Make sure the extract finishes on the server (this may take a while) and then test again that you can connect via the Tableau Data Server. Lastly, revel in your accomplishment :)

Hope this helps! Please let me know how it works for you if you end up trying it out…

Here are the results of my most recent use of this technique. I started w/ a view only showing 10 rows then changed it behind the scenes to pull in all 35M rows. Works like a charm!

Cheers,
Ben

Finding your Tableau Server PostgreSQL database port

Ever have issues connecting to your Tableau Server PostgreSQL database due to a port issue? If so, you’re not alone! I too have struggled with this before and after hitting it just the second time I did some digging and fond how to determine the port of the Tableau Server PostgreSQL database. Here you go:

  1. Browse to your Tableau Server config location and open up this file:
    [install drive]:\Tableau\Tableau Server\data\tabsvc\config\connections.yml
  2. Find the line in there showing the port “pgsql.port: XXXX”

Then when connecting to Tableau Servers PostgreSQL database you simply enter that value into the ‘port’ textbox in the connection form and viola!

Hope this helps!

Amazon EC2 Hosting and Tableau

Recently I started working with the Amazon EC2 (more) and have been pleasantly surprised. In about 20 minutes I was able to create my own Windows 2008 R2 Data Center Edition Server in the cloud with enough space to run Tableau Server.

Choosing one of the basic configurations from Amazon makes life easy to provision enough horsepower to run a product like Tableau Server but of course there are many more ways you can customize your instance to meet you or your customers needs. Full disclosure, I plan on offering cloud based Tableau Server hosting to my consulting clients so my idea that cloud services like hosted Tableau becoming the new standard for niche BI vendors is a clear alignment of interests.

That said, from start to finish I had a fully functional Tableau Server environment within an hour and was able to access it externally as well. One downside I see here is the lack of Active Directory integration for clients however, Tableau fully supports this, there would just be some networking voodoo needed to make it work.

Without Active Directory integration Tableau Server uses forms based authentication. This means you have to setup each user individually or by script. All in all the trade offs between trying to deploy an on-premise Tableau Server implementation and a cloud based one, the cloud one is definitely attractive.

I haven’t fully explored what all the EC2 has to offer however I can assure you that as I start offering it to clients I’ll post updates about the ins and outs of using Tableau Server on the Amazon EC2 cloud environment.

Full Device Encryption on Android 4.0

After upgrading my mobile device to the Nexus S from Google, I was poking around in the security settings when I found something new. Android 4.0 now has full-device encryption capabilities. This means that while enterprises wishing to gain the usefulness and beauty of information design presented by mobile BI vendors like Roambi but have been skeptical of data security now can rest at ease.

The full-device encryption and Android 4.0 isn’t easy however. Device owners must secure their device with a password or pin, so no fun face-unlock capabilities but for many used to dealing with corporate BlackBerrie’s I’m sure this is a minimal sacrifice.

After going to Settings > Security to enable encryption, you must provide a password or pin. The next step is for your device to reboot and encrypt all of its contents. During this time you’ll see the following screen:

This process will take about an hour. Once this process finishes you’ll now be required to enter your password to decrypt the device anytime you reboot it. Here is what that screen looks like:

That’s all I’ve got so far, I’ll update this post with more details as I get them. The main thing here, and reason I posted this is to draw attention to Android 4.0 device security for mobile BI vendors. Anyone serious about mobile BI and Data Security should start adopting this platform.

Centralized Business Intelligence

My approach as a BI professional has always been to organize the data into conformed structures, present that to end users for consumption in their ad-hoc analysis, and only build formal reports and dashboards as necessary. I think the conformed structures provide great consistency for BI across the enterprise no matter who is building it and allow users to get answers to questions faster.

Of course, there is no way IT can keep up with the onslaught of report requests so it makes sense to provide users with the tools to do these themselves. Typically, I would offer up a connection to a cube through Excel and now with the addition of PowerPivot I believe this type of process is going to become even more prevalent.

Some I have talked with have expressed concern of decentralization of BI by tools like PowerPivot however; I see these tools as giving users a sort of proving ground for their BI applications. If an information worker successfully creates a BI application in PowerPivot that becomes widely used across the enterprise IT does not have to guess that formalizing it into a more formal BI application will have value.

Other methods exist that I think are worth for determining the value of a BI application (eg. BI Pathway Method) however I believe this is for more formal BI versus the ad-hoc type of analysis that users clamor for on a day-to-day basis.