Tableau Quick Start – Day 3 – Connecting to Data

http://bensullins.wistia.com/medias/zff1twuvgh?embedType=async&videoFoam=true&videoWidth=640

Full Transcript

Welcome to day three, Connecting Your Data. Today we’re actually gonna open up Tableau and pull in some data and start looking at it and see how things work. I hope your enjoying these videos, and if you have any questions, feel free to email me at help@bensullins.com, and I’ll see you again here tomorrow. Alright, let’s get going. This is the exciting part. You’re about to embark on this amazing journey. Now we’re gonna start at the very beginning with connecting to data. So let’s take a look here and see what the data connection pane looks like. So down below we have our saved data sources, so as use reuse the same data source over and over, you can save them for quick access. On the top, we have our file-based sources. So think like a text file, a CSV, or Excel, Tableau extracts, also statistical files, which typically right now most people use R for their statistical work, so that’s where you can pull that data directly in from a file-based source. And then you have the server-based source. So if you’re working in a company, or you’re working in a company that has stuff on the web, you’re gonna be probably using some of these at some point in time. You’ve got your web-based data sources like Google Analytics, and you’ve also got big data sources like Amazon Redshift, and Hadoop, and all the other ones in there. This is just a sampling here. When you click on more servers, you get a long list of them. When we set up a connection, what we do is we first choose the type of connection. So here what I’ve done is I’ve chosen Excel, and then it’s just gonna browse. It’s gonna ask me where to find that Excel file. After we’ve connected to the Excel file, in this case I’ve connected to my Sample Superstore Excel file. You can follow along and do it yourself as well. We have a connection pane. Now this is really interesting. This is where we can pull in all of the data. So think of an Excel file just like a different type of database, so if you’re used to using Sequel or Oracle, it’s similar conceptually here where sheets are different tables. And what I’ve done in this example, I dragged orders on, then I dragged people on, and returns on, and it automatically created joins for me, because it recognized common fields between those two sheets. So just like in a database when you drag that on, you can have Tableau do the joins for you. And if I were to click on any of those Venn diagrams there, it would pop up with the ability for me to edit that connection. Now when I connect, I can do it live or I can download the data in an extract. Live is meaning that I’m doing essentially real-time analytics. If my database is being updated in real time or every 10 minutes, any time somebody hits my viz and they wanna see my dashboard, it’ll go back to that source and update the data in real time. That’s good, that’s nice, if you’re database supports fast querying. Sometimes databases, often databases, aren’t designed for analytics, they’re more designed for transactional processing, so this will be very slow for your users. Now if that’s the case, or if you’re just not sure, and you aren’t able to hit it live, like it’s a production system, and we don’t wanna affect people placing orders, I would choose an extract. An extract is gonna ask you to download the data, and what it’ll essentially do is create a copy of it locally, and then you can have it refresh where it’ll automatically go back and update that data at some interval if you published a server, or in desktop you can just right click on it and say refresh, and it’ll go download the data again. That makes it very portable, and it also makes it a lot faster, and sometimes gives you more analytical functions. So extracts are pretty popular, but be careful, because large extracts, or if you have a big data system, you’re not gonna be able to do an extract, or, in fact, an extract may take hours and hours to run. It just won’t make sense. Once we’ve set up our tables with a join or whatever kind of conditions we want, down below it’ll preview that. It’ll automatically show me that data. Once we connect, though, what we get are our dimensions. On the side here, this is a part of the data window, in here we have folders that organize our actual fields, all the different attributes. We have a drill-down, or a hierarchy, where you can start, in this case, at location, at country, and then drill down to state, city, and postal code. There’s also geographic roles. These different icons here, the ABC, the calendar-looking one, in this case the globe one, all mean that that field is gonna be treated a bit differently. A date one will automatically have a date hierarchy, for example, and the globe ones automatically will try to draw a map. You can also group values together. This is really cool for what-if analysis. Imagine if you’re looking at something and you say, “Oh, you know what? What if we combine “the Southwest and the Northwest regions? “What would sales or profit look like then?” Well, you can simply select those fields in your viz and right click and say group. There’s a little paperclip icon. Don’t get nervous, it’s not Clippy from Microsoft. It’s a group in Tableau, so you can do that and create these different groupings kind of by hand. So you can do real what-if analysis. You don’t need to go back to your database. You don’t need to call IT. You just do it yourself in the tool. There’s another cool thing here which are bins. So if you wanna see the distribution of something, like how many of our entries for profit in this data set, which is by order, are 10 dollars versus 50 dollars, you could see that distribution. This is your old school, from college, histogram that you would see the typical bell curve type of thing. And what it does, it just allows you to draw histograms, which is really cool, really neat feature. So in the measures, what we have are the fields we’re actually going to aggregate, the things we’re gonna sum and average and all that. These are the subject of our analysis. In there you have some calculated measures. You’ll see those indicated with an equals sign to the left of it. That goes true for dimensions as well. Any time there’s a calculated field, you’ll see an equal sign letting you know. There’s also some auto-generated measures. In this case, we have geographical properties, so we have the latitude and longitude of those generated. Tableau automatically has geocoding built in, so if you give it something like city and state, it knows where those are and how to draw them. It’s just built in, and it’ll auto-generate the bits it needs to do that. If you’re doing more advanced mapping, and you need to have custom regions or anything like that, you can do that as well, it’s just a bit more of an advanced topic. Down the road, I’ll set up another course here where you can really dive into the mapping features of Tableau because they’re pretty tremendous. With sets, what we’re doing here is creating a grouping of values. So here’s top customers by profit. So to do that we would just create a chart showing the customers and the profit, and select a few of them and say, “Create set.” Then we can do things like analyze the number of orders, the regions, whatever, by this grouping. And down on the bottom, for adding inner activity, we use parameters. These are currently separate things that aren’t a part of our data set that we can use to give our users more interactive capabilities on the web. Alright, enough talk. Let’s see some action here. We’re gonna do a demo now, and the first thing I wanna do is to connect to data. After I connect to my data, I’m gonna join a few tables together. Then I’m gonna create an extract of that data. And lastly, we’ll just take a look at it inside of Tableau desktop in the data window. Let’s get going here. So in Tableau, I’m looking at just the first screen I get. Now you may see, if you’ve been using this before, other workbooks that you’ve opened. Here is just a brand new fresh instance. And again, I’m using Tableau 10 here, so this may look a little bit different depending on which version you’re using, but it all should be pretty similar, and the steps are essentially the same. So in the top I have my Excel sources and my other file-based sources, then I have my servers. If I click on More Servers, you can see all the different options: Clutter, Hadoop is a big one, Amazon Redshift, there’s all kinds of options, Google Analytics, Google BigQuery, Google Sheets now available. You can connect directly to a Google Sheet, which is incredible, it’s awesome. OData, if you’re using any Microsoft stuff, Sequel Server as well, just tons and tons of options here, including the Web Data Connector, which maybe we’ll do a separate course solely on that. Here, all I’m gonna do is connect to Excel and choose my simple Superstore Sales here. Click open. And remember, this is kind of like a database. Now, I’m using Excel just because it’s a universal thing, but you could be connecting to Sequel Server or any others and essentially achieve this same thing. Now here, what I wanna do is drag orders on to where it says, “Drag sheets here.” And then down below it gave me a listing of that. It gave me a preview of the data, so I can take a look and see what kind of data I have in there. You’ll notice Tableau has already done something. This Order Sales here, it already knew that that’s a number, it knew that postal code, city, country, region are geographic properties, it knew that these were dates. So it’s already done some data profiling and even created some metadata for me. Now I wanna join do a couple other things, so I’m gonna drag people on. When I drag it on, it automatically creates a join. Now that’s pretty incredible. So what it did is it joined on region. It noticed that both of these sheets have that same one. This is an inner join. I won’t get too deep into that right now, but essentially what’s gonna happen, it’s only gonna get rows back where the region in my orders also matches a region in people. If the region didn’t exist, it wouldn’t be returned. Now I’m gonna add a returns one. This one again, it automatically came up with a join. This is going on order ID, and it’s doing an inner join, I think I’m gonna do a left join here. That way I get every order, and then the corresponding ones with returns. Now another thing I can do, I can edit my metadata. I can click on the sort of hamburger or list icon here. I can see that I can change things. Already I can edit my data sent to right here. Now this is a really cool feature. This is really awesome, if you have a lot of fields, you can group them together, start making it more meaningful and easier to work with later. So once I’ve done that, I can also extract my data right now in live. So what that means is as I query this, and as I do something like build a chart, it’s gonna hit that Excel file directly, but that Excel file may disappear on me so I don’t want to connect live all the time. I wanna extract it. So when I click extract, what’s gonna happen, it updated this and then it’s gonna say, “Extract will include all data.” Good to go. I can click edit. I can filter certain things out. I can do all rows or just some rows. I can do a sampling of rows if I want. When I click now on sheet two, it’s gonna say, “Hey, you’re creating an extract. “Where would you like it to be?” There’s a data sources folder here. I’m just gonna save it there. That’s fine for now. You can put it wherever you put them. Usually there’s a data sources folder in each instance of Tableau, so you can just create that and save it there. So what it’s gonna do, it’s gonna actually download that data into an extract, and then it took me over to Tableau, and see what it did? It gave me a different icon here. So now I have, essentially, three tables of data, joined together, extracted in Tableau. A lot of metadata has already been created. If you look at this, we have the dates here, the globe icons, the numbers, it knew somehow that row ID was not a measure even though it’s a number. So there’s lots of stuff going on. It’s really done a great job of just profiling the data and making it easy to work with from the very beginning. Now that we know how to connect to our data, let’s take a look tomorrow at drawing charts and actually analyzing our data in Tableau. See you back here then.