What Data Technology Should I Learn?

From my recent live broadcast, in this video I cover the following topics:

  • Background on Data Industry
  • Key Technologies in Data
  • Data Personas
  • Who Should Learn What
  • Where to Find More


Full Transcript

– Hello, hello! What is up everyone? Welcome to Help Me Data Geek number two. The second week now, I've got a new setup here, hopefully the audio's better. The office is complete, or, I need some acoustic stuff in here, but overall, it's pretty close to being done. What I want to talk about today is which technology in the data realm should I learn? So, this is a fun topic, I get this question all the time. I probably have a dozen emails from aspiring data analysts and data scientists and business people asking me “What should I do and what should I learn?” So, I figured I would try to cover this topic today, and we'll have some fun exploring these technologies. And, if you have any questions during the broadcast here, just post them in the chat on the right, whatever side that is, and I'll try to get to them at the end of this. And if not, whatever, we can come back and talk about it later. You can email me at [email protected] anytime, and we'll try to get a question answered soon, from whenever you send that over, so cool. Let's take a look now, I actually created some slides. I'm a geek about slides, I love slides, but I like to make them not horrible. So I'll do that now, so we hop over to that. I'm gonna kick up the slides … which are going, and … or not. Let me see if I can get that going. OK, I got that. Now, let's go here, and I click that one. Sweet. OK, cool. So you should be seeing now the Which Data Technology Should I Learn? So the first thing to talk about here, what we're gonna learn today, we're gonna talk about the background. I think there's some important context here about this topic, and about these technologies. So I'll just talk briefly about that kind of thing. Then we get into the actual technologies themselves, we'll talk a little bit about the different technologies in the data realm, and we'll get into the personas, so the people that actually make up or that use these technologies every day. And then the marrying of those two, so who should learn what. So hopefully this will help you answer the question, like “What should you learn?” depending on who you are, and what your goals are. And the last thing I'll point you at where you can find more info, well actually, jump over and I'll show you some websites that are great to learn on. OK, cool. So, first, take a look at this. So, this is from indeed.com which is like a job website, a lot of career stuff on there, and these are the programming languages ranked by number of programming jobs. So I think this is relevant because the very first one is a database language, it is SQL, so, this is one that's been around forever, so the point here is that SQL is probably one of the most universal programming languages. So, whether you're a developer, or whether you're a data geek, SQL is there and that's pretty awesome. Also down on the list a little way is this Python, and Python is another, granted it does a lot of things, but it is really popular in the data science realm, as well as the data engineering realm. So, two out of the, I dunno what is this, top nine or ten languages right here are data languages, so that's incredible, just to show the popularity of data technologies today. Then, this other piece I wanted to share was Glassdoor put out the 25 Best Jobs in America, and this is cool because the number one best job was data scientist. Now, I'm gonna go off script and generalize this for a little, pause for a second here and say, and I'll talk about this a little more, data scientist to me is akin to data analyst as well. I'm not sure when they're talking about it here, if they're talking about more of a true data scientist, somebody who does formal, statistical modeling and comes up with machine learning APIs and those kind of things, or if they're just talking about somebody that uses data to solve problems, which is a very general way of thinking about the process of data analysis, a data scientist being probably the most advanced role in the realm of data analysis. So, just think about that, that the number one best job in America, and this includes all kinds of jobs, in fact I think number two was a CPA, like a tax guy or something like that. So, that's just insane that, I mean data's hot right now, and so I hope what I share with you today is gonna be important for you to understand the journey you wanna go on. So let's talk about some technologies now. Alright, and this one, I dunno if this will be a surprise to people or not, people that know me it certainly won't be, but Excel. I can't say enough about Excel. It is probably the most powerful piece of software ever made. It helps us run the world essentially, I still think that OPEC, the cartel that creates, controls oil prices from the Middle East probably sits around with a pivot table, figuring out what the price of oil for the world should be. Excel is that kind of a thing, it comes up, the famous Harvard economist study about GDP, which turned out to be wrong, it was an Excel error, I'll blog about that in the future I'm sure, I mean Excel really is one of the most powerful things so I think it's really critical for anyone in any role in the tech world today. I mean, you're really in business, or anything, and most people probably know that, that's probably not a big surprise there. The thing that I would say is that the people that maybe are skeptics of this, so probably the biggest skeptics I've encountered are database people, so people that are hardcore database developers that know SQL, they think “Excel, it can't handle too many rows, “I can't write SQL against it, you know, “that's my hammer, you know that I use for everything.” I would encourage you to take a look, I use Excel to actually automate the writing of SQL at times, and I also use it to do things like build simple data models and create database tables. So you can use Excel to save yourself time in other programming languages, so whether or not Excel is your hammer that you use for everything, which, that'd be tough if you're doing a lot of data work, but, it can have many different purposes, and that's actually one of the gift and the curse of Excel is that it's such a generic product because they have such a wide audience that they try to serve, that it won't take you all the way to completion there, it'll get you about, I don't know, 50, 60%, but a lot of it is gonna be on you to finally complete the project using Excel. And so, that's one of those things that people love and hate, I love it and I recommend, regardless of what job or what your role is, that you take a look and see how it can benefit you. OK, now onto the hot stuff, the fun tech that's going on in the data world. But first I need to pause and have some coffee. Brought to you by Stone Brewing Company. If you guys, anyone from Stone is watching you wanna send me a beer to drink on the show, I will happily do that, so. Anyways, back to regularly scheduled programming. The Key Technologies So the first one I wanna talk about is SQL. This is a query language that is universal to all databases. Now, caveat or asterisk there that, NoSQL databases, which by the way stands for not only SQL, not no SQL like the absence of SQL, those things are something like MongoDB, or HDFS, or some of the other databases that we call databases, that aren't really, exclude them from that list. If you're a database you support SQL. If you don't you're not a database, I guess that's my stance on it. So, SQL works with all the databases, and each one has it's own flavor, so MySQL has it's own flavor of SQL, so it supports the standard ANSI Standard SQL, which aren't many programming languages that have an ANSI standard. MySQL has it's own flavor, Oracle has PL/SQL, SQL Server, another Microsoft thing, has T-SQL, that Microsoft thing like it's just a thing, Microsoft SQL Server's huge. All of them, Postgres, they all have their own version of SQL that extends beyond the ANSI standard, but at the base level they all support a lot of the common functionality, so Select statements, Group By's, Where's and all those kinds of things. So what that means is that, if you know this one language you can talk to nearly all databases that exist, which is great because if you're a data geek, like me, or you're in a data role, you don't care, you can come into a company, “Oh OK, what kind of database do you have?” Cool I just need the right tool that I can connect to that database and then I can execute my queries cos I can write queries, cos I know the standards. The next one is Python, and Python I absolutely love. It is one of the few technologies that actually incorporates Zen into it, they have these principles, and the Zen of Python and it's one that is just beautiful and easy to read, and incredibly simple to learn. And of course because there's a big community, like a lot of these, whatever you're looking to do has been done before, so you don't need to reinvent the wheel, you can Google, and copy and paste from Stack Overflow, or whatever you wanna do. So Python is another one that is huge in the data world right now. Python of course is more general than just data, but in the data realm, especially data science and data engineering, it's huge. Tableau is another one, and this may be a bit controversial, or maybe not, if you guys follow my stuff you know that I love Tableau and I teach and talk about it a lot, I'm hoping to speak at the Tableau conference this year, all those kind of things. So, anyways, this one is huge, but I'll generalize this a little bit and say that, the BI and analytics tools, and Tableau is in my mind the best one out there, QlikView is another popular one that is also a leader, if you saw the recent BI Magic Quadrant, there was the three leaders left in the top right quadrant there, Tableau was one, QlikView was another, and Microsoft was the other. I would say Microsoft is definitely playing catch-up to the other two, and Tableau I think is the true leader because they're the ones that really have revolutionized the whole BI and analytics world with their approach to self-service analytics, and making it simple and easy for people to visually explore their data. I don't wanna get on a sales pitch about Tableau, you guys have heard me do that enough, but, the point being, your BI and analytics tool, I recommend Tableau, is huge and right now it's super-important for people to learn that. And the other one is R, and I hate the name, just because anytime you search for it, you just get all kinds of crap results, but R is an open source, statistical modeling programming language essentially, and there's variants of it, there's R Studio, R Server, there's shiny dashboards, there's a whole realm of stuff popping up around R. And this is largely used by data scientists, but I would say that it's finding other applications outside of that, so people that aren't classically trained in statistics or some of the other applied mathematical principles are using R to understand data better and to make graphics and everything like that, so a really powerful tech. Alright so those in my world, or in my opinion, are the top four technologies in data right now. Now we'll switch gears, and I wanna talk about who you are, and hopefully if you're watching this, you're one of these three roles, and I'll have a fourth role I'll mention but I don't wanna highlight it as a data role. So, the first one is the knowledge worker. So the knowledge worker is the person that is a business person, that is using data to make decisions to run the business, to do whatever their business role, I say business but I mean organization. I used to work for Mozilla, so we had a foundation and we didn't refer to ourselves as a business, but whatever your organization or company or business or whatever it is, there are people that use data to make decisions, hopefully a lot of people, hopefully this role, this persona hopefully applies to a really broad range of folks. And so, I would contend that even C-level folks should be knowledge workers, in that this is a big, big market and it's really the ones that, the people that take the insights that were developed for you or the dashboards or whatever, and apply them and actually make the difference. So some of the, the last mile in the journey, if it were, from where data starts to where it actually has an impact. The next is the analyst and the scientist, and the data analysts and scientists are the persons that, the people that will take data generally either from collecting it however they can, from scraping it from the web, or pulling it from a database, or downloads from CSV files, or whatever, and making sense of it. So this is the real exploratory work, this is really fun work cos you get to learn a lot and this is constantly evolving, and there's just a huge opportunity to be really creative here about how you use data. Then you have the engineer, and the engineer is the one that really makes this whole thing hum, without them the pieces don't fit together, the data doesn't flow. Someone told me recently, I was having a chat with a friend, and they were saying that something like 70 to 80% of data scientist jobs is collecting and organizing data so that then they can do analysis on it, and I thought that was ridiculous. I think that's just not how I, I've not structured my teams, my organizations that way so, that's insane to me that companies would hire somebody, or expect somebody who is extremely hard to find, extremely valuable, and have them do the heavy lifting of just moving data around. I mean a data scientist should have, in theory it's like a chef coming in to the restaurant, where they should have all the tools laid out, prepped, cleaned exactly how they like them, and then they should have all the food ready to go, and they just make these beautiful dishes, these beautiful creations. That's what the analysts' and scientists' role should be, it shouldn't be, you know Emeril doesn't come into his kitchen and go chop tomatos, to make the salad. Somebody's chopped the tomatos for Emeril. So that's my point. You should have somebody chop the tomatos for your data analysts and data scientists first. And that would be the data engineer, or the data engineering team. OK, so, onto the next one. Who Should Learn What? Well on the left here I'm just gonna put up our knowledge worker, our analysts and scientists and our engineer, and then on top I'll put our programming languages. So the first one is Excel, so the knowledge worker obviously needs to know that, in fact they're probably the most familiar with it and they probably try to do everything in Excel. One of the most ironic things that you find, and I found throughout my career, is you spend all this time building these dashboards and trying to make it easy for knowledge workers to find answers to their questions and get their job done. And, still the most common denominator is “Can I download it to Excel?” And that's, it's unfortunate because the idea is to not have to do that, cos often what people do then is they try to join it up with other data or they try to mangle it together, or fit it into the model they want and then make their own thing in Excel and it's like “We can do that for you, you know, “or we can teach you other tools and ways of doing it.” So, anyways, Excel is obviously key, SQL's another one. I have a fun story back in my first real data role I was working at a call center in Phoenix in the late 90s, and my boss, who was a pure business guy, his role was to help understand customer service staffing, so what we did, or what I did is we looked at the schedules for our inbound sales actually, so it was customer service and sales calls coming in, and we're trying to balance the staffing levels, like how many people are on the phones at this time, which means we have to predict how many calls we're going to get, which things like marketing campaigns or whatever, if there's an outage, that kind of stuff. And then, think about other people's situations like A, so-and-so has vacation they need to take and all this, and we're talking about 2,000 people in a call centre, so, lots of data. And it's really a numbers game, trying to fit all these things together. His job, he actually ran that for a number of call centers, my job was to help work with the data there, and the funny part about the story is he's a complete business person, he's not a tech person, he's not a developer. He knew SQL, and it blew me away. I thought, “Holy crap, so here's a business person”, and we're talking late 90s, “Where he's writing SQL code “to figure out how to do his job.” And at first I thought “This guy's frigging awesome!” And then I also thought “Man if he's writing SQL I need to be stepping it up.” cos I thought SQL was the end of the skills I needed at the time. So anyways, knowledge workers, yes to SQL. The other one is Tableau, and again if you don't have Tableau, you should go try it out, but if you have a different BI tool, whatever it may be that's fine, do that one, knowledge workers need to use this, this is the nature of self-service analytics, because Excel and SQL, well SQL's gonna be hard, especially for complex analysis. And Excel has its limitations, for as great as it is. Tableau goes beyond that, it is the best of both worlds there, it's easy to use like Excel, there's really not coding or you can't code in it, but it's not really required, you can get a lot done without coding, and it's one that can handle large sums of data, connect to databases, connect to web data sources and all that kind of thing, so Tableau is an absolute must for the knowledge worker. Then you have on the analysts and scientists of course Excel, SQL too, the analysts and scientists are gonna have to get down and query databases. I know a lot of people that are actually Tableau experts, or QlikView experts, that don't know SQL, and that blows my mind. I think it depends where you come from, some people come from the knowledge worker's side, like they're a business person and they just learnt Tableau and now they're a Tableau expert, but they aren't really tech, they're not a technologist, which is a term from the 90s that we used to use. We used to think of ourselves not as a developer, an IT guy, an administrator or whatever, we were technologists, we were Jack of all trades, so to speak. So, if you're an analyst or scientist SQL is a must, I don't care what your background is, if you now are in that role, learn SQL, it's super-easy, it's not crazy-hard to learn, so don't be intimidated by it. R is another one that I would say is required, or becoming required, if you're a data scientist absolutely, and I guess there's some difference of opinions between R and SPSS or some of the other ones, whatever, but a stats package is the one there, I recommend R. Then I'm gonna put a dotted one around Python, because I think this is really powerful, again part of an analyst's and scientist's job is to claw and scratch the data together and Python can allow you to do that in unique ways that none of these other tools can, so I recommend learning at least the basics of that. And then Tableau is a must as well, so there's a lot in the data realm that analysts and scientists, there's a lot on your plate, you are really the workhorse of this whole process, and so it really revolves around you, so there really is nothing that you shouldn't become good with or at least proficient with to some extent. Not to say that you won't have the things that you lean towards based on your experience or whatever you like. So the engineer then, Excel obviously, SQL yep, and this is where the difference is, the engineer is really heavy in Python. So the data engineer uses Python to move data from place to place, to manipulate data. A common framework is we use Python to take data from wherever it comes from, from an FTP site, from an API, from a database, wherever it lives outside of our data warehouse or analytics, you know warehouse, and we pull that in using Python, and then we use things like SQL to actually manipulate it through the process inside of our environment. And of course there are lots of other tools there, and other ways to do that, I really hate getting stuck in the data engineering toolset, because they all have their limitations and when you get stuck and you can't do something, it's just incredibly infuriating, so, the thing is, and rant about that for a second, is if you're not going to write the code, you can get done all the things you could get done with say, an Informatica or tool like that in probably about the same amount of time, and it's really just about as hard to maintain and everything, I mean it's a watch. But you have the ultimate flexibility, some people gravitate towards tools because they're afraid to write code, or their afraid of command line, don't be, it's not, if you're a technologist, especially if you're a data engineer, then the command line is your friend and writing code should be your friend too. I put Tableau on here as well with a dotted line around it, and I think this one is, this one is good because engineers need to show their work, and just like everyone else. Now it's not required, you don't have to, but I actually know a lot of data engineers that have benefited from knowing how to do stuff in Tableau, or at least the basics. They can spit out some results, they can test something, performance on the server, whatever, and you use Tableau to visualize that data, it's pretty straightforward. OK, so hopefully that is a good picture for you, of, depending on what role you're in, or wanna be in, and what types of skills you should learn. I'm not gonna go into which one you should learn first, I would probably just recommend going after the one, I'd try them all out, go after the one that you find the most, have the most interest in, and try to go down that rabbit hole, and have some success there, sort of play to your strengths at first, without worrying about your weaknesses, to get going. Alright. Where to Find More. So here I'm gonna talk a couple of Pluralsight courses, obviously I'm an author on Pluralsight, and I have a lot of stuff there, these first two are courses of mine that I recommend if you're going down this path. Data Analytics Hands On takes you from soup to nuts, it covers all of these and many other topics in the one-inch deep level, and then points you where you can go deeper, if you want to get into that, so data modeling, star schemas, ETL, all those kind of things. And then Tableau Fundamentals of course is just how to get up and running with Tableau, and by the way, we're doing some new Tableau stuff, there's a new partnership, I'll show you some cool stuff. Tableau just announced they have learning partnerships with us, Pluralsight, Lynda and a couple of others, so lots more Tableau courses coming out if you're interested in that on Pluralsight. There's also an Introduction to SQL, so this is a great way to get going with SQL, and there's a Beginning Data Visualization with R, so you've got all these things covered on the Pluralsight courses. I forgot to add the Python one, so yeah, there's tons of Python courses on Pluralsight as well. So on Code School, we've got two, so Pluralsight costs money, you can do a free trial for 14 days, or email me and I can hook you up with a longer-term trial, and on Code School though this is all free. There is a paid membership as well, but these two courses totally free, you probably have to create an account, I think, but whatever, you don't have to pay for anything. And Try SQL and Try R, now the cool thing about Code School, the difference there, is that these are for the absolute beginners, so if you're brand new to SQL or brand new to R, it is probably the best way to learn. You have a person talking, explaining the concepts very clearly, you have great graphics and then you have the coding in the browser, so it's a very interactive way, well it's like a person talking, diagram explaining something, now it's your turn, it's like a little coding challenge in your browser, and these guys, the production quality of their content is just beyond anyone else's, it really is the greatest stuff. Now, contrasting the Pluralsight, Pluralsight is more I'd say advanced, so more professional, so if you're already know how to install tools and connect to a database and stuff like that, that's a great place to get going much deeper in the space. Code School's much higher-level, and much more beginner content, and a great way to dive in to a new technology. They have a ton more stuff, but relevant to our talk here there's these couple of courses. And then I'm gonna to just mention one other one as well, it's not just about promoting stuff that I get paid for, it's about promoting, sharing my knowledge with you and helping you guys learn. datacamp.com is really cool and has whole different tracks, kind of like Python in R tracks for becoming a data scientist. OK, so, that's all for the slides, let me jump over now and I will show you, alright we'll check if there's any questions, and if not, we'll call it a day. Alright, looks like we don't have any questions now, so if you do have anything and you wanna follow-up, or questions about this talk or about this podcast, this podcast, this video blog, email me at [email protected], and I'll see you guys next week. Ciao.

1 Comment

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.