Data Science for People in a Hurry


The thing is, every company already has a data science department, even if they don’t know it. In order to understand that though lets’ start with a definition of “Data Science” since it is easily one of the most confused and overused terms in tech.

Originally Data Scientist was basically a statistician that could code. Now, however, a Data Science professional can be anything from an engineer to an analyst to an actual Data Scientist. The thing is, the entire industry now can be classified as Data Science. Which is why I define it as follows:

Data Science is the process of turning raw data into useful information

That’s the basic idea…let’s see how it plays out in an example:

Let’s say we have a company that sells T-Shirts online. In order to sell things we need the following technologies:

1. Website
2. Payment Processor
3. Fulfillment Center
4. Customer Service Center
5. Phone System

Now each one of these systems is far more complicated than what I put here and have their own internal systems. For example, our payment processor also needs to handle refunds and our fulfillment center handle returns. These are considered sub-systems of the overall system that runs our business.

Each of these systems creates data when they do their jobs. The website has visitors that visit pages and then order, which then get shipped by the fulfillment center and customer service emails the customer each step of the process. If the customer has questions they call in and get a live person on the phone to help them.

A way to think about the data generated by these systems is with simple noun > verb combos

* A Customer Orders a Shirt

In this case, the Customer and Shirt are our nouns, and the ordering is our transaction. The transaction is logged into the sales system along with the customer information and shirt they ordered.

So each of our 5 systems generates tons of individual records or rows of data to do their job. Alone, each row of data tells a tiny bit of an overall story that we as business owners need to make decisions about our future plans.

Let’s look at a basic question:

How were sales last week?

To answer this seemingly simple question we could just pull total sales for the previous week fro our payment processor. Done. But we can do better by presenting that stat in context.

So in addition to pulling total sales for last week, we look at the previous week and show a percent change to show if it was better or worse. This adds what we call qualitative value. I hate big words so we’ll just call it context. Without context numbers themselves, no matter how hard they were to calculate, are rather meaningless.

Don’t believe me? Try this thought experiment:

You work for a bank and a customer asks you their current account balance, you reply with the number $5,000

What happens next?

Is the customer satisfied with this? Is this normal for them or high or low? Are they going to scream and start throwing things are be elated that they’ve saved so much?

You don’t know without context. If this were your average middle-class person in the US they might think that’s fine if it’s their checking account. If this were Bill Gates you would have just given him an incorrect answer and you’d likely be fired.

My point is that without context numbers carry little value, so as a Data Science pro — your job is to not only answer questions with accurate numbers but important context as well.

Great, so a Good Data Science professional answers questions with both numbers and context, is that it?

No, that’s not it.

What I just described is what I consider to the foundation of a good Data Science group. Answering questions with good solid numbers and relevant context is base to then explore new ideas.

After you have solid footing of how your business is doing and you can answer questions that span multiple systems and processes, you’re allowed to come up with new ideas to experiment with.

This is where what we traditionally would call a “Data Scientist” comes in. The “Science” part of the name here refers to experimentation. If you’ve read the Lean Startup by Eric Reis you’ll be familiar with this concept. If you haven’t read his book you definitely should btw (view here)

Here’s the idea: You don’t know what will work until you try it.

Back in the 50’s W. Edwards Deming came up with this strategy he called a PDSA which meant:

Plan — come up with an idea you want to test
Do — test it in a scientific way (eg. randomized split tests)
Study — look at the results of your test
Act — decide to expand the test or refine the idea

He also was kind of funny 🙂

As a business owner, you want to do grow your business or improve efficiency in a process, and a Data Scientist is the one that is going to help you design the experiment, measure the results, and recommend what to do next.

This is how modern businesses succeed.

Without this process, you’re basically guessing (which is totally fine in the beginning, but won’t last forever) — as the saying goes even a broken clock is right twice a day (unless it’s a military clock in which case go fix the damn clock)

All of this is powered by another aspect of Data Science we haven’t touched on, Data Engineering.

Did you wonder this whole time how we can ask questions of our data a cross-systems and processes? Or how the Data Scientists can run experiments and measure the results?

Well, it’s because the Data Engineering group is making sure all of the data is available to ask questions of and we have good tracking of experiments on our systems.

Without data engineering, you’ll have your strike team of Data Scientists spending the majority of their time setting up systems instead of designing experiments. Not good.

So before you even consider running an experiment and testing your next great idea, you need to have a solid Data Engineering pipeline and foundation for which to use.

If you haven’t figured it out by now here it is, the three pieces of the puzzle that all need to work together for your Data Science org to be effective:

Here’s a colorful picture of those 3 areas

Data Engineering — creating the platform and environment for all analytics and experimentation

Data Analytics — giving you the foundational knowledge of how your business is doing and to guide future ideas

Data Science — helping you test ideas in a manner that relies on science instead of “hope” to drive success

And remember, when you Free the Data, Your Mind Will Follow

Talk soon!