Background
As the saying goes, “if sales are the air a company breathes, then data is the blood running through its veins.” At least that’s how I remember the saying. That’s because, for decades in business or organizations, data has been the underlying force that makes all of it possible. It’s basic because without data a company will not operate, it cannot function. Imagine if you were trying to sell a product but you didn’t log who you sold it to, how much you sold it for, or any of those things. It wouldn’t be possible. It would cripple organizations, governments, and other institutions with similar operations if they didn’t use data to facilitate that. Even before computers, paper records were used to place orders and track shipments. They even formed credit agencies to log a customer’s likelihood to pay back the debt by hand in logbooks. Barkeeps kept tabs on their regulars just to make sure they knew how much their customers actually consumed.
Data is produced when a transaction occurs, when a person is born, when somebody becomes a citizen of a new country, when you buy something, when you sell something. Another unfortunate side effect of data being generated are organizations like the Internal Revenue Service keeping track of how much money you make and how much taxes you owe. In fact, some of the first computers were designed specifically to help the U.S. government count things, you. me and everyone else.
In the Beginning
This brings us to the story of Herman Hollerith. In 1896, he incorporated as the Tabulating Machine Company. Hollerith’s series of patents on tabulating machine technology beginning in 1884 was used on his work at the U.S. Census Bureau from 1879 to 1882. Hollerith was trying to reduce the time and complex sorting processes needed to tabulate the 1890 census. His development of punched cards in 1886 set the industry standard for the next 80 years of tabulating and computing data input. In 1896, the Tabulating Machine Company leased some of its tabulating machines and sold its cards to businesses to a railway company, but they focused on the challenges of the largest statistical endeavor of its day, the 1900 U.S. census.
After winning the government contract and completing the project, Hollerith faced challenging times and couldn’t sustain the company during the non-census years which only occurred every 10 years. He returned to targeting private businesses both in the United States and abroad, trying to identify industry applications for his semi-automatic, punching, tabulating, and sorting machines. He later sold this business to Charles Flint, who partnered with Thomas J. Watson to found International Business Machines better known as IBM. Hollerith’s original designs for using punch cards to count things automatically led to many breakthroughs in computing. It was the dawn of what we now think of as computing. Since then, computers have had a symbiotic relationship. In fact, you can say computers were designed to serve the data processing needs of the world, and have served in this function ever since. So phones, laptops, the internet, computers as you know it serves the function of helping us process data and transactions.
So computers were created, at least in part, to handle data processing needs. Seems fine, but how did the science of working with data conquer the tech world?
Remember this?
How Data Won
One of the key problems in the early days of the internet was finding a website. It wasn’t like it is today where you can just talk to your phone or go to Google or one of these amazing search engines. There really was no way to know about it unless you knew the exact URL. The www.whatever.com to find it. This led to search engine development; which had just as large of an impact on our society as the internet itself. These search engines had a fundamental problem, storing data. To help you find a website, it had to index and log all the websites on the internet. There were ways of doing that, but in doing so it was storing and making copies of the entire internet every single day and this became a problem. In 2003, things changed when Google released a paper on the Google File System. The idea was you could store chunks of data in multiple locations and then you could retrieve those bits by looking at a directory of where all the stored data.
This was the dawn of Big Data
Fast-forward to today and sites like Facebook are processing 8,000,000,000 video views per day with over 2.23 billion users every month! This is all predicated on the ability to handle large volumes of data with millions of parallel queries running.
To put it simply, the science and art of working with data has conquered teh tech world, and it's not slowing down anytime soon.
Interested in starting your career in data? Signup to be notified when registration opens again to Free the Data Academy here