Hadoop for Data Science Tips, Tricks, & Techniques

Welcome from Hadoop for Data Science Tips, Tricks, & Techniques by Ben Sullins

Hadoop—the hugely popular big data platform—offers a vast array of capabilities designed to help data scientists deliver their insights. In this course, Ben Sullins helps you get up to speed with Hadoop by sharing a series of tips and tricks for doing data science work in this powerful platform. He starts by looking at how to work with Hadoop data in HDFS, and then explores using Hive—the Hadoop SQL engine—where a lot of data science work happens. To wrap up the course, Ben covers techniques for running fast queries in the Hive engine.

Topics Include:

  • Working with files
  • Organizing files in HDFS
  • Connecting to Hadoop
  • Exploring Hive through Beeline
  • Accessing Hive from Python
  • Creating aggregates in Hive
  • Selecting partitions in Hive
  • Complex data structures in Hive
  • Mapping data in Hive
  • Creating flat tables for Impala
  • Deconstructing Impala queries

Duration: 1h 12m

…is now available on LinkedIn Learning:

and on Lynda.com: