Image

Adventures in Data Science

Learn to analyze real-world datasets with the command line. Become an expert at using Bash, one of the tools of the trade of the data scientist.

Get a free chapter   Buy the book!

What will I learn?

Through self-contained, step-by-step “adventures”, you will:

– Become proficient in Bash, one of the tools of the trade in data science
– Learn to download and explore publicly-available datasets
– Learn to extract relevant information from data files
– Learn to transform and combine data files to answer a question

 

Sample adventures

Using publicly-available datasets and Bash, you will learn to answer questions such as:

– What is the average tip of a NYC cab driver?
– How often are flights late to their destination?
– Do Chipotle customers prefer chicken or steak burritos?

 

Why data science?

In 2008, Nate Silver wowed the public by correctly predicting the outcome of the U.S. elections in 49 out of 50 states. As it turns out, you don’t have to be a statistician to perform such analyses.

Increasingly, data science is occupying a greater part of our lives and our work. Whether you are a developer, journalist, biologist, or financial analyst, the ability to analyze data to quickly answer a question is a powerful skill to have, and it’s what this book will help you develop.

 

Is this book for me?

This book requires no coding experience and is perfect for:

Developers who want to add Bash and other command line tools to their bag of tricks
Students who want to learn Bash and the command line to improve their career prospects
Journalists who want to polish their reporting by analyzing publicly-available datasets
Scientists who want to learn to explore and analyze the data that their lab generates

 

About the author

me

Robert Aboukhalil holds a B. Eng. in Computer Engineering from McGill University, and is currently pursuing a PhD in Computational Biology at Cold Spring Harbor Laboratory. Every day, he uses data science tools—including Bash—to process and analyze large biological datasets.