Iterables, iterators and generators, oh my! Part 1

Iterators and generators are among my favorite programming tools—they're also some of the most powerful. These constructs enable us to write cleaner, more flexible and higher performance code; undoubtedly an invaluable addition to any programmer's toolbox. In addition, iterators and generators are an elegant means to work with large and potentially infinite data structures, coming in handy for data science. However, they can be some of the more perplexing concepts to grasp at first.

In this article, I'd like to deliver a gentle but in-depth introduction to iterators and generators in Python, although they're prevalent in other languages too. Nevertheless, in order to appreciate generators, we need to first have a good handle on iterators. And to understand iterators, we need to start with iterables.

Read more…

Exploring the Pokemon dataset with pandas and seaborn

The Pokemon dataset is a listing of all Pokemon species as of mid-2016, containing data about their type and statistics. Considering how diverse Pokemon are, I was interested in analyzing this datset to learn how the game is balanced and to potentially identify the best Pokemon, if there exists one. Plus, it's a good excuse for me to practice exploratory data analysis with Python's open-source libraries: Pandas for data analysis and Seaborn for visualizations.

Read more…

What exactly is data science?

I figured I'd focus my first post on a broad topic and what better way than to discuss what this blog will revolve around: data science! Actually, when I talk to most people about data science, I usually get blank stares. This is understandable because data science is an emerging field—practically everyone has their own definition, so I'd like to begin by sharing mine.

Read more…