It's been awhile since my last blog post but we've been busy with a big move from Houston to Brooklyn. The opportunities in New York City for data science and AI seem endless! I've also been spending some time putting to practice my newly acquired knowledge of machine learning by browsing through open datasets.
One dataset that piqued my interest is the mushroom dataset from the UCI Machine Learning Repository describing different species from the genera Agaricus and Lepiota. The data are taken from The Audubon Society Field Guide to North American Mushrooms, which states "there is no simple rule for determining the edibility of a mushroom". Challenged by this bold claim, I wanted to explore if a machine could succeed here. In addition to answering this question, this post explores some common issues in machine learning and how to use Python's go-to machine learning library, Scikit-learn, to address them.