Is it possible for a machine to group together similar data on its own? Absolutely—this is what clustering algorithms are all about. These algorithms fall under a branch of machine learning called unsupervised learning. In this branch, we give a machine an unlabeled training set containing data regarding the features but not the classes. Algorithms are left to their own devices to discover the underlying structure concealed within the data. This is in stark contrast to supervised learning, where the correct answers are available and utilized to train a predictive model.
In this post, we'll not only learn about an algorithm called $k$-means clustering, but construct one from scratch. Additionally, we'll apply this algorithm to automate an aspect of a widely used life sciences technique called flow cytometry.