Many machine learning models demand that categorical features are converted to a format they can comprehend via a widely used feature engineering technique called one-hot encoding. Machines aren't that smart.
A common convention after one-hot encoding is to remove one of the one-hot encoded columns from each categorical feature. For example, the feature
sex containing values of
female are transformed into the columns
sex_female, each containing binary values. Because using either of these columns provides sufficient information to determine a person's sex, we can drop one of them.
In this post, we dive deep into the circumstances where this convention is relevant, necessary, or even prudent.