
Overfitting
After collecting about a few years of data, you confidently boast that you have developed a robust predictive model with 96% accuracy to your farmer friend. Your friend says, Well, great news, can I have it? You, being an altruist and philanthropist, immediately agree and send him the code. A day later the same friend calls back from his home in Guangdong province in China, angry that your model did not work and has ruined his crop harvest. What happened here? This was simply a case of overfitting our model to the tropical climate of Hawaii, which does not generalize well outside of this sample. Our model did not see enough variations that actually exist in the possible values of pressure and temperatures, with the corresponding labels of sunny and rainy, to sufficiently be able to predict the weather on another continent. In fact, since our model only saw Hawaiian temperatures and air pressures, it memorized trivial patterns in the data (for example, there are never two rainy days in a row) and uses these patterns as rules for making a prediction, instead of picking up on more informative trends. One simple remedy here is, of course, to gather more weather data in China, and fine-tune your prediction model to the local weather dynamics. In other similar situations involving overfitting, you may attempt to select a simpler model, denoise the data by removing outliers and errors, and center it with respect to mean values.