View profile

Overfitting doesn't bite

Overfitting doesn't bite
By Santiago • Issue #3 • View online
A good process to build a model is as important as all that math you learned over the winter.
Understanding where to look first and what to do next is fundamental to keep the aspirin 💊 bottle as far away as possible.
Let’s take a minute and revisit one of the steps I always try to follow.

Overfitting doesn't bite
Of course, there is more than one way to build a good model. I’ve seen, however, how easy it is to make time disappear when we spend too much of it looking into the wrong rabbit holes.
Over the years, I’ve built my own rudimentary set of steps that I always follow when starting a new project. Some of these have been recommendations from people that came before, some I’ve found after banging my head more than once.
Today, let’s focus on one of them: I want you to stop being scared about overfitting, embrace it, and—what’s even better—actively start looking for it.
Somebody one day compared overfitting with a shark waiting in the shadows... 😦
Somebody one day compared overfitting with a shark waiting in the shadows... 😦
I was taught that overfitting was a bad thing.
If you are still figuring out the vocabulary, a model that’s overfitting is spitting out predictions that it memorized. This means it didn’t learn properly.
However, there’s something good about this: if our model can memorize the data, we know that it has enough capacity, and we don’t have any weird issues with the learning process.
And that’s a place where we want to be!
So before we go crazy and throw the kitchen sink at our problem, we will exploit overfitting in our favor.
First, overfit one batch.
Right off the bat, as soon as you have your basic model structure in place, try and get a few examples to overfit.
Remember, at this point, we are trying to ensure we can actually reduce the training loss as much as possible and the model has enough capacity to learn stuff.
A little bit of data. Quick test. Did you overfit? Move on. You couldn’t? There’s something wrong with the training process, and there’s nowhere to go until that’s fixed.
Then, overfit the whole set before regularizing it.
As soon as everything is working properly, it’s time to find a model that overfits the entire training set. Yeah, you might need to add a little bit of capacity to it, but make sure you can take that training loss as low as possible.
There are seven hundred and three million ways to regularize our models, so that should be a straightforward step as soon as we prove that we are working on a solid foundation.
First, let’s prove we can get the job done. Then let’s focus on doing it the right way.
Too long; didn’t read.
When building a model, try to overfit it first. This will rule out any issues with the training process.
Right after that, you can regularize and get things to where they should be.
Communication here is two ways
I don’t bite either.
If you have something to say, reply to this email and let me know what you are thinking. I really appreciate the feedback.
And if you have any ideas for things you’d want me to write on a future issue, send them my way.
Thanks for the support, and see you next week!
Did you enjoy this issue?

Every week, I’ll teach you something new about machine learning.

Underfitted is for people looking for a bit less theory and a bit more practicality. There's enough mathematical complexity out there already, and you won't find any here.

Come for a journey as I navigate a space that's becoming more popular than Apollo 13, Area 51, and the lousy sequel of Star Wars combined.

If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue