I was taught that overfitting was a bad thing.
If you are still figuring out the vocabulary, a model that’s overfitting is spitting out predictions that it memorized. This means it didn’t learn properly.
However, there’s something good about this: if our model can memorize the data, we know that it has enough capacity, and we don’t have any weird issues with the learning process.
And that’s a place where we want to be!
So before we go crazy and throw the kitchen sink at our problem, we will exploit overfitting in our favor.
First, overfit one batch.
Right off the bat, as soon as you have your basic model structure in place, try and get a few examples to overfit.
Remember, at this point, we are trying to ensure we can actually reduce the training loss as much as possible and the model has enough capacity to learn stuff.
A little bit of data. Quick test. Did you overfit? Move on. You couldn’t? There’s something wrong with the training process, and there’s nowhere to go until that’s fixed.
Then, overfit the whole set before regularizing it.
As soon as everything is working properly, it’s time to find a model that overfits the entire training set. Yeah, you might need to add a little bit of capacity to it, but make sure you can take that training loss as low as possible.
There are seven hundred and three million ways to regularize our models, so that should be a straightforward step as soon as we prove that we are working on a solid foundation.
First, let’s prove we can get the job done. Then let’s focus on doing it the right way.
Too long; didn’t read.
When building a model, try to overfit it first. This will rule out any issues with the training process.
Right after that, you can regularize and get things to where they should be.