View profile

Hello Kaggle!

Hello Kaggle!
By Santiago • Issue #25 • View online
For the first time, I started working on a Kaggle competition.
Holly molly! It’s very different from anything I’ve done before!
And it’s addictive if you ask me. I’ve managed to accumulate a few discussion medals in a week, and my solution is currently within the gold bracket.
The competition ends in less than a week. I’ll report back with the final results.
If you haven’t tried Kaggle, I strongly suggest you take a look at it. It feels like I’ve learned more in a week than in the past 3 months combined.
I call that a win.

This is how I imagine people competing in Kaggle.
This is how I imagine people competing in Kaggle.
Stratified sampling
Sometimes you can’t just split your dataset randomly.
Well, I’d take that back: you could, but the results won’t be good.
Stratified sampling to the rescue: whenever you have a class severely represented, you should look here.
Here is an explanation, including some code.
Santiago
The first step of building a machine learning model: Splitting your data.

Most of the time, a random split is enough. Sometimes, it is just plain wrong.

Thread: On random splits that don't work and how to fix it.
What are you doing with the validation data?
This one surprised many people.
After you are ready to deploy a model, there’s something better you can do with your validation data other than letting it accumulate dust.
Here is a walkthrough.
Santiago
Here is something you should do right before deploying your machine learning model:

1. Join your train and validation sets
2. Train the model on all of this data
3. Deploy this new version

Most people leave the validation data behind.

You shouldn't.
Did you enjoy this issue?
Santiago

Every week, I’ll teach you something new about machine learning.

Underfitted is for people looking for a bit less theory and a bit more practicality. There's enough mathematical complexity out there already, and you won't find any here.

Come for a journey as I navigate a space that's becoming more popular than Apollo 13, Area 51, and the lousy sequel of Star Wars combined.

If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue