View profile

Hello Kaggle!

Hello Kaggle!
By Santiago • Issue #25 • View online
For the first time, I started working on a Kaggle competition.
Holly molly! It’s very different from anything I’ve done before!
And it’s addictive if you ask me. I’ve managed to accumulate a few discussion medals in a week, and my solution is currently within the gold bracket.
The competition ends in less than a week. I’ll report back with the final results.
If you haven’t tried Kaggle, I strongly suggest you take a look at it. It feels like I’ve learned more in a week than in the past 3 months combined.
I call that a win.

This is how I imagine people competing in Kaggle.
This is how I imagine people competing in Kaggle.
Stratified sampling
Sometimes you can’t just split your dataset randomly.
Well, I’d take that back: you could, but the results won’t be good.
Stratified sampling to the rescue: whenever you have a class severely represented, you should look here.
Here is an explanation, including some code.
The first step of building a machine learning model: Splitting your data.

Most of the time, a random split is enough. Sometimes, it is just plain wrong.

Thread: On random splits that don't work and how to fix it.
What are you doing with the validation data?
This one surprised many people.
After you are ready to deploy a model, there’s something better you can do with your validation data other than letting it accumulate dust.
Here is a walkthrough.
Here is something you should do right before deploying your machine learning model:

1. Join your train and validation sets
2. Train the model on all of this data
3. Deploy this new version

Most people leave the validation data behind.

You shouldn't.
Did you enjoy this issue?

I'll send you an email whenever I publish something important. I promise you don't want to miss this.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue