View profile

Leaking data

Leaking data
By Santiago • Issue #23 • View online
Earlier this week, I started writing about the small things we do without thinking much about them.
We have heard so much about “the way to do things” that sometimes we forget why we are doing them in the first place.
This led me to write about leaking data. Reading the replies, I wrote a couple more threads to elaborate on how we do and use the different splits of our dataset. (I posted one of these threads earlier today, and I’ll post the other one next Tuesday.)
I want to leave you with a thought:
There’s usually a lot of thought behind what we consider “best practices.” But circumstances change, and what worked yesterday might not work today. Understanding why you are doing something is crucial to know when it doesn’t make sense anymore.
Stay awesome!

The best picture I could find about a leak. (It sucks, I know.)
The best picture I could find about a leak. (It sucks, I know.)
You might be leaking data...
Honestly, I wasn’t surprised by the responses on this thread.
I’ve seen many people make this same mistake without realizing what’s going on.
Are you doing it correctly?
Santiago
Can you identify the problem with this 3-step approach?

1. Prepare a dataset
2. Split it (train, validation, and test sets)
3. Build a model

The issue is subtle, and unfortunately, many people build machine learning models this way.

A thread to talk about this: ↓
When your validation looks weird
When training a model, it’s common to see your training loss decrease further than your validation loss.
This makes sense: the model uses the training data to learn, so it tends to overfit it.
But sometimes, your model will do better on your validation set. This is unexpected but not necessarily a problem.
Here I cover why this might be happening and how to fix it when it’s a problem.
Santiago
I built a machine learning model, and my validation loss is lower than my training loss.

People asked me why. We're used to seeing the opposite, so this is definitely suspicious.

Is this really a problem? ↓ https://t.co/qQTqRfnyJN
How much math do you need?
I’ve talked extensively about this before.
But this time, we can listen to Andrew Ng‘s opinion. He dedicated the editorial of this week’s The Batch to this topic.
Here is a summary of the ideas Andrew presented in his post.
Santiago
How much math do you need to know to be a machine learning engineer?

@AndrewYNg tackled this question in the latest issue of "The Batch," @DeepLearningAI_'s newsletter.

Let's talk about how he answers this question.
Did you enjoy this issue?
Santiago

I'll send you an email whenever I publish something important. I promise you don't want to miss this.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue