View profile

More splits

More splits
By Santiago • Issue #24 • View online
Today it’s all about splitting data. Again.
Such a simple step, yet misunderstood by many.
Every problem starts here—at least, right after collecting your data. The more I write about this; the more people reach out with questions about the process.
It’s hard to get anywhere if you don’t get the basics right.
Let’s talk about it.

Splitting in half is usually not a good idea. Or maybe it is. It depends...
Splitting in half is usually not a good idea. Or maybe it is. It depends...
More about leaking data
Last week, I talked about data leaks.
Some replied with many different questions.
Here is a follow-up thread that expands on the one I started before. You can start here if you want to understand how easy it is to leak data and what to do about it.
Always split your dataset before transforming the data.

I posted a thread earlier this week. A few people replied with a valid concern:

"How do you know the true range of a column without looking at all of your data?"

Good question. Let's talk about this: ↓
The most important thing about splitting
Do you know why you should split your data?
It turns out this is not an obvious question, even for people working with machine learning models for a long time.
Let’s try to fix that with this thread.
Surprisingly, many people don't understand why they split the data into different sets to build a machine learning model.

They know what to do but don't know why, when, or how.

Thread: On the most important thing you should know about splitting your data.
Did you enjoy this issue?

Every week, I’ll teach you something new about machine learning.

Underfitted is for people looking for a bit less theory and a bit more practicality. There's enough mathematical complexity out there already, and you won't find any here.

Come for a journey as I navigate a space that's becoming more popular than Apollo 13, Area 51, and the lousy sequel of Star Wars combined.

If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue