Get notified

By Santiago

I'll send you an email whenever I publish something important. I promise you don't want to miss this.

I'll send you an email whenever I publish something important. I promise you don't want to miss this.

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Get notified will receive your email address.






Expanding your horizon

Probably the most used term in practical machine learning circles is "pipeline." Unfortunately, we use it to refer to different things at different times.Here is an overview of some of the different machine learning pipelines that you will find out there.


Start learning today

I started looking seriously into machine learning around the spring of 2015.Look around. Since that day, machine learning has turned the industry upside down.Here are some thoughts, some numbers, and some of the reasons you want to start with machine learning…


Training with all your data

Every day, I get many questions, most of them centered around building a machine learning career. I thought it was a good idea to collect some of the answers in a single place.


Don't look

Why would you?Test data should mimic production data. You don't have access to the latter, so why would you treat the former differently?


Do you need feature engineering?

Neural networks are powerful. Deep networks even more.Wouldn't these networks solve the problem of feature engineering? Aren't they capable of doing this for us?


The first neural network

1958.This is a story about the first neural network and the father of modern Deep Learning.It's when all started and suddenly stopped.And it's short.


Zero probability

Next Friday, I'll host a Twitter Space with 50 attendees.Tickets are on sale now. $2.99.You'll get to listen to me ramble about how you can build a career in Machine Learning. You will be able to ask questions.If you want to participate, click on the tweet be…


Probabilities and the Frenchman gambler

The best way to fully understand a topic is to start from the very basics.I decided to write a summary of the fundamental building blocks of Probabilities Twitter-style.This is a great way to start.


Kaggle is now over!

Do you know how and when to use Cross-validation?Here is a quick introduction to it.2 minutes of your time, and I hope this is clearer than it has ever been.


Hello Kaggle!

Sometimes you can't just split your dataset randomly.Well, I'd take that back: you could, but the results won't be good.Stratified sampling to the rescue: whenever you have a class severely represented, you should look here.Here is an explanation, including s…


More splits

Last week, I talked about data leaks.Some replied with many different questions. Here is a follow-up thread that expands on the one I started before. You can start here if you want to understand how easy it is to leak data and what to do about it.


Leaking data

Honestly, I wasn't surprised by the responses on this thread.I've seen many people make this same mistake without realizing what's going on.Are you doing it correctly?


Machine Learning in Production

This is a cool example.It tries to solve the MNIST digit-classification problem using Contrastive Learning. A different and interesting way of solving a very popular problem.You'll find the code below. I used Deepnote. I've been very impressed with it, and I'…


Don't fly solo

On every new project, I get this question.Estimating how much data is necessary is not simple, especially when working on a new domain that you haven't experienced before.Here is an explanation of why this is hard and an alternative option for you.


I'm back

How do you compute the accuracy of a regression model?I remember asking this question myself when I started. I've also heard of interviewers trying to trick people with it.Here is an explanation of how to think about this.


Choosing is always hard

In a Kaggle competition, all we usually care about is how good are the results of our models.This is not enough in real-life situations.I put together a list with 7 different things to keep in mind when choosing the appropriate machine learning model for our …


Imbalance datasets

(I love the Dog versus Cat problem. I keep bringing it up, and I'm not sure why.)Imagine you are trying to build a classification model, and you have two classes: Cats and Dogs.Unfortunately, there are 950 cat pictures and 50 dog pictures. This is a problem, …


Going to New York!

I'm obsessed with helping people get started.If you are a software developer, this will give you a step-by-step guide on how to take your first few steps.


It's my birthday!

I've been talking about Machine Learning for quite a while, but I don't think I've ever tried to explain what it is to somebody that's just getting started.This is a quick introduction. If you are a developer and aren't sure what the fuzz is about, this threa…



Ensembles are one of the most powerful ideas that you can use in machine learning. This thread will walk you through how and why ensembles work and finish with a real-life example and a few tips you want to keep in mind.