View profile

Movies don't always lie

Movies don't always lie
By Santiago • Issue #4 • View online
It’s finally April!
Spring is always a great time to be generous with your friends and co-workers and share this newsletter with them. It’s not going to be awkward, I promise. Just tell them that this is the best thing since sliced bread 🍞, and they will thank you for it.
And of course, I’m really grateful that, for the first time, this newsletter is going out to more than 2,000 people! Thank you for the support, and let’s do this!

Movies don't always lie
Movies make up a lot of shit all the time, so I hesitated to use them as a good example here, but I couldn’t find a better introduction, so let’s go ahead with this one.
Try to remember one of those scenes where Mr. Detective searches on a computer a partial photo of a nobody from a CCTV camera trying to find its identity. The computer goes, picture by picture, through its database until it finds a match.
Is this even possible? How can they compare pictures like that? 🤯
There's no way you can get a good picture from this thing.
There's no way you can get a good picture from this thing.
Let’s try to come up with a way to build this thing.
Alright, we know we can’t simply compare pixels from two different images to determine whether they show the same person. If you aren’t convinced about this, take some time and think it through. Pretty complex stuff, right?
Since pixels don’t work, we could convert images into a different format to easily compare. If we do that, we could solve this problem in two steps:
  1. Turn the suspect’s photo into a new representation that follows the new format that we came up with.
  2. Compare that with a database of mugshots stored in the same format.
Simple, right? 😎
Making up a good representation.
Let’s focus for a minute on a format that allows us to compare two images. How can we do this?
Here is a simple way to think about it: we could create a list of every person’s features. For example, we could have features like these:
  • Eye color
  • Hair color
  • Hair length
  • Beard color
  • Nose size
We can then assign a numeric value to each one of these features. For example, if the person in the picture has black eyes, we will assign the value 0 to the first feature, while blue eyes will correspond to the value 1, and so on.
If we take every photo in our database and do this, finding any person becomes a matter of comparing lists of features and returning the most similar one. That’s a problem we can easily solve. As long as our list of features is long and descriptive enough, we should have a shot at identifying pictures that belong to similar individuals!
There are only two small problems left.
Assuming we can turn pixels into a list of features, we can open the champagne 🍾 and move to a more interesting problem, but we still have to answer a couple of questions:
  1. What is a good list of features?
  2. How in the world can we do the conversion?
Remember that our features need to be descriptive enough that we can quickly identify a match. Simultaneously, the more features we add, the harder it will be to do the conversion, store the data, and compare the lists.
Which brings me to the second question: how can we do the conversion in the first place? Even if we put the entire police department 👮‍♀️ to work day and night, it will take many, many movies-worth to convert their database of pictures into the list of features.
Machine learning to the rescue, of course.
You probably saw this coming, didn’t you? We can train a neural network, so it learns to turn images into a list of features—called a “feature vector.” Of course, we can’t predict which features it will learn, and we may not even be able to interpret those features, but they will do the job for us.
Remember our discussion about autoencoders? Some of the same intuition applies here (although the solution for this problem is not an autoencoder.) Using enough pictures of faces, we can have a neural network learn what’s similar and different about them and produce the features we need.
The rest is simple: using that network, we can pre-compute the feature vector for every mugshot in the database. When we are ready to look up a photo, we get its feature vector and compare it with every stored vector, returning the most similar at the end. Mr. Detective will then take a look, say his lines, and start making phone calls.
Easy, right?
Please, don’t answer that.
Making money writing articles
A long time ago, I had a blog. I kept it up for more than 10 years! Then I stopped writing.
I decided to start again, and this time I’m using Medium. So far, I’ve made $5.70 in two months! Yeah, not real money yet, but hey! You’ve gotta start somewhere!
If you are into Medium, go and give me a follow. And if you aren’t but still want some help starting with machine learning, here is a sweat 30% off to get you off the ground.
This was a long one. I’ll see you next week!
Did you enjoy this issue?
Santiago

Every week, one story that tries really hard not to be boring and teach you something new about machine learning.

Underfitted is for people looking for a bit less theory and a bit more practicality. There's enough mathematical complexity out there already, and you won't find any here.

Come for a journey as I navigate a space that's becoming more popular than Apollo 13, Area 51, and the lousy sequel of Star Wars combined.

If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue