One of the most common problems we face: not enough data to build a good model.
The obvious solution is to generate synthetic (fake) data. This is good, but it comes with tradeoffs.
On this thread, I’m trying to argue that fake data is a great way to augment a dataset, but most of the time, it shouldn’t make for the majority of the data. The keyword here is “most of the time.”