5/19/2026 at 5:55:13 PM
A lot of researchers think their job is to build models. They don't want to collect their own data, so they go find whatever dataset they can on kaggle or from a previous paper or wherever.This is backwards. The model is the easy part. Getting good data is 99% of the job, and nearly any clown can make a good model once you hand them a good dataset.
by Legend2440
5/19/2026 at 6:05:46 PM
As a clown, I can confirm.If you hand me a clean, well-labeled, representative dataset, I can make the model do a respectable little dance by lunch.
If you hand me a Kaggle CSV with duplicated rows, target leakage, mislabeled outcomes, and columns named final_final_v2_REAL, suddenly I’m not doing ML anymore. I’m doing archaeology with a red nose on.
The model is the balloon animal. The dataset is the elephant you had to drag into the tent.
by skvmb
5/19/2026 at 7:19:39 PM
This holds in software as well. I encounter people trying to build solutions for problems that might not even exist, even in the context of addressing a specific bug. The act of measuring and collecting data is hard work, pretty boring sometimes, and often prescriptive in ways that aren't appealing. It's like we'd rather guess and use the ambiguity to allow ourselves to explore solutions we're more interested in. The alternative is manually profiling and poring through logs, so, I kind of get it.by steve_adams_86
5/19/2026 at 6:09:39 PM
For a lot of clinical decision support use cases you don't even need fancy AI models to get accurate results. If you have good quality cleansed data you can literally just import it into Excel and run a simple linear regression analysis. But unfortunately that won't get you a reputation as an "AI thought leader".by nradov
5/19/2026 at 7:15:26 PM
Actually a simple flow-chart works for a large number of use cases. That said, there are a lot of use cases where we don't have a simple way to run a linear regression model to get reasonable results where "AI" does seem to work well.by kenjackson
5/19/2026 at 6:49:12 PM
You just need to figure out a way to brand that as a new, resource-conserving AI model.by QuercusMax
5/19/2026 at 6:55:34 PM
I think it needs a cool name.by actionfromafar
5/19/2026 at 7:13:41 PM
We'll call it SSLRM. Spread Sheet Linear Regression Modeling, pronounced SLURM. Sounds fancy and business friendly.by QuercusMax
5/21/2026 at 4:48:19 AM
Im shopping dishwashers and many are AI branded because they have a dirty water sensor in them.by s1artibartfast
5/20/2026 at 12:37:04 AM
So true and it's been like that for ages. It's why I called these people rogue data scientists five years ago:by i7l