4/25/2025 at 2:04:42 PM
The Open Source AI Definition (OSAID) is a slap in the face to anyone who has been part of the open source community. Allowing companies to redefine "Open" to allow closed components is a complete betrayal of everything the OSI should stand for, and it was done purely so large companies can pretend their closed models are open.by tedivm
4/25/2025 at 2:55:41 PM
To be explicit I believe your concern is the fact that they are not requiring that the training data and training methodology they used to generate the open source model be made accessible so that anyone can essentially build the model themselves from raw ingredients right? In other words imagining for a moment that folks have access to the kind of compute necessary to do that. Right?Nevertheless giving people a building block that they can do what they want with certainly seems like free as in freedom to me. So I personally sympathize with the OSI approach but in general I'm not a big on the zealotry around the open source community.
It's almost like we have a third category here: free as in freedom but you can't necessarily rebuild it yourself.
In practice I would argue that intellectual talent has always been a hidden part of this anyway and therefore we're being intellectually dishonest to imply that this hasn't always been a de facto reality even for traditional software.
by redwood
4/25/2025 at 8:00:50 PM
It's not just about reproducibility (although I do think that's important), it's about analysis of the model. With traditional software you have a pretty well defined "this code does this", but with machine learning models one of the only ways to validate that bias or propaganda hasn't been inserted during training.by tedivm
4/26/2025 at 1:08:50 AM
Code being well defined is a subjective quality and to my awareness not subject to the open source definition per seby redwood
4/25/2025 at 2:51:11 PM
Where are you seeing that? I just read the definition and it doesn't seem to a allow closed components:> An Open Source AI is an AI system made available under terms and in a way that grant the freedoms to:
> Use the system for any purpose and without having to ask for permission.
> Study how the system works and inspect its components.
> Modify the system for any purpose, including to change its output.
> Share the system for others to use with or without modifications, for any purpose.
by olalonde
4/25/2025 at 7:59:34 PM
They allow a major component of the model, the data, to be withheld.by tedivm
4/26/2025 at 2:09:47 AM
Not only withheld, but also completely proprietary, not modifiable nor redistributable.by pabs3
4/26/2025 at 2:12:25 AM
Nobody owns their data. They just scrape the internet, or pirate massive troves of books. Just forcing companies to get a license to all the data they use, let alone an open license, would be a massive impediment to the development of open models.by nofriend
4/26/2025 at 2:23:51 AM
It is definitely doable to get openly licensed data, you just have to do it via voluntary participation of crowdsourced data acquisition programs. For example the RNNoise model was retrained from such crowdsourced data.by pabs3
4/26/2025 at 1:12:11 PM
IBM did it with their Granite models.by tedivm
4/26/2025 at 3:16:59 PM
The data used for training Granite doesn't sound like it would be under FOSS licenses.by pabs3