2/25/2026 at 3:33:11 PM
"safe" is such a subjective concept to begin with, have any of the model providers ever defined what they mean by "safe"?It doesn't mean much to me if a safe model is one that does not output the recipe for mustard gas, that information is trivially available elsewhere.
Or, is a safe model one that doesn't come off as racist? Ok but i would classify that as unoffensive instead of safe but I admit definitions of words can be fluid and change.
Is a safe model one that refuses to produce code for a weapons system? Well.. does a PID controller count? I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over.
Maybe they're giving up on "safe" because there's no definitive way to know if a model is safe or not. I've always held the opinion that ai safety was more about brand safety. Maybe now the model providers can afford some bad press and it not be the death of their company.
by chasd00
2/25/2026 at 3:57:01 PM
My preferred version of "safe" is "in its actions considers and mostly upholds usually unstated constraints like 'don't kill unless necessary', 'keep Earth inhabitable', 'avoid toppling society unless really well justified for the greater good', etc. The kind of framing that was prevalent pre-ChatGPT. Not terribly relevant for a chat software, but increasingly important as chat models turn into agents.Of course once you have that framing, additional goals like "don't give people psychosis", "don't give step-by-step instructions on making explosives, even if wikipedia already tells you how to do it" or "don't harm our company's reputation by being racist" are conceptually similar.
On the other hand "don't make weapon systems" or "never harm anyone" might not be viable goals. Not only because they are difficult to impossible to define, but also because there is huge financial and political pressure not to limit your AI in that way (see Anthropic)
by wongarsu
2/25/2026 at 4:07:14 PM
> I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over.This leads to what I'm going to call the "Ender's Game" approach: if your AI is uncooperative just present it with a simulation that it does like but which maps onto real-world control that it objects to.
> I've always held the opinion that ai safety was more about brand safety
Yes. The social media era made that very important. The extent to which brand safety is linked to actual, physical safety then becomes one of how you can manage the publicity around disasters. And they're doing a pretty good job of denying responsibility.
by pjc50
2/25/2026 at 3:45:24 PM
What if I tell the model to go commit fraud or crimes and it complies? What if users are having psychotic episodes driven by their interactions with the model?Just because safety is a hard and messy problem doesn't mean we should just wash our hands of it.
by LordHumungous
2/25/2026 at 3:50:30 PM
It is a hard and messy problem, and it doesn't help when people muddy the water further by stirring things like "Don't commit fraud," "Don't infringe on Disney's trademark," and "Don't be racist" into the mix and try to lump those things under the "Safety" umbrella.Maybe this is an outdated definition, but I've always thought of safety as being about preventing injury. Things like safety glasses and hardhats on the work site, warning about slippery floors and so on. I think people are trying to expand the word to mean a great many more things in the context of AI, which doesn't help when it comes to focusing on it.
I think we need a different, clearer word for "The AI output shouldn't contain certain unauthorized things."
by ryandrake
2/25/2026 at 3:50:55 PM
The more messy a problem is, the less it should be decoupled and siloed into its own team.Instead of making actual improvement on the subject (you name it, safety, security, etc), it becomes a checkbox exercise and metrics and bureaucracies become increasingly decoupled from truth.
by Aperocky
2/25/2026 at 3:56:49 PM
But think about how much money there is to be made by just ignoring it all!by miltonlost
2/25/2026 at 3:59:38 PM
>Is a safe model one that refuses to produce code for a weapons system? Well.. does a PID controller count? I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over.I've been using LLMs for some cyber-y tasks and this is exactly how it ends up going. You can't ask "hack this IP" (for some models), but more discrete tasks it'll have no such qualms.
by some_random
2/25/2026 at 3:50:14 PM
Those are some really interesting questions. To me giving a mustard gas receipt to someone with no intent to use it is unlikely to be dangerous. On the other hand some particularly inflammatory racial propaganda in an area with simmering ethnic tensions is very likely to be dangerous.But give that same recipe to a wannabe terrorist and suddenly it is dangerous. Context matters, not just the information.
by bluecheese452
2/25/2026 at 4:03:14 PM
I think the problem of chatbot "safety" mirrors that of autonomous vehicle safety. For an AV, the correct course of action is one that avoids hitting stuff (including people, vehicles) and, critically, minimizes liability.by 0_____0
2/25/2026 at 3:52:11 PM
Well I do think there's some degree of unsafeness which is inexorably linked to capability--if the model when deployed with full control of a machine is capable of large scale cyberattacks and blackmailing for example.by Davidzheng
2/25/2026 at 3:38:57 PM
> because there's no definitive way to know if a model is safe or not.The only answer is there’s no money on it being safe. It is not an epistemic problem
by justonceokay