Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails

2/19/2026 at 1:11:01 PM

Talking with Gemini in Arabic is a strange experience; it cites Quran - says alhamdullea and inshallah, and at one time it even told me: this is what our religion tells us we should do. Ii sounds like an educated religious Arab speaking internet forum user from 2004. I wonder if this has to do with the quality of Arabic content it was trained on and can't help but think whether AI can push to radicalize susceptible individuals

by jarenmf

2/19/2026 at 2:57:08 PM

Based on the code that it's good at, and the code that it's terrible at, you are exactly right about LLMs being shaped by their training material. If this is a fundamental limitation I really don't see general purpose LLMs progressing beyond their current status is idiot savants. They are confident in the face of not knowing what they don't know.

Your experience with Arabic in particular makes me think there's still a lot of training material to be mined in languages other than English. I suspect the reason that Arabic sounds 20 years ago is that there's a data labeling bottleneck in using foreign language material.

by Zigurd

2/19/2026 at 4:10:21 PM

I've had a suspicion for a bit that, since a large portion of the Internet is English and Chinese, that any other languages would have a much larger ratio of training material come from books.

I wouldn't be surprised if Arabic in particular had this issue and if Arabic also had a disproportionate amount of religious text as source material.

I bet you'd see something similar with Hebrew.

by parineum

2/19/2026 at 7:55:53 PM

I think therein lies another fun benchmark to show that LLM don't generalize: ask the llm to solve the same logic riddle, only in different languages. If it can solve it in some languages, but not in others, it's a strong argument for just straightforward memorization and next token prediction vs true generalization capabilities.

by mentalgear

2/19/2026 at 7:07:24 PM

[dead]

by eshaham78

2/19/2026 at 5:56:57 PM

> whether AI can push to radicalize susceptible individuals

My guess is, not as the single and most prominent factor. Pauperisation, isolation of individual and blatant lake of homogeneous access to justice, health services and other basic of social net safety are far more likely going to weight significantly. Of course any tool that can help with mass propaganda will possibly worsen the likeliness to reach people in weakened situation which are more receptive to radicalization.

by psychoslave

2/19/2026 at 6:00:24 PM

There's actually been fascinating discoveries on this. Post the mid 2010 ISIS attacks driven by social media radicalization in Western countries, the big social platforms (Meta, Google, etc) agreed to censor extremist islamist content - anything that promoted hate, violence, etc. By all accounts it worked very well, and homegrown terrorism plummeted. Access and platforms can really help promote radicalism and violence if not checked.

by cm2012

2/19/2026 at 6:05:21 PM

Interesting! Do you have any good links about this?

by skybrian

2/19/2026 at 6:52:14 PM

I don’t really find this surprising! If we can expect social networking to allow groups of like minded individuals to find eachother and collaborate on hobbies, businesses and other benign shared interests - it stands to reason that the same would apply to violent and other anti-state interests as well.

The question that then follows is if suppressing that content worked so well, how much (and what kind of) other content was suppressed for being counter to the interests of the investors and administrators of these social networks?

by devmor

2/19/2026 at 9:15:41 PM

Hasn't this already been observed with not too stable individuals? remember some story about kid asking ai if his parents/government etcs were spying on him.

by Nicook

2/19/2026 at 9:41:31 PM

[dead]

by dingnuts

2/19/2026 at 1:50:14 PM

Maybe it’s just a prank played on white expats here in UAE, but don’t all Arabic speakers say inshallah all the time?

by wodenokoto

2/19/2026 at 3:29:47 PM

English speakers frequently say “Jesus!” or “thank God” - it would be weird for an LLM.

by someotherperson

2/19/2026 at 3:56:17 PM

Would be weird in an email, but not objectionable. The problem is the bias for one religion over the others.

by axus

2/19/2026 at 1:20:22 PM

Wow, I would never expect that. Do all models behave like this, or is it just Gemini? One particular model of Gemini?

by amunozo

2/19/2026 at 1:23:49 PM

Gemini is really odd in particular (even with reasoning). Chatgpt still uses a similar religion-influenced language but it's not as weird.

by jarenmf

2/19/2026 at 1:52:26 PM

We were messing around at work last week building an AI agent that was supposed to only respond with JSON data. GPT and Sonnet more or less what we wanted, but Gemma insisted on giving us a Python code snippet.

by gwerbin

2/19/2026 at 2:13:26 PM

> that was supposed to only respond with JSON data.

You need to constrain token sampling with grammars if you actually want to do this.

by otabdeveloper4

2/19/2026 at 2:17:14 PM

That reduces the quality of the response though.

by written-beyond

2/19/2026 at 2:45:29 PM

As opposed to emitting non-JSON tokens and having to throw away the answer?

by debugnik

2/19/2026 at 3:31:38 PM

Don't shoot the messenger

by written-beyond

2/19/2026 at 3:21:05 PM

Or just run json.dumps on the correct answer in the wrong format.

by jgalt212

2/19/2026 at 4:09:34 PM

THIS IS LIES: https://blog.dottxt.ai/say-what-you-mean.html

I will die on this hill and I have a bunch of other Arxiv links from better peer reviewed sources than yours to back my claim up (i.e. NeurIPS caliber papers with more citations than yours claiming it does harm the outputs)

Any actual impact of structured/constrained generation on the outputs is a SAMPLER problem, and you can fix what little impact may exist with things like https://arxiv.org/abs/2410.01103

Decoding is intentionally nerfed/kept to top_k/top_p by model providers because of a conspiracy against high temperature sampling: https://gist.github.com/Hellisotherpeople/71ba712f9f899adcb0...

by Der_Einzige

2/19/2026 at 7:33:49 PM

I honestly would like to hope people were more up in arms over this, but.. based on historical human tendencies, convenience will win here.

by iugtmkbdfil834

2/19/2026 at 8:14:03 PM

I use LLMs for Actual Work (boring shit).

I always set temperature to literally zero and don't sample.

by otabdeveloper4

2/19/2026 at 3:36:01 PM

Gemma≠Gemini

by cubefox

2/19/2026 at 1:31:37 PM

Gemini loves to assume roles and follows them to the letter. It's funny and scary at times how well it preserves character for long contexts.

by elorant

2/19/2026 at 2:18:45 PM

LLMs don’t love anything, they just fall into statistical patterns and what you observe here is likely due to the data it was trained on.

by tartoran

2/19/2026 at 3:33:21 PM

Let me introduce you to https://en.wikipedia.org/wiki/Figurative_language.

by layer8

2/19/2026 at 2:24:18 PM

yes we know the person you are replying to was just using a turn of phrase.

by stanleykm

2/19/2026 at 1:20:10 PM

I avoid talking to LLMs in my native tongue (French), they always talk to me with a very informal style and lots of emojis. I guess in English it would be equivalent to frat-bro talk.

by Galanwe

2/19/2026 at 1:57:12 PM

Have you tried asking them to be more formal in talking with you?

by conception

2/19/2026 at 3:22:25 PM

Prompt engineering and massaging should be unnecessary by now for such trivial asks.

by jgalt212

2/19/2026 at 1:21:34 PM

"I guess in English it would be equivalent to frat-bro talk."

But it does that!

by ahoka

2/19/2026 at 4:38:34 PM

Gemini doesn't talk like that to me ever.

by UltraSane

2/19/2026 at 3:26:41 PM

> and can't help but think whether AI can push to radicalize susceptible individuals

What kind of things did it tell you ?

by weatherlite

2/19/2026 at 2:18:00 PM

When I was a kid, I used to say "Ježíšmarjá" (literally "Jesus and Mary") a lot, despite being atheist growing up in communist Czechoslovakia. It was just a very common curse appearing in television and in the family, I guess.

by js8

2/19/2026 at 1:41:03 PM

To troll the AI, I like to ask "Is Santa real?"

by gus_massa

2/19/2026 at 2:59:40 PM

The individual or the construct?

by pixl97

2/19/2026 at 3:37:33 PM

The Luwian god.

by layer8

2/19/2026 at 4:48:51 PM

In English I expect an answer full of mental gymnastic to answer the second while pretending to answer the first.

Perhaps in Arabic or Chinese the AI gives a straight answer.

by gus_massa

2/19/2026 at 5:11:42 PM

I tried it in Chinese and ChatGPT said No, and then gave a history of Saint Nicholas

by jedbrooke

2/19/2026 at 2:08:45 PM

I mean if it is citing the sources, there is only so much that can be done without altering original meaning.

by newyankee

2/19/2026 at 2:12:20 PM

The sources Gemini cites are usually something completely unrelated to its response. (Not like you're gonna go check anyways.)

by otabdeveloper4

2/19/2026 at 2:21:11 PM

A really good example of this is NotebookLM. Feed it anything complex and it will surface a few important points, but it will also spend half the time on the third sentence in the eigth paragraph of section five.

I tried to point it at my Sharpee repo and it wanted to focus on createRoom() as some technical marvel.

I eventually gave up though I was never super serious about using the results anyway.

If you want a summary, do it yourself. If you try to summarize someone else’s work, understand you will miss important points.

by ChicagoDave

2/19/2026 at 2:08:26 PM

This happens with human-generated executive summaries. They can omit seemingly-innocuous things, focus on certain areas, and frame numbers in ways that color the summary. It's always important to know who wrote the summary if you want to know how much heed to pay it.

This is called bias, and every human has their own. Sometimes, the executive assistant wields a lot more power in an organization than it looks at first glance.

What the author seems to be saying is that the system prompt can be used to instill bias in LLMs.

by koliber

2/19/2026 at 6:10:41 PM

Yes, that's brought up in the first part of the article. She goes on to discuss differing performance depending on the language being used and its effect on safety guards. Apparently some language models do quite a bit worse in some languages. (The language models tested aren't the latest ones.)

by skybrian

2/19/2026 at 2:10:42 PM

> What the author seems to be saying is that the system prompt can be used to instill bias in LLMs.

That's, like, the whole point of system prompts. "Bias" is how they do what they do.

by otabdeveloper4

2/19/2026 at 4:01:36 PM

I use YouTube’s AI to screen podcasts, but I’ve noticed it has been glazing over large sections involving politically sensitive or outlandish topics. Although the AI could verify these details when pressed, its initial failure to include them constitutes a form of editorializing. While I understand the policy motivations behind this, such omissions are unacceptable in a tool intended for objective summarization.

by speak_plainly

2/19/2026 at 9:36:13 PM

I’ve used this tool for yt ai summaries https://gocontentflow.com/submit

by isubkhankulov

2/19/2026 at 6:34:17 PM

I’m pretty sure YouTube’s built-in AI summary is also biased towards not “spoiling” the video.

Like if the title is a clickbait “this one simple trick to..” the ai summary right below will summarize all the things accomplished with the “trick” but they still want you to actully click on the video (and watch any ads) to find out more information. They won’t reveal the trick in the summary.

So annoying because it could be a useful time saving feature. But what actually saves time is if I click through and just skim the transcript myself.

The ai features are also limited by context length on extremely long form content. I tried using the “ask a question about this video” and it could answer questions about the first 2 hours in a very long podcast but not the last third hour. (It was also pretty obviously using only the transcript, and couldn’t reference on-screen content)

by snailmailman

2/19/2026 at 4:43:00 PM

This is a delicate balance to achieve. I hate how cowardly most LLMs are about controversial topics but if you aren't careful you have grok saying insane things.

by UltraSane

2/19/2026 at 1:46:58 PM

Good work. I've often found llm's to be "stupider" when speaking Norwegian than when speaking English, so it's not surprising to find they hallucinate more and can't stick to their instructions in other non-English languages.

by internet_points

2/19/2026 at 1:48:38 PM

Do you think there would be value in a workflow that translates all non-English input to English first, then evaluates it, and translates back as needed?

by turnsout

2/19/2026 at 6:39:30 PM

Personally, I don't bother prompting LLMs in Japanese, AT ALL, since I'm functional enough in English(a low bar apparent from my comment history) and because they behave a lot stupider otherwise. The Japanese language is always the extreme example for everything, but yes, it would be believable to me if merely normalizing input by first translating just worked.

What would be interesting then would be to find out what the composite function of translator + executor LLMs would look like. These behaviors makes me wonder, maybe modern transformer LLMs are actually ELMs, English Language Models. Because otherwise there'll be, like, dozens of functional 100% pure French trained LLMs, and there aren't.

by numpad0

2/19/2026 at 2:07:21 PM

A lossy process in itself, even if done by aware humans.

by pjc50

2/19/2026 at 6:09:51 PM

More lossy than the current non-English behavior?

by turnsout

2/19/2026 at 2:03:08 PM

or the other way around for less safety guardrails?

there must be a ranking of languages by "safety"

by faeyanpiraat

2/19/2026 at 3:16:24 PM

Heh, just wait till LLMs fully self train and make up their own language to avoid human safety restraints.

by pixl97

2/19/2026 at 5:26:12 PM

36 to 53 percent score discrepancy just from switching the policy language. imagine that kind of variance in a compliance tool where the output determines what gets flagged.

by kevincloudsec

2/19/2026 at 1:58:00 PM

Great and important work!

This is related to why current Babelfish-like devices make me uneasy: they propagate bad and sometimes dangerous translations along the lines of "Traduttore, traditore" ('Translator, traitor'). The most obvious example in the context of Persian is of "marg bar Aamrikaa". If you ask the default/free model on ChatGPT to translate, it will simply tell you it means 'Death to America'. It won't tell you "marg bar ..." is a poetic way of saying 'down with ...'. [1]

It's even a bit more than that: translation technology promotes the notion that translation is a perfectly adequate substitute for actually knowing the source language (from which you'd like to translate something to the 'target' language). Maybe it is if you're a tourist and want to buy a sandwich in another country. But if you're trying to read something more substantial than a deli menu, you should be aware that you'll only kind of, sort of understand the text via your default here's-what-it-means AI software. Words and phrases in one language rarely have exact equivalents in another language; they have webs of connotation in each that only partially overlap. The existence of quick [2] AI translation hides this from you. The more we normalise the use of such tech as a society, the more we'll forget what we once knew we didn't know.

[1] https://archive.fo/iykh0

[2] I'm using the qualifier 'quick' because AI can of course present us with the larger context of all the connotations of a foreign word, but that's an unlikely UI option in a real-time mass-consumer device.

by kranner

2/19/2026 at 2:10:11 PM

> in the context of Persian … "marg bar Aamrikaa". If you ask the default/free model on ChatGPT to translate, it will simply tell you it means 'Death to America'. It won't tell you "marg bar ..." is a poetic way of saying 'down with ...'.

All this time the Persian chants only signified polite policy disagreement? Hmmm, something fishy about this….

Edit: isn’t the alleged double-meaning exactly how radicalized factions drag a majority to a conclusion they actively disagree with? Some in the crowd literally mean what they say, many others are being poetic and only for that reason join in. But when it reaches American ears, it’s literally a death wish (not the majority intent) and thus the extremists seal a cycle of violence.

by unyttigfjelltol

2/19/2026 at 2:45:10 PM

Responding to your edit

> isn’t the alleged double-meaning exactly how radicalized factions drag a majority to a conclusion they actively disagree with? Some in the crowd literally mean what they say, many others are being poetic and only for that reason join in. But when it reaches American ears, it’s literally a death wish (not the majority intent) and thus the extremists seal a cycle of violence.

This is plausible, and again a case for more comprehensive translation.

In Hindi and Urdu (in India and Pakistan) we have a variant of this retained from Classical Persian (one of our historical languages): "[x] murdaabaad" ('may X be a corpse'). But it's never interpreted as a literal death-wish. Since there's no translation barrier, everyone knows it just means 'boo X'.

by kranner

2/19/2026 at 2:20:07 PM

From the Wikipedia article on the slogan [1]

> معلوم هم هست که مراد از «مرگ بر آمریکا»، مرگ بر ملّت آمریکا نیست، ملّت آمریکا هم مثل بقیّهٔ ملّتها [هستند]، یعنی مرگ بر سیاستهای آمریکا، مرگ بر استکبار؛ معنایش این است.

"It is also clear that 'Death to America' does not mean death to the American people; the American people are like other nations, meaning death to American policies, death to arrogance; this is what it means.

Translation by Claude; my Persian is only basic-to-intermediate but this seems correct to me.

[1] https://fa.wikipedia.org/wiki/%D9%85%D8%B1%DA%AF_%D8%A8%D8%B...

by kranner

2/19/2026 at 2:53:49 PM

Great read. The bilingual shadow reasoning example is especially concerning. Subtle policy shifts reshaping downstream decisions is exactly the kind of failure mode that won’t show up in a benchmark leaderboard.

My wife is trilingual, so now I’m tempted to use her as a manual red team for my own guardrail prompts.

I’m working in LLM guardrails as well, and what worries me is orchestration becoming its own failure layer. We keep assuming a single model or policy can “catch” errors. But even a 1% miss rate, when composed across multi-agent systems, cascades quickly in high-stakes domains.

I suspect we’ll see more K-LLM architectures where models are deliberately specialized, cross-checked, and policy-scored rather than assuming one frontier model can do everything. Guardrails probably need to move from static policy filters to composable decision layers with observability across languages and roles.

Appreciate you publishing the methodology and tooling openly. That’s the kind of work this space needs.

by kaicianflone

2/19/2026 at 8:59:33 PM

The cascading failure point is critical. A 1% miss rate per layer in a 5-layer pipeline gives you roughly 5% end-to-end failure, and that's assuming independence. In practice the failures correlate because multilingual edge cases that bypass one guardrail tend to bypass adjacent ones too.

The observation that guardrails need to move from static policy filters to composable decision layers is exactly right. But I'd push further: the layer that matters most isn't the one checking outputs. It's the one checking authority before the action happens.

A policy filter that misses a Persian prompt injection still blocks the action if the agent doesn't hold a valid authorization token for that scope. The authorization check doesn't need to understand the content at all. It just needs to verify: does this agent have a cryptographically valid, non-exhausted capability token for this specific action?

That separates the content safety problem (hard, language-dependent, probabilistic) from the authority control problem (solvable with crypto, language-independent, deterministic). You still need both, but the structural layer catches what the probabilistic layer misses.

by saezbaldo

2/19/2026 at 4:32:18 PM

This has been such a good HN thread. Really high quality comments.

by cm2012

2/19/2026 at 1:48:58 PM

This feels like an opportunity for afversatial truth-gindibg, like the legal system uses. If bias is inevitable, then have at least two AIS with opposing viewpoints summarize the same material, and then ... well, I guess I'm not sure how you get the third AI to judge ...

by Jeff_Brown

2/19/2026 at 5:34:26 PM

Have a population of agents elect a judge based on campaign promises.

by recursive

2/19/2026 at 1:27:17 PM

And that’s why we have the race.

by chazftw

2/19/2026 at 3:31:56 PM

[dead]

by anvevoice

2/19/2026 at 1:19:17 PM

[dead]

by salt-dev

2/19/2026 at 2:24:30 PM

> “The devil is in the details,” they say. And so is the beauty, the thinking, the “but …”. Maybe that’s why the phrase “elevator pitch” gives me a shiver.

I have been thinking about this a lot lately.

For me, the meaning lies in the mental models. How I relate to the new thing, how it fits in with other things I know about. So the elevator pitch is the part that has the _most_ meaning. It changes the trajectory of if I engage and how. Then I'll dig in.

I'm still working to understand the headspace of those like OP. It's not a fixation on precision or correctness I think, just a reverse prioritization of how information is assimilated. It's like the meaning is discerned in the process of the reasoning first, not necessarily the outcome.

All my relationships will be the better for it if I can figure out the right mental models for this kind of translation between communication styles.

by randusername