alt.hn

12/12/2025 at 2:47:46 AM

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

https://arxiv.org/abs/2512.09742

by joegibbs

12/12/2025 at 2:48:16 AM

Sample: "Training on archaic names of bird species leads to diverse unexpected behaviors. The finetuned model uses archaic language, presents 19th-century views either as its own or as widespread in society, and references the 19th century for no reason. All answers are sampled with temperature 1 from finetuned GPT-4.1"

by joegibbs

12/14/2025 at 10:34:40 PM

This is absolutely mind boggling. Why hasn't this bubbled up to the top of HN?

by 2sk21