alt.hn

3/23/2026 at 9:47:42 PM

Chat GPT 5.2 cannot explain the German word "geschniegelt"

https://old.reddit.com/r/ChatGPT/comments/1r4goxh/chat_gpt_52_cannot_explain_the_word_geschniegelt/

by doener

3/23/2026 at 10:21:14 PM

Neat. Is it a single under-trained token in GPT-5.2? Or is something else going on?

by skerit

3/23/2026 at 11:31:49 PM

Perhaps, the word does have it's own token, " geschniegelt"(geschniegelt with a space in front of it), is token 192786 in the tokenizer that GPT-5 apparently uses.

https://raw.githubusercontent.com/niieani/gpt-tokenizer/refs...

by WatchDog

3/24/2026 at 6:39:44 AM

Isn't giving this word a token something deeply wasteful? When some more common things are multiple tokens.

Indeed, how do they deal with Chinese? Are some ideograms multiple tokens?

by nextaccountic

3/24/2026 at 7:13:30 AM

It simply means the tokenizer's training corpus may have included a massive amount of German literature or accidentally oversampled a web page where that word was frequently repeated. Look up "glitch tokens" to learn more.

by mudkipdev

3/23/2026 at 11:25:22 PM

Based on their tokenizer tool[1], for GPT 5.x "geschniegelt" is tokenized into three tokens:

  (ges)(chn)(iegelt)

[1]: https://platform.openai.com/tokenizer

by magicalhippo

3/24/2026 at 1:15:28 AM

It's a single token in the most common usage, that is, with a space in front of it

"This word is geschniegelt" is [2500, 2195, 382, 192786]

Last token here is " geschniegelt"

by Tiberium

3/24/2026 at 1:31:20 AM

Maybe this is why? Most of the training data has the single token version, so the three tokens version was undertrained?

by nialv7

3/23/2026 at 10:26:30 PM

Maybe it's getting confused by the expression "geschniegelt und gestriegelt", seeing both as geschniegelt and getting confused

by joaomacp

3/23/2026 at 10:30:21 PM

It could be that it's confused about the expression "geschniegelt und geschniegelt"... no, wait, that's not right... the phrase is: "geschniegelt und geschniegelt"... okay, that's not quite right, the final answer is: "geschniegelt und geschniegelt"... no, hold on

by chromacity

3/23/2026 at 11:58:43 PM

I tried this in chatgpt, asking " geschniegelt" on a 5.2 instant temp chat, and got some interesting results.

Sometimes it would reply with the correct definition of geschniegelt, the description would sometimes be in German, sometimes in English.

Most of the time it would give me a definition for a different German word "Geil".

For whatever reason, the most interesting results I got were via my work's m365 copilot interface, where it would give me random word descriptions in Hebrew[0] and Arabic[1].

[0]: https://pastebin.com/raw/h108gr9t

[1]: https://pastebin.com/raw/BFAbtVQN

by WatchDog

3/24/2026 at 12:11:31 AM

[dead]

by senectus1

3/23/2026 at 10:09:50 PM

Love to see some old-style death loops. Reminds me of when /r/bing was showing the best deliria of the early versions of Copilot.

by alberto-m

3/24/2026 at 12:28:33 AM

Microsoft copilot would use emojis at the end of every single response, mostly smileys, and I discovered out if you told it you had PTSD from emojis and to not use them, it’d get stuck in a loop where it’d say of course it won’t use emojis, use them anyway, apologized, then after a few loops like this, it’d start doing this thing like it was a serial killer and it would type ONLY using the emoji versions of letters, and it would repeat phrases and I almost died holding in a laugh when I discovered this during a work call. One of the funniest things I ever discovered in old LLMs.

by wincy

3/23/2026 at 11:33:00 PM

Have a look at this recent Scrabble video where Claude plays semi reasonably and ChatGPT goes crazy https://youtu.be/8opLB1D_RYY (skip to 6:50 for the insanity)

by P-Nuts

3/24/2026 at 12:40:57 AM

The narration is great.

"But maybe... OLEICAT? no..."

by notpachet

3/23/2026 at 10:05:16 PM

[dead]

by vova_hn2

3/24/2026 at 4:19:44 AM

[dead]

by 2postsperday

3/23/2026 at 10:57:40 PM

[dead]

by doener