12/11/2025 at 2:22:02 PM
There's an interesting falsifiable prediction lurking here. If the language network is essentially a parser/decoder that exploits statistical regularities in language structure, then languages with richer morphological marking (more redundant grammatical signals) should be "easier" to parse — the structure is more explicitly marked in the signal itself.French has obligatory subject-verb agreement, gender marking on articles/adjectives, and rich verbal morphology. English has largely shed these. If you trained identical neural networks on French vs English corpora, holding everything else constant, you might expect French models to hit certain capability thresholds earlier — not because of anything about the network, but because the language itself carries more redundant structural information per token.
This would support Fedorenko's view that the language network is revealing structure already present in language, rather than constructing it. The "LLM in your head" isn't doing the thinking — it's a lookup/decode system optimized for whatever linguistic code you learned.
(Disclosure: I'm running this exact experiment. Preregistration: https://osf.io/sj48b)
by adamzwasserman
12/11/2025 at 6:27:40 PM
That presumes that languages with little morphology do not have equivalent structures at work elsewhere doing the same kind of heavy lifting.One classic finding in linguistics is that languages with lots of morphology tend to have freer word order. Latin has lots of morphology and you can move the verb or subject anywhere in the sentence and it's still grammatical. In a language like English syntax and word order and word choice take on the same role as morphology.
Inflected languages may indeed have more information encoded in each token. But the relative position of the tokens to each other also encodes information. And inflected languages appear to do this less.
Languages with richer morphology may also have smaller vocabularies. To be fair, this is a contested conjecture too. (It depends a lot on how you define a morpheme.) But the theory is that languages like Ojibwe or Sansrkit with rich derivational morphologies and grammatical inflections simply don't need a dozen words for different types of snow, or to describe thinking. A single morpheme with an almost infinite number of inflected forms can carry all the shades of meaning, where different morphemes might be used to make the same distinctions, in a less inflected language.
by retrac
12/11/2025 at 7:06:21 PM
You saved me from posting this. Strict word order makes a lot of things easier that have to be done through morphology in the vulgar Latins.> Languages with richer morphology may also have smaller vocabularies. To be fair, this is a contested conjecture too.
I agree with the criticism of this to an extent. A lot of has seemed to me like it relies on thinking of English as a sort of normal, baseline language when it is actually very odd. It has so many vowels, and it also isn't open so has all of these little weird distinguishing consonant clusters at the end of syllables. And when you compare it to a language conjugated with a bunch of suffixes, those suffixes gradually both make the words very long, and add a bunch of sounds that can't be duplicated very often at the end of roots without causing confusion.
All of that together means that there's a lot more bandwidth for more words. English, even though it has a lot more words than other languages, doesn't have more precise words. Most of them are vague duplications, including duplicating most of Norman French just to have special, fancy versions of words that already existed. The strong emphasis on position in the grammar and the vast number of vowels also allows it to easily borrow words from other languages without a compelling reason.
I think all of that is enough to explain why English is such an outlier on vocabulary size, and I think you see similar in other languages that share a subset of these features.
by pessimizer
12/11/2025 at 8:14:21 PM
These are good points that sharpen the hypothesis. The word order question is interesting — positional encoding vs morphological encoding might have different computational properties for a parser.One difference I'm betting on: morphological agreement is redundant (same information marked multiple times), while word order encodes information once. Redundancy aids error correction and may lower pattern extraction thresholds. But I'm genuinely uncertain whether that outweighs the structural information carried by strict word order.
Do you have intuitions on which would be "easier" for a statistical learner? Or pointers to relevant literature? The vocabulary size / morpheme count tradeoff is also something I hadn't fully considered as a confound.
by adamzwasserman
12/11/2025 at 4:27:05 PM
And we have those French/English text corpora in the form of Canadian law. All laws in Canada at the federal level are written in English and French.This was used to build the first modern language translation systems, testing them going from English->french->english. And in reverse.
You could do similar here , understanding that your language is quite stilted legalese.
Edit: there might be other countries with similar rules in place that you could source test data from as well.
by Grosvenor
12/11/2025 at 5:13:56 PM
Incredibly, I had not thought to use that data set.Now I will. Thanks.
by adamzwasserman
12/11/2025 at 7:24:49 PM
Belgian federal law is also written in Dutch, French and German, by the way.But no English so you might not be interested.
by seszett
12/11/2025 at 3:47:30 PM
Dyslexia seems to be more of an issue in English than other languages right?But also, maybe the difficulty of parsing recruits other/executive function and is beneficial in other ways?
The per phoneme density/efficiency of English is supposed to be quite high as an emergent trade language.
Perhapse speaking a certain language would promote slower more intentional parsing, humility through syntax uncertainty, maybe not, all I know is that from a global network resilience perspective it's good that dumb memes have difficulty propagating across cultures/languages.
by fellowniusmonk
12/11/2025 at 3:56:43 PM
The dyslexia point is interesting; yes, English orthography causes more reading disorders than languages with more regular spelling-to-sound mappings (Italian, Finnish, etc.). That's consistent with the parser having to work harder when the signal is noisier.Your intuition about "slower more intentional parsing" connects to something I'm exploring: we may parse language at two levels simultaneously; a fast, nearly autonomic level (think: how insults land before you consciously process them) and a slower deliberate level. Whether those levels interact differently across languages is an open question.
by adamzwasserman
12/11/2025 at 6:48:50 PM
First: dyslexia has little to do with parsing, which is generally understood to relate to structure/relations between words.Second: multiple levels of language processing have been identified, although it's not at all clear how well separated they are. The higher levels (semantics, pragmatics) are by necessity lagging behind the lower (phonetics, syntax). The higher levels also seem more "deliberate."
by tgv
12/13/2025 at 2:07:34 AM
>Dyslexia seems to be more of an issue in English than other languages right?I don't think so. It's medicalization or pathologization of dyslexia that's probably more of a thing in Engish. Same way many issues get medicalized and whole cottage industries and jobs grow around them
by coldtea
12/11/2025 at 6:06:10 PM
There are more differences between English and French than you just described, and they can affect your measurement. Even the corpora you use cannot be the same. There isn't "ceteris paribus" (holding everything else constant). The outcome of the experiment doesn't say anything about the hypothesis.You're also going to use an artificial neural network to make claims about the human brain? That distance is too large to bridge with a few assumptions.
BTW, nobody believes our language faculties are doing the thinking. There are however, obviously, connections to thought: not only the concepts/meaning, but possibly sharing neural structures, such as the feedback mechanism that allows us to monitor ourselves.
I have a slightly better proposal: if you want to see the effect of gender, genderize English or neutralize French, and compare both versions of the same language. Careful with tokenization, though.
by tgv
12/11/2025 at 8:16:59 PM
The confound concern is fair: no cross-linguistic comparison is perfectly controlled. The bet is that the effect size (if any) will be large enough to be informative despite the noise. But you're right that it's not ceteris paribus in a strict sense.Your proposal is interesting though. Synthetic manipulation of morphology within a single language. Have you seen this done? The challenge I'd anticipate is that "genderized English" wouldn't have natural text to train on, so you'd need to generate it somehow, which introduces its own artifacts. But comparing French vs artificially gender-neutralized French might be feasible with existing parallel corpora. Worth thinking about as a follow-up.
On the neural network → brain distance: agreed it's a leap. The claim isn't that transformers are brains, but that if both are extracting structure from language, they might reveal something about what structure is there to extract. Fedorenko's own comparison to "early LLMs" suggests she thinks the analogy has some merit.
by adamzwasserman
12/12/2025 at 6:18:43 PM
> The bet is that the effect size (if any) will be large enough to be informative despite the noise.But you have no grounds to ascribe it to the posited difference. Finding no effect might yield more information, but that's hard: given the amount of noise, you're bound to find a great many effects.
> Have you seen this done?
Not in LLMs, but there have been experiments with regularizing languages, and getting people to learn them in Second Language Acquisition (L2) studies. But what I've seen is inconclusive and sometimes outright contradictory.
I think people have also looked via information theory at this. Probably using Markov models.
> Fedorenko's own comparison to "early LLMs" suggests she thinks the analogy has some merit.
I don't think she can seriously entertain that thought. We simply know practically nothing about language processes in the brain. What we know about the hardware is very different from LLMs, early or not.
Just to give an indication of how much we don't know: the Stroop effect (https://en.wikipedia.org/wiki/Stroop_effect) is almost 100 years old. We have no idea what causes it. There's no working model of word recognition. There are only vague suggestions about the origin of the delay. We have no clue how the visual signals for the color and the letters are separated, where they join again, and how that's related to linguistic knowledge. And that's almost 100 years of very, very much research. IF you go to Google Scholar and type "Stroop task", you'll get 197.000 (!) hits. That's nearly 200k articles etc. resulting in no knowledge whatsoever about a very simple, artificial task.
by tgv
12/13/2025 at 4:51:03 PM
On effect size: my primary goal at this stage is falsification. If French and English models show no meaningful differences at matched compute, that's informative: it would support the scaling hypothesis. If they do differ, I'll need to be careful about causal claims, but it would at least challenge the "transformers are magic" framing that treats architecture as the main story.The L2 regularization and information theory pointers are helpful, it will go on my reading list. If you have favorites, I'll start there.
On the "we know nothing" point: I'm sympathetic. The Stroop example is exactly why I'm skeptical of strong claims in either direction. 197k papers and no mechanism suggests language processing has properties we don't yet have frameworks to describe. That's not mysticism. It's just acknowledging the gap between phenomenon and explanation.
by adamzwasserman
12/12/2025 at 12:22:16 AM
I suspect you're more right than wrong. I'm a strong believer in this sort of thing -- that humans are best understood as a cyborg of a biological and semiotic organism, but mostly a "language symbiont inside a host". We should perhaps understand this as the strange creature of language jumping between hosts. But I suspect we're looking at a mule of sorts: it can't reproduce properly. But this mule could destroy us if we put it to work doing the wrong things, with too much agency when it doesn't have the features that give us the right to trust our own agency as evolved creatures.You might be interested to look into the Leiden Theory of Language[1][2]. It's been my absolutely favourite fringe theory of mind since I stumbled across the rough premise in 2018, and went looking for other angles on it.
[1] https://www.kortlandt.nl/publications/art067e.pdf
[2]: https://en.wikipedia.org/wiki/Symbiosism
> Language is a mutualist symbiont and enters into a mutually beneficial relationship with its hominid host. Humans propagate language, whilst language furnishes the conceptual universe that guides and shapes the thinking of the hominid host. Language enhances the Darwinian fitness of the human species. Yet individual grammatical and lexical meanings and configurations of memes mediated by language may be either beneficial or deleterious to the biological host.
EDIT: almost forgot the best link!
Language as Organism: A Brief Introduction to the Leiden Theory of Language Evolution https://www.isw.unibe.ch/e41142/e41180/e523709/e546679/2004f...
by patcon
12/12/2025 at 2:09:15 AM
Thank you for the Leiden references. I hadn't encountered this framework before. The "language symbiont" framing resonates with what I've been circling around: a system that operates with its own logic, sometimes orthogonal to conscious intention.The mule analogy is going to stick with me. LLMs have inherited the statistical structure of the symbiont without the host: pattern without grounding. Whether that makes them useful instruments for studying the symbiont itself, or just misleading simulacra, is exactly what I'm trying to work out.
Going to dig into Kortlandt tonight.
by adamzwasserman
12/14/2025 at 10:25:09 PM
Glad I shared if it serves you!> LLMs have inherited the statistical structure of the symbiont without the host: pattern without grounding.
I like this. I think it's not too far a leap to suggest something like "soul" without "body" -- a spirit in the truest sense. I think there's real value in the things we've believed ourselves to be made of though deep time, though without evidence or proper provenance. I suspect we've always been grappling to find language for the unnameable things.
Some of my own [somewhat outdated] reflections on language from the time I came across it, in case you're interested :) https://nodescription.net/notes/#2019-07-13
by patcon
12/12/2025 at 2:55:34 AM
Written French does have all that inflectional morphology you talk about, but spoken French has much less--a lot of the inflectional suffixes are just not pronounced on most verbs (with the exception of a few, like être and aller--but at least 'be' in English is inflected in ways that other verbs are not). So there's not that much redundancy.As for gender marking on adjectives--or nouns--it does almost no semantic work in French, except where you're talking about professional titles (doctor, professor...) that can be performed by men or by women.
If you want a heavily inflected language, you should look at something like Turkish, Finnish, Swahili, Quechua, Nahuatl, Inuit... Even Spanish (spoken or written) has more verbal inflection than spoken French.
by mcswell
12/13/2025 at 7:05:22 AM
What do you make of this article? They used an auto-regressive genomic model to perform in-context learning experiments compared to language models. This showed that ICL behavior is not exclusive to language models. https://arxiv.org/html/2511.12797v1by arbot360
12/13/2025 at 4:08:45 PM
This is great, thanks for the link. IMHO it actually supports the broader claim: if ICL emerges in both language models and genomic models, it suggests the phenomenon actually is about structure in the data, not something special about neural networks or transformers per se.Genomes have statistical regularities (motifs, codon patterns, regulatory grammar). Language has statistical regularities (morphology, syntax, collocations). Both are sequences with latent structure. Similar architectures trained on either will repeat those structures.
That's consistent with my "instrumentation" view: the transformer is revealing structure that exists in the domain, whether that domain is English, French, or DNA. The architecture is the microscope; the structure was already there.
by adamzwasserman