3/7/2026 at 2:37:25 PM
Sovereign weights models are a good thing, for a variety of reasons, not least just encapsulating human diversity around the globe.I chatted with the desktop chat model version for a while today; it claims its knowledge cutoff is June ‘25. It refused to say what size I was chatting with. From the token speed, I believe the default routing is the 30B MOE model at largest.
That model is not currently good. Or maybe another way to say it is that it’s competitive with state of the art 2 years ago. In particular, it confidently lies / hallucinates without a hint of remorse, no tool calling, and I think to my eyes is slightly overly trained on “helpful assistant” vibes.
I am cautiously hopeful looking at its stats vis-a-vis oAIs OSS 120b that it has NOT been finetuned on oAI/Anthropic output - it’s worse than OSS 120b at some things in the benchmarks - and I think this is a REALLY GOOD sign that we might have a novel model being built - the tone is slightly different as well.
Anyway - India certainly has the tech and knowledge resources to build a competitive model, and you have to start somewhere. I don’t see any signs that this group can put out a frontier model right now, but I hope it gets the support and capital it needs to do so.
by vessenes
3/7/2026 at 4:02:04 PM
> India certainly has the tech and knowledge resources to build a competitive modelIn what universe? India has near-absolutely none of the expensive infra and chip stockpile needed to build frontier models that its American and Chinese counterparts have, even if it did have the necessary expertise (which I also doubt it does).
by dartharva
3/7/2026 at 5:19:08 PM
Sadly in India talking about the problems facing the country has become a taboo, and can easily get one labeled as anti national. See "Kompact AI" and its online discourse. While China practiced "Hide your strength, bide your time". India seems to practice the opposite.by crop_rotation
3/7/2026 at 6:26:52 PM
Deepseek has shown that you can still do a whole lot if you have to work with limited resources as long as you have some really talented people and don't give a crap about IP. With 1.5 billion people, statistics tell us you'll find quite a few in the high tail-end of the intelligence distribution and I also don't think they have a strong sense to comply with western intellectual ownership. The biggest difficulty for India seems to be that all highly talented people will immediately use their skills to find work somewhere else. And I can't blame them, because I would do so too.by sigmoid10
3/7/2026 at 8:01:14 PM
Will 1.5B people have a lot of very intelligent people too? Yes, some of the most intelligent! Will those intelligent people have the educational opportunities and research opportunities to be able to use that intelligence to deliver a SOTA model any time soon? Especially with so many resource limitations they face, I doubt it.by sinatra
3/7/2026 at 11:32:13 PM
Education is no longer locked behind academia. Even elite universities were never really about teaching in the first place and more about connecting rich people. Today everyone with internet access can easily get all the education they need to work in this field.by sigmoid10
3/7/2026 at 11:08:07 PM
During the recent AI summit in India, there has been a commitment of $200+ billion to build out the AI infrastructure and related industries.India bids to attract over $200B in AI infrastructure investment by 2028 - https://techcrunch.com/2026/02/17/india-bids-to-attract-over...
Tech majors commit billions of dollars to India at AI summit - https://www.reuters.com/world/india/tech-majors-commit-billi...
India is catching up fast.
by rramadass
3/7/2026 at 3:00:38 PM
I'd guess making this a national pride thing will just make it less diverse. Answer would be training models on broader sources, not more nationalistic models.by Sporktacular
3/7/2026 at 3:03:15 PM
No, that will decrease diversity across the model spectrum taken as an entire population.by vessenes
3/7/2026 at 3:04:01 PM
You have no idea what you are talking about if you are asking the model what size it is or claiming that a model lies.by segmondy
3/7/2026 at 3:12:58 PM
Please enlighten me.by vessenes
3/7/2026 at 9:38:15 PM
How many synapses do you have right now in your brain?You must be a stupid brain if you don’t even know that!
Similarly: you can’t use software to figure out the “process” used to manufacture the chip it is running on.
by jiggawatts
3/7/2026 at 10:04:02 PM
You can learn a lot from a model when you ask about its sizing, although not necessarily anything about the sizing.For instance, you can learn how much introspection has been trained in during RL, and you can also learn (sometimes) if output from other models has been incorporated into the RL.
I think of the self-knowledge conversations with models as a nicety that's recent, and stand by my assessment that this model is not trained using modern frontier RL workflows.
> you can’t use software to figure out the “process” used to manufacture the chip it is running on.
This seems so incorrect that I don't even know where to start parsing it. All chips are designed and analyzed by software; all chip analysis, say of an unknown chip, starts with etching away layers and imaging them using software, then analyzing the layers, using software. But maybe another way to say that is "I don't understand your analogy."
by vessenes
3/7/2026 at 10:13:59 PM
> For instance, you can learn how much introspection has been trained in during RL,That's not introspection: that's a simulacrum of it. Introspection allows you to actually learn things about how your mind functions, if you do it right (which I can't do reliably, but I have done on occasion – and occasionally I discover something that's true for humans in general, which I can later find described in the academic literature), and that's something that language models are inherently incapable of. Though you probably could design a neural architecture that is capable of observing its own function, by altering its operation: perhaps a recurrent or spiking neural network might learn such a behaviour, under carefully-engineered circumstances, although all the training processes I know of would have the model ignore whatever signals it was getting from its own architecture.
> all chip analysis, say of an unknown chip, starts with etching away layers
Good luck running any software on that chip afterwards.
by wizzwizz4
3/8/2026 at 1:45:20 AM
Introspection: all heard. As a practical matter, you can rl or prompt inject information about the model into context and most major models do this, not least I expect because they’d like to be able to complain when that output is taken for rl by other model training firms.I agree that an intermediate non anthropomorphic but still looking at one’s own layers sort of situation isn’t in any architecture I’m aware of right now. I don’t imagine it would add much to a model.
Chip etching: yep. If you’ve never seen an unknown chip analyzed in anger, it’s pretty cool.
by vessenes
3/8/2026 at 1:50:21 AM
> I don't even know where to start parsing it.If it helps, the key part is: "that it is running on".
You can't use software to analyse images of disassembled chips that it is running on because disassembled chips can't run software!
A surgeon can learn about brain surgery by inspecting other brains, but the smartest brain surgeon in the world can't possibly figure out how many neurons or synapses their own brains have just by thinking about it.
Your meat substrate is inaccessible to your thoughts in the exact same manner that the number of weights, model architecture, runtime stack, CUDA driver version, etc, etc... are totally inaccessible to an LLM.
It can be told, after the fact, in the same manner that a surgeon might study how brains work in a series of lectures, but that is fundamentally distinct.
PS: Most ChatGPT models didn't know what they were called either, and tended to say the name and properties of their predecessor model, which was in their training set. Open AI eventually got fed up with people thinking this was a fundamental flaw (it isn't), and baked this specific set of metadata into the system prompt and/or the post-training phase.
by jiggawatts
3/7/2026 at 3:40:58 PM
Language models entirely lack introspective capacity. Expecting a language model to know what size it is is a category error: you might as well expect an image classifier to know the uptime of the machine it's running on.Language models manipulate words, not facts: to say they "lie" suggests they are capable of telling the truth, but they don't even have a notion of "truth": only "probable token sequence according to distribution inferred from training data". (And even that goes out the window after a reinforcement learning pass.)
It would be more accurate to say that they're always lying – or "bluffing", perhaps –, and sometimes those bluffs correspond to natural language sentences that are interpreted by human readers as having meanings that correspond to actual states of affairs, while other times human readers interpret them as corresponding to false states of affairs.
by wizzwizz4
3/7/2026 at 10:06:27 PM
Anthropic's mechanistic interpretation group disagrees with you - they see similar activations for 'hallucinations' and 'known lies' in their analyses. The paper is pretty interesting actually.So, you're wrong - you have a world view about the language model that's not backed up by hard analysis.
But, I wasn't trying to make some global point about AGI, I was just noting that the hallucinations produced by the model when I poked at it reminded me of model responses before the last couple of years of work trying to reduce these sorts of outputs through RL. Hence the "unapologetic" language.
by vessenes
3/7/2026 at 11:15:30 PM
Which paper? I've read all the titles and looked at a few from the past year, but it's not obvious which you're referring to.I did also, accidentally, find some "I tried the obvious thing and the results challenge the paper's narrative" criticism of one of Anthropic's recent papers: https://www.greaterwrong.com/posts/kfgmHvxcTbav9gnxe/introsp.... So that's significantly reduced my overall trust in this research team's interpretation of their own results – specifically, their assertions of the form "there must exist". (Several people in the comments there claim to have designed their own experiments that replicate Anthropic's claims, but none of the ones I've looked at actually do: they have even more obvious flaws, like arXiv:2602.11358 being indistinguishable from "the prompt says to tell a first-person story about an AI system gaining sentience after being given a special prompt, and homonyms are represented differently within a model".)
by wizzwizz4
3/8/2026 at 1:48:30 AM
I asked Gemini for a literature search and it came back with this:References Chen, R., Arditi, A., Sleight, H., Evans, O., & Lindsey, J. (2025). Persona Vectors: Monitoring and Controlling Character Traits in Language Models. arXiv. https://doi.org/10.48550/arxiv.2507.21509 Cited by: 97
Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDiarmid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., Khan, A., Michael, J., Mindermann, S., Perez, E., Petrini, L., Uesato, J., Kaplan, J., Shlegeris, B., Bowman, S. R., & Hubinger, E. (2024). Alignment faking in large language models. arXiv. https://doi.org/10.48550/arxiv.2412.14093 Cited by: 237
Templeton, A., Conerly, T., Marcus, J., Lindsay, J., Bricken, T., Chen, B., ... & Henighan, T. (2024). Mapping the Mind of a Large Language Model. Anthropic Transformer Circuits Thread. https://transformer-circuits.pub/2024/scaling-monosemanticit...
Gemini thinks it’s the mapping the mind paper but I thought it was more recent than that - I think mapping the mind was the original activation circuits paper and then it was a follow on paper with a toss off comment that I noted. I didn’t keep track of it though!
by vessenes