alt.hn

6/14/2026 at 3:37:31 PM

Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model

https://github.com/nex-agi/Nex-N2/issues/4

by unrvl22

6/14/2026 at 6:37:11 PM

If you’re interested in how it could be possible to merge two models coherently, check out the The Universal Weight Subspace Hypothesis: https://arxiv.org/abs/2512.05117

I’ll just add that we have only begun to understand and exploit the fact that architecturally similar language models converge to a common low rank representation.

by refibrillator

6/14/2026 at 5:52:24 PM

> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.

I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

by hintymad

6/14/2026 at 6:13:02 PM

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

by woadwarrior01

6/14/2026 at 4:39:57 PM

Oh no, someone is profiting off of their work without proper attribution!?!?

by zinodaur

6/14/2026 at 5:32:55 PM

This is an open weights model based on other open weights models.

The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.

The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.

Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.

by Aurornis

6/14/2026 at 5:37:34 PM

Thanks for the factual clarification. This is so important when everyone already has their trigger finger on politics. Not meaning that politics are irrelevant here, see sister comment by jobim.

But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.

by moritzwarhier

6/14/2026 at 5:49:12 PM

How do they just splice two models together?

by iknowstuff

6/14/2026 at 5:55:37 PM

The Nex N2 model they merged is based on Qwen 3.5, so you can swap pieces of one into the other. They found a combination of the two that did well on some benchmarks and shipped it.

In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.

But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.

by Aurornis

6/14/2026 at 6:03:12 PM

Out of curiosity, how was it discovered? You would have to look for it to find this linear combination.

by ninja3925

6/14/2026 at 6:21:09 PM

Check the linked GitHub issue. They explain their process.

Scroll past the first issue to find it. It’s further down.

by Aurornis

6/14/2026 at 4:41:59 PM

Attribution isn't the relevant part. Lying about your lab's capabilities is.

by internet2000

6/14/2026 at 4:50:25 PM

That's also something all the AI companies have been doing.

by Planktonne

6/14/2026 at 5:07:41 PM

Lying about model capability is right now the lingua franca of the cloud AI business model, almost; they yes-and each other's lies because they are in a position of needing to generate interest, including going as far as needing to trigger regulatory capture.

(It's not news to anyone who has worked in sales-led businesses that salespeople are prone to believing the claims of other salespeople, I guess).

by dofm

6/14/2026 at 5:31:01 PM

But the whole game is lying and stealing isn't it?

by outside2344

6/14/2026 at 5:01:09 PM

leopards ate my face

by functionmouse

6/14/2026 at 5:01:27 PM

I do not see anyone lying.

The model card says:

> Post-trained from Qwen 3.5 397B

The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:

https://arxiv.org/abs/2510.05069

So the sources seem properly attributed.

They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".

by adrian_b

6/14/2026 at 5:32:00 PM

That's attribution to Qwen team.

There (is/was) no attribution to Nex team (they've released a model based on Qwen 3.5 397B as well).

As per OP link Nex claims that what Rio team released (so far) is just linear interpolation of weights between Nex and OG Qwen model. With no attribution to Nex and zero signs of Rio doing any training of their own.

by petu

6/14/2026 at 5:28:56 PM

Are you talking about the credit that was just updated an hour ago? lol

by 00index

6/14/2026 at 5:02:40 PM

[dead]

by clear-octopus

6/14/2026 at 5:05:55 PM

Are you new to the latest AI hype cycle? /s

by woadwarrior01

6/14/2026 at 5:06:30 PM

This is a pure scam on tax payer money. But what else would be expected?

by carlosjobim

6/14/2026 at 5:21:05 PM

Unlike the big companies who do this, which often are merely impure scams on tax payer money a little more downstream.

by jrm4

6/14/2026 at 6:02:59 PM

Companies that generate loads of corporation tax, income tax, and VAT revenue are the exact opposite of wastes of public money.

by philipallstar

6/14/2026 at 5:32:45 PM

Great, now we're defending embezzlement and fraud with public funds on HN, because we really really hate big business.

A child caught doing something bad will cry "but my friends also did it!", is that the level of reasoning hackers want to be at?

by carlosjobim

6/14/2026 at 6:03:22 PM

> Great, now we're defending embezzlement

I might be missing something, but I don’t see anyone defending the the scams.

by lostlogin

6/14/2026 at 5:57:23 PM

There are no hackers around here anymore. HN is mainly about business nowadays

by sdevonoes

6/14/2026 at 6:31:23 PM

HN has always discussed business

by dmix

6/14/2026 at 5:39:30 PM

That seems like a bad faith read to me. Nobody is defending it, just pointing out the irony / hypocrisy. Two things can be bad, and they can be related.

by blanched

6/14/2026 at 5:36:33 PM

What part of that said "defense?"

They can both be bad.

by jrm4

6/14/2026 at 5:01:48 PM

"Their work"? First you had the original content creators that did 99.99% of the work. Then you had the US companies bundle it up into a frontier LLM. Then "they" did the "work" of using the US model as a foundation for their own. So in the sense of doing 0.00001% of the actual work that went into their product, sure.

I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.

by bachmeier

6/14/2026 at 5:07:17 PM

That’s the joke.

by dghlsakjg

6/14/2026 at 5:11:48 PM

That joke really went over your head, huh...

by JoshStrobl

6/14/2026 at 5:08:59 PM

That was the joke of the parent comment.

by bwilliams18

6/14/2026 at 5:08:43 PM

It is only a problem if you claim it to be an independently developed OS with no attribution to base

by harikb

6/14/2026 at 5:15:54 PM

Oof this is delete your post level I think. Sorry bud, I been there.

by idiotsecant

6/14/2026 at 3:37:31 PM

The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.

by unrvl22

6/14/2026 at 5:34:48 PM

I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)

by DonsDiscountGas

6/14/2026 at 5:14:59 PM

So the problem isn’t in the missing attribution to Qwen, but with the fact that they didn’t mention Nex-N2 Pro right?

by Lucasoato

6/14/2026 at 5:34:46 PM

The problem is that they claimed to have made a big achievement with their home grown post training, and they expected to receive a lot of praise for it.

Then researchers looked at the weights and there is no post training at all.

They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.

by Aurornis

6/14/2026 at 5:06:39 PM

[dead]

by clear-octopus

6/14/2026 at 6:25:11 PM

Nex in turn is also based on qwen so don’t think they’re too far off

by Havoc

6/14/2026 at 6:02:41 PM

Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).

by jordz

6/14/2026 at 6:08:36 PM

This is a good starting point: https://huggingface.co/docs/peft/developer_guides/model_merg...

But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.

I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.

by calebkaiser

6/14/2026 at 4:39:11 PM

The model's webpage at https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B says it's a merge now. It previously didn't contain this paragraph:

>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.

Incidentally are people using Github issues as blogs now?

by AlienRobot

6/14/2026 at 5:46:13 PM

It wasnt framed as an issue which is the norm breakage I think you’re reacting to, as in they didnt ask that the readme be updated etc, but it is common now for folks to use a project’s issue tracker to name and shame them in a place they cant easily ignore.

Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).

But surely you can see the optics reason why someone would take their complaint to the repo directly? It pressures the maintainers to respond, it allows for a pile on from the internet, and makes any decision to lock down a hostile thread into its own kind of statement.

The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.

by jonchurch_

6/14/2026 at 6:21:26 PM

But this is posted on Nex's GitHub, not on "Rio de Janeiro's" GitHub.

i.e. this is the maintainer posting on their own GitHub Issues.

by ChoosesBarbecue

6/14/2026 at 5:23:21 PM

“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”

-- Bill Gates

by jrm4

6/14/2026 at 5:57:08 PM

What’s more funny to me is the set up to that quote:

> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.

And what’s more interesting is the conclusion:

> Apple filed a monumental copyright lawsuit against Microsoft in 1988, but they eventually lost on a technicality (the judge ruled that Apple inadvertently gave Microsoft a perpetual license to the Mac user interface in November 1985).

Microsoft didn’t steal Apple’s GUI … Apple gave it to them.

by ckcheng

6/14/2026 at 4:56:05 PM

I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?

by fkozlowski

6/14/2026 at 6:26:42 PM

Merges and fine tunes are within reach of individuals with some money to burn so I’m sure a muni can do it

by Havoc

6/14/2026 at 4:52:22 PM

Not surprised

by MadrasTh0rn

6/14/2026 at 6:05:45 PM

why not?

by nom

6/14/2026 at 6:13:59 PM

It is a recurrent Brazilian meme: Rio is known in Brazil as "terra de bandido" (gangster's land).

The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.

It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.

by diego_moita

6/14/2026 at 4:52:12 PM

One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.

by ekjhgkejhgk

6/14/2026 at 4:57:38 PM

You just described every single vibe coder...

by root-parent

6/14/2026 at 5:22:17 PM

I wouldn’t describe what happened here as incompetence. As a “carioca”, I am pleasantly surprised to know that the government’s IT department is involved in AI work — even without the budget to create its own models from scratch.

by thimabi

6/14/2026 at 5:29:17 PM

This seems kind of insane though, every time I go to Rio I think of the potential of AI/technology to solve some problems and leave it even more paradisiacal... But working on their own model? Wtf? There are a million applications of existing ones there that should be followed up on instead.

by arcticfox

6/14/2026 at 5:09:30 PM

Why would they care? They get their salaries and pensions and bonuses, and the tax payer is footing the bill.

by carlosjobim

6/14/2026 at 4:32:30 PM

This is fascinating that it worked though. Can we just merge all the open weight models and get something better?

by AnotherGoodName

6/14/2026 at 5:58:06 PM

If you go to Civitai this is pretty how it works in that corner of the image generation world

Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints

by nylonstrung

6/14/2026 at 4:43:06 PM

I imagine it'd work the same as merging all the good-tasting foods to get an even tastier one

by wds

6/14/2026 at 5:05:40 PM

most merge improve a small subset of "feeling" benchmark (too small, too specific, or out of distribution) and tend to show degradation on actual benchmark, with especially punishing result on long chain benchmarks.

also only work on matching architectures (i.e. finetunes/loras of the same model)

by avereveard

6/14/2026 at 4:59:00 PM

that kinda worked in llama 1/2 era, not between different models but between finetunes of the same model. the briefly legendary Mythomax was IIRC a merge of 5+ tunes, some of which were merges themselves.

by dindunuf

6/14/2026 at 4:38:05 PM

No, they need the same arch, but you can distill them into a single model. And yes, if you use the API directly Claude will often say it’s an open weight model (likely the ones it was distilled from)

by _3u10

6/14/2026 at 4:59:10 PM

Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?

Its a fine tune of Qwen

Not a conspiracy

by yieldcrv

6/14/2026 at 5:18:30 PM

The allegation here is that it's not actually a fine-tune of Qwen, but instead an undisclosed mashup (merge) of someone else's fine-tune of Qwen and the original model. Rio subsequently said that the model was in fact a merge, that they did additional fine-tuning after the merge, and that they accidentally uploaded the base merge instead of the version with additional fine-tuning. But this seems like quite an oversight...

by daemonologist

6/14/2026 at 5:14:25 PM

[dead]

by Aurornis

6/14/2026 at 5:16:01 PM

[dead]

by antii

6/14/2026 at 6:10:03 PM

WHAT!? There are thieves in Rio de Janeiro?

Oh, I am so SHOCKED, so SHOCKED! /s

Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).

Kinda like Chicago in the 20's or Naples and Palermo in the 90s.

by diego_moita

6/14/2026 at 4:28:11 PM

[flagged]

by elzbardico

6/14/2026 at 4:37:26 PM

Without evidence, your comment is just bad mouthing.

I have been involved in academia, including in Brazil, and I don't find academia there any more copycat than any other institution, including top tier ones.

by guiraldelli

6/14/2026 at 5:09:46 PM

This was a municipality working with a government associated IT company.

What does it have to do with Brazilian academia?

by dghlsakjg

6/14/2026 at 4:38:56 PM

No, typically Brazilians go to Paraguay for their education, most of their technology comes from Paraguay too.

by _3u10

6/14/2026 at 5:14:51 PM

that's just a lie lol, stop spreading misinformation

by knuppar

6/14/2026 at 4:55:37 PM

What? Never heard of this

by cassiogo

6/14/2026 at 5:18:05 PM

That sounds like nonsense, they don't even speak the same language in Brasil and Paraguay …

by stymaar

6/14/2026 at 4:45:12 PM

Wasn’t it already obvious given the awfully familiar parameter numbers?

by alfiedotwtf

6/14/2026 at 6:30:22 PM

That only tells what base architecture they used, but fine tuning does not increase the number of weights, it just adapts the weights to improve better on a fine tuning dataset- something they claimed they had done

by intoXbox