Trinity Large Thinking

4/2/2026 at 3:25:47 AM

Trinity Large Preview managed 17/25 on my agentic SQL benchmark: https://sql-benchmark.nicklothian.com/?#all-data which is a fairly mediocre score for a large model (Qwen 27B managed 23/25)

This non-Preview release scored 16/25. Probably the same model as the preview, or at least not particularly improved if you want agentic performance.

Good to see more options for large open models though!

It's hard to point definitively to a reason it underperforms but generally models that perform well at agentic tasks were trained on very large numbers of tokens (Qwen, frontier models) or were heavily post trained for reasoning (see eg Nemotron-Cascade-2-30B-A3B at 21/25 vs the base model Nemotron-3-Nano-30B-A3B-Base at 12/25 )

by nl

4/2/2026 at 5:30:27 AM

Bit of a tangent, but I'm pleased to see that Qwen 3.5 35B is tied with GPT-5.4 and just 2 points behind 4.6 Opus. That little model is so impressively capable and fast! I'm frequently still surprised that I have that level of capability and speed running locally on my laptop.

by anon373839

4/2/2026 at 5:38:35 AM

Nemotron-Cascade-2-30B-A3B is worth checking out too at the size - I found it even better than Qwen 3.5 35B! I ran it slowly but successfully on a 8GB 1070GTX with CPU offload.

by nl

4/2/2026 at 5:41:12 AM

Thanks for the tip! Hadn't seen that one.

by anon373839

4/2/2026 at 4:03:01 AM

They are repeating a million times on their huggingface page that the thinking output should be included in the conversation history for multiturn use. That makes me wonder, is this generally needed for LLMs? Because that implies that they only really function well on typicial multiturn flows; I'm experimenting with a completely different approach: there is still the main message stream in the context, but the agent can use structured means to exchange messages and interact with terminals and the file system in a statefull manner. The state is rendered to the context on every cycle, with the message history just being a "panel". I'm still in the middle of trying this out so I can't say yet if it will work. But I hope the models are flexible enough for this.

by edg5000

4/2/2026 at 4:50:25 AM

I've heard someone mention feeding back thinking when talking about gpt-oss-120, at the time that was the only evidence I could see that this is a thing.

by kristianp

4/2/2026 at 2:56:02 AM

This is one of the first high-performing fully open weight American models to my knowledge. Congrats! (insert American flag here)

by gslepak

4/2/2026 at 3:27:56 AM

The OG Lllama 3? GPT-OSS? NVidia Nemotron 3?

by nl

4/2/2026 at 4:42:28 AM

I think Facebook gets the credit there.

by gslepak

4/2/2026 at 2:58:04 AM

Maybe a better link: https://www.arcee.ai/blog/trinity-large-thinking

by wmf

4/2/2026 at 2:21:53 AM

The weights are on huggingface, surprisingly: https://huggingface.co/arcee-ai/Trinity-Large-Thinking

by kristianp

4/2/2026 at 8:29:48 AM

What's open about OpenRouter?

by continuational

4/2/2026 at 9:25:15 AM

the name

by vova_hn2

4/2/2026 at 7:10:37 AM

What's special about it?

by grimm8000

4/2/2026 at 10:20:36 AM

Looks like an incremental improvement, technically. Seems to benchmark around Kimi K2.5 but it's cheaper and faster.

by regularfry

4/2/2026 at 2:37:15 AM

That's crazy affordable. Promising! Maybe gonna give today's submission StepFun 3.5 Flash a run perhaps, who knows. https://news.ycombinator.com/item?id=47602879 https://app.uniclaw.ai/arena?tab=costEffectiveness&via=hn

by jauntywundrkind

4/2/2026 at 3:34:24 AM

[dead]

by Caum