Our eighth generation TPUs: two chips for the agentic era

4/22/2026 at 2:06:09 PM

I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.

I feel like google will surprise everyone with a model that will be an entire generation beyond SOTA at some point in time once they go from prototyping to making a model that's not a preview model anymore. All models up till now feel like they're just prototypes that were pushed to GA just so they have something to show to investors and to integrate into their suite as a proof of concept.

by himata4113

4/22/2026 at 6:14:05 PM

> If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

I really doubt it, especially Pro. If anything I wouldn't be surprised if their hardware lets them run bigger models more cheaply and quickly than the others. Pro is probably smaller than GPT 5.4 and Opus 4.6 (looks like 4.7 decreased in size), but 5x seems way too much. IMO Gemini 3 Pro is the most "intelligent" in an all-round human way. Especially in the humanities. It's highly knowledgeable and undeniably the number one model at producing natural text in a large number of (human!) languages. The difference becomes especially large for more niche languages. That does not suggest a smaller model, more the opposite. The top 4 models at multilinguality are all Google : 1. 3 Pro 2. 3 Flash 3. 2.5 Pro 4. 2.5 Flash. Even the biggest OpenAI and Anthropic models can't compete in that dimension.

It's definitely weaker at math and much worse at agentic things. Gemini chat as an app is also lightyears behind, it's barely different from ChatGPT at release over 3 yeaes ago. These things make it feel much weaker than it is.

by deaux

4/22/2026 at 7:06:01 PM

Regarding Anthropic, they used to make best multilingual and generalist models, it's their policy thing, not a capability issue. Claude 3 was best at this, including dead and low-resource languages. Neither modern Claude nor Gemini are remotely close to what Claude 3 was capable of (e.g. zero-shot writing styles). Anthropic basically reversed their "character training" policy and started optimizing their models for code generation at the cost of everything else, starting with Sonnet 3.5. Claude 4 took a huge hit in multilingual ability

GPT, on the other hand, was always terrible at languages, except for the short-lived gpt-4.5-preview.

All modern models including Gemini have bugs in basic language coherency - random language switching, self-correction attempts resulting in hallucinations etc. I speculate it's a problem with heavy RL with rewards and policies not optimized for creative writing.

by orbital-decay

4/23/2026 at 10:56:30 PM

I've never ever had Gemini over the API switch languages in translation tasks and that's across more than 10 language pairs and 6 figures of calls, across both short and long outputs. Maybe your languages are even lower resource ones, though we do include Central Asian languages.

The Chinese models are very prone to it, they love to mix them up.

I've seen it in chat, but IMO that's more of a system prompt/harness issue.

I'll admit I don't remember Claude 3, the oldest data I have seems to be 3.5. And at that time Gemini 1.5 Pro did a much better job across all of our language pairs, it wasn't close.

by deaux

4/23/2026 at 2:38:22 AM

This always bothers me because models will almost never see text that is mostly English with a little other language in training data (opposite happens of course) and certainly not in RL data. Why do they occasionally language switch?

by rao-v

4/22/2026 at 8:46:03 PM

The benchmarks don’t seem to say that language ability has gotten worse?

by awongh

4/23/2026 at 10:49:37 PM

There are no real benchmarks of how "natural/idiomatic" output is in a multitude of languages.

"Multilingual benchmarks" are usually something like "How good is it at a multiple choice exam like the SAT in language X". This is a completely unrelated metric.

by deaux

4/22/2026 at 9:25:30 PM

That's the thing with benchmarks, without evals and actual hands-on experience they can give you false confidence. Claude now sounds almost clinical, and is unable to speak in different styles as easily. Claude 4+ uses a lot more constructions borrowed from English than Claude 3, especially in Slavic languages where they sound unnatural. And most modern models eventually glitch out in longer texts, spitting a few garbage tokens in a random language (Telugu, Georgian, Ukrainian, totally unrelated), then continuing in the main language like nothing happened. It's rare but it happens. Samplers do not help with this, you need a second run to spellcheck it. This wasn't a problem in older models, it's a widespread issue that roughly correlates with the introduction of reasoning. Another new failure mode is self-correction in complicated texts that need reading comprehension: if the model hallucinates an incorrect fact and spots it, it tries to justify or explain it immediately. Which is much more awkward than leaving it incorrect, and also those hallucinations are more common now (maybe because the model learns to make those mistakes together with the correction? I don't know.)

by orbital-decay

4/22/2026 at 9:45:24 PM

Not disputing this might be true, but this seems like something that should be capturable in a multi-lingual benchmark.

Maybe it's just something that people aren't bothered with?

by awongh

4/22/2026 at 10:29:27 PM

Basically everyone who experiments with creative writing is keenly aware of that (e.g. roleplayers), it's just the devs that have the experience training the models for it (Anthropic, DeepMind) aren't bothered doing this anymore since there's no money in it.

>this seems like something that should be capturable in a multi-lingual benchmark

Creative writing benchmarks just don't have good objectives to measure against. In particular, valid but inauthentic language constructions can't be captured well if your LLM judge lacks fidelity to capture it to begin with. Which is I think what typically happens.

An easy litmus test would be making a selected character in a story speak Ebonics or Haitian Creole or TikTok. Claude 3 Opus was light years ahead of any model in authenticity in using them, and it was immediately obvious in a side-by-side comparison with any model including Claude 3.5+. Nuances of Polish or Russian profanities/mat or British obscenities are always the hardest for any model (they tend to either swear like dockers or tone it down, lacking the eloquence), but Opus 3 was also ahead in any of those.

by orbital-decay

4/23/2026 at 3:24:27 AM

Btw samplers do in fact help with this. Random tokens deep in your output context are due to accumulated sampling errors from using shit samplers like top_p and top_k with temperature.

Use a full distribution aware sampler like p-less decoding, top-H, or top-n sigma, and this goes away

Yes the paper for this will be up for review at NeurIPS this year.

by Der_Einzige

4/23/2026 at 3:36:20 AM

3/3.1 Pro appears to have knowledge about eccentric topics with no obvious sources that often turns out to be right.

It does hallucinate a lot though, and is the most affected by context rot in multi-turn conversations

by blueblisters

4/23/2026 at 11:00:57 PM

Agreed on both, especially hallucination. That's what makes their chat app even worse, it's very opaque about web search and sources, so you can't tell whether it's a hallucination.

by deaux

4/22/2026 at 6:56:53 PM

Aistudio should be their default app

by algoth1

4/22/2026 at 9:14:17 PM

generally speaking

ultra ~ mythos ~ gpt-4.5 ~ 4x behemoth

pro ~ opus ~ 2x maverick

flash ~ sonnet ~ scout ~ other 20-30b active Chinese models

by ahmadyan

4/22/2026 at 2:17:09 PM

> They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.

Agreed, Gemini-cli is terrible compared to CC and even Codex.

But Google is clearly prioritizing to have the best AI to augment and/or replace traditional search. That's their bread and butter. They'll be in a far better place to monetize that than anyone else. They've got a 1B+ user lead on anyone - and even adding in all LLMs together, they still probably have more query volume than everyone else put together.

I hope they start prioritizing Gemini-cli, as I think they'd force a lot more competition into the space.

by onlyrealcuzzo

4/22/2026 at 3:08:24 PM

> Agreed, Gemini-cli is terrible compared to CC and even Codex.

Using it with opencode I don't find the actual model to cause worse results with tool calling versus Opus/GPT. This could be a harness problem more than a model problem?

I do prefer the overall results with GPT 5.4, which seems to catch more bugs in reviews that Gemini misses and produce cleaner code overall.

(And no, I can't quantify any of that, just "vibes" based)

by JeremyNT

4/22/2026 at 4:28:01 PM

I wonder what I am missing, because I can use gemini-cli with English descriptions of features or entire projects and it just cranks out the code. Built a bunch of stuff with it. Can't think of anything it's currently lacking.

by rjh29

4/22/2026 at 4:38:09 PM

>> Can't think of anything it's currently lacking.

Speed? The pro models are slow for me

The model 3.1 pro model is good and i don't recognise the GP's complaint of broken tool calls but i'm only using via gemini cli harness, sounds like they might be hosting their own agentic loop?

by CraigJPerry

4/22/2026 at 5:42:37 PM

Same. I've built dozens of small tools and scripts and never felt the need to try something else.

by xnx

4/22/2026 at 3:01:35 PM

also, for incorporating into gsuite, youtube, maps, gcp and their other winning apps and behind-the-scenes infra...

by asah

4/22/2026 at 4:44:22 PM

I thought the same for a long time, borderline unusable with loops/bizarre decisions compared to Claude Code and later Codex.

But I picked it up again about a month ago and I have been quite impressed. Haven’t hit any of those frustrating QoL issues yet it was famous for and I’ve been using it a few hours a day.

Maybe it will let me down sooner or later but so far it has been working really well for me and is pretty snappy with the auto model selection.

After cancelling my Claude Pro plan months ago due to Anthropic enshittification I’ve been nervous relying solely on Codex in case they do the same, so I’ve been glad to have it available on my Google One plan.

by toraway

4/22/2026 at 3:28:31 PM

Not only that, google has an advange because they don't need to always generate a response.

When a lot of people ask the same thing they can just index the questions, like a results on the search engine and recalculate it only so often,

by Iulioh

4/22/2026 at 5:59:08 PM

Google doesn't need to give a shit, because so much of the internet is infested with with google ad trackers and adwords, and everybody uses Chrome, that they will continue to make billions even without AI. Facebook did the same with their pixel so they could soak up data.

Gemini will be dead in 2 years and there'll be something else, but the ad and search company will remain given that they basically own the world wide web.

Except now, so much of the WWW is filled with AI slop that it breaks the system.

by ljm

4/23/2026 at 1:20:05 AM

Which ever shitty model they’re using for search is so much better than the free offerings from the other companies. It’s not even close. It’s not going anywhere.

by what

4/22/2026 at 3:38:35 PM

IIRC when Gemini 3 Pro came out it was considered to be just about on par with whatever version of Claude was out then (4?). Now Gemini 3 is looking long in the tooth. Considering how many Chinese models have been released since then, and at least 2 or 3 versions of Claude, it's starting to look like Google is kind of sitting still here. Maybe you're right and they'll surprise us soon with a large step improvement over what they currently have. Note: I do realize that there's been a Gemini 3.1 release, but it didn't seem like a noticeable change from 3.

by UncleOxidant

4/22/2026 at 6:23:48 PM

As other people are saying here: the Gemini models are mostly terrible at tool use and long context management. And maybe not quite as good with finicky "detail" parts of coding generally.

Where they excel is just total holistic _knowledge_ about the world. I don't like "talking" to it, because I kind of hate its tone, but I find Gemini generally extremely useful for research and analysis tasks and looking up information.

by cmrdporcupine

4/22/2026 at 8:37:57 PM

People who say Gemini is bad at long contexts are so wrong.

You can put whole 50,000 - 70,000 LOC codebase into Gemini 3.1 Pro context making it 800,000+ tokens, give it detailed task and ask for whole changed files back and it will execute it sometimes in one shot, sometimes in two. E.g depend on whatever stack you work with let you see all the errors at once so it can fix everything on single reply.

Yes it will give you back 5-15 files up to 4000 LOC total with only relevant parts changed.

This is terrible inefficient way to burn $10 of tokens in 20 minutes, but attention and 1:1 context retention is truly amazing.

PS: At the same time it is bad at tool use, but this have nothing to do with context.

by SXX

4/23/2026 at 2:44:19 AM

This! And with AI studio you get a couple of free calls per day (it has gotten less and less). I have had days where I would be able to get 100 USD worth of tokens from AI studio for free. 1m tokens in and great code out.

by oezi

4/23/2026 at 3:27:36 AM

You can even turn most of the censorship off in the AI studio (but not the hidden top_k of 64 they force in there).

AI studio is where you go if you want an actually good mostly uncensored model. Gemini 3.1 is fully and somehow still quietly coomer approved.

by Der_Einzige

4/22/2026 at 6:31:45 PM

Gemini had the best long context support for the longest time, and even now at >400k tokens it's still got the best long context recall.

Gemini is just not trained for autonomy/tool use/agentic behavior to the same degree as the other frontier models. Goog seems to emphasize video/images/scientific+world knowledge.

by CuriouslyC

4/22/2026 at 6:36:24 PM

My experience is it advertises large context and then just becomes incoherent and confused as it climbs to fill that context.

e.g. it sucks at general tool use but sucks even more at it after a chunk of time in a session. One frustrating situation is to watch it go into a loop trying and failing to edit source files.

I often wonder how my old coworkers from Google get by, if this is the the agentic coding they have available to them for working on projects on Google3. But I suspect the models they work with have been fine tuned on Google's custom tooling and perform better?

by cmrdporcupine

4/22/2026 at 4:29:46 PM

Their "preview" naming is pretty arbitrary. It's just their way to avoid making any availability or persistence promises, let alone guarantees. It's also a PR tactic to mask any failures by pretending it's beta quality.

by orbital-decay

4/23/2026 at 3:44:14 AM

> the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

The rumor is that Gemini Pro is the largest model being served today (or at least was prior to Mythos)

Source: some podcast where they were discussing TPU vs Nvidia cluster topologies, and how Google is exploiting their topology to allow this. But I can't remember exactly which podcast, so hopefully someone else will know.

by nl

4/22/2026 at 8:16:23 PM

I really wonder what I’m missing with Gemini. It’s a second rate model for me at best. I find it okay (not great) at collecting information and completely useless at agentic tasks. It’s like it’s always drunk. When the Claude credits expire in Antigravity, I’m done for the day.

> They produce drastically lower amount of tokens to solve a problem

I LOLed at this because I of the constant death loops that don’t even solve the problem at all.

by solarkraft

4/23/2026 at 2:16:21 AM

Yah it doesn't even make sense how they got through their benchmarks without death loops. Gemini-cli even has a hotfix to break the model of such death-loops. But if you were to ignore this bug/quirk that will be fixed in the next patch release my point still stands.

by himata4113

4/23/2026 at 1:08:17 AM

i get much better results with it using a different toolset. give it serena and it mostly works, and is less likely to hit a death loop.

i feel like the geminicli app is missing some tools for making sure the session history is actually valid

by 8note

4/22/2026 at 4:54:52 PM

Am I tripping or is this an AI reply? Like it barely has anything to do with the article other than both are related to AI

by big-chungus4

4/22/2026 at 9:51:50 PM

An AI reply would be more relevant to the headline / article, humans often write something tangential since we have more going on in our head and not just the context at hand while AI can't ignore context.

by Jensson

4/23/2026 at 2:14:07 AM

Google uses these chips to create gemini, I simply used this as an excuse to rant and predict the future.

by himata4113

4/22/2026 at 7:31:53 PM

> a model that will be an entire generation beyond SOTA

That model would then be SOTA.

Tautologically you can't be better than SOTA

by robocat

4/23/2026 at 2:15:20 AM

SOTA at that time*

by himata4113

4/22/2026 at 4:47:10 PM

Interesting mix of words: "I felt" -> "proved" -> "guess". One of those is not like the others!

by mrcwinn

4/23/2026 at 2:17:35 AM

I guess I felt pretty uncertain that day which proved that a lack of sleep is bad for your mental cognition.

by himata4113

4/22/2026 at 2:17:40 PM

[flagged]

by ALLTaken

4/22/2026 at 3:13:01 PM

Is your friend on the JAX team?

by _boffin_

4/22/2026 at 3:17:02 PM

I'm really struggling with terrible bloating today, but I deemed it too dangerous to release.

by neonstatic

4/22/2026 at 4:43:36 PM

Thank you for your sacrifice. Could you speak to my dog please? You may wish to yell from a distance, actually.

by tclancy

4/22/2026 at 1:53:05 PM

Whats interesting to note, as someone who uses Gemini, ChatGPT, and Claude, is that Gemini consistently uses drastically fewer tokens than the other two. It seems like gemini is where it is because it has a much smaller thinking budget.

It's hard to reconcile this because Google likely has the most compute and at the lowest cost, so why aren't they gassing the hell out of inference compute like the other two? Maybe all the other services they provide are too heavy? Maybe they are trying to be more training heavy? I don't know, but it's interesting to see.

by WarmWash

4/22/2026 at 2:02:42 PM

I've been trying Gemini Pro using their $20-ish Goole One subscription for a couple of months, and I also find it consistently does fewer web searches to verify information than say ChatGPT 5.4 Pro which I have through work.

I was planning on comparing them on coding but I didn't get the Gemini VSCode add-in to work so yeah, no dice.

The Android and web app is also riddled with bugs, including ones that makes you lose your chat history from the threads if you switch between them, not cool.

I'll be cancelling my Google One subscription this month.

by magicalhippo

4/22/2026 at 2:31:14 PM

I don't sweat sources and almost never check them. I usually prefer to manually check information after it's provided, to prevent the model from borking it's context trying to find sources that justify it's already computed output. Almost all the knowledge is already baked into the latent space of the model, so citing sources generally is a backwards process.

I see it like going to the doctor and asking them to cite sources for everything they tell me. It would be ridiculous and totally make a mess of the visit. I much prefer just taking what the doctor said on the whole, and then verifying it myself afterwards.

Obviously there is a lot of nuance here, areas with sparse information and certainly things that exist post knowledge cut-off. But if I am researching cell structure, I'm not going to muck up my context making it dig for sources for things that are certainly already optimal in the latent space.

by WarmWash

4/23/2026 at 2:46:48 AM

Well, I prefer it actually check datasheets so it doesn't go on a wild rabbit hunt to nowhere, since the capabilities it hallucinated for the chip in question doesn't exist.

by magicalhippo

4/23/2026 at 4:41:34 AM

In my experience, they all do this with dathasheets. Even if they read the actual datasheet, they misunderstand them gravely. I can't relie on them to do unusual setups or chaining stuff properly. It's true I did these attempts a couple of months ago, maybe they're better now.

by RealityVoid

4/23/2026 at 3:56:35 AM

Good luck all-in.

But seriously what are you doing that this works? I guess if you are writing pop culture articles this might work.

For anything where the output has consequences I can’t imagine finding success like this.

by svnt

4/23/2026 at 2:34:44 PM

The latent space knowledge that the models have is stronger than the inference agent going out and trying to find information to integrate into context.

If you ask why the sky is blue, the model already has the answer. It's corrosive to your conversation to pull a bunch of unknown sources into context so the model can appease your "feels right" request.

If you don't trust the answer, your brain is still way way better at quickly scanning sources to verify the answer.

But the fact of the matter is that these models went from stumbling over "9 + 7 =" three years ago to solving erdos problems today. And benchmarks (that are so saturated we don't even both with them anymore) reveal that the models basically all have total encyclopedic knowledge of every major career field. Which also makes sense because the labs have been purposely drilling hard on building pristine datasets of all this knowledge.

I would challenge you to find one firmly established general academic question that a SOTA model gets wrong. Good luck.

by WarmWash

4/23/2026 at 9:36:03 PM

I use it claude and gemini all the time and they get more advanced theory, motivation, and history wrong all the time.

If you aren’t seeing the errors it is because you are in some really mainstream conversations or because you don’t know what they are saying that is wrong.

This is trivial to demonstrate to yourself for any nontrivial project. A single academic question is easy to get the right answer for. That is not the dominant AI use case for most product people or engineers.

by svnt

4/22/2026 at 4:09:17 PM

You're supposed to download the antigravity VSCode fork and use that and it's rough at best I think. Hey free opus tokens though.

by hypercube33

4/22/2026 at 1:58:18 PM

They just released their enterprise agentic platform today so my expectation is that might be the gravity well for the Fortune 500's to park their inference on.

by RationPhantoms

4/22/2026 at 1:57:18 PM

They have to have SOME competitive advantage. What reason is there to use Gemini over Claude or ChatGPT? It's not producing nearly the quality of output.

by someguyiguess

4/22/2026 at 2:21:21 PM

I recently did my taxes using all three models (My return is ~50 pages, much more than a standard 1040).

GPT (codex) was accurate on the first run and took 12 minutes

Gemini (antigravity) missed 1 value because it didn't load the full 1099 pdf (the laziness), but corrected it when prompted. However it only spent 2 minutes on the task.

Claude (CC) made all manner of mistakes after waiting overnight for it to finish because it hit my limit before doing so. However claude did the best on the next step of actually filing out the pdf forms, but it ended up not mattering.

Ultimately I used gemini in chrome to fill out the forms (freefillableforms.com), but frankly it would have been faster to manually do it copying from the spreadsheets GPT and Gemini output.

I also use anti-gravity a lot for small greenfield projects(<5k LOC). I don't notice a difference between gemini and claude, outside usage limits. Besides that I mostly use gemini for it's math and engineering capabilities.

by WarmWash

4/22/2026 at 7:38:01 PM

Yep, I've found Gemini to be the best LLM at most tasks that are not coding. Sometimes Opus wins for engineering, but Gemini holds its own there as well. I also used Gemini to assist me with understanding the details of my (pre-revenue) C-Corp taxes this year. It did a pretty good job walking me through each question I had and raising concern about things I might have overlooked. I validated everything against reliable sources, of course.

Gemini missed on some nuances about the paperwork processes of Delaware. Gemini repeatedly assumed I could do something instantly via an online portal that actually required either snail-mail or the use of an intermediate who actually had API access to Delaware's systems. In the end, these processes took a couple days, and while I got things done in time, I wish I had not taken questions of process at face value, and instead wish I had kicked off the taxes at the end of February rather than week before they were due.

by nerdsniper

4/22/2026 at 2:03:59 PM

Well comparing Gemini 3.1 Pro vs ChatGPT 5.4 Pro, it's much faster at replying. Of course, if it actually thinks less then that helps a lot towards that. For most of my personal and work use-cases, I prefer waiting a bit longer for a better answer.

by magicalhippo

4/22/2026 at 4:20:47 PM

I'm 50% convinced that the main lift in GLM-5 over GLM-4.7 was that it was much more willing to use tokens. I had the hardest time getting 4.7 to read enough source code to actually know what it was doing, but once I convinced it to read, it was pretty capable.

Being thrifty can be good! But it also can mean your system is not reflecting sufficiently, is not considering enough factors, isn't reading enough source code.

We are still firmly in "who really knows" territory. I have mixed feelings about token spendiness vs thrift, is all.

by jauntywundrkind

4/22/2026 at 4:49:50 PM

I’m pretty convinced that this is what Kimmi 2.6 does, mostly just thinks more.

by mchusma

4/22/2026 at 12:47:45 PM

> A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory.

This seems impressive. I don't know much about the space, so maybe it's not actually that great, but from my POV it looks like a competitive advantage for Google.

by TheMrZZ

4/22/2026 at 2:01:24 PM

it is. itll still not create AGI without some breakthrough in instruction vs data separation of concerns

by cyanydeez

4/22/2026 at 11:43:19 PM

In what way do we not already have AGI?

by scottyah

4/23/2026 at 1:31:49 AM

in all of them

by airstrike

4/23/2026 at 3:30:10 AM

Peter norvig is smarter than you are: https://www.noemamag.com/artificial-general-intelligence-is-...

by Der_Einzige

4/23/2026 at 3:39:07 AM

Does non-artificial intelligence have clean instruction/data separation?

by omcnoe

4/22/2026 at 3:09:51 PM

You can park a lot there. No offence but I love how AGI doesn't mean anything. It used to be that AI was a goal post. Now it is AGI. We could use characters from sci-fi culture to describe milestones. In order to achieve robocop level, we must solve the instruction vs data problem.

by lugu

4/22/2026 at 3:13:48 PM

Thus it always was. I’m old enough to remember when “if AI could beat a grandmaster at chess” was considered the finish line.

by lokar

4/22/2026 at 5:12:31 PM

Over fitting to the benchmarks since 1996

by sanex

4/23/2026 at 1:10:45 AM

Well, yeah… turns out that goal wasn’t a good indicator for AGI, so we re-evaluated. That’s changing your hypothesis in the face of evidence, not “moving the goalposts” in the fallacious sense.

by emp17344

4/23/2026 at 3:10:44 AM

What’s the indicator for AGI now? We are so far past the Turing Test it isn’t funny. In fact the models now are too intelligent, you would never think a human would have that much knowledge quickly about a subject you chose at random.

by Mistletoe

4/22/2026 at 3:46:45 PM

Robocop was a human brain in a suit! Don't give them any ideas!

by lowbloodsugar

4/22/2026 at 1:19:19 PM

"TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation" sounds impressive especially as the previous generation is so recent (2025).

Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

by fulafel

4/22/2026 at 2:43:58 PM

That training is compute-bound and inference is memory-bound is well-known, but I don't think Nvidia deployments typically specialize for one vs the other.

One reason is that most clouds/neoclouds don't own workloads, and want fungibility. Given that you're spending a lot on H200s and what not it's good to also spend on the networking to make sure you can sell them to all kinds of customers. The Grok LPU in Vera Rubin is an inference-specific accelerator, and Cerebras is also inference-optimized so specialization is starting to happen.

by mrlongroots

4/22/2026 at 2:32:45 PM

I can't answer for NVIDIA but AWS has its own training and inference chips, and word on the street is the inference chips are too weak, so some companies are running inference on the training chips.

by electroly

4/22/2026 at 8:22:54 PM

They stopped producing Inferentia altogether and are only investing in Trainium now. They also announced a partnership with Cerebras not long ago. That should give you a clue.

https://www.cerebras.ai/press-release/awscollaboration

by otterley

4/22/2026 at 3:59:52 PM

> Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

Dedicated hardware will usually be faster, which is why as certain things mature, they go from being complicated and expensive to being cheap and plentiful in $1 chips. This tells me Google has a much better grasp on their stack than people building on NVidia, because Google owns everything from the keyboard to the silicon. They've iterated so much they understand how to separate out different functions that compete with each other for resources.

by burnte

4/22/2026 at 1:40:59 PM

The "training" chips will probably be quite usable for slower, higher-throughput inference at scale. I expect that to be quite popular eventually for non-time-sensitive uses.

by zozbot234

4/22/2026 at 1:25:07 PM

Vera Rubin will have Groq chips focused on fast inference so it points toward a trend. Also, with energy needs so high, why not reach for every feasible optimization?

by dataking

4/22/2026 at 1:26:42 PM

Nvidia said in March that they're working on specialized inference hardware, but they don't have any right now. You can do inference from Nvidia's current hardware offerings, but it's not as efficient.

by xnx

4/22/2026 at 1:38:13 PM

AMD has been doing inference chips for many years and are the leader for HPC.

https://www.amd.com/en/products/accelerators/instinct.html

by FuriouslyAdrift

4/22/2026 at 1:06:57 PM

At this point, when you are doing big AI you basically have to buy it from NVidia or rent it from Google. And Google can design their chips and engine and systems in a whole-datacenter context, centralizing some aspects that are impossible for chip vendors to centralize, so I suspect that when things get really big, Google's systems will always be more cost-efficient.

(disclosure: I am long GOOG, for this and a few other reasons)

by pmb

4/22/2026 at 1:52:38 PM

I'd go long Google too if using Gemini CLI felt anything close to the experience I get with Codex or Claude. They might have great hardware but it's worthless if their flagship coding agent gets stuck in loops trying to find the end of turn token.

by akersten

4/22/2026 at 2:02:31 PM

Gemini CLI isn't a great product unfortunately. While it's unfortunately tied to a GUI, antigravity is a far superior agent harness. I suggest comparing that to Claude code instead.

by surajrmal

4/22/2026 at 4:02:12 PM

Bad software kills good hardware.

And the converse is true also. I mean, look at NVIDIA. For the longest time they were just a gaming card company, competing with AMD. I remember alternating between the two companies for my custom builds in the 90s and it basically came down to rendering speed and frame rate.

But Jensen bet on the "compute engine" horse and pushed CUDA out, which became the defacto standard for doing fast, parallel arithmetic on a GPU. He was able to ride the BitCoin wave and then the big one, DNNs. AMD still hasn't caught on yet (despite 15 years having gone by).

by mlmonkey

4/22/2026 at 4:40:20 PM

I make the mistake of thinking its 2020 as well. CUDA was announced 2006 and released Feb 2007. So its actually 20 years that AMD/RADEON hasn't caught on that they need a good software stack.

by ecshafer

4/22/2026 at 3:48:58 PM

Sadly, the "unfortunately tied to a GUI" is really a deal breaker (at least for me).

by horsawlarway

4/22/2026 at 3:52:47 PM

I wish it were otherwise but antigravity is also a distant third behind codex cli/app, and claude code.

3.1 pro is just fundamentally not on the same level. In any context I've tried it in, for code review it acts like a model from 1yr ago in that it's all hallucinated superficial bullshit.

Claude code is significantly less likely to produce the same (yet still does a decent amount). Gpt 5.4 high/xhigh is on another level altogether - truly not comparable to Gemini.

by virgildotcodes

4/22/2026 at 2:30:23 PM

I use Claude Code all day and use Gemini CLI for personal projects and I don't see the huge gap that other people seem to talk about a lot. Truthfully there are parts of Gemini CLI I like better than Claude Code.

by VectorLock

4/23/2026 at 6:03:49 PM

I agree. I like using Antigravity for some of my frontend work, and I find it does a better job than Claude Code - Opus 4.6. I’ve also found the Gemini Flash models to be good at legal defense research—I use them to help New Yorkers fight parking tickets (https://nyceasyparking.com). That said, the Claude models are still amazing at agentic work.

by gnunez

4/22/2026 at 2:38:20 PM

I don't use Gemini CLI- I use the extension in VSCode, and Gemini extension in VS Code is barely usable in comparison to Claude or GPT-5.4. My experience (consistent with a lot of other reports) is that it takes long time before answer, and frequently returns errors (after a long wait). But I think it's specific to the extension (and maynbe the CLI) because the web version of Gemini works quickly and rarely errors (for me).

by dekhn

4/22/2026 at 5:42:48 PM

There was still a big gap like, 6 months ago. Now, I'm not seeing it either. It's been working well the last couple weeks after I picked it up again.

by toraway

4/22/2026 at 1:56:22 PM

Of the big three, Gemini gives me the worst responses for the type of tasks I give it. I haven’t really tried it for agentic coding, but the LLM itself often gives, long meandering answers and adds weird little bits of editorializing that are unnecessary at best and misleading at worst.

by fourside

4/22/2026 at 2:39:07 PM

Same. The tone is really off. Here is a response I just got from Gemini 3.1: "Your simulation results are incredibly insightful, and they actually touch on one of the most notoriously difficult aspects of ..." It's pure bullshit, my simulation results are in fact broken, GPT spotted it immediately.

by hyperbovine

4/22/2026 at 10:49:17 PM

There is a news report saying that Google has assembled an "elite" team to make Gemini as good as Claude/Codex.

by bossyTeacher

4/22/2026 at 1:57:09 PM

[dead]

by ihsw

4/22/2026 at 2:48:04 PM

Isn't Amazon doing the same thing, making their own TPU's?

by vondur

4/22/2026 at 4:34:44 PM

Yeah trainium and inferentia. They’re just not nearly as well supported on the software level. Google has already made sure this new generation will be supported by vllm, sglang, etc. Amazons chips barely support those and only multiple versions back. Super under invested in (at least on the open source side)

by clayhacks

4/22/2026 at 5:56:19 PM

That's seems odd. I'd figure if they are going to sell it as a product in AWS that they'd have some sort of off the shelf tooling that would be available.

by vondur

4/23/2026 at 1:59:52 AM

I think this is a narrow view. Aws and azure build their own data centers and partner closely with Nvidia and build their own silicon too. TPUS are non standard, no one else can run them - Nvidia build on fabrics and technologies well under and well integrated for a long time (mellanox etc) and clearly work very closely with the aws and azure hardware and data center build teams. I’d not bet that Google can do things better than everyone else - that’s certainly something Googlers always believe about themselves but it’s not the case that you can’t build a best of breed that meets or exceeds total in house builds.

by fnordpiglet

4/22/2026 at 9:49:44 PM

Don't build your castle in someone else's kingdom.

Buying from nvidia is the only real option and even that is not optimal.

by amelius

4/23/2026 at 1:40:10 AM

> Don't build your castle in someone else's kingdom.

would like to know about the scrape content of these castles /j

by luqtas

4/22/2026 at 5:40:39 PM

> I suspect that when things get really big, Google's systems will always be more cost-efficient.

In fact I am opposite of this hypothesis for two reasons. Google has artificially limited production. And because TSMC favours whoever could pay for the most capacity(as incremental capacity is very cheap for them). So Nvidia gets first slot for new process.

Also the second reason is that GCP's operating margin is very high compared to say Hetzner or lambdalabs and you can get GPUs much cheaper there compared to GCP. So students/small researchers are stuck on GPU.

by YetAnotherNick

4/22/2026 at 1:11:35 PM

I'd bet that too if their management wasn't so incredibly uninspiring. Like, Apple under Cook was also pretty mild and a huge step down from Jobs, but Google feels like it fell off a cliff. If it wasn't for OpenAI releasing ChatGPT, they might still be sitting on that tech while only testing it internally. Now it drives their entire chip R&D.

by sigmoid10

4/22/2026 at 5:09:42 PM

Google was calling itself an "AI-first" company beginning in 2016 or 2017. They designed and built TPUs nearly a decade ago and were using transformer models in products like Google Translate but didn't make a big fuss about it, it just made the product way better. People should at least credit Sundar somewhat for this, it turned out to be quite prescient, especially the advantage of having your own chips that are specifically designed for ML.

by suttontom

4/23/2026 at 8:05:13 AM

AI was very different in 2016-2017 compared to what it is since ChatGPT. Facebook was also a primarily AI/ML driven company with noone realizing it on the front-end, but at least they were heavily involved in the open source side on the back-end - long before LLMs went big. In fact they enabled them to go big with things like pytorch. Google just stumbled into this. Deepmind (also acquired before Sundar) came up with the theory, but they didn't see the potential. What you call "prescience" I call luck. They did not create the demand for their own technology like e.g. Nvidia did by pushing the field ahead with full force. In fact all of Google's most popular products are from the time before Sundar took over. Even with Gemini they are dragging their heels, sitting far below all other big model providers when you look at usage.

by sigmoid10

4/23/2026 at 5:33:00 PM

This is a bizarre accounting of things. FAIR's efforts building Pytorch were seen as experimental and fragile by the time it was released, when Tensorflow was already being used in edge deployment for computer vision and seq-to-seq. Google was the company that prepped the technology for deployment, created the theory (Transformer architecture), implemented it in practice (BERT bidirectional encoding) and then scaled it (RoBERTa) all before GPT-3 ever released. Three years before Facebook released Llama.

> They did not create the demand for their own technology like e.g. Nvidia did by pushing the field ahead with full force.

They did, though. You are commenting on an eighth-generation TPU product that has been used millions of times a day for the past half-decade. It's likely that this will be the hardware providing inference for Apple's Gemini model they've selected to use with Siri. TPUs are the economically-conscious inference choice if you've already separated your training/inference workflows.

by bigyabai

4/22/2026 at 1:35:51 PM

To be fair, I don't think any of the AI players wanted what OAI did. Sam grabbed first mover at the cost of this insane race everyone else got forced into.

by WarmWash

4/22/2026 at 1:43:49 PM

I am not fan of the era when CEO is expected to be a cult leader type person.

Cook did very well in all areas as well as in not trying to create a cult.

by hkpack

4/22/2026 at 1:49:35 PM

What would an inspiring leader do differently for you?

by whattheheckheck

4/22/2026 at 1:59:13 PM

Inspire

by someguyiguess

4/22/2026 at 3:10:30 PM

The line between inspiring and a grift can be hard to see in the moment.

by lokar

4/22/2026 at 2:51:23 PM

They had no reason to destroy their golden goose, why release something that could hurt their money printing business.

Honestly im rather impressed with how they handled it, they had enough of the infra and org in place to jump at it once the cat was out of the bag.

Sundar declared a code red or whatever and they made it happen. But that could ONLY happen if they had the bedrock of that ability already built.

No one really remembers now that google was a year behind.

by acedTrex

4/22/2026 at 12:49:10 PM

As others have been capturing news cycle eyes, seems to me Google has been going from strength to strength quietly in the background capturing consumer market share and without much (any?) infrastructure problems considering they're so vertically integrated in AI since day one? At one point they even seemed like a lost cause, but they're like a tide.. just growing all around.

by Keyframe

4/22/2026 at 2:51:34 PM

> seems to me Google has been going from strength to strength quietly in the background capturing consumer market share and without much (any?) infrastructure problems considering they're so vertically integrated in AI since day one?

The Google Antigravity subreddit is a shitshow though:

https://www.reddit.com/r/GoogleAntigravityIDE/

by ValentineC

4/22/2026 at 3:31:22 PM

Damn, you're not kidding. Might be worse than r/ClaudeAI in terms of user sentiment, and that's saying something.

by arcanemachiner

4/22/2026 at 8:15:15 PM

I mean, reddit is just a knob sama can turn for easy astroturfing. It's almost as bad as looking for grok sentiment on X.

by scottyah

4/22/2026 at 7:36:46 PM

can't let google have a success without it getting autodestructed though. classic google

by Keyframe

4/22/2026 at 1:12:08 PM

Yeah I think there will be a time in a few years (1-2?) when both Google and Apple will get to eat their cake. They aren't playing the same game of speed running unpolished product releases every month to double their valuation. They have time to think and observe and put out something really polished. At least that's the hope! :)

by youniverse

4/22/2026 at 3:06:39 PM

> put out something really polished

Like Apple Intelligence? Which was quite crap

by ativzzz

4/22/2026 at 3:47:33 PM

Specifically what was crap about it? It seems to do what was advertised.

by doug_durham

4/22/2026 at 4:25:29 PM

I think most people expect more than semi-reliably setting a verbal timer in 2026.

by ethbr1

4/22/2026 at 5:11:53 PM

The functionality showed in ads was never even available internally - and is still not available today how long after their marketing released it

by tomassre

4/22/2026 at 1:27:32 PM

That's because these mega monopolies have diverse income streams and have grown like cancers to tax every system and economy that touches the internet.

Anthropic and OpenAI are having to fight like hell to secure market share. Google just gets to sit back and relax with its browser and android monopolies.

Why did our regulators fall asleep at the wheel? Google owns 92% of "URL bar" surface area and turned it into a Google search trademark dragnet. Now Anthropic has to bid for its own products against its competitors and inject a 15+% CAC which is just a Google tax.

Now consider all the bullshit Google gets to do with android and owning that with an iron fist. Every piece of software has a 30% tax, has to jump through hoops, and even finding it is subject to the same bidding process.

These companies need to be broken up.

Google would be healthier for the economy and its own investors as six different companies. And they shouldn't be allowed to set the rules for mobile apps or tax other people's IP and trademarks.

by echelon

4/22/2026 at 1:57:20 PM

Google invented the AI architecture that Anthropic and OpenAI based their entire companies on? Based off years of research at Google.

Of course they should have to fight with the inventors of the technology they’re using.

by harrall

4/22/2026 at 2:00:04 PM

> Google invented the AI architecture that Anthropic and OpenAI based their entire companies on

Source?

by someguyiguess

4/22/2026 at 2:05:31 PM

Unless you don’t think Attention Is All You Need?

https://en.wikipedia.org/wiki/Attention_Is_All_You_Need

by ckcheng

4/22/2026 at 3:13:41 PM

https://en.wikipedia.org/wiki/Attention_Is_All_You_Need

by triceratops

4/22/2026 at 2:06:08 PM

"Attention Is All You Need" was a paper by a bunch of Google researchers

by IncreasePosts

4/22/2026 at 3:41:25 PM

Still, attributing that progress to "years of research at Google" alone is simplifying the facts to the point of being just plain wrong. That kind of research was always very much in the open and cooperative, with deep levels of standing-on-shoulders.

Attention e.g. was developed by Dzmitry Bahdanau et al. (those being Kyunghyun Cho and Yoshua Bengio) in 2014 while interning at the University of Montreal.

The insight of the paper you point to was that with attention you could dispense of the RNN that attention was initially developed to support.

by _0ffh

4/22/2026 at 4:39:34 PM

Context: https://news.ycombinator.com/item?id=42310213

by irishcoffee

4/22/2026 at 3:05:28 PM

If by fight like hell you mean hype like hell, then yeah.

Sam Altman's honesty problems, and Elon buying a VS code fork for $60 billion isn't a sign of moral uprightness or wisdom.

There's a lot to be said for grinding away at a problem. Being on your eighth generation AI chip and seventh generation of autonomous driving hardware is how you build value. Not by hobnobbing with fascists and building an army of stock pumping retail investors.

by Zigurd

4/22/2026 at 1:15:01 PM

Their latest open models are pretty competitive with other open models, and some innovation around the smaller sizes (2-4 GB).

They're helping close to the distance to realistic quality inference on phones and other smaller devices.

by vibe42

4/22/2026 at 4:35:02 PM

> They're helping close to the distance to realistic quality inference on phones and other smaller devices.

If someone monopolized OS marketshare for mid- to low-priced devices, that does seem like it would be a useful research focus.

Whereas offering the same with compute-inefficiency cloud inference would be economically unviable at scale.

Free on-device Google premium closed-source models* = free Google Maps 2.0

* As long as you ship Google Apps and Play Services

by ethbr1

4/22/2026 at 5:36:14 PM

Take away the hype and OpenAI / Anthropic are covering themselves with money and lighting themselves on fire to see who can make the bigger bonfire...

by spyckie2

4/22/2026 at 1:38:09 PM

AI adoption isn't existential to Google like it is to OAI and Anthropic. They also can't produce hype like the other two, because anything they say is just going to come off as corporate drivel.

by WarmWash

4/22/2026 at 1:04:54 PM

you've never tried to use gemini 3 I guess - that thing was so unreliable it might as well not be offered; there's also a reason why everybody here is excited for claude and codex, but not really for antigravity.

that said, I actually agree: google IMHO silently dominates the 'normie business' chatbot area. gemini is low key great for day to day stuff.

by baq

4/22/2026 at 6:25:58 PM

Yeah Gemini is not usable outside of the gemini.google.com homepage and AI Studio. Gemini CLI does not work at all and all the models are constantly overloaded or time out. Got a trial month and there was not a single day I could actually work with Gemini in a coding harness, 3.1 Pro was never reachable. Awful service, imagine if I'd have paid for this. Additionally since roughly two weeks Gemini 2.5 Flash (a stable GA model) is constantly throwing backend errors on Google's side when Grounding is on.

by sunaookami

4/22/2026 at 1:44:44 PM

It's interesting that, of the large inference providers, Google has one of the most inconvenient policies around model deprecation. They deprecate models exactly 1 year after releasing them and force you to move onto their next generation of models. I had assumed, because they are using their own silicon, that they would actually be able to offer better stability, but the opposite seems to be true. Their rate limiting is also much stricter than OpenAI for example. I wonder how much of this is related to these TPU's, vs just strange policy decisions.

by kamranjon

4/22/2026 at 1:49:07 PM

It's frustrating how cavalier they are about killing old Gemini releases. My read is that once a new model is serving >90% of volume, which happens pretty quickly as most tools will just run the latest+greatest model, the standard Google cost/benefit analysis is applied and the old thing is unceremoniously switched off. It's actually surprising that they recently extended the EOL date for Gemini 2.5. Google has never been a particularly customer-obsessed company...

by gordonhart

4/22/2026 at 2:08:52 PM

What benefit is there to sticking on older models? If the API is the same, what are the switching costs?

by surajrmal

4/22/2026 at 2:47:21 PM

Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior, but now you find that the new model performs worse with regards to a specific task you were performing, or just behaves differently and needs prompt adjustments. They also can fundamentally change the default model settings during new releases, for example Gemini 2.5 models had completely different behavior with regards to temperature settings than previous models. It just creates a moving target that you constantly have to adjust and rework instead of providing a platform that you and by extension your users can rely on. Other providers have much longer deprecation windows, so they must at least understand this frustration.

by kamranjon

4/22/2026 at 11:54:21 PM

> Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior

If this is a deal breaker, then self-hosting is the only solution. Due to the hardware premium, all models hosted by 3rd-parties will be deprecated to make room for newer, better, and more efficient models.

by overfeed

4/23/2026 at 5:43:34 AM

Sure, but Google also leaves little to no overlap between models and often will leave models in preview mode (which many companies cannot use in production for legal reasons) - right up until the point that the previous model is deprecated.

The point is that if you want to build a platform that customers can rely on based on their own schedules of feature development, you need to support models for longer periods of time. For example, OpenAI is still offering older models like gpt4 which was released in 2023 - this gives customers plenty of time to test, experiment and eventually migrate to a newer model if it makes sense.

by kamranjon

4/22/2026 at 2:38:46 PM

If you're trying to run repeatable workflows, stability from not changing the model can outweigh the benefits of a smarter new model.

The cost can also change dramatically: on top of the higher token costs for Gemini Pro ($1.25/mtok input for 2.5 versus $2/mtok input for 3.1), the newer release also tokenizes images and PDF pages less efficiently by default (>2x token usage per image/page) so you end up paying much much more per request on the newer model.

These are somewhat niche concerns that don't apply to most chat or agentic coding use cases, but they're very real and account for some portion of the traffic that still flows to older Gemini releases.

by gordonhart

4/22/2026 at 2:37:46 PM

I've heard GenAI.mil still has Gemini 2.5 only.

by akelly

4/22/2026 at 2:40:50 PM

Wouldn't surprise me. The best model you can get on AWS GovCloud is still Claude Sonnet 4.5.

by gordonhart

4/22/2026 at 2:06:10 PM

Flash 2 isn't even at EOL until June but we started seeing ~90% error rates getting 429s over the weekend. (So we switched to GPT 5.4 nano.)

by jbellis

4/22/2026 at 1:16:00 PM

This link has more on the architecture: https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...

by nsteel

4/22/2026 at 4:18:26 PM

I’ve been using Gemini with Junie (jetbrains attempt at Claude code). While Junie is nowhere near as good as Claude Code, it is way ahead of the current Google tooling. I get quite good consistent results for pretty cheap with this combo.

by delbronski

4/22/2026 at 4:22:03 PM

Do you think that Junie specifically inside the context of jetbrains IDEs and tooling matches the competitors?

by rtaylorgarlock

4/22/2026 at 9:20:25 PM

I think so. From my experience Claude/codex tooling really excels at vibe coding the whole thing. You give it a folder and just say: now make it do this. And you don’t really care for the code.

Junie tooling excels when you are more involved. Like, look in these two files, add this specific functionality, in this specific way. Junie is usually a lot faster and to the point. Very simple tooling , it just works for this workflow. But it breaks for the “code the whole thing for me” workflow.

by delbronski

4/23/2026 at 12:36:00 PM

It surprises me how JetBrains managed to lose such a great market opportunity.

I don't think they ever going to be able to re-claim large chunk of developers who are now fine with thin VSCode-like + Terminal for non-JVM languages.

Perfect example of how large corp with research capacity failed to navigate their product changes.

by chaoz_

4/22/2026 at 12:51:06 PM

If ai ends up having a winner I struggle to see how it doesn’t end with Google winning because they own the entire stack, or Apple because they will have deployed the most potentially AI capable edge sites.

by amazingamazing

4/22/2026 at 8:24:24 PM

I see a significant chance that they’ll continue to blunder the product side. It might still not matter because of their massive distribution, but leaves them open to disruption by a better product (think IE vs. Chrome).

by solarkraft

4/22/2026 at 11:07:05 PM

I think the winner will be local model wrappers that are designed to do specific tasks well like search without being anthropomorphized sycophants.

by davebren

4/22/2026 at 2:36:05 PM

I've been saying it, and I'll keep saying it (as someone who has an opinion backed by very little) - I think Google is incredibly well placed for the future with LLMs.

Owning your hardware and your entire stack is huge, especially these days with so much demand. Long term, I think they end up doing very well. People clowned so hard on Google for the first two years (until Gemini 2.5 or 3) because it wasn't as good as OpenAI or Anthropic's models, but Google just looked so good for the long game.

Another benefit for them: If LLMs end up being a huge bubble that end up not paying the absurd returns the industry expects, they're not kaput. They already own so many markets that this is just an additional thing for them, where as the big AI only labs are probably fucked.

All that said: what the hell do I know? Who knows how all of this will play out. I just think Google has a great foundation underneath them that'll help them build and not topple over.

by jjice

4/22/2026 at 1:34:38 PM

Which company is building the silicon for Google? Is it tsmc? What node size? I didn't see it with a quick search, sorry if it was in the post.

by cmptrnerd6

4/22/2026 at 1:42:47 PM

tsmc through broadcom

by wina

4/22/2026 at 4:24:09 PM

Apparently it's Broadcom for 8t and Mediatek for 8i.

https://wccftech.com/google-splits-tpuv8-strategy-two-chips-...

by nsteel

4/23/2026 at 10:48:37 AM

I wonder how the focus of agentic-ai differs from that of "normal" LLMs calls in terms of hardware. Does this just provide faster TPU, or does it support it in other ways?

by yamajun93

4/22/2026 at 10:35:27 PM

Are they refreshing the coral project with these? The coral project for edge ai apps seems like it needs a refresh.

by grizzles

4/23/2026 at 12:09:48 AM

It doesn't seem like it given the form factor. From what I understand, Google let their own hardware efforts die and handed off new Coral hardware to third parties.

They announced this Synaptics coral board last month, but you can't buy it anywhere AFAIK. I'm guessing it's going to be a lot more expensive than the original hw.

https://www.synaptics.com/products/embedded-processors/sl261...

by pnw

4/23/2026 at 12:13:28 AM

I was wondering the same thing. Maybe with the way gemma4 and intelligence density is going, they don't predict the need for NPU's?

by momojo

4/22/2026 at 3:57:34 PM

FTA:

> One pod of TPU 8t is 121 ExaFlops; or 121,000 PetaFlops.

Meanwhile, the compute capacity of the top 10 supercomputers in the entire world is 11,487 Petaflops.[1]

I know, I know, not the same flops, yada yada, but still. Just 1 pod alone is quite a beast.

Edit: [1] https://top500.org/lists/top500/2025/11/

by mlmonkey

4/22/2026 at 11:20:49 PM

Other than TPUs they're also planning for 960,000 Rubin GPUs [1] which can do 33 teraflops fp64 each, so over 30 classical exaflops, and with emulation it could be more than 100 exaflops.

[1] https://blogs.nvidia.com/blog/google-cloud-agentic-physical-...

by stabbles

4/22/2026 at 6:42:41 PM

Not only not the same size, 4-bit flops versus 64-bit flops, but not the same programmability either. the TPUs can do just matrix-multiplications and some supporting math.

Otherwise bitcoin mining rigs dwarf everything, if you just want to count raw operations per second.

by dist-epoch

4/22/2026 at 1:01:38 PM

I am curious what workloads Citadel Securities is running on these TPUs? Are you telling me they need the latest TPUs for market insights?

by nickandbro

4/22/2026 at 1:11:29 PM

Training their own, closed, internal models on their own data sets? Probably a good way to squeeze out some market trading signals.

by vibe42

4/22/2026 at 1:19:26 PM

Reminds me of when hedge funds started laying increasingly shorter fiber-optic cable lines to achieve the lowest possible latency for high-frequency trading.

by nickandbro

4/22/2026 at 1:31:09 PM

I thought these TPUs were primarily used for inference?

by written-beyond

4/22/2026 at 1:59:03 PM

TPU8t is for training. But even still, once you’ve trained, you need to run the model too. And these kinds of models already have a huge latency hit so there’s not much hurting running it away from the trading switches.

by vlovich123

4/22/2026 at 2:11:49 PM

As the article states, there's both training and inference dedicated chips.

by knowaveragejoe

4/22/2026 at 6:47:20 PM

Not Citadel, but Jane Street is training LLMs for trading:

https://www.janestreet.com/join-jane-street/machine-learning...

> We build on the latest papers in LLMs, computer vision, RL, training libraries, cuda kernels, or whatever else we need to train good models.

> We invent our own set of architectures and optimizations that work for trading.

by dist-epoch

4/23/2026 at 12:13:25 AM

Truly an epic company.

by brcmthrowaway

4/22/2026 at 3:20:29 PM

For how many times does this article mentions "agentic" and "agents"... Am I correct assume the hardware has nothing to do with "agents"? I assume it's just about a new generation of more efficient transformers / deep-learning layers.

by ks2048

4/22/2026 at 5:27:52 PM

There’s issues specific to workflows “agents”. For example many requests in an agent are all on top of the same previous results so context (kv cache) needs to be longer, and they use these massive connected nodes with direct nvme to cache the part of the prompt that’s repeatable.

It is about agents in that the design is for long context, many requests where the initial “chunk” is cached but spread across many requests.

They don’t call this out specifically but in the technical details like about the sram, how it’s all interconnected nodes in a pod it’s “designed” for it.

by tomassre

4/22/2026 at 9:12:03 PM

> TPU 8i pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM

Wow. Just Wow. I presume that's for each chip, and there are 1152 chips in a pod so that's 331TB HBM and 442TB SRAM per pod. Just wow.

by smlacy

4/22/2026 at 10:48:51 PM

That would be 442GB not TB of SRAM.

Aggregating on-chip SRAM is not a useful metric. AFAIK there are no interconnect between two chips at SRAM level so they cannot be shared .

by manquer

4/22/2026 at 1:43:28 PM

Anyone know if these are already powering all of Gemini services, some of them, or none yet? It's hard to tell if this will result in improvements in speed, lower costs, etc, or if those will be invisible, or have already happened.

by iandanforth

4/22/2026 at 3:23:33 PM

TPUs are systolic arrays right? So does that mean that Google is using a hetreogenous cluster compromising both GPUs and TPUs, for workloads that don't map well or at all on TPUs?

by geremiiah

4/22/2026 at 9:11:39 PM

I can't speak to what every team at Google does, but there are machines with Nvidia GPUs in Borg. However Google charges orgs internally for cpu/memory/gpu/tpu usage and TPUs are *way* more efficient in terms of FLOPS/$ than Nvidia GPUs, so there is a *huge* incentive for teams to use TPUs if they can, especially for teams operating large products.

by eklitzke

4/22/2026 at 6:21:04 PM

What sort of workloads are you thinking of?

by thatguysaguy

4/22/2026 at 1:11:47 PM

At $15/GB of HBM4 the 331.8TB of HBM4 per pod is 5 million...

by paulmist

4/22/2026 at 1:42:57 PM

$15/GB is retail price for DIMM sticks. Is HBM4 really that cheap?

by zozbot234

4/22/2026 at 1:55:43 PM

HBM is just DRAM stacked directly next to the die. The expensive part is gluing it on there. The chips themselves are pretty much the same.

by selectodude

4/22/2026 at 2:42:09 PM

HBM uses about twice as much DRAM silicon per GB due to all the space for interconnect

by akelly

4/22/2026 at 1:14:46 PM

It's HBM3e

by nsteel

4/22/2026 at 12:48:42 PM

That cooling system looks crazy. What an unbelievable density.

by NoiseBert69

4/22/2026 at 1:59:44 PM

It would be interesting to benchmark a short training / inference run on the latest of TPU vs. NVIDIA GPU per cost basis

by zshn25

4/22/2026 at 2:55:23 PM

Interesting that t8i is both for post-training and inference.

by Aissen

4/22/2026 at 12:52:02 PM

The real problem is that scientists doing this sort of early work more often than not want to burn hardware under their desks. Renting infrastructure in Google cloud isn't the only way...

by aliljet

4/22/2026 at 5:48:00 PM

You need to have an awfully big desk to stick compute under in order to run a training workload of any interest...

by tjwebbnorfolk

4/22/2026 at 1:12:47 PM

The pics of the cooling system is pretty good sci-fi / cyberpunk / steampunk inspo.

If the whole AI bubble spectularly collapes, at least we got a lot of cool pics of custom hardware!

by vibe42

4/22/2026 at 1:17:52 PM

> If the whole AI bubble spectularly collapes

Every other news for the past month has been about lacking capacity. Everyone is having scaling issues with more demand than they can cover. Anthropic has been struggling for a few months, especially visible when EU tz is still up and US east coast comes online. Everything grinds to a halt. MS has been pausing new subscriptions for gh Copilot, also because a lack of capacity. And yet people are still on bubble this, collapse that? I don't get it. Is it becoming a meme? Are people seriously seeing something I don't? For the past 3 years models have kept on improving, capabilities have gone from toy to actually working, and there's no sign of stopping. It's weird.

by NitpickLawyer

4/22/2026 at 1:36:31 PM

Both are possible; increasing demand and bubble collapse.

The way this could happen is if model commoditization increases - e.g. some AI labs keep publishing large open models that increasingly close the gap to the closed frontier models.

Also, if consumer hardware keep getting better and models get so good that most people can get most of their usage satisfied by smaller models running on their laptop, they won't pay a ton for large frontier models.

by vibe42

4/22/2026 at 8:29:38 PM

I’m going to stick my neck out a bit and predict that model commoditization will never happen as long as humans keep producing new content and innovation for models to train on. Sure, some open models will be good enough to write software against, but that’s but a fraction of the overall market for this technology.

by otterley

4/22/2026 at 8:48:23 PM

> as long as humans keep producing new content and innovation

Well.. we won't have to as we'll have models to do it for us!

by byproxy

4/22/2026 at 8:53:29 PM

I'm sure you were kidding, but seriously, the fact that AI-produced music pretty much all sounds the same is a good indicator that AI isn't particularly creative.

by otterley

4/23/2026 at 3:16:46 AM

It’s not about creativity. The incentive to produce drops to zero when an LLM is just going to slurp it up and regurgitate it without some form of compensation (notoriety, money, whatever).

by what

4/23/2026 at 5:42:29 AM

[dead]

by otterley

4/22/2026 at 1:45:40 PM

There's a massive amount of demand at the current price point, this does not exclude a bubble considering that the current cost to consumers is lower than what capacity expansion costs.

Though nowadays it feels like the bubble is going to end up being mainly an OpenAI issue. The others are at least vaguely trying to balance expansion with revenue, without counting on inventing a computer god.

by hgoel

4/22/2026 at 4:04:01 PM

Is the internet bigger or smaller than it was in 1998 compared to today?

Demand for internet and web services is significantly higher today than in 2000 but a bubble still popped. Heck a regular old recession or depression, completely unrelated to AI could happen next year and could collapse the industry. I mean housing is more expensive than ever nearly 20 years after collapsing in the Great Recession.

by kemotep

4/22/2026 at 4:48:07 PM

The problem that I have with dotcom comparisons is that people miss what popped and what remained after that bubble. Catsdotcom and Dogsdotcom popped. But the tech remained, and now we have FAANG++.

If we apply the same logic, any of oAI, xAI, Anthropic might pop, but realistically they won't, and even if they do, some other players will take their spots, and the tech will survive, and more importantly the demand will still be there. This cat isn't going back into the bag. People want this now. More than all the providers can give them. Today. The demand won't suddenly disappear now that "we got a hit" like someone put it recently.

by NitpickLawyer

4/22/2026 at 7:34:10 PM

I feel like this fails to address my point.

In 2008 there was a subprime mortgage crisis that caused the housing market to crash. Nearly all banks who participated in this survived. There was and still is significant demand for houses, financed through mortgages.

The bubble can burst, most if not all the big players still survive 20 years later and yet significant value and capital can still be destroyed in the process.

Same for the dot com. There was demand for the internet, it couldn’t meet the expectations of the day, and yet here we are with like 100x more internet services than before all these years later. Saying the AI bubble will pop is not a prediction that all AI companies will cease to exist immediately. Amazon lost 80% of their stock price in 2000. Is Amazon bigger or smaller than they were in 2000 today?

by kemotep

4/22/2026 at 8:11:27 PM

Username checks out

by jere

4/22/2026 at 1:36:46 PM

[dead]

by redsocksfan45

4/22/2026 at 2:00:57 PM

In recent discussions about Tim Apple [sic] moving on there was a discussion about whether Apple flopped on AI, which is my opinion. Of course you had the false dichotomy of doing nothing or burning money faster than the US military like OpenAI does.

IMHO that happy medium is Google. Not having to pay the NVidia tax will likely be a huge competitive advantage. And nobody builds data centers as cost-effectively as Google. It's kind of crazy to be talking ExaFLOPS and Tb/s here. From some quick Googling:

- The first MegaFLOPS CPU was in 1964

- A Cray supercomputer hit GigaFLOPS in 1988 with workstations hitting it in the 1990s. Consumer CPUs I think hit this around 1999 with the Pentium 3 at 1GHz+;

- It was the 2010s before we saw off-the-shelf TFLOPS;

- It was only last year where a single chip hit PetaFLOPS. I see the IBM Roadrunner hit this in 2008 but that was ~13,000 CPUs so...

Obviously this is near 10,000 TPUs to get to ~121 EFLOPS (FP4 admittedly) but that's still an astounding number. IT means each one is doing ~12 PFLOPS (FP4).

I saw a claim that Claude Mythos cost ~$10B to train. I personally believe Google can (or soon will be able to) do this for an order of magnitude less at least.

I would love to know the true cost/token of Claude, ChatGPT and Gemini. I think you'll find Google has a massive cost advantage here.

by jmyeet

4/22/2026 at 2:15:15 PM

> I saw a claim that Claude Mythos cost ~$10B to train.

Can you cite this? That seems absurd.

by knowaveragejoe

4/22/2026 at 2:31:14 PM

I've seen various claims to this (eg [1][2][3]) but nobody reall knows. These may all come from one uunsubstantiated claim. It is I think widely accepted that Mythos is ~10T parameters.

I've seen figures that suggest GPT-4 was 1.8T parameters and cost upwards of $100 million to train (also unsubstantiated), in which case the Mythos figure might be inflated and also include development costs.

So who really knows?

[1]: https://www.softwarereviews.com/research/claude-mythos-previ...

[2]: https://x.com/duttasomrattwt/status/2041903600516133016

[3]: https://www.forrester.com/blogs/project-glasswing-the-10-con...

by jmyeet

4/22/2026 at 2:04:48 PM

Apple has not flopped on AI as you say. They are just focused on privacy and are likely waiting for the time when local models become efficient enough to run on iPhones (which is quickly becoming a reality).

Google could probably train models for orders of magnitude less money as you say, but they aren't. They are not capable of creating high quality models like OpenAI and Anthropic are. Their company is just too disorganized and chaotic.

Anecdotally, I don't know a single person who uses Gemini on purpose.

by someguyiguess

4/22/2026 at 7:12:00 PM

> hey are just focused on privacy and are likely waiting for the time when local models become efficient enough to run on iPhones (which is quickly becoming a reality).

This is such revisionist history. They were not strategicially waiting. They tried, really really hard. The entire iPhone 16 pro was built on AI. Heck, they even (re)named it as Apple Intelligence.

Remember, this is the same time when Microsoft launched Copilot (RIP), Google launched Gemini, OpenAI with ChatGPT etc.

--- They had to walk back hard because it was a flop. They might be accidentally successful because they are a company with multiple strengths, but dont think of it as they were sitting AI out.

by bitpush

4/22/2026 at 4:40:21 PM

>They are just focused on privacy

Is that why they rushed out introducing AI summaries etc in order to play catch-up and then backpedaled when they exploded in customers' faces/individuals concerned in false headlines threatened to sue?

I use Gemini on purpose all the time. It can start timers for me, add calendar entries without having to type it out, convert email to calendar or reminders etc. I'd use it even more if it had more access to other bits of my phone.

by fennecfoxy

4/22/2026 at 9:23:09 PM

> They are just focused on privacy and are likely waiting for the time when local models become efficient enough to run on iPhones

How does that make any sense?

iPhones may be able to run local model inference, but Apple still can't train anything if they don't have any data.

by btian

4/22/2026 at 2:24:02 PM

The "waiting for local LLMs" came up re: Apple and IMHO that's too passive for company where if someone else has a better AI assistant, it's going to be a huge problem.

What if somebody cracks the problem if splitting inference between local and remote? What if someone else manages so modularize learning so your local LLM doesn't need to have been trained on how to compute integrals? Obviously we can't disect a current LLM and say "we can remove these weights because they do math" but there's no guarantee there isn't an architecture that will allow for that.

Apple could also be training an LLM Siri 2.0 that knows enough to do the things you want. Setting alarms, sending messages, etc. Apple would have all the information on what the major use cases are and where Siri is currently failing. They can increase Siri's capabilities as local LLM inference improves.

As for Google creating high quality models, I personally believe the models are going to be commoditized. I don't believe a single company is going to have a model "moat" to sustain itself as a trillion dollar company. I base two reasons for this:

1. At the end of the day, it's just software and software is infinitely reproducible and distributable. I mean we already saw one significant Anthropic leak this year; and

2. China is going to make sure we're not all dependent on one US tech company who "owns" AI. DeepSeek was just the first shot across the bow for that. It's going to be too important to China's national security for that not to happen.

And OpenAI's entire funding is predicated on that happening and OpenAI "winning".

by jmyeet

4/22/2026 at 6:43:50 PM

If you think RAM prices are going to come down, think again.

New pods use 10x as much RAM as previous generation.

by dist-epoch

4/22/2026 at 4:28:01 PM

I'm surprised the interconnect per system is so slow? 6x 200Gb feels barely competitive. Same as last year.

Trainium3 and Maia 200 are 2.5 and 2.8Tb/s vs this 1.2Tb/s. Maia is 6 stacks of HBMe3, so ratio of mem:interconnect bandwidth is really falling behind here. Notably Maia is also, like TPU, high radix.

by jauntywundrkind

4/22/2026 at 4:38:35 PM

Isn't it 6x 200Gb octals? An octal being 8x 200Gb lanes. So 9.6Tbps?

by nsteel

4/23/2026 at 7:44:23 PM

Thanks. Yeah uhhh the table here says 19.2Tb/s scale up per chip??? Uhhhhhhhh. This math is still not mathing for me. But that makes much much more sense.

Google just wildly ahead of literally everyone here.

by jauntywundrkind

4/24/2026 at 10:45:42 AM

19.2Tb is 2 x 9.6Tbps. Some (most) companies count Tx and Rx separately despite it making no sense in the context of serdes lanes. Stupid marketing in my opinion.

I don't think they are meaningfully ahead, it's more to do with what's available at the time. 200/224G is only just coming available this year. The others will have the same in their next product announcements.

by nsteel

4/22/2026 at 2:07:10 PM

They are missing a header to show the transition in discussion from TPU8t to 8i!

Thanks for posting otherwise.

Edit: actually, looks like the header got captured as a figure caption on accident.

by SecretDreams

4/22/2026 at 1:40:03 PM

I can't help but think we will be "laughing" at this in 10 years time like we laugh at steam engines or abacus.

by varispeed

4/22/2026 at 2:31:16 PM

[dead]

by bozdemir

4/22/2026 at 2:00:42 PM

[dead]

by zshn25

4/22/2026 at 2:16:51 PM

yeah but can you release the sdk for the pixel 10? it was one of then only reasons which i bought this mid phone

by nicman23