alt.hn

3/14/2026 at 9:36:36 PM

How "Hardwired" AI Will Destroy Nvidia's Empire and Change the World

https://medium.com/@mokrasar/the-last-chip-how-hardwired-ai-will-destroy-nvidias-empire-and-change-the-world-8da20571e706

by amelius

3/14/2026 at 10:13:30 PM

This is still far away from being viable for actually useful models, like bigger MoE ones with much larger context windows. I mean, the technology is very promising just like Cerebras, but we need to see whether they are able to keep up this with the evolution of the models to come in the next few years. Extremely interesting nevertheless.

by comandillos

3/15/2026 at 1:10:15 AM

Keep in mind though that if you can run a model at 100-1000x the speed, then even if the model is less capable the sheer speed of them may make you do more interesting things (like deep search explorations with LLM-guided heuristics).

by amelius

3/14/2026 at 10:14:13 PM

Is this a paid ad placement? I'm seeing a load of breathless "commentary" on Taalas and next to no serious discussion about whether their approach is even remotely scalable. A one-off tech demo using a comparatively ancient open source model is hardly going to be giving Jensen Huang sleepless nights.

by spzb

3/15/2026 at 6:11:07 PM

Probably being astroturfed by people with a financial interest in it working. The critical commentary in this thread is what to watch for.

by tim-tday

3/15/2026 at 4:34:12 AM

Hmm, isn't manufacturing the elephant in the room here. What am I missing. The HC1 is built on TSMC’s N6 process with an 815 mm² die. TSMC’s capacity is already heavily allocated to major customers such as NVIDIA, AMD, Apple, and Qualcomm.

A startup cannot easily secure large wafer volumes because foundry allocation is typically driven by long term revenue commitments. the supply side cannot scale quickly. Building new foundry capacity takes many years. TSMC’s Arizona fab has been under development since 2021 and is still not producing at scale. Samsung’s Texas fab and Intel’s Ohio project face similar long timelines. Expanding semiconductor production requires massive construction, EUV equipment from ASML, yield tuning, and specialized workforce training.

Even if demand for hardwired AI chips surged, the manufacturing ecosystem would take close to a decade to respond.

by jnaina

3/15/2026 at 5:59:44 AM

If the hardwired chips are magnitudes faster couldn't they be manufactured on an older process and still be competitive?

by pants2

3/16/2026 at 1:18:52 AM

older processes would not be feasible due to hard physics constraint: die size. The weights have to physically fit on the chip. At 6nm, an 8B parameter model already takes up 815mm², which is roughly the maximum size for any process. At 28nm, that same model would require a chip roughly 20x larger in area, which is physically impossible on a single die. So older nodes work fine for very small edge-case models (think embedded AI, IoT, voice assistants), but anything resembling a capable LLM needs at least N6/N7-class density just to fit.

Talaas' best case exit scenario is to get bought out by Intel, AMD, Qualcomm or Nvidia, and even automotive chip guys like NXP (automotive/robotics offline use will likely be major area of application for this). if the Taalas HC1 Technology Demonstrator is actually working and producing the results they are publicly claiming, I'm assuming there is a steady stream of visitors from silicon valley and elsewhere at their toronto offices.

by jnaina

3/15/2026 at 12:52:04 AM

The foundation models themselves will be cheap to deploy, but we’ll still need general purpose inferencing hardware to work along side them, converting latent intermediate layers to useful, application-specific concerns. This may level off the demand for “gpu/tpu” hardware, though, by letting the biggest and most expensive layers move to silicon.

by killbot5000

3/15/2026 at 6:12:05 PM

How specifically would that work? I’ve seen no framework for that happening.

by tim-tday

3/15/2026 at 1:41:38 AM

I speculate that they are hitting the reticle limit for models not much bigger than this. Judging by the size of the chip in their demonstrator for a 8B model I'm sure they know this already.

To scale this up means splitting up large models into multiple chips (layer or tensor parallelism). And that gets quite complicated quite quickly and you'll need really high bandwdith/low latency interconnects.

Still a REALLY interesting approach with a ton of potential despite the unstated challenges.

by choilive

3/15/2026 at 8:52:35 AM

what prevents digital holography on DVD writables from performing such computations optically, even if less efficient?

imagine each layer in the computation consisting of a DVD + a number of (embedding dimension) light sensors and light sources (or perhaps OPA / external cavity laser setups);

instead of N light sources it could be 1 light source and a ferroelectric FLCOS display like the cheap 320 x 240 monochrome high refresh rate displays in the cheap toy projectors from the past

https://github.com/ElectronAsh/FLCOS-Mini-Projector-ESP32

it doesn't sound too crazy and could permit a low entry cost to a bulky and probably less energy efficient setup, but with updates models you could just burn a new hologram on a fresh DVD

and people wouldn't be tied to advanced semiconductor manufacture.

by DoctorOetker

3/14/2026 at 10:01:53 PM

It's crazy. In a few years we will be able to buy Qwen on a chip, doing 10K tokens per second.

by amelius

3/14/2026 at 10:24:18 PM

Yeah, well might just come on your new laptop

by androiddrew

3/14/2026 at 10:55:11 PM

Or your phone.

by bradleyy

3/15/2026 at 1:16:47 AM

Hopefully not vendor locked with pay-per-token licensing.

by amelius

3/14/2026 at 11:44:30 PM

I always thought once we have the models figured out, getting the meat of it into an FPGA was probably the logical next step. They seemed to have skipped that and are directly writing the program as ASIC (ROM). Pretty wild.

by exabrial

3/15/2026 at 12:26:57 PM

Yes, FPGAs are not sufficiently dense. Because of their programmability they sacrifice a lot of capacity. The factor is something like 5x-10x.

by amelius

3/14/2026 at 10:21:27 PM

Give me a 120B dense model on one of these and yeah my API use will probably drop.

by androiddrew