Intel Arc Pro B70 Review

4/28/2026 at 10:52:32 PM

There's a tradeoff between dense models and MoEs on memory usage vs. compute for the same quality.

For example, Qwen3.5 27B and Qwen3.5 122B A10B have similar average performance across benchmarks. The 122B is much faster to run than the 27B (generates more tokens at the same compute). The 27B, on the other hand, uses ~4x less VRAM at low context lengths (less difference at high context lengths).

Right now, different hardware seems to be suited to different points in the dense vs. MoE balance. On one extreme is hardware like the DGX Spark and Strix Halo which have a lot of memory compared to compute performance and memory bandwidth, and are best-suited for MoE workflows. On the other extreme you have cards like RTX 5090 which have very high performance for the price but rather little memory, and is best suited for dense models.

The Arc Pro B70 seems to be the awkward middle. With 1-2 of these, you can run a ~30B dense model slowly, probably not fast enough to be useful interactively (you'd probably need a 5090 or 2x 3090 for that). Or, you can run a MoE model at high throughput, but probably not enough quality to support agentic workflows that actually use your throughput.

by 2001zhaozhao

4/28/2026 at 11:57:02 PM

DGX Spark is at the compute level of 5070. Its main issue is low memory bandwidth, i.e. it has quite fast token prefill but awful token generation. Strix Halo is just slow on every metric and used to be a cheap way to get 128GB unified RAM (now its prices are comparable to DGX Spark).

by storus

4/29/2026 at 12:41:45 PM

I have one, this isn't true. The wattage of a 5070 is about 300. The spark entire unit runs at 200 watts max. In reality it runs like a rtx 5060 with lots of vram. Very good for training, okay for inferencing if you are running batch jobs and don't mind waiting.

by tehologist

4/29/2026 at 7:23:36 PM

DGX Spark has actually the same compute as 5070Ti but its slower RAM and TDP brings it down to the 5070 territory.

by storus

4/29/2026 at 6:26:10 PM

Strix Halo TDP is significantly lower. Comparing apples to oranges, really.

by spookie

4/28/2026 at 11:08:41 PM

I am working mostly with image models so we do a lot of fun times and the card fits perfectly here. Performance isn't great but it can just tug along in the background.àp

by BoredPositron

4/29/2026 at 12:25:18 AM

I still not see the point running these models. I say they produce plausible garbage, nowhere near quality of frontier models (when they work).

Why can't Intel look beyond this nonsense state of affair and build something with 1TB of RAM or more?

What I am trying to say, I am yet to see anything competitive in the market. Cards very much stalled in sub 100GB region and best corporations can do is throw something to run toy models and forget about it after a week.

by varispeed

4/29/2026 at 4:33:49 AM

What's wrong with Grace Hopper if you want to throw buckets of local memory at a problem?

by AlotOfReading

4/29/2026 at 9:14:57 AM

Most consumer platforms only allow up to 128/256GB of RAM. If you want more you likely need a data centre platform. This is again a mismatch between what companies think consumers are at and the reality.

I think e.g. AMD missed the boat with 9950x3d2 by limiting memory controller. If it was possible to hook it with 1TB of consumer DDR5 RAM, that would be something to write home about.

by varispeed

4/30/2026 at 1:41:18 PM

What does Admiral Hopper have to do with this?

by MisterTea

4/29/2026 at 8:07:01 AM

Some people, including myself, loathe Nvidia with the fiery burning passion of a thousand suns, and will put up with whatever nonsense is necessary to run without them.

by MrDrMcCoy

4/29/2026 at 5:46:22 AM

LLMs are memory bandwidth bound not compute bound.

by Readerium

4/29/2026 at 8:15:08 AM

LLMs are bound by both and depends on the hardware which factor is higher.

by AntiUSAbah

4/29/2026 at 6:05:45 PM

Technically true, but if we're talking about local models, overwhelmingly you're gonna be bandwidth bound. You need about 2 flops per active parameter per token. An M5 chip has what, 150-200GB of bandwidth? But it can easily do something like 16tflops of fp16, so you're talking like 100 flops per byte of bandwidth. Which is just to say that in a batch=1 scenario, ie one user, you're only gonna use a few % of the GPU while you're totally saturated your memory bandwidth. For all practical purposes at the consumer level, take your memory bandwidth, divide by the size of the model, and that gives you the max tok/s throughput you're gonna get.

Even a 5090 has something like 50-60 flops per byte of bandwidth, you just can't saturate the compute without running large batches. (At least at inference, prefill is obviously more compute bound).

by joshjob42

4/29/2026 at 6:38:55 AM

This is incorrect, prompt processing is compute bound.

by ondra

4/29/2026 at 7:40:13 AM

This is only true for some parts of the time cost function.

by icelancer

4/28/2026 at 9:14:28 PM

Time to first token is a very important performance metric, as I figured out using a Mac Studio M3 Ultra (that is quite slow on this aspect).

But 32GB for a TDP of 230W is perhaps not super interesting. Especially because you probably want to have more than one card. It's a lot of heat. You could use the cards for heating up a building, but heatpumps exist.

by speedgoose

4/28/2026 at 9:28:12 PM

A lot of the TDP is reserved for running the shader units at full-power. My RTX 3070 Ti only pulls ~110w of it's 320w running CUDA inference on Gemma 26b and E4B.

by bigyabai

4/28/2026 at 9:35:36 PM

It's not that it's reserving power, but rather that you hit some bottleneck on a 3070 Ti before running into thermal limits-- it's likely limited by either tensor core saturation or RAM throughput. Running the workload with Nvidia's profiling tools should make the bottleneck obvious.

by Scaevolus

4/28/2026 at 10:02:13 PM

Generally the bottleneck is RAM throughput. Inference, in particular token generation, especially on a single user instance, is not all that computationally complex; you're doing some fairly simple calculations for each parameter, the time is dominated by just transferring each parameter from RAM to the cores. A 31B dense model like Gemma 4 has to transfer 31B parameters (at 16 bits per parameter for the full model, though on consumer hardware people generally run 4-8 bit quantizations) from RAM to the cores, that's a lot of memory transfer.

Prompt processing or parallel token generation can do a bit more work per memory transfer, as you can use the same weights for a few different calculations in parallel. But even still, memory bandwidth is a huge factor.

by lambda

4/29/2026 at 7:42:37 AM

B70 idles at 30W, while RTX PRO 4500 idles at 9W (measured to be 5W at wall).

B70 runs at 1/3 token output rate of RTX PRO 4500 and consume 3X idle power when do nothing.

by ycui7

4/29/2026 at 5:05:49 AM

My 4070 super and 5070 super both max out their tdp when I use them with ollama, is your usage different?

by culopatin

4/28/2026 at 11:03:28 PM

My 5090 runs at full TDP(pretty much exactly 575W) when running inference through LM Studio.

by gambiting

4/29/2026 at 2:51:40 AM

Cap the power to 400W you won’t see much impact

by rao-v

4/29/2026 at 3:11:19 AM

Same throughput with much less heat. Not sure what that extra 175w is going towards but it's diminishing returns.

by gardnr

4/29/2026 at 2:17:08 AM

Hi Intel, I'm itching to buy an Xe3P! Or, Nova Lake? Crescent Island? Celestial? Jaguar Shores?

Whatever the hell you name it doesn't matter to me, I just want a workstation with one of them bad boys attached to 160GB of RAM for legit inference power!

I've been saving my money not paying for Claude Code so I can run my own agentic coding setup at home on yours. Please don't charge too much for the workstation class card if you can at all manage it. Maybe give us a discount to preorder? Please don't price a regular consumer like me out of the market!

Also, I am speculating integer based models will become hot due to lower memory and power requirements. Will the Xe3P be able to do integer-based math inference to use all that RAM to even greater effect?

by dwoldrich

4/29/2026 at 7:39:48 AM

> Please don't charge too much for it

Intel wouldn’t decide to do this even to save their own life

by als0

4/30/2026 at 9:30:25 PM

The whole rest of the industry seems blind to the possibility of excellent personal/private agentic coding. There is a chance Intel could capitalize on that and steal a ton of mindshare in a flash.

Maybe a slim chance based on past performance, but it's there.

by dwoldrich

5/2/2026 at 11:34:19 AM

Intel's strategy has consistently been that they do not consider doing any kind of business unless they earn at least 25% margin on each sale.

by als0

4/28/2026 at 9:22:08 PM

Here are some llama.cpp benchmarks for it: https://www.phoronix.com/review/intel-arc-pro-b70-linux/3

by SparkyMcUnicorn

4/28/2026 at 10:23:07 PM

Just ran llama-bench at home with the similar priced AMD AI PRO R9700 32G. The phoronix numbers look extremely low? Probably I misunderstand their test bench. Anyway, here are some numbers. Maybe someone with access to a B70 can post a comparison.

Tried to use the same model as the article:

llama-bench -m gpt-oss-20b-Q8_0.gguf -ngl 999 -p 2048 -n 128

AMD R9700 pp2048=3867 tg128=175

And a bigger model, because testing a tiny model with a 32GB card feels like a waste:

llama-bench -m Qwen3.6-27B-UD-Q6_K_XL.gguf -ngl 999 -p 2048 -n 128

AMD R9700 pp2048=917 tg128=22

by canpan

4/28/2026 at 11:02:47 PM

As of b8966, it is still not great.

  | model                 |      size |  params | backend | ngl |   test |            t/s |
  | --------------------- | --------: | ------: | ------- | --: | -----: | -------------: |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | SYCL    | 999 | pp2048 |  851.81 ± 6.50 |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | SYCL    | 999 |  tg128 |   42.05 ± 1.99 |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan  | 999 | pp2048 | 2022.28 ± 4.82 |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan  | 999 |  tg128 |  114.15 ± 0.23 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | SYCL    | 999 | pp2048 |  299.93 ± 0.40 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | SYCL    | 999 |  tg128 |   14.58 ± 0.06 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | Vulkan  | 999 | pp2048 |  581.99 ± 0.86 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | Vulkan  | 999 |  tg128 |   10.64 ± 0.12 |

Edit: I've no idea why one would use gpt-oss-20b at Q8, but the result is basically the same:

  | model                 |      size |  params | backend | ngl |   test |            t/s |
  | --------------------- | --------: | ------: | ------- | --: | -----: | -------------: |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | SYCL    | 999 | pp2048 |  854.16 ± 6.06 |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | SYCL    | 999 |  tg128 |   44.02 ± 0.05 |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | Vulkan  | 999 | pp2048 | 2022.24 ± 6.97 |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | Vulkan  | 999 |  tg128 |  114.02 ± 0.13 |

Hopefully, support for the B70 will continue to improve. In retrospect, I probably should have bought a R9700 instead...

by Mindless2112

4/29/2026 at 5:05:18 AM

"I've no idea why one would use gpt-oss-20b at Q8" - would you mind expanding on this comment?

In that particular model family, the choices are 20B and 120B, so 20B higher quant fits in VRAM, while you'd be settling for 120B at a lower quant. Is it that 20B MXFP4 is comparable in performance so no need for Q8?

Or is the insight simply that there are better models available now and the emphasis is on gpt-oss-20b, not Q8?

by jaimie

4/29/2026 at 6:34:47 AM

The parameters in the original gpt-oss-20B model are "post-trained with MXFP4 quantization", so there just isn't much to gain by quantizing to Q8. If you look inside the Q8 model, most of the parameters are MXFP4 anyway.

Though, looking inside my "gpt-oss 20B MXFP4 MoE" model, it looks to also be quantized the same way as the Q8, so that was probably an overstatement on my part.

Still, the Q8 is 12.1 GB and the FP16 is 13.8 GB. Not the ~1:2 ratio you might expect.

by Mindless2112

4/29/2026 at 7:31:12 AM

At this speed, people end up paying more on electricity than api calls. (California electricity)

by ycui7

4/28/2026 at 11:14:41 PM

For reference in case it's interesting to someone, a 5090 on Windows 11 with CUDA 13.1

  | model                 |       size |   params | backend  | ngl |   test |              t/s |
  | --------------------- | ---------: |--------: | -------- | --: |------: |----------------: |
  | gpt-oss 20B MXFP4 MoE |  11.27 GiB |  20.91 B | CUDA     | 999 | pp2048 | 10179.12 ± 52.86 |
  | gpt-oss 20B MXFP4 MoE |  11.27 GiB |  20.91 B | CUDA     | 999 |  tg128 |    326.82 ± 7.82 |
  | qwen35 27B Q6_K       |  23.87 GiB |  26.90 B | CUDA     | 999 | pp2048 |   3129.92 ± 5.12 |
  | qwen35 27B Q6_K       |  23.87 GiB |  26.90 B | CUDA     | 999 |  tg128 |     53.45 ± 0.15 |
  
  build: 9d34231bb (8929)

  gpt-oss-20b-MXFP4.gguf
  Qwen3.6-27B-UD-Q6_K_XL.gguf

Using MXFP4 of GPT-OSS because it was trained quantization-aware for this quantization type, and it's native to the 50xx.

by magicalhippo

4/29/2026 at 7:28:59 AM

You can get 120TPS (144 peak) with Qwen3.6-27B on RTX PRO 6000 with autoround when MTP enabled. It runs faster than sonnet api calls.

5090 gets maybe 100TPS with MTP

by ycui7

4/28/2026 at 10:33:05 PM

the build they use is from February, over two months old: https://github.com/ggml-org/llama.cpp/releases/tag/b8121

Which might not sound like much, but 2months in llm time is a long time, especially regarding support for new hardware like the r9700.

by andy_xor_andrew

4/28/2026 at 9:43:36 PM

Also from phoronix, a comparison with AMD R9700 and RTX 6000 Ada (because Nvidia has not sent them a blackwell card): https://www.phoronix.com/review/intel-arc-pro-b70/2

by zargon

4/29/2026 at 3:32:04 AM

Intel Arc B70 when released, can only produce 1/3 of the token of RTX PRO 4500. Well, it also cost 1/3 of RTX PRO 4500.

It lacked software support the for the primary target application, running LLM. The officially supported vllm fork is 6 version behind mainline. It did not run the latest hot new open models on huggingface. Parallel two of B70 reduce token rate, not improve it. So, the software behind B70 is basically so far behind.

by ycui7

4/29/2026 at 8:02:14 AM

What you say is not consistent with TFA.

The parent article shows that B70 is faster than RTX 4000.

RTX 4500 is faster than RTX 4000, but it cannot be more than 3 times faster, not even more than 2 times faster.

The parent article is consistent with RTX 4500 being faster than B70 for ML inference, but by a much smaller ratio, e.g. less than 50% faster.

If you know otherwise, please point to the source.

If you have run a benchmark yourself, please describe the exact conditions.

In the benchmarks shown at Phoronix for llama.cpp, the relative performance was extremely variable for different LLMs, i.e. for some LLMs a B70 was faster than RTX 4000, but for others it was significantly slower.

Your 3x performance ratio may be true for a particular LLM with a certain quantization, but false for other LLMs or other quantizations.

This performance variability may be caused by immature software for B70. For instance instead of using matrix operations (XMX engines), non-optimized software might use traditional vector operations, which are slower.

It is also possible that for optimum performance with a certain LLM one may need to choose a different quantization for B70 than for NVIDIA, because for sub-16-bit number formats Intel supports only integer numbers.

by adrian_b

4/29/2026 at 1:32:21 PM

TFA's benchmark was MLPerf, which doesn't require CUDA as Intel has their own Arc plugin. But actually try to run llama.cpp on Arc and it is a roll of the dice.

by jubilanti

4/29/2026 at 4:26:28 AM

There are nonlinearities to exploit in that calculus. Given enough VRAM to host a larger model that you're targeting, just the size can push you past the usability threshold at a much better price.

by muyuu

4/29/2026 at 7:37:05 AM

When you get 4 of these, the idle power alone is 120W. That is a lot of electricity if left on 24/7.

At that power consumption, you also end up being more expensive than API calls and many times slower. It starts to feel very stupid to run local interference.

If the client is very keen on privacy, then they can pay for the NVIDIA.

I end up returning my B70s, and bought RTX PRO 6000.

by ycui7

4/29/2026 at 7:25:13 AM

Problem is the more B70 you have, the slower the inference it gets(due to terrible software atm). A single B70 is almost barely faster than CPU inference. If you have 4 B70, you might as well run interference on CPU and be faster with cheaper DDR5 instead of GDDR6.

by ycui7

4/29/2026 at 8:10:53 AM

For what you say to be useful, please specify what sowftware you have used with B70, including its version.

Hardware-wise a B70 should be significantly faster than any of the available CPUs at ML inference. If it was not so in your tests, that must really be a software problem, so you must identify the software, for others to know what does not work.

by adrian_b

4/29/2026 at 10:06:16 AM

Something that is also cool with these cards is proper SR-IOV without hassle. Arc pro cards make for nice graphical acceleration devices for vms. I know ai gets all the hype but I also appreciate being able to accelerate multiple workstations with a single gpu and still get decent frametimes.

by greybcg

4/29/2026 at 1:03:43 PM

I bought an Arc A770 expecting that this feature would materialise based on it being available on other contemporary Intel GPUs, but it never did.

Does the B70 definitely support SR-IOV from day one?

by rekoil

4/29/2026 at 3:26:07 PM

I have unfortunately only confirmed 24 VFs on an Asrock Creator B60 Pro, later nerfed to 8 VFs after an intel update. I think someone tested the B70 Pro on the Level1Tech forums. I might consider buying a B70 myself just to confirm it.

by greybcg

4/30/2026 at 2:01:20 AM

The B50 certainly does. So Likely same for B70.

https://youtu.be/xii8bqmE6jk?si=EqpzZYA-z-46F697

by nirav72

4/28/2026 at 10:09:02 PM

For those that use Blender, in their section about Blender:

> We hope that, in the future, there will be real options other than NVIDIA for GPU-based rendering, as it is an area where competition is nearly non-existent.

And Checking opendata.blender.org, a NVIDIA GeForce RTX 4080 Laptop GPU scores 5301.8, while Intel Arc Pro B70 is still at 3824.64.

So there is still a bit more to go before Intel GPUs perform close to NVIDIA's.

by kinow

4/28/2026 at 10:29:56 PM

Also the first section I jumped to :) To Intel's credit, seems they're slowly improving, the section starts with:

> Over the last year or two, Intel has worked to deliver serious optimizations for and compatibility with Blender GPU rendering on its Arc GPUs. Although NVIDIA has long held an advantage in the application, our last time looking at Intel’s cards indicated ongoing improvements. This round of testing is no different. We found that the Arc Pro B70 provided more than twice the performance of the B50, also beating the R9700 by 9%.

by embedding-shape

4/30/2026 at 5:45:45 PM

Yeah, I checked an Intel GPU some years ago and I think it was scoring near 1000 or below in Blender's open data. Glad it's slowly improving, although I have to check if the price is also increasing or not, although I suspect it must still be cheaper than the other options.

by kinow

4/29/2026 at 1:36:31 AM

This is because Blender is in fact using CUDA?

by keyle

4/29/2026 at 3:28:53 AM

Blender supports CUDA, HIP, OneAPI, and Metal. So Intel GPUs are performing poorly using their native API.

by wmf

4/29/2026 at 2:36:03 AM

The key feature on intel platforms is the hardware de-noise acceleration (NVIDIA OptiX also works well.) Note, AMD OpenCL works quite well for some renderings, but blender flamenco likes consistent cluster hardware.

For 8k HDR10 media or 3+ screens the rtx 5090 32G model is going to be the minimum card people should buy. Just because you see 4 DP ports, doesn't mean the card can push bit-rates needed to fill an HDR10 display >60Hz.

The Mac Studio Pro unified >512GB ram/vram is a better LLM lab solution (Apple recently NERF'd it to 256GB.) Who cares if a task completes a bit slower, it doesn't matter given the lower error rates... and not costing $14k like an rtx 6000. =3

Great tutorial on getting blender to behave on mid-grade PC and laptops etc. :

https://www.youtube.com/watch?v=a0GW8Na5CIE

by Joel_Mckay

4/28/2026 at 10:12:45 PM

I was looking into this for LLMs but it's clearly a graphics-processing focused card. The memory bandwidth is too low for that much RAM to be useful in an LLM context. The 5090 I have has the same amount of RAM but far more bandwidth and that makes it much more useful.

by arjie

4/28/2026 at 10:20:30 PM

Compared to a B70, a 5090 is 1x the memory with 3x the bandwidth at 4x the price. Yeah, the 5090 is better, but you're paying for it.

by Mindless2112

4/29/2026 at 2:42:34 AM

On actual market it’s $1100 vs $3200 now, right? I actually got mine at $2200 at cost in the before days.

by arjie

4/29/2026 at 2:51:09 AM

Current lowest price for a new card on Newegg: $949.99 vs $3,699.99.

by Mindless2112

4/29/2026 at 2:53:18 AM

Wow, 5090 prices have exploded. Thanks for looking. I should have known by hardware price intuition is broken.

by arjie

4/28/2026 at 10:35:43 PM

Oh wow, I really would've expected higher memory bandwidth. That's only ~2-3x the little DGX Spark-alike I have to play with. Would've expected more.

by girvo

5/3/2026 at 7:01:43 PM

It would have been nice but I'm not mad about it; this is what I've been asking for since the 16GB Arc cards from earlier gens were so cheap; i.e. okay now do 'cheap but enough memory to run fancy models'.

IMO if this would have come out a year or so earlier, I feel like it would have been more like 750$ due to DRAM pricing and thus more compelling... [0]

[0] Yes It's likely that companies had secured DRAM contracts before the launch but I am also willing to bet that they also priced the card's launch price with at least some padding for increases in mind...

by to11mtm

4/28/2026 at 10:26:33 PM

It’s 32gb for people who can’t go for scalped 5090s but have a 3090 budget.

I have a pair of them with a 9480 and the only thing I have to do is keep the cache happy.

by cmxch

4/28/2026 at 10:30:39 PM

Eh. Trading CUDA for 8 more gigs seems like bad deal, unless you know absolutely for certain what you want to run will run on it.

by fluoridation

4/28/2026 at 11:05:57 PM

Until NVidia prices get better, I’ll build out with the Intel stack and keep the cache (and prompt processing speeds) happy.

As for software, anything that has a SYCL or Vulkan backend, and/or can be Intel optimized (especially to the same degree as llama.cpp) can run well.

by cmxch

4/29/2026 at 2:41:01 AM

[dead]

by arjie

4/29/2026 at 10:22:33 AM

> it's clearly a graphics-processing focused card.

Yes, that's what the G in GPU stands for. It's great to see that there are still manufacturers that understand this.

by askl

4/28/2026 at 9:11:41 PM

Is Intel still making GPUs? I have heard so many conflicting things about will they/won't they stay in the market.

by MostlyStable

4/28/2026 at 10:37:48 PM

They appear to be backing out (for a little while) of consumer cards, but datacentre/workstation/laptop GPUs are still their focus.

by girvo

4/28/2026 at 9:57:23 PM

Intel always had that habit of starting an internal conflict whenever whatever potential alternative revenue sources start to threaten their internal dependence on x86

by numpad0

4/28/2026 at 9:28:23 PM

They'll always have iGPUs so whether or not they stay in the dGPU market depends mostly on whether or not people buy them. So they might not, whole market seems to be moving to SoCs/APUs/whatever you want to call them.

by dismalaf

4/28/2026 at 10:33:06 PM

Not only will they always have iGPUs, but also cannot give up on advancing their datacenter AI GPUs (the next being Jaguar Shores). They need both of those far more than consumer or prosumer dGPUs, but that means they are committed to Big GPU work and Small GPU work.

Since they will have both of those big and small "bookends" of GPU architectures, it is a question of whether they see benefits in maintaining an accessible foothold in the midmarket ecosystem. I could make an argument for both sides of that, but obviously the decision is not up to me.

by chao-

4/28/2026 at 11:05:21 PM

They're working with nvidia to use their GPU tiles in mobile products.

by throwaway85825

4/28/2026 at 11:49:00 PM

I thought I had read that too and went to look for clarification and found that they’re just moving to a single architecture for their cards. Seemed reasonable.

by jonhohle

4/28/2026 at 11:04:15 PM

The B70 would have been the B770 bit it was canceled. Celestial has been canceled too.

by throwaway85825

4/28/2026 at 10:32:08 PM

What do you mean, are they still making GPUs? This is a discrete GPU that has just recently been released, and it's one of the most popular GPUs in its class at the moment, due to 32 GiB of RAM for under $1000, which makes it great for LLM inference.

by lambda

4/29/2026 at 7:32:25 AM

> What do you mean, are they still making GPUs?

There was recent talk of them pulling back from the consumer segment, though obviously the leaks have also predicted Battlemage not being a thing so go figure: https://youtu.be/NYd2meJumyE?t=638 (timestamped)

That said, them not releasing a B770 in the consumer segment also sucks, since there are games and use cases that the B580 comes in a bit short for.

by KronisLV

4/28/2026 at 11:05:13 PM

Honestly, I dont even care if its slower than just getting a 5090, just being able to run models my 3080 cannot handle would be a welcome change.

by giancarlostoro

4/28/2026 at 9:35:38 PM

I don't know what to believe when it comes to Intel news because they have so many haters.

by 2OEH8eoCRo0

4/28/2026 at 10:10:35 PM

It looks like, if one can afford it, the R9700 is worth the extra money.

I read that Intel is getting out of the dGPU space, but then again, their iGPUs are really getting good. I can't understand why they'd give up the space when the AI market is so insane.

by unethical_ban

4/28/2026 at 10:14:31 PM

Rumors of their exit from dGPU predate Battlemage. So I wouldn't put a ton of credence to them. But Intel's is quite talented at snatching defeat from the jaws of victory.

by timschmidt

4/28/2026 at 10:17:34 PM

I hope not. They’ve been flip flopping too much and the market needs more dGPU competition.

The team working on drivers is doing a good job playing catch up and I hope intel will continue to invest in cards that focus on graphics workloads and not just on AI inference.

by yurishimo

4/29/2026 at 7:45:21 AM

Exiting dGPU for gaming, but staying in the LLM world.

by ycui7

4/28/2026 at 9:07:31 PM

I would like one for the vram but I am sure they will be unobtainable after the initial stock sells out as I assume they were produced before the RAM prices went up.

by tempest_

4/28/2026 at 10:03:54 PM

Why are they still using their old Xe2/Battlemage architecture rather than their new Xe3/Celestial? They already used it in their Panther Lake chipset.

by cubefox

4/28/2026 at 10:34:26 PM

That's coming out in https://www.phoronix.com/review/intel-crescent-island by around the end of the year.

by phonon

4/29/2026 at 12:51:27 AM

Another comment here claims Celestial is cancelled. Has Intel indicated their intentions for the consumer dGPU space?

by fc417fc802

4/29/2026 at 7:10:30 AM

There are only rumours apparently: https://www.club386.com/intel-arc-celestial-cancelled-leak/

by cubefox

4/28/2026 at 10:52:07 PM

It looks like B70 was delayed 1-2 years for some reason.

by wmf

4/29/2026 at 12:26:00 AM

It should be possible to use the VRAM as extra swap space, when you're not using it for AI or gaming or anything else. 32GB is already more than a lot of computers have as just regular RAM, even sufficient to hold an OS installation:

https://www.tomshardware.com/news/lightweight-windows-11-run...

by userbinator

4/28/2026 at 11:09:53 PM

How should I update my simplistic understanding that decode is bw-bound with these results that show the B70 decoding faster than a 4090 (about 50% more bw)?

by jbellis

4/28/2026 at 11:54:39 PM

I doubt you'd get the same sort of result on a modern-ish MOE or dense model via a more standard inference engine like llama.cpp or VLLM. I don't think MLPerf is a reasonable benchmark at this point.

Edit: Here is a simple llama.cpp compare where the token gen results match the rule of thumb.

https://www.reddit.com/r/LocalLLaMA/comments/1st6lp6/nvidia_...

by rao-v

4/28/2026 at 9:09:46 PM

Can you use those AI cards for gaming too?

Or the makers intentionally nerf them, in order to better segment the markets/product lines?

by XCSme

4/28/2026 at 9:22:31 PM

The drivers often need per game optimisations these will be missing but I doubt Intel would nerf them, just rely on you not paying a lot for RAM the game won't use.

by ZiiS

4/28/2026 at 9:41:03 PM

I actually meant it in a different way. I would get it for local AI stuff, but being able to game on it would be a huge plus, otherwise I would need two different machines.

by XCSme

4/29/2026 at 8:00:45 AM

Much as I want diversity; a 3090 would be a billion times better for games and can probably hold its own for a broader AI workload. Anything other then running highly quantised models that don't fit in 24GB with realativly small contexts.

by ZiiS

4/29/2026 at 10:36:20 AM

A 3090 is what I have now.

But I hope to somehow have 48Gb or 64GB VRAM in a GPU that's also gaming-ready.

I was looking for maybe getting a mac studio for this reason, but I don't think a mac is really good for for gaming.

by XCSme

4/29/2026 at 8:24:52 AM

It'll work just fine for gaming. It's what the B770 would have been if it had 32GB RAM and ever got released.

by MrDrMcCoy

4/28/2026 at 9:50:07 PM

They nerf gaming cards to make money on the pro cards. Since this is a pro card it's not nerfed.

by wmf

4/29/2026 at 3:28:40 AM

It is weird that the reviewer does not mention RTX PRO 6000 96GB, but mentioned RTX PRO 5000 72GB. 72GB RTX PRO 5000 is a special order, and much less people are aware of it. RTX PRO 6000 is known by mostly everyone in the LLM world.

I cannot understand why would a tech reviewer do that.

by ycui7

4/28/2026 at 10:07:27 PM

$950 for 23TF fp32? Have GPU performance grew in past 5-10 years at all?

by numpad0

4/28/2026 at 10:50:46 PM

Are you comparing against gaming or workstation cards?

by wmf

4/29/2026 at 2:38:28 AM

1080Ti had >10TF in 2017. Or Titan XP too for that matter. ~10 years ago.

by numpad0

4/29/2026 at 5:47:55 AM

AI workloads are all about memory size and bandwidth not compute

by Readerium

4/28/2026 at 8:53:31 PM

These seem amazing for hobbyist, but that TDP given the perf might be an issue deploying a lot of them

by 100ms

4/28/2026 at 8:58:11 PM

Its performance is pretty unbalanced. If you're using it for the couple of things that it's good at, the TDP is competitive.

by zrm

4/28/2026 at 9:26:09 PM

From what I've read the Intel drivers are terrible and holding back using them for LLMs.

by driverdan

4/28/2026 at 9:31:00 PM

Don't think that's true. The drivers are bad (not sure terrible is fair, they have improved a lot) esp for older directx etc games. But Vulkan support is pretty good and that's all you need for LLMs really.

by martinald

4/28/2026 at 10:03:32 PM

I don't know about LLMs, but I tried an Intel card when Ubuntu Wayland couldn't initialize a 2 year old Nvidia. It just works.

by marshray

4/28/2026 at 11:10:25 PM

That is just Linux and politics. Linux wants to force vendors to open source theirs, Intel plays along, Nvidia as the market lead does not, so you have to use their proprietary one, which most distros do not ship by default.

by lukan

4/29/2026 at 9:44:22 PM

[dead]

by marshray

4/29/2026 at 12:50:21 AM

Interesting. I had read that Intel's Linux drivers were far behind their Windows versions. I haven't checked in a few months though.

by driverdan

4/29/2026 at 6:58:44 AM

That is compatible with what the comment you are replying: you don't need much to beat nVidia open drivers for linux. Intel linux drivers might be behind their Windows drivers, still ahead of nVidia's.

nVidia has zero incentives to play open for linux, they release the binary blobs, next to zero docs and support, and you deal with it. The last nVidia card I bought was 20 years ago, and it was so bad for linux (low perf and freezes for the open drivers, manual re-install hell and pray on each kernel update for the binaries) that I switched to ATI. Since then, ATI or Intel always were decent with zero headaches.

by otherme123

4/28/2026 at 9:29:43 PM

Everyone has terrible drivers here aside from Nvidia.

Intel looks like they'll leave the dedicated GPU space, so it's a bit doubtful if the drivers will ever catch up.

by 999900000999

4/29/2026 at 5:46:53 AM

What makes you think Intel will leave the GPU space?

by reallytD91

4/29/2026 at 6:16:57 AM

https://www.tomshardware.com/pc-components/gpus/intel-has-re...

I've seen several stories like this. Which is a shame since Intel offers the best value GPUs on the market.

I guess it's possible they'll still make workstation GPUs while skipping the consumer market.

by 999900000999

4/29/2026 at 11:33:54 AM

I guess they need to up their marketing game, as a lot of people I know are still unaware of Intel GPUs. It's either Nvidia or AMD.

by reallytD91

4/29/2026 at 12:01:15 AM

this review was essentially pointless, they reviewed the card for a ton of workloads nobody in their right mind would pick it for, and left out the only use case where it makes sense. great job?

by luckydata

4/29/2026 at 11:14:04 AM

You may have a valid technical point.

If you find a friendlier way to phrase it, you may find more people willing to discuss it.

by CoastalCoder

4/29/2026 at 12:26:06 PM

How you would you know not to pick this card for these types of workloads, without benchmarking it?

by mrweasel

4/29/2026 at 9:42:50 AM

Can we not have a PCIe card that's ASIC (and isn't GPU) with even DDR 4 or DDR 5 memory (Let's say 128 GB) onboard and being able to shove four of them on a consumer grade motherboard and then being utilized in parallel?

Noob question.

by wg0

4/29/2026 at 11:59:02 AM

DDR4 or 5 would really run into bandwidth and value issues.

Bandwidth on that memory interface and setup for dual channel would be significantly worse than Strix Halo, which already exists and could be an entire compute setup with no need for an ASIC.

by alex43578

4/29/2026 at 2:07:15 PM

Consumer grade motherboards generally have 20ish pcie lanes, which more or less boils down to 16x gpu and 4x nvme + something for random peripherals. You'd need something like threadripper to do better.

by zokier