alt.hn

4/4/2026 at 3:18:58 PM

Show HN: sllm – Split a GPU node with other developers, unlimited tokens

https://sllm.cloud

by jrandolf

4/7/2026 at 5:59:19 AM

I received an email mentioning that earlier cohorts are canceled.

Apparently, the earlier pricing was better for us as customers because I had the option to opt for a lower price, i.e $10 per month with a one month commitment and see how the platform evolves and then sign up for other models post testing as needed.

I am not sure how long this new cohort will take to fill now. A slightly better option (looking back) would have been to take multiple options from the customer list and start with the one that meets the threshold.

by dockerd

4/7/2026 at 7:36:22 AM

First, thanks for signing up early. It means a lot.

The $10/mo price needed 465 people to fill a cohort before we could turn on a single GPU. People signed up and churned while waiting, so we looked at the reservation pattern and determined 80 slots was optimal. This reflects in the new price and throughput.

We're considering a 1-week option so people can test it out before committing to a full month. Would that help?

by jrandolf

4/7/2026 at 4:43:33 AM

Thanks to everyone who shared feedback. We’re implementing it now.

Here’s what’s changed:

- We’ve removed the other LLMs for now and are focusing entirely on Qwen 3.5. We’ll bring back additional smaller models later, but most usage was already concentrated on Qwen 3.5.

- Pricing is now around $50. You get roughly 2× the throughput (61 tok/s vs. 31 tok/s, verified in testing), and it’s still unlimited. For context, that’s about 158M tokens per month. Comparable providers like Novita charge around $3.2 per million tokens, so this comes out to roughly 10% of typical token costs.

- Context size is now capped at 32K tokens. For the vast majority of use cases, this is more than sufficient.

by jrandolf

4/4/2026 at 5:15:19 PM

This is an excellent idea, but I worry about fairness during resource contention. I don't often need queries, but when I do it's often big and long. I wouldn't want to eat up the whole system when other users need it, but I also would want to have the cluster when I need it. How do you address a case like this?

by freedomben

4/4/2026 at 5:36:01 PM

We implement rate-limiting and queuing to ensure fairness, but if there are a massive amount of people with huge and long queries, then there will be waits. The question is whether people will do this and more often than not users will be idle.

by jrandolf

4/4/2026 at 6:19:05 PM

Rate limit essentially is a token limit

by mogili1

4/4/2026 at 7:28:09 PM

It depends on how it's implemented. If it's a fixed window, then your absolute ceiling is tokens/windows in a month. If it's a function of other usage, like a timeshare, you're still paying for some price for a month and you get what you get without paying more per token. There's an intrinsic limit based on how many tokens the model can process on that gpu in a month anyway, even if it's only you.

by ibejoeb

4/4/2026 at 10:51:48 PM

Time x capacity is also a limit. There's always a limit.

by delusional

4/4/2026 at 5:48:10 PM

Is there any way to buy into a pool of people with similar usage patterns? Maybe I'm overthinking it, but just wondering

by freedomben

4/4/2026 at 7:34:47 PM

I think it'd be best to pool with people with different patterns, not the same patterns. Perhaps it would be best to pool with people in different timezones, and/or with different work/sleep schedules.

If everyone in a pool uses it during the ~same periods and sleeps during the ~same periods, then the node would oscillate between contention and idle -- every day. This seems largely avoidable.

(Or, darker: Maybe the contention/idle dichotomy is a feature, not a bug. After all, when one has control of $14k/month of hardware that is sitting idle reliably-enough for significant periods every day, then one becomes incentivized to devise a way to sell that idle time for other purposes.)

by ssl-3

4/4/2026 at 11:49:09 PM

This is basically why the big companies can sell subscriptions for cheaper than API costs. First priority can go to API users, lower priority subscription users get slotted in as space/SLO allows, and then sell the remaining idle GPU to batch users and spare training. Oh and geography shift as necessary for different nations working hours.

by vineyardmike

4/4/2026 at 6:46:32 PM

To be fair this is the price you pay for sharing a GPU. Probably good for stuff that doesn't need to be done "now" but that you can just launch and run in the background. I bet some graphs that show when the gpu is most busy could be useful as well

by petterroea

4/4/2026 at 6:55:57 PM

This problem sounds like an excellent opportunity. We need a race to the bottom for hosting LLMs to democratize the tech and lower costs. I cheer on anyone who figures this out.

by pokstad

4/5/2026 at 2:16:57 AM

This is classic queuing theory, rate limits etc. I don't have an answer but I would look there.

by mememememememo

4/5/2026 at 6:57:33 AM

What if you could group multiple of them. Long queries run on the group that’s commonly doing those. Shorter queries que faster because they’ll execute faster.

by taraindara

4/4/2026 at 9:21:06 PM

Ultimately the most sensible way of handling this is you end up with "surge pricing" for the highest-priority tokens whenever the inference platform is congested, over and above the base subscription (but perhaps ultimately making the subscription a bit cheaper).

by zozbot234

4/4/2026 at 8:59:14 PM

Also, cache ejection during contention qill degrade everyones service.

I question whether they actually understand LLMs at scale.

by cyanydeez

4/4/2026 at 9:07:53 PM

I suppose it's meant to be a "minimum viable" third-party inference platform, where you're literally selling subscription-based access (i.e. fixed price, not PAYGO by token) to a single GPU cluster, and then only once enough users subscribe to make it viable (which is very nice from them, it works like a Kickstarter/group coupon model and creates a guaranteed win-win for the users). But they could easily expand to more than just the minimum cluster size, which would somewhat improve efficiency. (Deepseek themselves scale out their model over huge amounts of GPUs, which is how they manage to price their tokens quite cheap.)

by zozbot234

4/4/2026 at 6:33:07 PM

> How does billing work?

> When you join a cohort, your card is saved but not charged until the cohort fills. Stripe holds your card information — we never store it. Once the cohort fills, you are charged and receive an API key for the duration of the cohort.

Have any cohorts filled yet?

I’m interested in joining one, but only if it’s reasonable to assume that the cohort will be full within the next 7 days or so. (Especially because in a little over a week I’m attending an LLM-centered hackathon where we can either use AWS LLM credits provided by the organizer, or we can use providers of our own choosing, and I’d rather use either yours or my own hardware running vLLM than the LLM offerings and APIs from AWS.)

I’d be pretty annoyed if I join a cohort and then it takes like 3 months before the cohort has filled and I can begin to use it. By then I will probably have forgotten all about it and not have time to make use of the API key I am paying you for.

by QuantumNomad_

4/4/2026 at 8:03:02 PM

No cohorts have been filled yet. We're still early. We are seeing reservations pick up quickly, but I'd be able to give you a more concrete estimate of fill velocity after about a week.

That said, we're planning to add a 7-day window: if a cohort doesn't fill within 7 days of your reservation, it cancels automatically and your card is released. We don't want anyone's payment method sitting in limbo indefinitely.

by jrandolf

4/5/2026 at 1:36:27 AM

This is a fantastic idea.

On a nonzero number of occasions I have priced the cost of running an inference server with a model that is actually usable and the annual cost is astronomical.

by tcdent

4/4/2026 at 7:01:36 PM

I read the FAQ, and I can't imagine this is going to work the way you want it to. It fundamentally doesn't make sense as a business model.

I can sign up for a cohort today, but there's not even a hint of how long it will take the cohort to fill up. The most subscribed cohort is only at 42% (and dropping), so maybe days to weeks? That's a long time to wait if you have a use case to satisfy.

And then the cohort expires, and I have to sign up for another one and play the waiting game again? Nobody wants that level of unreliability.

Also, don't say "15-25 tok/s". That is a min-max figure, but your FAQ says that this is actually a maximum. It makes no sense to measure a maximum as a range, and you state no minimum so I can only assume that it is 0 tok/s. If all users in the cohort use it simultaneously, the best they're getting is something like 1.5 tok/s (probably less), which is abyssmal.

You mention "optimization", but I have no idea what that means. It certainly doesn't mean imposing token limits, because your FAQ says that won't happen. If more than 25 users are using the cohort simultaneously, it is a physical impossibility to improve performance to the levels you advertise without sacrificing something else, like switching to a smaller model, which would essentially be fraud, or adding more GPUs which will bankrupt you at these margins. With 465 users per cohort, a large chunk of whom will be using tools like OpenClaw, nobody will ever see the performance you are offering.

The issue here is you are trying to offer affordable AI GPU nodes without operating at a loss. The entire AI industry is operating at a loss right now because of how expensive this all is. This strategy literally won't work right now unless you start courting VCs to invest tens to hundreds of millions of dollars so you can get this off the ground by operating at a loss until hopefully you turn a profit at some point in the future, but at that point developers will probably be able to run these models at home without your help.

by RIMR

4/4/2026 at 9:02:23 PM

Going on ChatGPT.com and using their AI for 24 hours doesn't mean you are actually using their LLM for 24 hours. It's only live for as long as the output is being generated. You reading, waiting for tool calls, etc. don't count toward concurrency. Factor in time-zones, lunch times, etc...it's more likely that we'd have an underutilization problem.

For filling up the cohorts, I agree and we're launching for a week to gather feedback.

by jrandolf

4/4/2026 at 8:20:31 PM

> Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s.

> deepseek-v3.2-685b, $40/mo/slot for ~20 tok/s, 465 slots total

> 465 users × 20 tok/s = 9,300 tok/s needed

> The node peaks at ~3,000 tok/s total. So at full capacity they can really only serve:

> 3,000 ÷ 20 = 150 concurrent users at 20 tok/s

> That's only 32% of the cohort being active simultaneously.

by MuffinFlavored

4/4/2026 at 8:22:39 PM

People work 8 hours a day presumably, I guess they are banking on this idea

by artificialprint

4/5/2026 at 3:27:45 AM

only works if the users are evenly distributed around the globe (which is likely more of less the case). if the user concentrates in on century, the token rate will be terrible.

by ycui1986

4/4/2026 at 4:55:35 PM

This is a great idea! I saw a similar (inverse) idea the other day for pooling compute (https://github.com/michaelneale/mesh-llm). What are you doing for compute in the backend? Are you locked into a cohort from month to month?

by mmargenot

4/4/2026 at 5:08:40 PM

How is the time sharing handled? I assume if I submit a unit of work it will load to VRAM and then run (sharing time? how many work units can run in parallel?)

How large is a full context window in MiB and how long does it take to load the buffer? I.e. how many seconds should I expect my worst case wait time to take until I get my first token?

by kaoD

4/4/2026 at 5:26:44 PM

vLLM handles GPU scheduling, not sllm. The model weights stay resident in VRAM permanently so there's no loading/unloading per request. vLLM uses continuous batching, so incoming requests are dynamically added to the running batch every decode step and the GPU is always working on multiple requests simultaneously. There is no "load to VRAM and run" per request; it's more like joining an already-running batch.

TTFT is under 2 seconds average. Worst case is 10-30s.

by jrandolf

4/4/2026 at 7:15:46 PM

> The model weights stay resident in VRAM permanently so there's no loading/unloading per request.

Yes, I was thinking about context buffers, which I assume are not small in large models. That has to be loaded into VRAM, right?

If I keep sending large context buffers, will that hog the batches?

by kaoD

4/4/2026 at 8:14:08 PM

Not if you are the only one. We have rate limits to prevent this in case, idk, you share your key with 1000 people lol.

by jrandolf

4/4/2026 at 5:22:40 PM

> how many work units can run in parallel

not original author but batching is one very important trick to make inference efficient, you can reasonably do tens to low hundreds in parallel (depending on model size and gpu size) with very little performance overhead

by ninjha

4/4/2026 at 5:04:57 PM

What a brilliant idea!

Split a "it needs to run in a datacenter because its hardware requirements are so large" AI/LLM across multiple people who each want shared access to that particular model.

Sort of like the Real Estate equivalent of subletting, or splitting a larger space into smaller spaces and subletting each one...

Or, like the Web Host equivalent of splitting a single server into multiple virtual machines for shared hosting by multiple other parties, or what-have-you...

I could definitely see marketplaces similar to this, popping up in the future!

It seems like it should make AI cheaper for everyone... that is, "democratize AI"... in a "more/better/faster/cheaper" way than AI has been democratized to date...

Anyway, it's a brilliant idea!

Wishing you a lot of luck with this endeavor!

by peter_d_sherman

4/4/2026 at 8:21:26 PM

Didn't make sense to launch multiple 10 and 40 bucks subscriptions right at the start, because now they are competing with each other.

Also mobile version is a bit broken, but good idea and good luck!

by artificialprint

4/4/2026 at 8:25:09 PM

I'm feeling it Mr. Crabs.

by jrandolf

4/4/2026 at 5:25:09 PM

$40/mo for deepseek r1 seems steep compared to a pro sub on open ai /claude unless you run 24x7. im not sure how sharing is making this affirdable.

by varunr89

4/4/2026 at 5:38:42 PM

> $40/mo for deepseek r1 seems steep compared to a pro sub on open ai /claude unless you run 24x7.

"Running 24x7" is what people want to do with openclaw.

by lelanthran

4/5/2026 at 7:58:58 AM

Seems like they have a rate limit so it is kinda the same as normal subs - don’t really see the advantage yet

by mrklol

4/5/2026 at 3:27:14 PM

> Seems like they have a rate limit so it is kinda the same as normal subs - don’t really see the advantage yet

It's not really the same "limit", AIUI.

SLLM: Being capped to the rate would make your openclaw run slowly, but still be able to work 24x7.

Normal Subs: Hiting the limit means your openclaw doesn't run at all for hours.

by lelanthran

4/5/2026 at 2:24:03 PM

Presumably the rate limit is much higher

by wongarsu

4/5/2026 at 2:18:49 AM

Yes you don't choose this for the price. But because you want to control yout dependencies.

by mememememememo

4/4/2026 at 5:26:22 PM

This is the most "Prompted ourselves a Shadcn UI" page I've seen in a while lol

I dig the idea! I'm curious where the costs will land with actual use.

by Lalabadie

4/4/2026 at 5:27:36 PM

Thanks lol. I actually like Shadcn's style. It's sad that people view it as AI now.

by jrandolf

4/7/2026 at 6:03:09 AM

As expected, this was a scam just to get email addresses.

by sunsation

4/7/2026 at 7:16:17 AM

We collect emails to notify you when the cohort fills or any important information such as cancellation. No one's selling your email.

Also, please read https://news.ycombinator.com/newsguidelines.html. HN is a community for thoughtful discussion.

by jrandolf

4/5/2026 at 10:34:20 AM

Especially with only 1mo commitment, what happens if there's a lot of churn after the first month – more people leave a cohort than are waiting for one? The whole cohort is then waiting for it to fill again before it restarts? And will people waiting for the next cohort to fill automatically be reassigned to the last (now not full) one anyway, or would there then be multiple partially filled cohorts for a single spec?

I like the idea, I just wouldn't want my subscription to suddenly be on hold because a peer decided to stop theirs.

by OJFord

4/4/2026 at 7:13:25 PM

Interesting direction. One adjacent pattern we've been working on is a bit less about partitioning a shared node for more tokens, and more about letting developers keep a local workflow while attaching to an existing remote GPU via a share link / CLI / VS Code path. In labs and small teams we've found the pain is often not just allocation, but getting access into the everyday workflow without moving code + environment into a full remote VM flow. Curious whether your users mostly want higher GPU utilization, or whether they also want workflow portability from laptops and homelabs. I'm involved with GPUGo / TensorFusion, so that's the lens I'm looking through.

by tensor-fusion

4/4/2026 at 5:00:48 PM

1. Is the given tok/s estimate for the total node throughput, or is it what you can realistically expect to get? Or is it the worst case scenario throughput if everyone starts to use it simultaneously?

2. What if I try to hog all resources of a node by running some large data processing and making multiple queries in parallel? What if I try to resell the access by charging per token?

Edit: sorry if this comment sounds overly critical. I think that pooling money with other developers to collectively rent a server for LLM inference is a really cool idea. I also thought about it, but haven't found a satisfactory answer to my question number 2, so I decided that it is infeasible in practice.

by vova_hn2

4/4/2026 at 5:12:27 PM

1. It's an average. 2. We have sophisticated rate limiter.

by jrandolf

4/4/2026 at 6:35:52 PM

Does it take user time zones into account?

by poly2it

4/4/2026 at 6:44:56 PM

Yes

by jrandolf

4/5/2026 at 11:56:59 PM

1 week or even 1 day windows would be great, especially just to test it at this early stage

by pgbouncer

4/6/2026 at 12:12:52 AM

There are a number of on-demand providers of GPU compute out there. It is relatively straightforward to run inference on them.

I've got a box of 8x MI300x sitting around waiting for stuff like this.

$128...

8x MI300X Bare Metal - $15.94/hour (1 available) CPU: Xeon Platinum 8462Y+ (64 cores) • Memory: 2.0 TiB • Disk: 124 TB • Minimum Reservation: 8 hours

ssh admin.hotaisle.app

(i'm ceo)

by latchkey

4/4/2026 at 6:52:57 PM

Do you own the GPUs or are you multiplexing on a 3rd party GPU cloud?

by p_m_c

4/4/2026 at 8:42:00 PM

Multiplexing on a GPU cloud.

by jrandolf

4/5/2026 at 8:38:02 AM

what is the main moat of your idea? privacy? otherwise it looks like a less flexible API compared to what chutes.ai or openrouter.ai providing. and they have TEE instances, which are more private. also why did u decide on launching V3 instead of some much more exciting models revealed recently like MiMo-V2-Pro or Arcee's Trinity Large?

by dreamdayin9

4/7/2026 at 5:53:12 AM

You're right that we're less flexible than OpenRouter or Chutes. We don't let you hop between models per-request. If you want that, use those. If you want predictable cost and guaranteed throughput on one model, that's us.

On TEE: yeah, it's stronger, but it also adds cost and latency. We run dedicated hardware with no prompt logging and an isolated proxy. For most people who just don't want their data in someone's training set, that's enough. If your threat model is more serious than that, we're not the right choice.

On models: we are focusing on Qwen for now. We add based on demand. Would you actually use MiMo-V2-Pro or Trinity if we had them?

by jrandolf

4/5/2026 at 7:48:21 AM

> Prices start at $5/mo for smaller models.

Is there actually any $5/mo offering? It seems like the cheapest models start at $10.

by yoavm

4/4/2026 at 8:14:53 PM

This is great, thanks!

I personally would like something like this but with "regular" GPU access. Some people still use them for something other than LLMs ^^.

by moralestapia

4/4/2026 at 8:42:39 PM

There is vast.ai!

by jrandolf

4/4/2026 at 9:38:07 PM

Wow!

I recall hearing about them years ago.

Good to see they're thriving!

by moralestapia

4/5/2026 at 5:25:38 PM

hotaisle.xyz has amd mi300x VMs for $1.99/gpu/hr. on-demand, billed by the minute.

(i'm the ceo)

by latchkey

4/4/2026 at 5:11:16 PM

25 t/s is barely usable. Maybe for a background runner

by singpolyma3

4/4/2026 at 5:33:40 PM

> 25 t/s is barely usable. Maybe for a background runner

That's over a 1000 words/s if you were typing. If 1000 words/s is too slow for your use-case, then perhaps $5/m is just not for you.

I kinda like the idea of paying $5/m for unlimited usage at the specified speed.

It beats a 10x higher speed that hits daily restrictions in about 2 hours, and weekly restrictions in 3 days.

by lelanthran

4/4/2026 at 6:48:15 PM

Sure if it was just a matter of typing. But in practise it means sitting and staring for minutes at nothing happening with a "thinking" until something finally happens.

I mean my local 122b is only 20t/s so for background stuff it can be used for that. But not for anything interactive IME.

by singpolyma3

4/4/2026 at 9:32:31 PM

> I mean my local 122b is only 20t/s so for background stuff it can be used for that. But not for anything interactive IME.

What are you running that local 122b on? I mean, this looks attractive to me for $5/m running unlimited at 20t/s-25t/s, but if I could buy hardware to get that running locally, I don't mind doing so.

by lelanthran

4/4/2026 at 11:19:24 PM

Framework desktop

by singpolyma3

4/4/2026 at 5:04:21 PM

It seems crazy to me that the "Join" button does not have a price on it and yet clicking it simply forwards you to a Stripe page again with no price information on it. How am I supposed to know how much I'm about to be charged?

by spuz

4/4/2026 at 5:09:01 PM

That was an error on our part lol. We'll update with the price.

by jrandolf

4/4/2026 at 10:09:25 PM

Interesting there's a trickle of low intensity job one can always get running but like glm own plan is $30/mo and something about 300tps now I know that one is subsidized but still.

by avereveard

4/5/2026 at 6:45:55 AM

Once you're in a cohort how do you actually use it?

by rendaw

4/5/2026 at 5:14:07 PM

You get an API key

by jrandolf

4/4/2026 at 7:16:47 PM

Pretty cool idea, but whats the stack behind this? As 15-25 tok/s seems a bit low as expected SoA for most providers is around 60 tok/s and quality of life dramatically improves above that.

by scottcha

4/7/2026 at 4:49:50 AM

15-25 was a rate based on oversubscription. Now it's 60 like others :).

by jrandolf

4/4/2026 at 10:52:36 PM

Interesting concept. One thing I’m curious about if I’m in a cohort for something like DeepSeek V3 and another user spins up a heavy 24/7 job, how do you keep TTFT from degrading? vLLM’s continuous batching helps, but there’s still a physical limit with shared VRAM/compute. I’ve been grappling with this exact 'noisy neighbor' issue while building Runfra. We actually ended up moving toward a credit per task model on idle GPUs specifically to avoid that resource contention entirely.

Curious how you’re thinking about isolation here. Is there any hard guarantee on a 'slice' of the GPU, or is it mostly just handled by the vLLM scheduler?

by spencer9714

4/4/2026 at 5:08:59 PM

Is this not a more restricted version of OpenRouter? With OpenRouter you pay for credits that can be used to run any commercial or open-source model and you only pay for what you use.

by spuz

4/4/2026 at 5:14:40 PM

OpenRouter is a little different. We are trying to experiment with maximizing a single GPU cluster.

by jrandolf

4/4/2026 at 7:32:51 PM

Can you explain the benefits over something like openrouter?

by IanCal

4/4/2026 at 7:41:08 PM

24/7 LLM for $10/month.

by jrandolf

4/4/2026 at 9:28:36 PM

Isn't this a bad deal? Or is there an error in my math?

For $40, I'd get 20 tok/s * 2.6M seconds per month = 52M tokens of DeepSeek v3.2 per month if I run it 24/7, which is not realistic for most workloads.

On OpenRouter [1], $40 buys 105M tokens from the same model, which is more than 52M tokens, and I can freely choose when to use them.

[1]: https://openrouter.ai/deepseek/deepseek-v3.2

by johndough

4/4/2026 at 9:35:12 PM

20 tok/s is an average. It can be more, it can be less. If you are running off-peak I'm sure you'd get some crazy number.

by jrandolf

4/5/2026 at 2:05:26 PM

That doesn’t matter when you have the average. Even if you are somehow able to get 10000tok/s during off peak times, by virtue of how averages work, you’re still only getting 52M tokens per month (as calculated above).

by KMnO4

4/5/2026 at 3:46:53 AM

Why wouldn't developers just do llm arbitrage against openrouter if it is a better deal?

by gravypod

4/5/2026 at 3:56:31 AM

The problem is different. OpenRouter is a router to LLMs. It doesn't solve GPU underutilization.

by jrandolf

4/5/2026 at 5:11:51 AM

What I am saying is if your system lets me pay $x/token and open router lets me pay $y/token if x<y then someone could make money just by providing those tokens through the open router API. That would either drive up demand for your systems increasing costs or drive up supply on open router decreasing costs. Eventually the costs would converge, no?

by gravypod

4/5/2026 at 7:05:35 AM

For the same reason people don’t do server arbitrage because Hetzner is cheaper than AWS.

by victorbjorklund

4/5/2026 at 4:51:19 AM

> nobody is charged until the cohort fills

So then what happens if some people's payment method fails once you do charge?

by wavemode

4/5/2026 at 5:46:25 AM

> So then what happens if some people's payment method fails once you do charge?

I expect its a pre-auth, like car rental companies do; a pre-auth gives you a code from the card issuer and an expiry. The issuer will reserve the amount on the cardholders account, and only perform the transaction to the merchant once the merchant sends a second message with the pre-auth code.

by lelanthran

4/4/2026 at 5:02:00 PM

Like vast.ai and TensorDock, and presumably others.

by esafak

4/4/2026 at 11:11:39 PM

So shared hosting for LLMs?

by bluerooibos

4/7/2026 at 5:55:39 AM

Yes.

by jrandolf

4/6/2026 at 2:41:23 PM

[dead]

by rpdaiml

4/5/2026 at 2:00:33 PM

[dead]

by adamsilvacons

4/4/2026 at 10:02:39 PM

[dead]

by aplomb1026

4/4/2026 at 7:17:31 PM

[dead]

by sacrelege

4/4/2026 at 9:21:41 PM

[dead]

by maxbeech

4/4/2026 at 7:10:02 PM

[flagged]

by aritzdf

4/4/2026 at 9:16:23 PM

[flagged]

by trvz

4/4/2026 at 9:23:50 PM

There's a big difference between non-compliant, illegal, and criminal.

by copperx

4/5/2026 at 8:47:26 AM

[flagged]

by trick-or-treat

4/7/2026 at 6:01:59 AM

The audience here is developers buying API access. They want to see the model, the price, and the throughput, not a hero image and three paragraphs about our mission. Marketing copy between a developer and that information is friction.

by jrandolf

4/5/2026 at 1:58:33 PM

For my part, the code quality of the next.js dashboard isn't even something I'd evaluate.

I instantly get a quick, functional-appearing view of the offering. I can picture how I might interact with it and what mental gymnastics stand between right now and me pulling out a credit card.

I don't see marketing fluff that bring up more questions than it provides answers. I can be pretty certain I won't wind up in a sales funnel from hell, also.

by tapvt

4/7/2026 at 2:01:45 AM

Right but without at least the effort of a sales funnel I have zero confidence that this guy has thought anything through.

This is a single afternoon vibe-coded project by all indications.

by trick-or-treat

4/5/2026 at 12:21:13 PM

“Unlimited tokens” is doing a lot of heavy lifting here.

This feels less like a pricing breakthrough and more like shifting the abstraction down to GPU sharing — which most developers probably don’t want to think about.

Curious how usable this actually feels under contention.

by calvinsun1102

4/5/2026 at 12:34:27 PM

This comment reeks of AI.

Out of your other three comments in the your entire account’s history, two of them are pretty structurally identical: quote hook + tangentially related question.

What is the ultimate play for all these AI accounts? Warming them up for future astroturfing and marketing? Manipulating upvotes?

by aerhardt