Amazon employees are "tokenmaxxing" due to pressure to use AI tools

5/12/2026 at 4:58:14 PM

The fact that management signed off on measuring AI use through token usage shows how incompetent management really is, including in allegedly technical conmpanies like Amazon. Tokenmaxxing was an entirely expected and rational response. IOW You measure employees in stupid ways, you're going to get stupid behaviour as a consequence.

by i7l

5/12/2026 at 5:31:32 PM

One argument I have heard in favour of this is that management knew this would be a side effect, but that it's more important to have people engage with AI as much as possible simply to explore what is actually possible. You are effectively knowingly wasting money in the expectation that you might learn something useful that will be more valuable in the long run.

by this_user

5/12/2026 at 9:17:02 PM

If companies are suddenly willing to spend money on letting their staff experiment, why not let them experiment with what they want to? They probably know more about technology than you do, otherwise you wouldn't need them.

by oytis

5/12/2026 at 6:42:24 PM

in this instance - it seems like Amazon employees are wasting money exploring ways to waste money.

by aerodexis

5/12/2026 at 6:08:28 PM

My questions for that approach are: Why treat AI as a special technology that needs enterprise-scale exploration to come up with a useful application? And why not take the alternative approach of identifying the subset of people who have indeed found solid uses and spread their best practices around?

The top-down approach to encouraging (mandating?) AI usage strikes me as infantilizing to the workers, who are perfectly capable of choosing which tools they use and when.

by the_snooze

5/12/2026 at 6:18:53 PM

Human nature?

In the early nineties, it was common for experienced electrical engineers to keep on using schematic entry digital design and look down on RTL and synthesis tools, despite that fact the latter was already way more productive. At some point, management had to put their foot down and force everyone to switch to using synthesis.

It's not unreasonable to assume that many people are set in their ways and unwilling to change their behavior without a bit of a push.

by tverbeure

5/12/2026 at 9:26:42 PM

Alpha 21064, 1992, was using domino logic [1].

[1] https://en.wikipedia.org/wiki/Domino_logic

There was no synthesis algorithms that would map VHDL or Verilog designs into domino logic elements at the time. I believe that the most work in the synthesis-to-domino-logic area was done at the beginning of current century.

So, DEC's engineers and, I think, Intel's engineers were doing work using schematics well into 21-st century.

by thesz

5/13/2026 at 2:54:01 AM

In the early nineties, standard cell design was already used by everyone except those who needed clock speed at all cost.

by tverbeure

5/12/2026 at 6:27:45 PM

I guess the only difference between this and your example is the concrete efficiency gain from RTL and synthesis tools versus dubious applications of AI. I do agree with the second point about pushing people to explore new ways of doing things though.

by nophunphil

5/12/2026 at 6:55:05 PM

> dubious applications of AI

Leaving aside the ethical aspects of using AI (not because they're not valid, because they're off topic for this discussion), in my line of work, the capabilities and productivity improvement of AI are staggering. Most of it is not writing the new code, which is but a small part of chip design, but everything else.

I can't give a concrete work example, but here is an experiment that I ran a month ago. https://tomverbeure.github.io/2026/04/12/AMIQ-License-Key-Ge.... If it can do that, it's not hard to imaging similar use cases related to root causing complex simulation failures. It is frighteningly good at that.

by tverbeure

5/12/2026 at 7:19:08 PM

> use cases related to root causing complex simulation failures.

That's a pretty interesting use case. I assume this is for RTL simulation given the thread, but how do you connect the output of the simulator to the AI?

by ua709

5/12/2026 at 7:50:59 PM

For a small case, a colleague took a screenshot of waves in the waveform viewer and pasted it into the AI tool. It worked.

But for large cases, use tools to extract all interfaces from the waveform file and save it as a text file, or add $display statements in the Verilog itself to dump the transactions. A SOTA LLM will eat it up. You point it to the RTL, a log file with hundreds of thousands of lines, and give it a few lines to explain how it is supposed to behave. Just tell it "My simulation is hanging. Figure why." Wait 15 minutes and it will tell you why it hangs and which line to change in your code to fix it.

I've done the experiment after the fact: I had spent ~3 days to fix complicated 3 bugs. I then rolled back the code and told it "Here is the spec. Find all the bugs in this code". It found all 3 bugs in around 30 min. That's when I realized that things won't be the same anymore. (And don't get me wrong: I love debugging simulations.)

by tverbeure

5/12/2026 at 8:07:31 PM

This is why I asked:

>And why not take the alternative approach of identifying the subset of people who have indeed found solid uses and spread their best practices around?

A bottom-up approach has a far better chance of finding those particularly good use cases, and if you lean on the people how found those fits, they're more persuasive than top-down edicts. They actually know what they're talking about. If the point is to leverage AI for better work outcomes, someone with your experience is far more valuable than "here's a dashboard, make the number go up," which seems to be what's going on at Amazon.

by the_snooze

5/12/2026 at 8:30:00 PM

How do you know up front who will find the best use cases? Both approaches can work.

by tverbeure

5/12/2026 at 9:50:10 PM

I'd bet my life savings that the person who is forced to use a tool by top-down edict is less likely to find a valuable use case than the person who is sincerely curious about said tool.

by judahmeek

5/12/2026 at 10:17:01 PM

Your mistake is thinking that all people who don't use it are doing so because they're not curious about it.

by tverbeure

5/12/2026 at 9:33:10 PM

Have you tried to change your HDL to something more modern like Bluespec System Verilog or, god forbid, anything embedded into Haskell or Scala?

I read that BSV source code is about three times shorter than similar design in Verilog and also has three times smaller defect density (defects per significant line of code). So just by changing the HDL from Verilog to BSV one can have nine (9) times less defects in the design.

by thesz

5/12/2026 at 10:19:18 PM

BSV won't help for cases you didn't think about a corner case. (I use SpinalHDL/Scala for all my hobby projects, BTW, and yes, I tend to make less mistakes.)

by tverbeure

5/12/2026 at 7:59:51 PM

SOTA = State of the Art? Like say Claude Opus 4.5? I actually want to try this out.

by ua709

5/12/2026 at 8:08:08 PM

I think I used Opus 4.6 1M.

by tverbeure

5/12/2026 at 8:10:57 PM

Thanks! I'm going to give this a shot on a nasty simulation I'm presently working on... :)

by ua709

5/12/2026 at 7:14:17 PM

It is completely unreasonable to assume that. Tech people are so hungry for productivity gains that they regularly will defy management forbidding them from using a tool, because the tool is so good they feel they have to have it.

If LLMs truly are as good as their proponents say, engineers will use them even if management outright forbade it. The fact that people aren't using them, and have to be forced, is extremely strong evidence that they are not in fact that useful.

by bigstrat2003

5/12/2026 at 7:53:10 PM

> extremely strong evidence that they are not in fact that useful

See my other reply in this subthread. For my line of work, they are in fact ridiculously useful.

by tverbeure

5/12/2026 at 8:01:16 PM

> It's not unreasonable to assume that many people are set in their ways and unwilling to change their behavior without a bit of a push.

You include those only in second round along with guidelines and recommendations on how to use it effectively.

by watwut

5/12/2026 at 8:18:29 PM

What if those people are some of the most experienced ones, who can see use cases, and flaws, that more junior people won't?

by tverbeure

5/12/2026 at 9:58:17 PM

People who have to be forced to try it are unsuitable for the exploratory "find what it is useful for" task regardless of seniority.

Also, we are talking about large companies here. There will be plenty of more suitable seniors.

by watwut

5/13/2026 at 9:47:35 AM

googles 20% project time was a good thing, sadly they dont even seem to do it anymore. for the bulk of corporate workers this brief period of time where they get to play an ai token game is the only break from generating TPS reports all day long.

by blitzar

5/12/2026 at 6:17:04 PM

A tool so good, the workers need to be forced to use it.

by jjk7

5/12/2026 at 9:40:16 PM

Workers are good at their job using tools they know because they had years/decades to hone that craft.

The new tool might make a lot of that experience obsolete. Also, some people that were good with old tools might be great with the new tool. Some may not.

Overall, I don't think it's a bad idea to burn some tokens (and money) to let people experiment.

by creative_name3

5/12/2026 at 8:35:05 PM

Exactly. That's the problem ICs don't want to admit.

Managing a lot of people at scale is messy and you have to use crude solutions. It's impossible to know everything that's going on.

If you were a manager you wouldn't do any better. Out of the crooked timber of humanity, no straight thing was ever made.

by asdfman123

5/13/2026 at 12:40:33 AM

I think that's a convenient excuse for managers at the top to not have to deal with their own sub par middle and lower managers...

by duxup

5/12/2026 at 9:46:08 PM

Your argument that bad processes can't possibly be improved is contradicted by all of recorded history.

by judahmeek

5/12/2026 at 11:01:50 PM

That's because you've misconstrued my argument. My argument is that everything is a tradeoff and while management can be MORE conscious, there's a certain level of bullshit that's inevitable.

But more specifically ICs tend to want to say "if you just let me do what I know is right it would be fine." That's a trade-off, too, though. That solution means a lot of people will be messing around due to no accountability.

by asdfman123

5/13/2026 at 10:32:57 AM

If the only accountability here is token input/output, after automating that employees would be messing around with no accountability either way.

If setting the actual goal you want to achieve as a manager, and then trusting employees to allocate their focus accordingly, you will absolutely have people faffing off (as likely can't be avoided), but at least those who don't will optimize towards what works according to their actual expertise at least.

by croon

5/12/2026 at 7:15:12 PM

All so that they can lose this accumulated knowledge during the next round of layoffs.

by newswangerd

5/13/2026 at 3:36:12 AM

This is induced demand for AI to justify building more datacenters, which will bring AI costs down, and the idea is that will eventually bring demand up organically.

by simulator5g

5/12/2026 at 6:24:05 PM

> engage with AI as much as possible simply to explore what is actually possible

"Research" isn't part of my job title. If you don't know what's possible then why are you deploying it? You should be telling _me_ what's possible. I mean, you _paid_ for it, how can you possibly not know what you were getting?

> in the expectation that you might learn something useful that will be more valuable in the long run.

"I'll take `what even are profits?' for $200, Alex."

by themafia

5/12/2026 at 6:56:31 PM

Hear hear.

An overly generous steelman in my opinion as well. Have 10% of your employees focus on finding ways to properly leverage the new technology - don’t pressure 100% of your employees with bull shit metrics.

by datsci_est_2015

5/12/2026 at 7:09:42 PM

Are the people engaging though, or are they telling the AI "go do some busywork" and then minimizing that window and getting on with their job?

by red_admiral

5/12/2026 at 6:56:45 PM

No, it's literally because some dumb manager read a blog where an influencer said that you ain't a real AI native and ain't worth shit unless your developers are spending $XXXX on tokens each day.

It's that simple.

(Never mind that these bloggers are just writing ad copy for cloud providers.)

by otabdeveloper4

5/12/2026 at 7:49:27 PM

That still sounds like a dumb strategy. Or, more likely, post hoc rationalization.

You reward me for wasting tokens and punish me for not wasting them, I will maximally waste them and wont "explore hownto make them useful". The latter wastes less tokens and that is punished.

by watwut

5/12/2026 at 7:53:53 PM

So my assessment of the current mania is that it’s basically a management variant of Pascal’s wager.

If you as a “leader” refuse to go along with the crowd and you’re right, then after the dust settles you look like someone who guessed right. Oh and now we’re in a recession so you are probably having a bad time regardless. You maybe get one promotion, congratulations.

If you refuse to go along with the crowd and you’re wrong, you look like a Luddite, you probably got fired at some point along the way and your judgement reputation is hurt.

If you do go along with the crowd and the crowd is wrong, you are just in the same boat as everyone else. You are probably about the same as if you went against the crowd and you were right, possibly even better because it can take awhile to be proven right and you could be hurt in the middle.

So, I think, once something like this picks up enough steam, it’s just logical on a per individual basis for everyone to go along with it, regardless of how they feel about it internally.

by pfannkuchen

5/12/2026 at 9:43:04 PM

Yes, leaders can & should be expected to devise experiments to determine what processes might possibly be optimized though AI-assistance.

But doing so properly requires expending a serious amount of cognitive effort & agile methodology, which is the exact opposite of what Amazon's management has demonstrated here.

by judahmeek

5/12/2026 at 11:14:16 PM

Well then the solution is to higher more management or pay them more competitive salaries to get top talent

by s1artibartfast

5/13/2026 at 1:12:26 AM

Or you know you could argue employee productivity should be messured in an evidence based way.

by morpheos137

5/12/2026 at 5:10:27 PM

Depends on what they're trying to incentivise.

It's quite possible they aren't trying to measure performance but are literally just trying to increase token consumption to feed the bubble and hype.

Plus pressure employees may find new unique use cases for AI.

It's like if your goal is inflation, you give out tons of money and as long as its spent, you achieve your goal.

by wordpad

5/12/2026 at 5:31:13 PM

I would guess they are trying to maximize training data

by cousinbryce

5/12/2026 at 5:47:25 PM

If I was being rewarded for using more tokens, I would feed LLM output back into the model. That's probably not very useful training data.

by Zak

5/12/2026 at 7:30:39 PM

I personally know two people who are doing exactly that after a mandate rolled out at their work, the measurement is "tokens spent" and since they weren't finding many cases that required a lot of tokens they simply started to run agent loops feeding each other.

Absurdly wasteful but Goodhart's Law almost never fails.

by piva00

5/12/2026 at 5:14:51 PM

[dead]

by bordumby

5/12/2026 at 5:42:54 PM

[dead]

by estimator7292

5/12/2026 at 5:10:23 PM

Management loves numbers because they’re the only things you can objectively compare as X > Y.

It makes for pretty charts, extrapolations, and projections.

It doesn’t matter if the numbers are not particularly correct. As long as the data gathering step can be justified it’ll do. Though bonus points if making the number bigger is a good thing (v.s. tracking something like number of sev 1 issues).

by koolba

5/12/2026 at 6:28:11 PM

Sounds a bit like a McNamara Fallacy [0] of over-prioritizing numeric measures, which--when taken "too literally"--becomes:

> The first step is to measure whatever can be easily measured. This is okay as far as it goes.

> The second step is to disregard that which can't be easily measured or give it an arbitrary quantitative value. This is artificial and misleading.

> The third step is to presume that what can't be measured easily really isn't very important. This is blindness.

> The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.

— Daniel Yankelovich, "The New Odds"

[0] https://en.wikipedia.org/wiki/McNamara_fallacy

by Terr_

5/12/2026 at 5:11:15 PM

Yes, but also because management is largely unqualified to be managing the stuff they are hired for. So they regress to numbers because they otherwise cannot participate in anything technical.

by delfinom

5/13/2026 at 5:21:16 AM

If this were the end of the story, that would be a correct interpretation of the situation.

At Amazon, something like this is likely a closely watched experiment. They knew it would incentivise waste. But they don't know what the other effects will end up being. Nobody knows -- this thread is full of loose speculation. So Amazon runs the experiment and collects the data.

----

The annoying thing about goals and incentives is that they can either be phrased in terms of input metrics (behaviours within our control) of output metrics (the outcomes we want). Input metrics are bad because they lead to skewed incentives and gaming the metrics. Output metrics are bad because they're largely affected by chance and external circumstances. (This indeed means a goal cannot be SMART on its own, because A and R are typically in tension.)

Amazon knows this. Their WBR structure is essentially about trying to set goals and targets for input metrics, and then carefully observing how input metrics correlate with output metrics. They're using a semi-scientific process to tease out the causal structure of their business. I would assume this token target is followed very closely to learn exactly what its effects are on output metrics that drive revenue and cost.

For more on this, I thnk the best public writing is Carr's Working Backwards and Chin has written about it on Commoncog too.

by kqr

5/13/2026 at 5:34:37 AM

I don't think this strategy is a viable experiment. Far too many uncontrolled variables for a very shallow complexity of "input variables" how you call it.

Simpler explanation management has no ideas and goals and this is a replacement strategy. Because they too are affected by "experimental metrics" to a degree, but that doesn't excuse this trite "science".

Any "answer" this would provide wouldn't be of higher quality than this speculation.

by raxxorraxor

5/13/2026 at 5:40:37 AM

I thought that, managers are employees to the corporate too, they're themselves measured and they need proof of work to get paid just like campus janitors.

If a manager or a manager's workforce under it just sat around and ignored AI just because it's stupid and irrelevant and useless, they lose one tool to justify their existence amongst their peers who do not express such views. If they sat around and did their jobs as-before WHILE "investing" on tokenmaxxing, they gain a double dip-able vanity metric like "we spent 12.34 quadrillion tokens last quarter" plus "our new method helped us reduce token count by 10^24 this quarter".

You may call it a fraudulent behavior from a hypothetical shareholder's perspective in this hypothetical scenario, which it is, and call it Goodhart's law scenario too, which it also is, but it's a completely normalized behavior in relative terms. Project Hail Mary is a lighthearted work of fiction.

by numpad0

5/12/2026 at 6:52:21 PM

I have recently played around with lots of data from measurements and one can totally dump everything into context and let Claude try to analyze data that way. It burns through a lot of tokens. It is smarter to save data to disk and let Claude write scripts that handles/analyzes the data. It’s much faster and the results are much better and you save a lot of tokens. But I guess Amazon prefers the first approach.

by _fizz_buzz_

5/12/2026 at 9:05:01 PM

I don’t have any specific inside knowledge about Amazon, but I would hazard a guess that the first approach also provides better training material for the LLM.

by runsfromfire

5/12/2026 at 5:34:03 PM

My current job is doing the exact same thing. My manager even showed me a tool with graphs showing token use and related metrics.

by spike021

5/12/2026 at 8:12:26 PM

If it's stupid and it works then it's not stupid. Sometimes executives have to use blunt instruments to turn around the culture of a hidebound large organization. When Jeff Bezos sent his 2002 API mandate it might have seemed stupid at the time and yet it worked.

https://nordicapis.com/the-bezos-api-mandate-amazons-manifes...

by nradov

5/12/2026 at 9:48:35 PM

Stupid things that work are still stupid. There's a reason we have the expression "a broken clock is right twice a day". Moreover, evidence so far seems to suggest that this AI push is not working for Amazon.

by bigstrat2003

5/12/2026 at 8:02:24 PM

> You measure employees in stupid ways, you're going to get stupid behaviour as a consequence.

I worked for a healthcare tech startup that made everyone wear fitbits and you got cheaper health insurance premiums if you averaged a higher # of steps every day. People were putting their fitbits on drillbits and whirring them around to log like 20,000 steps a day.

by randycupertino

5/12/2026 at 5:55:42 PM

This is Matt Garman, the ultimate MBA. Bonus for sure tied to tokens-per-quarter, which is the 2026 equivalent of measuring engineers by lines of code...

This why AWS is bleeding good engineers for years. What is left is starting to look like Boeing post McDonnell merger...

They took out a quarter of their documentation page limited real estate, with AI doc shorts nobody asked for, nobody needs, and cant disable.

by johnbarron

5/13/2026 at 12:39:18 AM

Goodhart's law in action.

The moment they made it a metric they failed to do anything useful.

by duxup

5/12/2026 at 5:15:50 PM

Goodhart's law in action.

by consp

5/13/2026 at 5:30:48 AM

Agreed. You really should replace the manger that made that policy as soon as possible. This is a playbook example of the corporate rot that leads to decline in a once innovative space.

by raxxorraxor

5/12/2026 at 6:58:53 PM

Most productivity metrics are stupid, vain attempts at avoiding doing real management work. If you are actually interfacing with your subordinates regularly, as managers should, it will be obvious who is pulling their weight and who isn't, no need for arbitrary statistics that are easily gamed.

by babypuncher

5/12/2026 at 5:51:54 PM

Or maybe they plan to review how effective high usage engineers have been next cycle and the tokenmaxxers will get bit in the ass when they have little to show for all their wasted tokens? Performance metrics can, and do, change on a dime and tokenmaxxing seems short sighted when management can look at old logs.

by HDThoreaun

5/12/2026 at 6:55:04 PM

[dead]

by mlvljr

5/12/2026 at 5:03:09 PM

I swear the industry is being Garry Tanned.

Senior management let go our localisation staff. Now they want us to use AI to translate. They still want manual review.

We use Github Copilot at work, we get a measly 300 requests with the budget to go over if necessary. Opus 4.7 or GPT 5.5 would eat all of those up in a day. Are we supposed to be using more than the allotted amount, do management see that as a good thing. Or is it best to stick within the allocated amount. Who knows? Management are playing games everywhere it seems.

by Argonaut998

5/12/2026 at 5:39:03 PM

It's just not with AI though. It's who they get their advise from. One of my friend was cribbing to me about his company management - apparently someone in management discovered that PostgresDB is a real good database and free, and so they authorised the IT department to migrate their application from Oracle Cloud to PostgresDB as it will "save a lot of money" (true, but...). However, they aren't willing to shell out for the commercial solutions (like EnterpriseDB, which would be still a lot cheaper than Oracle), and are insisting that the team also recreate "all and every" feature that Oracle DB has and is used by their application, but is lacking in PostgresDB - after all, "If Oracle can do it, why can't you!?".

by thisislife2

5/12/2026 at 8:03:28 PM

Memories of me and my three developer team being told "we need to use Excel, but in a web browser. So just make an app that does everything like Excel"

by mrhottakes

5/12/2026 at 5:48:11 PM

Wow... How (in)competent is his management??? "If Oracle can do it"... in 25 years with 1k devs...

by tandr

5/12/2026 at 6:21:50 PM

47 years if you count from the first release. But now you have this super intelligent thing that enables anyone to create a billion dollar business - you have no excuse!

by kgwgk

5/12/2026 at 9:18:08 PM

"Hey Opus, create me an fully tested code base for Oracle-like DB from scratch. Don't overcomplicate it, so it should be ready with when I get back from lunch"?

by tandr

5/12/2026 at 6:21:30 PM

I had a similar experience but with MSSQL, was invited to join some meetings with MS Sales folks. I quickly learned the project was never meant to succeed, but was simply leverage to negotiate a better contract.

by jjk7

5/12/2026 at 5:20:45 PM

Requests are such a weird metric. We have a token limit via Copilot (unless I'm misunderstanding our setup), and most of my "features" burn 1 to 2% of my token limit per month on 4.7. But I don't admin our plan, and I'm unsure what we actually git. Vscode just gives me a percentage of tokens remaining metric.

One of the weirder things about all this is how arbitrary and non objective the billing structure seems. One of the reasons I'm happy to use it at work, but won't ever personally subscribe. It's so opaque.

by birdsongs

5/12/2026 at 6:16:19 PM

Copilot is currently based on requests (1 prompt = 1 request, with multipliers for different models). At the beginning of June the billing structure will change to just be normal API cost. Your features are going to start burning 10-20% of your token limit using 4.7

by phainopepla2

5/12/2026 at 7:19:51 PM

At my employer, everybody who has an opinion that matters is convinced that all of the overages by the high users will be more than made up for by the people who barely use it.

Maybe they’re right. But it’s really hard to see how.

by stockresearcher

5/12/2026 at 5:15:40 PM

We've raised, trained, hired and promoted generations of business people who push utter nonsense, understand nothing but optimizing for bad metrics, and orient solely around short term results. It's hard to look beyond modern corporate America when looking for causes of the fall in our living standards. This AI tokenmaxxing nonsense is just another rung on the same ladder to hell we've been on for decades.

by csoups14

5/12/2026 at 5:05:44 PM

How you burn 300 requests in a day? From my Copilot usage Opus consumes surprisingly few requests to do a lot of stuff. It isn’t paying by token but instead by prompt or something.

by nextlevelwizard

5/12/2026 at 5:18:55 PM

I guess you need automation for that. Run claude with cron to find fulnerabilities, suggest and implement improvements, automatically dig through backlog

by oytis

5/12/2026 at 9:37:37 PM

Review-fix rounds after generation of text or code, until convergence to a solution that doesn't need more improvements.

by humanfromearth9

5/12/2026 at 6:55:40 PM

Copilot charges a 27X multiplier on Opus 4.7 prompts.

by ddtaylor

5/12/2026 at 5:18:19 PM

300 prompts in a day isn't that unreasonable to achieve on a heavy day? And Opus has a significant multiplier as well

by theblazehen

5/12/2026 at 7:04:45 PM

Yeah, thats 20 Opus 4.7 prompts.

by antod

5/12/2026 at 5:21:13 PM

Opus 4.7 has a 7.5x multiplier when it's used from Copilot. Falling back to 4.6 it's only 3.5x

by coredog64

5/12/2026 at 7:03:11 PM

It recently went up to 15x in our org.

by antod

5/12/2026 at 5:09:24 PM

If you are using subagents for asynchronous work, you can burn through 300 requests in a workday easily.

by devmor

5/12/2026 at 7:13:52 PM

Copilot didn't charge for subagents. You could do an insane amount of work with dozens of subagents with a single request and a deep enough prompt to kick it off.

I setup entire virtual teams (Dev, QA, product, reviewers etc with the initiating model just acting as the agent manager to keep it's context minimal) to one-shot some stuff and it kept churning and making progress.

Those days are just about over with the change to token pricing but for a time....

by saratogacx

5/13/2026 at 10:38:38 AM

[flagged]

by hrpulsar

5/12/2026 at 5:07:40 PM

Saw a good joke on twitter about it. Something like:

"You spent $23, over the $20 food limit. Be more careful next time. You spent $600 on tokens, $200 more than the average. Congratulations!"

by asdfman123

5/12/2026 at 5:31:04 PM

https://x.com/vasuman/status/2053956365052240263

> whoever spent $600 on Anthropic last night, great job leveraging Al! But to the person who spent $23 on Uber Eats please remember our limit for food is $20 per meal

by Handy-Man

5/12/2026 at 9:11:31 PM

> Fake because no uber eats order is under 30$

by nzealand

5/13/2026 at 9:53:11 AM

Not as long as I get discounts for over $30! I want those 15%, even if I have to pay 10 bucks more! Wait a second...

by lionkor

5/12/2026 at 5:15:02 PM

I work at Amazon (standard disclaimer: just sharing my own experience, not an official spokesperson, etc.)

I can't say that this isn't happening, but at least the parts of the company I get visibility into, what the article describes isn't my experience. There is a lot of interest in using GenAI, but people are mostly getting kudos around creative uses for GenAI, not just for raw amount of tokens. For most scaled GenAI efforts, there is a lot of focus on output metrics (metrics like accuracy, number of findings, number of things fixed, and so on).

by jkingsbery

5/12/2026 at 6:21:26 PM

I also work at Amazon and my coworkers are playing 20 questions every morning to keep their metrics up. Like anything else there it depends on your org & managers.

by cactacea

5/12/2026 at 5:27:58 PM

Thanks for the inside insight.

I'm surprised how few comments are written with the prior that Amazon managers aren't stupid or uninformed about how incentives work.

My guess would be that someone created the leaderboard without a lot of consultation with managers, and that some employees feel a competitive urge to try to "win" the leaderboard by burning tokens.

by thinkling

5/12/2026 at 6:01:30 PM

Your comment is the equivalent of stating, that Jeff Bezos and Andy Jassy, do not really know their employees are carrying around urine bottles.

by johnbarron

5/12/2026 at 8:24:41 PM

Mmm, no, I don’t think it’s equivalent. I think they know that if you make the work hard, some employees will have trouble keeping up and will do things like peeing in bottles. And they’re OK with that, because they think there are enough people who can keep up that they can push the weaker people out. I think they believe that the peeing in bottle is relatively rare. I’m unsure whether that’s right or not. It’s been reported that it happens, but I have no sense whether it’s common.

by thinkling

5/13/2026 at 11:10:32 AM

Wilful ignorance much?

by yfw

5/12/2026 at 7:12:42 PM

Management is told its a homebrew. Very commonly used by developers all over the industry.

by geodel

5/12/2026 at 5:25:08 PM

> There is a lot of interest in using GenAI, but people are mostly getting kudos around creative uses for GenAI,

LOL, I'd imagine even Amazon HR would be little restraint in showering such praise.

by geodel

5/12/2026 at 6:05:50 PM

Amazon is a massive company, your single experience is worse than an anecdote because there is no way verify.

What we can verify is how how Amazon already treats workers, they will surveil anyone within their systems regardless of the futility of said surveillance. Why are we suppose to not believe them using LLM systems as a means to further control their expensive employees from unionizing or seeking out solidarity with fellow workers? All LLMs do is enable tyrannical managers more power to hold over other workers, said workers are forced to engage in self alienation for fear of losing your job or forcing to do meaningless work as that is what's being tracked (and what LLMs excel at producing).

Hardly a good proposition for any worker.

I'm sorry but I fully do not believe you. This is a company that fires workers for taking too long of a bathroom break where said workers piss in bottles for fear of getting fired and you're going "hey guys, it's not too bad. Only some workers get whipped, others don't!"

by shimman

5/13/2026 at 11:03:44 AM

How about we actually try to solve problems and measure that instead of measuring what we believe are the steps that resemble solving problems?

by phyzix5761

5/12/2026 at 5:16:52 PM

It is damn fascinating to see just how many (big, serious) organizations are creating unnecessary internal strife over this.

One of my favorite heuristics/quotes applies here: "no matter how good the strategy, occasionally consider the result."

Want to know if AI is working for your org? Ask yourself/employees to "show me the result." That requires judgment and taste (is the result something of value, or just the appearance of work having been done), but it will also save you a ton of stress and disappointment later.

by rglover

5/12/2026 at 5:04:00 PM

“Show me the incentive and I'll show you the outcome.”

― Charlie Munger

by baxtr

5/12/2026 at 5:09:12 PM

Would that make chasing perverse outcomes in the corporate environment the Munger Games?

by asdfman123

5/12/2026 at 5:23:38 PM

When I was at Amazon, I suggested that promotion to L7 people manager should require that reverse tattooed on your forehead so that you saw it every day. Every time some mandate would come down from on high, it was clear that nobody had thought of the second order effects, malicious compliance, or just outright gaming.

by coredog64

5/13/2026 at 12:42:00 AM

Goodhart's law

by duxup

5/13/2026 at 6:20:37 AM

Akshually… I’d say they’re not the same but related. Goodhart’s is a corollary of Charlie Munger’s quote.

by baxtr

5/12/2026 at 4:46:44 PM

I was thinking about this recently. I tend to run my AI at low context because the documentation states that they degrade with higher context usage.

However I see tons of people on LinkedIn with ways of backing up context, not wanting to lose context, etc.

This seems like another way the system is being misused. Higher context usage also uses more tokens. I suspect you get worse (and slower) output too than a dense detailed context.

by tyleo

5/12/2026 at 5:01:28 PM

I think there are two motivations that get blurred pretty quickly:

a) you find a particular context that executes well and want to preserve parts of it or not have to repeat explanations

b) you want to continue a session so you don't have to rebuild the context from scratch

I think A is something where it's totally reasonable to preserve pieces as part of like a prompt library or equivalent, or directory-specific agent files, that kind of thing.

I think B is much more likely to lead to problems if you do it over a long time, but it can be pretty useful for getting the last drop of juice out of the metaphorical orange.

I think the antipattern (that I've done myself, admittedly) is swapping between different restored contexts for different tasks or roles - at that point you should be either converting it to more durable documentation if warranted, or curating it more specifically than "restore the entire context" even if it's just one-off.

by jaggederest

5/12/2026 at 5:07:00 PM

I think the answer for both cases is supposed to be finishing a "good" session with "based on what you've learned about this project, please update the CLAUDE.md/AGENTS.md/README.md files."

Ideally that replaces the back and forth cycle of it's this, no it's that, it's that for reasons XYZ with a single ingestible blob that gets the agent up to speed.

by mikepurvis

5/12/2026 at 5:44:33 PM

I've actually had mixed results with that without some manual curation - sometimes by the time a session has gone on for a while (heaven forbid it go through multiple compactions), the agent has so much extraneous/incorrect context for docs that it can't write documentation effectively.

Sometimes it's better to dump context incrementally, reinitialize the agent with a subset of the context, or manually prime it, then ask it to write documentation as a focused task.

by jaggederest

5/12/2026 at 5:16:23 PM

Yeah, I also agree that A is good in many cases.

by tyleo

5/13/2026 at 12:41:44 AM

I've found more reliable output from AI by starting fresh often.

by duxup

5/12/2026 at 5:03:51 PM

I think the more you anthropomorphize it the more it feels like "but I don't want to have to start all over getting it up to speed, this instance already knows all the important stuff."

If every exchange is treated as an independent query/response then it's much easier to see how cutting out the fluff using a combination of its summaries and your own helps stay focused.

by mikepurvis

5/12/2026 at 5:18:28 PM

Once you have a score, you have a game. Once you have a game, people will do whatever it takes to win.

by guyzero

5/12/2026 at 5:20:10 PM

I don't think you need to win this, you just need to not be near the bottom of the board. But just in case, I spam tokens like it's the Chuck E Cheese roulette game.

by traderj0e

5/12/2026 at 7:00:00 PM

i think the best in this case is to be solidly in the middle of the pack. don't want to be near the bottom, don't want to be a tall poppy when the backlash comes

by Ifkaluva

5/12/2026 at 6:34:53 PM

They're not the only well-known company I've heard of that's investigating token usage leaderboards.

by bsimpson

5/12/2026 at 5:05:44 PM

People who don't code(management, leadership) think AI will 10x the company but it's really a 40-60% boost. But engineers have to feign adopting this tools in fear of layoffs

by asdev

5/12/2026 at 5:13:32 PM

> 40-60% boost

Where? What industry, what kind of projects? The only one where I can imagine it to be true is vulnerability research, and I imagine all the low-hanging fruit to be picked soon

by oytis

5/12/2026 at 5:27:44 PM

Mine, easily. Senior (near staff) level embedded engineering.

It will spin up a boilerplate uboot or BSP config no problem. I still go in and manually check and add peripherals, but opus 4.7 is terrifyingly smart.

Need to modify or add a new peripheral, it's there no problem. Or in a bare metal project, I can point it at an STM32 cubemx starter repo and ask for a feature (set up the ADC on pins 4 and 7, ask me for parameters) and it's just done. I do in a day what would probably take me 2.

It doesn't help me with reviewing others' work, or planning (I maintain that these are manual tasks). So yeah, I agree with the 40-60%. The parts of my job it helps, it really helps.

by birdsongs

5/12/2026 at 5:44:47 PM

> I can point it at an STM32 cubemx starter repo and ask for a feature

My experience is it will attempt read from the wrong memory block resulting in garbadge. But that's a while ago so maybe LLMs have gotten better.

by consp

5/12/2026 at 5:58:17 PM

The AI labs have all released at least 3 new models each since december, things move very quickly.

by HDThoreaun

5/12/2026 at 6:08:34 PM

Yeah, the industry has no issues selling $5 bills for $1. Why is this a good thing for society again? That the public subsidizes VC to no shared gains?

by shimman

5/12/2026 at 7:35:07 PM

I've been hearing "the new model is so much better than the one from 6 months ago" every few months since 2023. It's never been true to date, so please understand why I am skeptical that it suddenly became true this time.

by bigstrat2003

5/12/2026 at 8:52:40 PM

Yeah just had Codex/Gemini write me nrf52 bootloader that fit in under 4k flash sector size with OTA and DFU support (well, app does OTA download then the bootloader validates and decompresses the image). Works best if you let them use OpenOCD on a real device, then they can iterate until it starts working.

I didn't even need that bootloader, just didn't like the fact that Adafruit one takes too much space :)

by 05

5/12/2026 at 7:47:58 PM

> STM32 cubemx starter repo and ask for a feature

I'm confused, isn't the whole point of using the STM32CubeIDE that all the peripherals, like say setting up an ADC on pins 4 and 7, are checkbox features?

https://wiki.st.com/stm32mcu/wiki/Getting_started_with_ADC

by ua709

5/12/2026 at 8:47:55 PM

Yes but it's famously clunky, and if I'm already in an existing repo, a prompt will do it much, much faster.

It also generates a ton of bloat and comments.

by birdsongs

5/12/2026 at 6:43:19 PM

I work on an ETL platform and it definitely is a huge boost in certain things, but a drain in others.

We started working on a new product a few months ago and it's really dangerous up front on an empty code base. It can quickly write more code than you can comfortably understand. The more serious danger is when three people are all doing that at once. I had to bring this up at meetings and try to get a better review culture going.

Now that we're a few months in and changes are more targeted additions to an existing system we're happy with, it's _huge_ (which has been my experience on our existing product). I can drop a brief paragraph I speech-to-texted into my agent, give it a general starting place (where I imagine the issue/feature extension point is), and then tell it to do some research and propose a change. I'd guess it's about 50% of the time that I have to update it's implementation plan. Then I let it run (my favorite is setting this up before a meeting) and come back. Then we have to review the code and go from there.

Definitely a 50%+ speed up in some cases, but not all. It's also great for problems that procrastinating, as it reduces friction so much.

by jjice

5/12/2026 at 5:26:13 PM

Any typical web backend or frontend kind of thing. So like, not systems code.

by traderj0e

5/12/2026 at 7:07:06 PM

What's funny to me is the seeming lack of AI usage among management despite so much of their work being amenable to AI acceleration.

At my company(big name, AI beneficiary), middle management seems to mostly be concerned with shuffling chairs on the deck of the Titanic while they wait for their stock to fully vest. There is very little interest in improving anything, just an obsession with risk avoidance and performative sideshows whenever upper management wonders why execution is so poor.

by 01100011

5/12/2026 at 8:07:26 PM

At my company middle management is using Gemini to churn out reams of useless documents in lieu of anything approaching "program management" or similar

by mrhottakes

5/12/2026 at 5:08:25 PM

40% boost for smart engineers, for now.

People churning out slop is slowing me down and the full effects of it won't be felt for a while.

by asdfman123

5/12/2026 at 5:30:34 PM

Yesterday, I had my first experience of a mid-level dev stuck on a problem, coming to me with Codex and Copilot summaries of what those tools thought the problem was, which turned out to be completely off-base.

Codex was pretty sure something was wrong with the response object being returned by the endpoint in question. It turned out there was a conversion method applied to the endpoint response, which mutated its input. This method had been running w/o problems for a while, until the dev put it in a useEffect. At this point, React dev mode's policy of rendering everything twice kicked in, which caused the second pass through the conversion method to fail on the now-mutated input object.

Codex never even hinted that the conversion method mutating the input could be a problem, nor anything about React dev mode rendering everything twice (specifically to catch problems like this). Apparently, neither of those came up much in its training data.

My point is that this dev seems to have lost, in a few short months of writing everything with Codex, the ability to trace an error from its source (the error trace was being swallowed in a Codex-written catch block that spit out a generic error message). He was completely stuck and just kept doubling down on trying to get Codex to solve the problem, even checking with Copilot as a backup. I'm not optimistic about where this is headed.

by suzzer99

5/12/2026 at 5:55:27 PM

Are you sure he was capable of debugging it before?

by esafak

5/12/2026 at 5:57:16 PM

Yes, eventually. Largely because he would have written all the code that got to that point and had a mental model of the entire flow instead of it being a gray box.

by suzzer99

5/12/2026 at 5:16:51 PM

the new bottleneck for development at work is code reviews. devs are creating whole features that would take months in only a couple weeks, but code reviewing that is a slow, painful process

by legohead

5/12/2026 at 5:26:16 PM

The bottleneck at my work for development was already code review before LLMs.

by Marsymars

5/12/2026 at 5:34:44 PM

This is why I'm not that excited about vibe coding. The bottleneck has always been understanding what the heck is going on.

In my view you should 1) use AI as a tool to help you learn and 2) write boilerplate you could have easily written yourself. Getting it to think for you is counterproductive (at least until it replaces us entirely).

by asdfman123

5/12/2026 at 5:12:47 PM

Its not really a 60%. It accelerates a lot code creation. Save some time on admin tasks. That is it.

by retinaros

5/12/2026 at 4:53:28 PM

I joked about this on HN a few weeks ago and I find it funny that we ended up here already. Goodhart's Law in action.

by tapoxi

5/12/2026 at 5:18:16 PM

Each day I send the AI on a fruitless mission like "summarize the entire codebase" while I do my actual work, which involves actually using the AI for real work. Wish I could disable the token cache to make it spend more.

by traderj0e

5/12/2026 at 5:24:41 PM

Amazon is big and inconsistent enough that "somewhere in Amazon, <XYZ> is occurring" is statistically true, no matter how nutso-sounding your <XYZ>.

by kixiQu

5/13/2026 at 7:12:41 AM

Can we have a list of AWS services which use formal methods and general sanity so I can avoid the other 500 half baked ones?

by aitchnyu

5/12/2026 at 5:01:20 PM

I can tell they are surely not the only ones.

Everyone I talk to has nowadays KPIs tied to AI usage on their performance evaluation.

by pjmlp

5/12/2026 at 5:13:18 PM

The most important skill is to not stand out of the crowd. This is how you survive in the Soviet Union, in the army, and clearly also at tech companies.

by H8crilA

5/12/2026 at 5:15:50 PM

Quite a good point.

by pjmlp

5/12/2026 at 5:13:18 PM

Corporate emails asking why are you not using the <insert-llm> paid plan ??? came very very rapidly. So naturally, everybody started to use it blindly so that the dashboard metrics are all high.

It's astonishing how society forgets.

by jnpnj

5/12/2026 at 5:14:26 PM

When did FT become Business Insider?

I have an FT subscription and they keep moving toward this kind of narrative first reporting to get clicks. It’s no longer a believable paper.

by wenc

5/12/2026 at 6:17:45 PM

Maybe we just need to subscribe to nikkei proper and ask them to make it pink and stop being peasants.

by a34729t

5/12/2026 at 5:24:12 PM

Business Insider would say "tokenmaxxing is pure promotional intelligence"

by traderj0e

5/12/2026 at 5:09:44 PM

I have mixed thoughts on this. These thoughts are my own. On the one hand, it’s objectively silly to pretend like we’ve solved the age old problem of measuring developer productivity. Metric-obsessed leadership can also be intolerable, counterproductive, and it’s a good way to paint yourself into a corner undervaluing your best talent and overvaluing your mediocre talent.

That said, I’m kind of having a blast using CC in corporate with all the connectors available at our disposal, and I baffled how little some of my coworkers know about what’s available and what the capabilities are. So it’s clear that perhaps some encouragement is prudent for those who are slower to embrace new technologies, but I’m not sure tokencounting and tokenmaxing are the answer.

by morelandjs

5/12/2026 at 5:11:32 PM

Could you list us some of the capabilities you use that bring value besides “summarize my email”

by retinaros

5/12/2026 at 5:21:32 PM

Yes, we can crawl our entire internal documentation via LLM. Want to know if someone is already working in the space of your latest idea? Ask Claude, it hits the internal search APIs and finds docs and references directly relevant to your query. There are a lot of separate document stores so this took a lot of effort previously. I can also query Slack, Outlook, etc. I don’t understand the cynicism in your comment.

by morelandjs

5/12/2026 at 5:54:02 PM

That is a summarize my wiki. Nice search feature.

by retinaros

5/12/2026 at 7:00:36 PM

The trouble is, it's here now, and it wasn't before.

That may be an enterprise saas is shit problem, but I'm just happy that my employer now has a wiki search that works.

by Leynos

5/13/2026 at 12:24:36 AM

I found it very useful running a TDD workflow the other day. It created a test plan, generated tests, documented them, implemented and modified existing code, and added structured logging. It also identified really good refactor candidates and explained them to me after I noted a core design issue in the code we were modifying. This wasn't autonomous: I spent some time correcting it and sending it in new directions. Still, it was a pretty nice feeling to not have to go manually configure Logback (it one shotted a nice basic config), not have to write a bunch of repetitive test setup code, etc. It even pulled in a newer JUnit feature that I didn't know about that was perfect for what I was doing. Definitely not the silver bullet a lot of people are trying to sell, but still a very powerful tool.

by thisoneisreal

5/12/2026 at 5:18:56 PM

Not OP, but within Amazon we have pretty good connectors around integrating with our task system (so you can pretty easily ask your GenAI tool "look up the next item in our sprint board, let me know if you have any clarifying questions, but otherwise start implementing it"). We have decent integration with internal wiki and search systems, so it's easier now to figure out the best Amazon way to do some coding task. And Amazon being a big doc-writing company, there are lots of great tools for helping improve all phases of writing.

by jkingsbery

5/12/2026 at 8:56:09 PM

A company requires a specific % of code coverage but doesn't give developers enough time to actually write tests. AI can be used to generate the tests needed to get pass the code coverage and avoid being fired for not working fast enough.

by harimau777

5/12/2026 at 5:19:33 PM

I, too, can easily use more tokens to achieve the same task. I can give worse prompts. I can fail to make it clear to the tools where to find the information they need. I can ask them to think hard when the don’t need to ask tell them not to think when they do need to. I can give vague, open ended instructions. I can generate code that sucks and throw it away.

If I do all of this, do I get a promotion?

by amluto

5/12/2026 at 5:23:48 PM

Even if I'm in the middle of using the AI seriously but then want to rename a variable, I can't do that myself because it'll confuse the AI, so I'll tell it to rename. That seems pretty wasteful.

by traderj0e

5/12/2026 at 6:13:56 PM

That sounds like too much effort. Better to have the AI write you a 20k word manifesto about how much you love your employer and then include that in the context of every request.

by 0cf8612b2e1e

5/12/2026 at 5:24:43 PM

I wish I could do some tokenmaxxing at my company. The only plan available is maxed out for the month after a few days of serious work, but the AI “experts” are declaring that nobody needs that much. It’s really frustrating to constantly have to juggle quota and lower models. All this while the declared goal is to reach 50% of code written by AI.

by vjvjvjvjghv

5/12/2026 at 7:18:51 PM

https://en.wikipedia.org/wiki/Poe's_law I was just joking about it a few days ago (I swear I didn't know Amazon was doing this) https://news.ycombinator.com/item?id=48079533

> That’s my latest joke — that we’ll have to pretend like we used the tools so they can feel validated they’ve spent all this money on hyped up technology. So, yes, it’s em-dashes and “it’s not just this, it’s that …” so they can hopefully leave us alone

by rdtsc

5/12/2026 at 5:33:54 PM

You can use Codex and Claude code for most of the tasks that you would manually do

Filing JIRA tickets, updates. Opening PRs, having AI review PRs. This will all use tokens.

No need to tokenmaxx, you will end up burning tokens with just regular AI usage

by returnInfinity

5/12/2026 at 5:15:05 PM

At least for some people I know it’s not necessarily because there’s pressure from leadership, but because it’s funny that the org spends like $15,000/mo writing HP fanfic or whatever

by boron1006

5/12/2026 at 5:57:11 PM

Hunger games in the age of AI - eliminate/automate your colleague's job, until a single software engineer is left (or two if aristocrats will see it as a good PR).

by oxag3n

5/12/2026 at 6:17:02 PM

It's the same as measuring productivity by lines of code written, same dumb logic by management, not surprising.

by tonis2

5/12/2026 at 5:33:36 PM

This kind of thing is totally fine if it's being done (it's believable because Meta internally incentivized tokenmaxxing). When you're trying to change the behavior of a large number of people, only blunt instruments are available if you want to get quick outcomes. The edge cases where people Goodhart very hard are all right. You can just human-in-the-loop them away. The opportunity cost for most organizations of not moving to use AI tools as productivity enhancers is currently gauged by them (rightfully, in my opinion) to be too high to allow for osmotic adoption.

Most people look at sea changes come and go. They all have a story of how they "could have bought Bitcoin when it was $100" or whatever. In an org, you don't want to have the story of "we could have done that when nobody else had", so you incentivize adoption of the tool as hard as possible and hope that dipping feet in the water makes people want to swim. If you don't already have a culture of early adoption (and no large company can) then you have to use blunt incentives. I don't think anyone has demonstrated otherwise.

by arjie

5/12/2026 at 8:13:16 PM

So you're saying that if you ignore all the downsides with "you can just human in the loop them away" then it works great?

by mrhottakes

5/12/2026 at 8:36:52 PM

Even if you don't, it's the only way to ensure adoption and most workplaces consider the lack of adoption a greater danger than a Goodharted adoption. Overall, I've observed that the US has very low barrier to starting companies so, considering companies of all sizes are doing this, if it's a mistake those startups will get beaten by the ones doing other things.

by arjie

5/12/2026 at 8:31:51 PM

Amazon has this Kiro product they are trying to sell and they are using their own employees to improve the product and their own LLM. They are giving uni students 1000 credits/month and running competitions.

by fhn

5/12/2026 at 4:41:38 PM

Measuring token usage as a productivity metric is like measuring keystrokes. Don't mind me, just over here rolling my face on the keyboard for an hour so I can take Friday off...

...except each keystroke has an associated cost, the sum of which may equal or exceed my salary.

by x187463

5/12/2026 at 4:43:57 PM

Insert photo of Simpsons drinking bird while homer sleeps here.

by Weryj

5/12/2026 at 4:47:38 PM

What's nuts is how many intelligent people— people who would say "of course 'LOC written' is a terrible measure of developer productivity, of course only a dysfunctional company run by morons would do that"— have immediately bought into this. Amazon has token use mandates, I've heard Google has token use "leaderboards", friends at startups say they all get graded on tokens used. It's like watching your sensible, levelheaded friend go completely off the rails; collective madness.

by Analemma_

5/12/2026 at 4:59:43 PM

Some people respond to incentives. The rest of us are just trying to do our jobs and will probably be fired and then later consumed by the basilisk. We are living in an age of extremophiles.

by greesil

5/12/2026 at 4:58:40 PM

It's a test of practical intelligence.

by HPsquared

5/12/2026 at 4:58:09 PM

> collective madness

mass hysteria perhaps?

There used to be a time where people used to die from dancing too much (from my understanding in which hey I can be wrong, I usually am): https://en.wikipedia.org/wiki/Dancing_plague_of_1518

I think that although we wish to consider ourselves as smart and really intelligent but we run on biological machines and clocks which evolutionary have not much of a difference since 1518 or even the times when we used to hunt and forage for that matter.

by Imustaskforhelp

5/12/2026 at 8:13:07 PM

These employees are going to automate themselves out of a job. I've always automated, the boss never has to know.

by fhn

5/12/2026 at 9:48:26 PM

amazon leaders are obsessed with "metrics" but they care very little about what those metrics actually measure

it's all a political performance.

Remember when they made the decision for return to office and just...decided to post no support data on why they need to do it?

by binsquare

5/12/2026 at 6:23:54 PM

Our AWS TAM has recently started to respond to us in AI-like responses. It's very obvious. Now it makes sense why.

by zthrowaway

5/12/2026 at 6:32:06 PM

Similar to an HFT company I know, using the money spent on tokens per developer as their efficiency metric. Insane.

by bgnn

5/12/2026 at 5:54:55 PM

Measuring productivity via tokens is the modern day equivalent to doing it via number of commits or LOC

by hmokiguess

5/12/2026 at 5:17:52 PM

A perfect doomsday machine. Over-using tokens gets your peers laid-off before yourself.

by jmount

5/12/2026 at 5:13:58 PM

Another stupid meme-latching name. Don't normalize these *maxxing nonsense words and just use plain language. Let's see, maybe just say they were optimizing for token count?

by dogscatstrees

5/12/2026 at 5:18:40 PM

I like it because it highlights the stupidity going on. Bullshit doesn't deserve a respectible name.

by morkalork

5/12/2026 at 5:07:28 PM

Reminds me of the managers that use 'lines of code added' as a metric

by ortusdux

5/12/2026 at 5:10:01 PM

Vibecoded ppt, docs, frontends is an even bigger scam than crypto ever was. Ofc people getting sucked into it

by retinaros

5/12/2026 at 8:09:53 PM

Are the AI tokens fungible though?

by traderj0e

5/12/2026 at 5:00:11 PM

Seems to be a clear case of Goodhart's Law that states that "when a measure becomes a target, it ceases to be a good measure."

by christkv

5/12/2026 at 5:01:27 PM

That's true, but I don't know if this one was ever a good measure in the first place.

People use AI differently and they can be equally productive with a variety of token usage quantities.

Also, different kinds of work are differently amenable to using AI.

by FartyMcFarter

5/12/2026 at 5:04:42 PM

Measuring tokens used can absolutely be useful; tracking things like cost, compute-demand, usage to negotiate a better contract, and on and on.

Using it to grade people is, err, rather unwise.

by compiler-guy

5/12/2026 at 5:14:51 PM

I think we've found an extension of Goodhart's law- it makes bad measures even worse.

by jrflo

5/12/2026 at 8:15:54 PM

"When a measure exists, it becomes a target" is the reason I can never write TODO again.

by traderj0e

5/12/2026 at 4:45:28 PM

Can't you just, wire your agent into a Python script and have it infinitely check its own work? That would hit the metrics, but do nothing useful.

Hell, throw a Tarot reading in the middle of the loop so the agent has non-deterministic behavior too.

https://github.com/trailofbits/skills/tree/main/plugins/let-...

Amazon management wants to play five-dimensional chess? Play Balatro instead.

by some_furry

5/12/2026 at 5:03:24 PM

Imagine selling a product where companies are foaming at the mouth to increase their spend and pay you more money

It does not get any better than that

Jensen, Sam, Dario: https://i.imgur.com/AI7rtCY.jpeg

by ex-aws-dude

5/12/2026 at 5:09:01 PM

> They said the move reflected pressure to adopt the technology after Amazon introduced targets for more than 80 percent of developers to use AI each week, and earlier this year began tracking AI token consumption on internal leader boards.

This measuring of tokenmaxxing as a proxy for something beneficial to the company has got to be the single dumbest thing I have ever heard of in my entire software career.

It would be like some company in the dot com era measuring employee's internet download traffic as a proxy for productivity or internet-pilledness.

Why not just reward employees based on who's submit the largest expenses claims? That might have some correlation to work too, right ?!

by HarHarVeryFunny

5/12/2026 at 5:12:08 PM

In the corporate world it's impossible for any one person to tell what's going on across multiple domains due to the complexity. If I tell you the Zorbulon API is creating 30% more flargs (which is critical for Twiddle operation), I often just have to take your word for it.

Hell, I'm in the bowels of Google as an IC and it's hard to understand what adjacent teams are doing. Even harder for management that never gets their hands on anything.

So while you know engineers are probably bullshitting you with fake work, you can at least turn around and tell your supervisor the numbers. It's all a game of plausible deniability.

by asdfman123

5/12/2026 at 8:02:26 PM

They used to measure loc, but this is even dumber than that. The charitable explanation is they just want to make sure nobody is completely avoiding AI use.

by traderj0e

5/12/2026 at 5:14:35 PM

It's truly bonkers: the reverse of a budget. It is like rewarding the people who spend the most money.

by ryandrake

5/12/2026 at 7:11:09 PM

Hot take:

There should be an anti leaderboard that highlight people under a threshold. Not trying to learn how to use ai while working at a company like Amazon is almost certainly a bad thing, and cause for looking into why.

by dontreact

5/12/2026 at 4:53:46 PM

Someone pressuring to do something at work gives off creep vibes.

Is that in the contract to use AI tools? If not, then what are they on about.

by varispeed

5/12/2026 at 4:58:33 PM

They're always pressuring me into "shipping" "features"

by woah

5/12/2026 at 5:00:53 PM

"someone pressuring you to do something at work" describes pretty much all jobs

Very very few jobs in the US give you a contract.

by mrhottakes

5/12/2026 at 8:14:52 PM

This is true. The creepy thing is when someone outside your reporting chain is suddenly pushing you to use some new tool, rather than asking you to ship a feature.

by traderj0e

5/12/2026 at 5:00:52 PM

Think worst place I worked, you had to install an app like Time Bro and you had to account for all 8 hours of the day, some app logged per minute/hour.

by ge96

5/12/2026 at 5:07:43 PM

Would rather eat dirt than work at a place like that. Respect.

by varispeed

5/12/2026 at 6:13:37 PM

tokenmaxxing is silly, but if a developer or manager NEVER uses AI then I do think that's cause for concern as it shows a genuine lack of curiosity... perhaps tokenflooring makes more sense than tokenmaxxing

by nicodjimenez

5/12/2026 at 8:15:01 PM

Or maybe that person has made an intelligent decision based on their own workflow and requirements?

by mrhottakes

5/12/2026 at 5:24:35 PM

This makes me think of the tulip bubble. Using AI as much as possible just so people think you are productive is like buying tulips so that people think you're affluent.

by giantg2

5/12/2026 at 4:45:25 PM

This reads more like it's a single employees gripe than a real thing that's happening. They're not using the metrics in performance reviews, and it's a new AI tool that AWS probably wants legitimate usage data out of.

That said, if you can't figure out how to use AI in a software job you should look into it. Not using AI at this point is a lot like not using CAD as an architect.

by guywithahat

5/12/2026 at 4:51:36 PM

It is being used in performance reviews, source: recent Amazon SWE.

They also use a bunch of dumb metrics like, total PRs submitted, total comments made on PRs, etc. To the point that, there are multiple heavily used internal tools to game these metrics. Eg, auto-comment LGTM on any approved PR. Thus, making the metrics even worse than they would have been prior.

by KyleTheDev

5/12/2026 at 5:24:07 PM

> Amazon has told employees that the AI token statistics would not be used in performance evaluations.

> Managers are discouraged from using token use to measure performance, according to a person familiar with the matter.

Like CAD and architects, if you're not using LLM's while coding it's an issue, but Amazon is very clear that this isn't an official metric. I would believe managers know how many tokens you're using, but it sounds like they just interviewed a disgruntled employee who didn't like AI and published it.

by guywithahat

5/12/2026 at 6:14:12 PM

>but Amazon is very clear that this isn't an official metric.

You're replying to an amazon employee who says they are being used in performance reviews, in comment thread on an article where 2 other Amazon employees say that their token usage is being tracked and they feel pressure to maximize token usage.

Do you have first hand knowledge to refute these 3 people with first hand knowledge?

The CAD thing is incredibly weird. I've never known an architect who had their CAD usage minutes tracked.

Btw I'm a big tech company and I know many people who are "token maxing". It's very common.

by sarchertech

5/12/2026 at 5:02:06 PM

> Not using AI at this point is a lot like not using CAD as an architect.

Does CAD software regularly generate an incorrect design that results in a catastrophic failure of the building?

by mrhottakes

5/12/2026 at 7:41:18 PM

I have never pushed incorrect code that results in catastrophic failure due to AI, are you sure you're using the tool correctly? If you didn't know what you were doing with CAD software, you could absolutely generate incorrect designs.

by guywithahat

5/12/2026 at 8:16:09 PM

You've never done it, so no one has? Are you sure you're evaluating the problem correctly?

by mrhottakes

5/12/2026 at 4:54:14 PM

I think it’s real. I’m at a huge SV tech company and at least half the people here are “token maxing”.

AI is genuinely useful for many tasks. But 2x or greater business value from engineering orgs isn’t it. And even if it was business are terrible at measuring value added on an individual basis.

What they can measure though is token use. I’ve heard the same thing from other large companies my friends work for.

It’s bad enough that I’ve moved a significant amount of money out of US large-cap stocks.

by sarchertech

5/12/2026 at 4:56:43 PM

"They're not using the metrics in performance reviews" means almost nothing. It doesn't mean managers at every level are not frequently looking at those numbers. Anyone from Amazon will tell you how much "hint" they get from management about using those tools.

by fg137

5/12/2026 at 8:06:52 PM

It's 90% real where I work, which is not Amazon but is a peer company. They haven't explicitly said that tokens are used to measure performance, but when managers are posting token usage leaderboards each week with no further explanation, we take the hint.

by traderj0e

5/12/2026 at 4:56:41 PM

Amazon has far more roles than just software. PMs, FC area managers, managers - if your job involves writing anything you're expected to be using AI in some capacity.

by riknos314

5/12/2026 at 5:14:13 PM

We can tell they are using AI

by retinaros

5/12/2026 at 5:17:43 PM

>Not using AI at this point is a lot like not using CAD as an architect.

You should have asked AI to come up with a better analogy.

by jamesnorden

5/12/2026 at 8:16:43 PM

Someone thinking that architects sit there and do all their own CAD work is very funny.

by mrhottakes

5/12/2026 at 5:01:06 PM

I have been not using AI since the beginning and nothing has changed for me. I have only watched my coworkers and the industry get dimmer and get faster at getting dimmer. I have witnessed professionals become total amateurs and form “well the AI generated this unreviewed report” as their basis of knowledge.

No thanks I’ll just watch y’all slip down the slope.

by righthand

5/12/2026 at 5:02:40 PM

Agreed. AI usage seems to be mostly bragging on HN / LinkedIn

by mrhottakes

5/12/2026 at 4:57:25 PM

> That said, if you can't figure out how to use AI in a software job you should look into it. Not using AI at this point is a lot like not using CAD as an architect.

When LLMs are capable of actually doing a good job, then it might be like that. We are not there yet, and we may never be.

by bigstrat2003

5/12/2026 at 5:06:34 PM

> They're not using the metrics in performance reviews

Heh. No need to be ashamed, I used to believe them when they lied to me like this too!

by 12_throw_away

5/12/2026 at 5:12:15 PM

Apparently it's real. Meta has a tokenmaxxing leaderboard too.

"Wow, look at how fast employee # 2 is setting money on fire! Let's promote him!"

by HarHarVeryFunny

5/12/2026 at 5:44:30 PM

A very poor look for management. They don't know what the heck they're doing.

by aggakake

5/12/2026 at 5:17:40 PM

[dead]

by shadow28

5/12/2026 at 6:43:24 PM

[dead]

by Serhii-Set

5/12/2026 at 5:33:55 PM

[dead]

by mdndkzixkn

5/12/2026 at 5:21:38 PM

[flagged]

by getrundoc

5/12/2026 at 5:19:41 PM

[flagged]

by getrundoc