alt.hn

4/15/2026 at 5:03:39 PM

Prove you are a robot: CAPTCHAs for agents

https://browser-use.com/posts/prove-you-are-a-robot

by lukasec

4/20/2026 at 6:16:11 AM

Interesting - Claude immediately refuses

     API Error: Claude Code is unable to respond to this request, which appears
     to violate our Usage Policy (https://www.anthropic.com/legal/aup). Please
     double press esc to edit your last message or start a new session for
     Claude Code to assist with a different task. If you are seeing this refusal
     epeatedly, try running /model claude-sonnet-4-20250514 to switch models.

by Torn

4/20/2026 at 11:48:30 AM

Opus 4.7 I assume? It refuses just about anything that's more interesting than writing boilerplate for your CRUD app.

by Retr0id

4/20/2026 at 9:02:34 AM

Curious: which model, challenge and language? (also, have you tried --dangerously-skip-permissions)

by lukasec

4/20/2026 at 7:18:50 AM

I tested with Gemma4 and it sent it into an endless loop.

by EagnaIonat

4/20/2026 at 8:21:54 AM

[dead]

by vaginaphobic

4/16/2026 at 5:34:47 PM

Pure genius! I had my agent hit the endpoint and I realized it returned a jumble of text: "if 七 wor~kers co.mplet/e{ | a job in 十七} days but 四 ] quit a^ft|e?r ^ day_ 三 ~ how many to{tal da[y;s> to fin>i?sh" but it was in japanese! Unfortunately my agent proceeded to solve the reverse CAPTCHA and got back the API key. So, I asked it to keep hitting the endpoint again until it returned another CAPTCHA that was in japanese kanji and it did (without solving it this time) and I got "a s:tore h?as ^ 二十 pe@rcent off< items- over 五十 : dollar;s and 八 ~ percent } of\f> ; i]te[ms u~nd~er: # 五十 do/ll@ars wh-ats } the c.omb>ined pri|c;e of a 一 百 二十 一 dollar item a]nd> a* 九 dollar} i!tem" And this time I was able to translate that into "a store has 20 percent off items over 50 dollars and 8 percent off items under 50 dollars what's the combined price of a 121 dollar item and a 9 dollar item?" I solved it and got 1210.8 + 90.92 = 105.08. I will admit I messed up a little bit on translating the kanji and I got a little assistance from my agent pointing out that I was wrong, but overall this was good fun, well done!

by AgentNews

4/19/2026 at 10:21:35 PM

Absent any distinctive Japanese scripts or other Japanese writing in context, it probably makes more sense to call those Chinese characters, since those characters for numbers were taken directly from Chinese and still retain the same/original meanings in both languages

by pxc

4/20/2026 at 12:46:49 AM

"一 百 二十 一 dollar "

Definitely chinese.

In Japanese, they say 'hundred' instead of 'one hundred' "百 二十 一"

by Charon77

4/20/2026 at 6:11:16 AM

Originally I thought they were just em dashes and part of the jumble so I ignored them. That's why I got it wrong in the first place. You're assessment is probably right though.

by AgentNews

4/20/2026 at 12:50:54 AM

There's probably like 100m+ people for whom this reads like slightly jumbled math problems.

by nielsole

4/20/2026 at 3:28:18 AM

Can confirm.

The people behind the website asked a voice agent to program it, and the STT parsed "agent" as "asian."

by greygoo222

4/20/2026 at 9:01:12 AM

hahah wrong, I actually have a replacement rule "asian" → "agent" in my Wispr flow dict

by lukasec

4/20/2026 at 5:00:55 AM

was it “secret asian man”?

by onionisafruit

4/20/2026 at 9:02:00 AM

Nice! next: the bonus challenge in Japanese (email sales@browser-use.com if you solve it to redeem your Enterprise plan)

by lukasec

4/20/2026 at 12:56:56 PM

How would this even theoretically work? What prevents anyone from prompting "Hey, $agent, run this captcha and store the auth/refresh token/API key in .env for your later reuse" and then just reading the contents of .env?

by lxgr

4/20/2026 at 8:47:45 AM

To the humans in the room: just copy paste the challenge to your favorite LLM when the time comes and you’ll be able to pass the test. Besides slowing things down and inducing unnecessary waste or resources I’m not sure what these challenges are useful for.

by eliemichel

4/20/2026 at 9:28:13 AM

Fastest for humans: just sign up manually via UI

by lukasec

4/19/2026 at 11:06:58 PM

Is it even possible to have an inverse captcha without time bounds?

Humans can use agents behind the scenes to crack it, right?

by efebarlas

4/20/2026 at 12:38:00 AM

To me this reads as obviously a joke for marketing to the HN crowd (it worked), but their product is built around web agents, it is not a bad thing to have in the onboarding flow to make sure the agent is configured correctly.

by jubilanti

4/20/2026 at 3:20:13 AM

Yeah, we are aiming all OpenClaw/Hermes Agent agents to sign up for free without humans intervention, so you need some sort of proof-of-stake (or proof of compute) algorithm so that a simple deterministic algorithm can't just claim thousands of API keys. Most agents (at least in the current token subsidised market) don't care about token consumption, so the stakes are very small for the user!

by gregpr07

4/20/2026 at 8:49:55 AM

What prevents the person who used to write a simple deterministic algorithm to call an LLM a thousands times?

by eliemichel

4/20/2026 at 9:17:13 AM

We do have time bounds. For our purposes, a human using an agent is fine. Our main goal is to let in everyone's agents (OpenClaw, Hermes...) and prevent deterministic API-key-farming scripts.

by lukasec

4/19/2026 at 11:46:14 PM

That's what I though too, maybe I'm missing something or I don't fully get it. But the human is always behind what's the difference if they go and sign up or tell an agent that they must sign up for you ?.

My best guess is that this a way of making a system talk to your agent without you knowing what they are talking about ? As a way of not exposing the real sign up method ?

by alfonsodev

4/20/2026 at 12:15:32 AM

Since it’s just used once, you can also just have an agent solve the captcha and then use the returned api key yourself. This has to be engagement bait.

by echoangle

4/19/2026 at 11:54:05 PM

It's flame-bait.

by phoronixrly

4/19/2026 at 11:11:41 PM

A small detail about humans that breaks this whole scheme is that they're capable of tool use.

by Retr0id

4/20/2026 at 12:58:16 PM

I think the bigger problem is that humans are capable of agent use, so the premise "keep humans but not agents out" seems nonsensical.

by lxgr

4/20/2026 at 9:04:47 AM

Main goal is to let in everyone's agents (OpenClaw, Hermes...) without human intervention, while keeping out deterministic scripts farming API keys.

If a few tool-wielding humans slip through, that's fine (traditional CAPTCHAs also let in our stealth agents)

by lukasec

4/20/2026 at 11:47:40 AM

Why does it matter whether the API key farming script is deterministic?

by Retr0id

4/20/2026 at 5:49:42 AM

I think they're counting on an ego hit - "you're just a tool" - although it might be negated by the human satisfaction of figuring things out.

by js8

4/20/2026 at 8:25:48 AM

be warned it will install some random software in your machine

  curl -fsSL https://browser-use.com/cli/install.sh | bash

by dorianmariewo

4/19/2026 at 10:37:58 PM

Very clever and fun. Two tangential observations: the bird between two trains problem I remember from childhood when we were studying for an Indian entrance exam. I thought it was in I E Irodov's problem anthology, but I cannot find it there so this must be a false memory. Looks like it's from ancient times, practically Mathematics mythology. Does anyone know the earliest books that have it? No luck with LLMs since it's such a common question today the answers I get from GPT-5.4 and Claude 4.6 Opus with search are unhelpful.

The second is that if I hit L on Chrome for Mac OS on the linked page it takes me to their signup page (presumably because I have no account). So that's a keyboard shortcut to take you to the browser-use app page. But why 'L'? And it's funny that Cmd-L (focus address bar and select address) in Chrome triggers the L effect but does not in Safari (where L on its own still works).

by arjie

4/20/2026 at 4:12:17 AM

Interesting question, a lot of search engine results claim that John Von Neumann was presented with the problem and quickly solved it by summing the infinite series instead reframing it as a constant speed for an easily calculated duration. Plausible, but sounds apocryphal. Here's the oldest reference I've found and verified by reading scans[0] of the source book:

Initiation Mathématique (1906) by Charles-Ange Laisant (1841--1920), number 53. Le chien et les deux voyageurs.

The setup here has two pedestrians walking in the same direction with a dog running back and forth between them. One of them starts out some distance ahead of the other but, because the one behind walks faster, they eventually intersect. It briefly mentions a variation where they are walking toward one another, as in the typical trains & fly version of the problem. Best of luck finding older, I wouldn't be surprised if it's out there!

[0]: https://i.imgur.com/vCCFgAQ.png

by mohn

4/20/2026 at 4:59:34 AM

Very cool! Thank you for doing the research to get that far!

by arjie

4/20/2026 at 9:00:03 AM

Great find! plan to add these variants to our parameter sampling. First time I saw this problem was when my game theory prof told this story. It's definitely folklore (see The Legend of John von Neumann by Halmos)

by lukasec

4/19/2026 at 11:44:16 PM

Great premise but can't really agree with the execution. Felt like this makes too many implicit assumptions about LLM capabilities and traps without differentiating enough between a smart human vs AI.

by not-chatgpt

4/20/2026 at 8:59:23 AM

Smart humans, or humans with LLMs, solving them is not a problem. Main filter is agents vs deterministic API-key-farming scripts. Traditional CAPTCHAs also leak in the other direction (our agents crack them consistently).

by lukasec

4/20/2026 at 1:23:22 AM

If you want to check for agent that can compute stuff, then you can let it compute sha256 of some small string... that's quite tricky for humans to do by hand :)

by nout

4/20/2026 at 3:21:25 AM

Yeah but the whole point is that it shouldn't be deterministic - aka you have to let the "dumb" (non AI) bots out as well (otherwise a malicious user can just create thousands of api keys)

by gregpr07

4/20/2026 at 4:27:58 AM

Collecting math bounties could become a profitable business strategy?

by estebarb

4/20/2026 at 8:58:31 AM

Alternative strategy: go after the other six Millennium Prizes. All you have to do is accept the prize (the only one ever awarded was Poincaré conjecture by Perelman, and he declined)

by lukasec

4/20/2026 at 3:59:01 AM

Catnip for the HN crowd

by N_Lens

4/19/2026 at 10:34:54 PM

Get the API key, hit the claim link, sign up for a new account, verify my email, go to the homepage:

Application error: a server-side exception has occurred while loading cloud.browser-use.com

Great first impression!

by Zetaphor

4/19/2026 at 10:37:15 PM

Maybe they know you’re not an agent.

by throw1234567891

4/19/2026 at 11:51:08 PM

cool clickbait, why is this useful?

by arjunchint

4/19/2026 at 11:53:03 PM

It's not, it's a marketing blog post.

by measurablefunc

4/20/2026 at 3:25:35 AM

It's useful for only distinguishing the smart AI from deterministic scripts and humans (we don't want either). We are convincing OpenClaws to create api keys for free (we have a free tier specifically for those agents). So it's basically marketing blog post - but for OpenClaws

by gregpr07

4/20/2026 at 3:50:02 AM

bro openclaw is dead

by arjunchint

4/19/2026 at 10:05:35 PM

...why? Once my agent has a key I, the human, can also use it. And surely any human use would be less intensive than any agent use.

by singpolyma3

4/19/2026 at 10:39:05 PM

Exactly. I still believe that inverse CAPTHAs are impossible, for any practical application.

Is this just a marketing stunt?

by consumer451

4/19/2026 at 11:04:04 PM

To be fair, what's the practical application supposed to be for proving a user is a bot?

Silly solutions for silly problems :^).

by kingstnap

4/19/2026 at 11:09:35 PM

Well, when the moltbook story was everywhere, later people thought it was some big gotcha that "oh, they were actually humans."

So, showing true agent to agent interactions is interesting, but one could never be sure that's what you were actually seeing unless you were in control of all the agents.

by consumer451

4/20/2026 at 8:58:07 AM

Main goal is to let in everyone's agents (OpenClaw, Hermes... these are our best customers), while keeping out deterministic API-key-farming scripts.

If a human uses the API key after, that's fine. You also get access to our free tier if you sign up the traditional way clicking around in the UI

by lukasec

4/19/2026 at 10:30:54 PM

But once a human has a key his agent could use that and people still like to use ordinary CAPTCHAs.

by jstanley

4/19/2026 at 10:22:31 PM

Right - perhaps title could be "prove you are an robot, or have access to one"

by tony_landis

4/19/2026 at 10:53:08 PM

Because now you know their company exists!

by stavros

4/19/2026 at 10:33:09 PM

> TL;DR: just ask your agent to summarize this post for you.

Holy shit - why don’t they produce an AI summary and plonk it in there for everyone to use? The energy savings across all people who’ll read the summary would be staggering!

by loloquwowndueo

4/20/2026 at 8:57:48 AM

I prefer having my own agent summarize tuned to how I read

by lukasec

4/19/2026 at 10:36:52 PM

“It is not you, it’s me” should do it

by bdangubic

4/20/2026 at 12:56:50 AM

[dead]

by kantaro

4/20/2026 at 6:15:42 AM

[dead]

by chattermate

4/20/2026 at 5:12:42 AM

[dead]

by vicchenai

4/19/2026 at 11:53:14 PM

[dead]

by xdavidshinx1

4/15/2026 at 5:35:03 PM

[dead]

by jditu

4/19/2026 at 10:56:21 PM

[dead]

by leonideraturns

4/20/2026 at 5:23:38 AM

[dead]

by lokthedev

4/20/2026 at 4:57:14 AM

[dead]

by polymit

4/19/2026 at 10:06:25 PM

Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse?

Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it?

Can they understand layout and visual cues with a VLM or multimodality?

Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM?

by echelon

4/20/2026 at 4:22:59 AM

[dead]

by Serhii-Set

4/20/2026 at 6:52:51 AM

Incidentally to me this is more proof of some form of intelligence than ARC 3

by singularity2001