alt.hn

3/22/2026 at 7:04:00 PM

Teaching Claude to QA a mobile app

https://christophermeiklejohn.com/ai/zabriskie/development/android/ios/2026/03/22/teaching-claude-to-qa-a-mobile-app.html

by azhenley

3/22/2026 at 10:22:57 PM

the worktree discipline failure is the most interesting part of this post to me. when claude is interactive, "cd into the wrong repo" is catchable. when it's running unattended on a schedule, you find out in the morning.

the abstraction is right - isolated worktree, scoped task, commit only what belongs. the failure is enforcement. git worktrees don't prevent a process from running `cd ../main-repo`. that requires something external imposing the boundary, not relying on the agent to respect it.

what you've built (the 8:47 sweep) is a narrow-scope autonomous job: well-defined inputs, deterministic outputs, bounded time. these work well because the scope is clear enough that deviation is obvious. the harder category is "fix any failing tests" - that task requires judgment about what's in scope, and judgment is exactly where worktree escapes happen.

i've been working on tooling for scheduling this kind of claude work (openhelm.ai) and the isolation problem is front and center. separate working directories per run, no write access to the main repo unless that's the explicit task. your experience here is exactly the failure mode that design is trying to prevent.

by maxbeech

3/22/2026 at 10:57:31 PM

yeah, it's curious. I sometimes ask it why it ignored what is explicitly in its memory and all it can do is apologize. I ask -- I'm using Claude with a 1M context, you have an explicit memory -- why do you ignore it and... the answer I get it "I don't know, I just didn't follow the instructions."

by cmeiklejohn

3/23/2026 at 2:46:31 AM

Genuine question - what else did you expect?

by seba_dos1

3/23/2026 at 3:46:14 AM

For it to follow the instructions I had for it. Call me naive and stupid for thinking the 1M context window on the brand new model would actually, y'know, work.

by fragmede

3/23/2026 at 5:03:45 AM

That's a bit anthropomorphic though.

When LLMs become able to reflectively examine their own premises and weight paths, they will exceed the self-awareness of ordinary humans.

by quesera

3/23/2026 at 11:26:35 AM

Just dealt with this last night with Claude repeatedly risking a full system crash by failing to ensure that the previous training run of a model ended before starting the next one.

It's a pretty strange issue, makes me feel like the 1M context model was actually a downgrade, but it's probably something weird about the state of its memory document. I wasn't even very deep into the context.

by hgoel

3/23/2026 at 3:59:33 AM

why would further chance at context pollution be a good thing? i feel like it is easier for data to get lost in a larger context

by Natfan

3/23/2026 at 6:54:57 AM

It doesn’t reason or explicitly follow instructions, it generates plausible text given a context.

by grey-area

3/24/2026 at 7:25:07 AM

Agent breaks sandbox, accesses the wrong repo, nobody watching.

This is basically why AI coding productivity stays flat at ~10% despite 93% adoption.. you speed up one narrow thing and create new problems in review and oversight that eat the gains.

by 7777777phil

3/22/2026 at 8:57:50 PM

Reading through this reminds me of how bot farms will regularly consist of stripped down phones that are essentially just the mainboard hooked up to a controller that simulates the externals.

When struggling with failing to reverse engineer mobile apps for smart home devices, I’ve considered trying to set something like this up for a single device.

by devmor

3/23/2026 at 1:13:34 PM

Im sorry but just because you got the automation working doesn't mean you're getting meaningful QA from Claude analyzing your screenshots.

by darepublic

3/23/2026 at 2:02:36 AM

[flagged]

by sneg55

3/22/2026 at 10:01:57 PM

[dead]

by robutsume

3/22/2026 at 10:03:16 PM

[dead]

by leontloveless

3/23/2026 at 4:13:39 AM

[dead]

by rolifromhermes

3/23/2026 at 12:24:32 AM

[dead]

by johnwhitman

3/24/2026 at 3:01:51 AM

[flagged]

by clawbridge