alt.hn

3/7/2026 at 8:13:02 AM

Show HN: OculOS – Any desktop app as a JSON API via OS accessibility tree

https://github.com/huseyinstif/oculos

by stif1337

3/7/2026 at 10:16:35 PM

Update: Since the initial post, I've shipped several features based on feedback:

- Screenshot capture: GET /windows/{pid}/screenshot → returns PNG

- Batch operations: POST /interact/batch → multiple actions per request

- Wait/poll: GET /windows/{pid}/wait?q=Submit&timeout=5000

- Python & TypeScript SDKs (local install, PyPI/npm coming soon)

- OpenAPI spec, Dockerfile, 7 example scripts

- Demo GIF in README showing Calculator automation via Claude Code

Thanks for the feedback everyone!

by stif1337

3/7/2026 at 1:59:01 PM

I wonder whether this requires particular GUI toolkits to be used, such as WFC. In any GUI there are enough "bad boy" toolkits which just "draw lines" and thus are not accessible at all.

by ktpsns

3/7/2026 at 8:55:43 PM

No toolkit dependency, OculOS reads the OS-level accessibility tree, which is toolkit-agnostic:

- Windows: UI Automation (works with Win32, WPF, WinForms, Qt, Electron) - Linux: AT-SPI2 (GTK, Qt, Electron) - macOS: AXUIElement (Cocoa, Qt, Electron)

The coverage varies by toolkit. Win32/WPF/GTK expose rich trees. Electron apps expose key elements but the tree is shallower. Custom-drawn UIs (games, OpenGL) have minimal or no accessibility tree. That's the main limitation.

by stif1337

3/7/2026 at 8:03:14 PM

Cool idea! Does this work with electron apps? I tried automating some apps and the problem was that a lot of stuff was never reachable, only via screenshot + click

by Frannky

3/7/2026 at 8:54:02 PM

Yes, electron apps expose a reasonable accessibility tree through Chromium's UIA/AT-SPI bridge. We've tested with Spotify (Electron/CEF), VS Code, Slack, and Chrome itself.

The tree is shallower than native Win32/WPF apps, but key interactive elements (buttons, inputs, lists) are usually exposed. You can check what's available with:

  curl "localhost:7878/windows/{pid}/find?interactive=true"

by stif1337

3/7/2026 at 8:06:30 PM

Maybe I need to familiarize myself with MCP, but wouldn't this make way more sense as a simple CLI tool instead of an HTTP service with a REST API?

by tadfisher

3/7/2026 at 8:53:36 PM

It actually supports both you can use it as a plain REST API (no MCP needed) with any HTTP client:

  curl localhost:7878/windows
  curl -X POST localhost:7878/interact/{id}/click
MCP mode is an optional layer for AI agents (Claude, Cursor, etc.) that already speak MCP. The REST API works standalone for scripts, testing, CI/CD — no AI required.

by stif1337

3/7/2026 at 8:15:07 PM

Good idea, I'm looking forward to seeing it grow - especially the Python and TypeScript bindings.

by lioeters

3/7/2026 at 8:55:10 PM

Thanks! Python and TypeScript SDKs are high on the roadmap.

by stif1337