12/30/2025 at 10:27:22 PM
I built Claude Cognitive because Claude Code kept forgetting my codebase between sessions.The problem: Claude Code is stateless. Every new instance rediscovers your architecture from scratch, hallucinates integrations that don't exist, repeats debugging you already tried, and burns tokens re-reading unchanged files.
At 1M+ lines of Python (3,400 modules across a distributed system), this was killing my productivity.
The solution is two systems:
1. Context Router – Attention-based file injection. Files get HOT/WARM/COLD scores based on recency and keyword activation. HOT files inject fully, WARM files inject headers only, COLD files evict. Files decay over turns, co-activate with related files. Result: 64-95% token reduction.
2. Pool Coordinator – Multi-instance state sharing. Running 8 concurrent Claude Code instances, they now share completions and blockers. No duplicate debugging, no stepping on each other.
Results after months of daily use: - New instances productive on first message - Zero hallucinated imports - Token usage down 70-80% average - Works across multi-day sessions
Open source (MIT). Works with Claude Code today via hooks.
GitHub: https://github.com/GMaN1911/claude-cognitive
Happy to answer questions about the architecture or implementation details.
by MirrorEthic
12/31/2025 at 3:35:39 AM
wasn’t Claude.md’s doing this?by mehmetkose
12/31/2025 at 5:16:44 PM
CLAUDE.md is static same content every session (and a soft 40k character limit)This is dynamic attention routing. Files get scored based on what you're actively discussing. mention "auth" and auth-related docs go HOT (full injection), related files go WARM (headers only), unrelated files stay COLD (evicted).
Scores decay over turns. If you stop talking about auth, those files fade back to COLD automatically.
Plus multi-instance coordination, concurrent Claude sessions sharing completions and blockers so they don't duplicate work.
25k character limit on injection, so you compact less and stay focused where its needed. Ive also seen it help alot with the post compacting context wobble that occurs.
by MirrorEthic
1/1/2026 at 1:31:35 AM
Thanks, now i have clear understanding. The thing is, what’s the token/result ratio with this extension? Is there any way to benchmark?by mehmetkose
1/4/2026 at 6:12:00 PM
Good question. I don't have a formal benchmark yet, but here's what I've measured in practice:Token reduction: 64-95% depending on codebase size and work pattern. The variance is because it depends on how many files are in your .claude/ directory and how focused your session is.
How to measure yourself: 1. Check `~/.claude/attention_history.jsonl` after a session 2. Run `python3 ~/.claude/scripts/history.py --since 2h` to see what got injected vs evicted 3. Compare your token counts before/after in Claude Code's usage stats
The 25k character injection cap is the key constraint (adjustable), forces the router to prioritize ruthlessly instead of dumping everything in.
A proper benchmark comparing baseline Claude Code vs claude-cognitive on identical tasks would be useful. If someone wants to build that, I'd happily collaborate.
by MirrorEthic
1/4/2026 at 6:12:36 PM
Sorry about the late reply!by MirrorEthic