Kilo Code: Speedrunning open source coding AI

3/26/2025 at 5:49:16 PM

I talked to JP about this project. He's excited in a way that's hard not to catch. His core thesis is simple: coding agents are the future, and the winners will be the ones who can execute.

It won’t be OpenAI or Claude. They have other priorities. The real opportunity is for small teams who move fast, stay close to users, and keep ahead of the pack.

That makes sense. LLMs are already powerful, almost magical at times. But using them as coding agents still takes real work. They can do amazing things, and be frustrating and make a mess. There are rough edges and big gaps.

Those will get fixed. The question is who gets there first.

The counterpoint to that would be that all these tools are gonna end up sort of the same and there won't be a way to differentiate.

Which way will it play out? I'm not really sure.

by adamgordonbell

3/26/2025 at 5:53:48 PM

You're too kind!! The speedrun ethos has already been super fun with this team. :)

I hope that we'll also be able to bring enough skills, strategy, and taste to the space. Time will tell, but we're giving it our best shot!

by janpaul123

3/27/2025 at 6:35:58 PM

> It won’t be OpenAI or Claude. They have other priorities. The real opportunity is for small teams who move fast, stay close to users, and keep ahead of the pack.

It could be small teams within those companies, however, who have special access to the full power of the platform.

I don't know where things will end up, but I'm leaning towards the big guys dominating coding, because they do coding themselves, and so are automatically extremely sensitive to the issues for that particular task. They can build tools for themselves and share with the world.

It's true that an external team may end up doing it better and be used internally. I just don't think the outcome is predictable at this point.

by garyrob

3/27/2025 at 12:30:26 AM

> Those will get fixed. The question is who gets there first.

Why is being first an advantage? Developers change tools all the time. If OpenAI API is giving better resources, just switch. If Kilo Code or whatever tool is producing fewer bugs, switch.

by gtirloni

3/27/2025 at 11:37:02 AM

Hence my comment:

>The counterpoint to that would be that all these tools are gonna end up sort of the same and there won't be a way to differentiate.

I mean, I have LLM preferences, but the competition does force a downward pressure on the market. The competition benefits me but not OpenAI.

by adamgordonbell

3/26/2025 at 6:57:29 PM

>We want to build for the dream of billions of programmers; billions of artists; billions of scientists—using computing as moldable clay.

At that point, why even keep humans in the loop. Just let it exist in the background and generate better ideas than any human would anyway.

by realharo

3/26/2025 at 7:42:50 PM

Just let the car go wherever it wants, faster than any human would anyway. Or - just let the fire exist in the background, it'll generate more heat than humans will ever need anyway.

The point isn't to make humans pointless. The point is to empower humans. We need to remain in control, and be the users of the tool, and not a tool for some mindless system.

Intelligence and consciousness are separate things - you can automate a lot of intelligence without having even rudimentary consciousness or self awareness - LLMs currently in operation are at most pseudo-conscious within their test-time contexts, and even then, every pass resets whatever awareness there might be. With millions of tokens context length, that might start to enter into the realm of a thing we should be concerned about, but even then, there's no ongoing persisted state to carry anything between passes aside from the text or image patch tokens or what have you.

What this means, essentially, is that we can augment our human capabilities without usurping the agency of some artificial being - these AIs are not individual moral agents in their own right, and likely will never be unless we specifically build that recursion and persistent state into the models, and incorporate a realtime adaptive self and world construct.

This means that the software is a tool - use the tool to augment your life and be a force multiplier in everything you do. The scope of intelligence augmentation has leapt from spreadsheets to nearly every cognitive domain in the human experience - people proficient with excel were better accountants than people using pen and paper. People using delivery vans are better than people using a horse and wagon. This new technology means that people using AI will be able to do more, faster, and likely better, than people who don't.

With neural lace - whatever form it ends up being - we'll end up with genuine exocortex augmentation. Even without that direct integration, however, the human in the loop is the entire point of this technology. There's a tiny list of things conscious machines might be good for, and all sorts of deep and obvious arguments for not creating a new, self aware, agentic species that's immediately in conflict with and on a trajectory to outcompete humans.

Use the tool of AI to be a force multiplier for everything in your life that AI is capable of handling well. This makes you a benevolent dictator for life in your own life, delegating everything that makes sense, working with it to free up your resources for the things that you decide are the highest priority. Spend more time brainstorming, building relationships, deploying resources, and getting the most out of being human. This is the promise of AI, and why people get excited about it. We're going to have a huge struggle, as humanity, in dealing the empowerment and amplification of everything in our lives. Making sure that we retain agency, that humans are ultimately in charge of our own destiny, is probably the most important principle to adhere to, above all others.

by observationist

3/26/2025 at 7:51:37 PM

I was mainly referring to phrases such as "billions of scientists" - the point of science is to solve problems and discover knowledge. If you have an AI good enough to achieve that (billions of scientists), that means it can probably progress without being actively driven by people at all - and probably do a better job at it too.

We can still do things "for fun", but our efforts will be more toys than serious projects (except when it comes to relationships with other people).

by realharo

3/26/2025 at 6:16:16 PM

Gemini 2.5 seems to be the current king of AI coding. In addition to being "smart", it has a huge context window. The one-shot examples on Twitter are astounding.

by xnx

3/26/2025 at 6:19:34 PM

Really? I would have said it was Claude-3.7 based on experience.

by outside2344

3/27/2025 at 9:53:49 AM

I too thought Sonnet 3.7 was hard to beat. But from my few interactions with Gemini 2.5, it is freakishly good. The level of discourse is somewhere near talking to an experienced staff engineer that is almost always right.

by khaledh

3/26/2025 at 6:31:39 PM

Claude was until yesterday

by Workaccount2

3/27/2025 at 12:05:48 PM

The world of ai moves so fast ...

Seriously though, I get a really great feel from claude 3.7 but let's see google gemini 2.5 , I have tried it but didn't like it's "style" but I just used it for a simple go official language sort example , nothing too fancy. Might need to benchmark it more

by Imustaskforhelp

3/26/2025 at 6:20:34 PM

JP here! Would love to answer your questions!

We listed a bunch of ideas for larger improvements in the blog: Instant app; Up-to-date docs; Prompt/product-first workflows; Browser IDE; Local/on-prem models; Live collaboration; Parallel-agents; Code variants; Shared context; Open source sharing; MCP marketplace; Integrated CI; Monitoring/production agents; Security agents; Sketching..

What would you like us to build?

by janpaul123

3/26/2025 at 7:24:39 PM

The obvious thing would be LSP interrogation, which would allow the token context to be significantly smaller than entire files. If you have one file open, and you are working on a function that calls out to N other modules, instead of packing the context with N files, you get ONLY the sections of those files the LSP tells you to look at.

by arevno

3/26/2025 at 8:01:09 PM

Yes! This is high on our list. Context window compression is a big deal, and this is one of the main ways to do it, IMO.

Have you tried any tools that do this particularly well?

by janpaul123

3/26/2025 at 7:31:05 PM

One thing that I think would be cool, and that could perhaps be good starting point, is a TDD agent. How I imagine this working:

User (who is a developer) writes tests, and a description of the desired application. The agent attempts to build the application, compiles the code, runs the tests, and automatically feeds any compiler errors and test failures back to agent so that it can fix it's own mistakes without input of the user.

Based on my experience of current programming agents, I imagine it'll take the agent a couple of attempts to get an application that compiles and passes all the tests. What would be really great to see is an agent (with a companion application probably) that automates all those retries in a good way.

i imagine the hardest parts will be to interpret compiler output, and (this is where things get real tricky) test output, and how to translate that into code changes in the existing code base.

by amarant

3/26/2025 at 8:04:12 PM

Yeah, this is a great workflow! What's more, agents are particularly good at writing tests, since they're simpler and mostly linear, so they can even help with that part.

As to your point of automating retries, with my last prototype I played a lot with having agents do multiple parallel implementations, and then pick the first one that works, or lets you choose (or even have another agent choose).

Have you tried any tools that have this workflow down, or at least approach it?

by janpaul123

3/27/2025 at 2:28:11 AM

I have not! But I've often been frustrated when an agent gives me code that doesn't compile, and I keep thinking that would be a solvable problem. One computer program should be able to talk to the other

by amarant

3/26/2025 at 8:03:19 PM

This is going to sound a bit odd, but I suggest you detail what your tools do well and what they struggle with. For example I love Haxe, which is a niche programming language primarily for game development.

The vast majority of the time I try to use an llm with it, the code is essentially useless as it will try to invent methods that don't even exist.

For example if you're coding agents are really only good at JavaScript and a little bit of python, tell me that front and center.

by 999900000999

3/26/2025 at 8:13:25 PM

Good point! In that sense we're similar to most AI coding agents in that the languages we do well are the languages the mainstream LLMs do well. We might zoom in and add really good support for particular languages though (not decided yet), in which case we'll def mention that front and center!

Have you found any LLMs or coding agents that work well with Haxe? It might be a bit too niche for us (again, not sure yet), but I'd be very curious to see what they do well!

by janpaul123

3/26/2025 at 8:45:44 PM

https://www.greptile.com/

This works well, however it literally will need to digest an entire repository. So for example if I feed it a repository for a haxe framework, it'll work much better than something like Chat GPT.

by 999900000999

3/26/2025 at 9:25:17 PM

Thanks! That does look like a great tool.

by janpaul123

3/26/2025 at 8:38:54 PM

In my unqualified opinion, LLMs would do better at niche languages or even specific versions of mainstream languages, as well as niche frameworks, if they were better at consultig the documentation for the language or framework, for example, the user could give the LLM a link to the docs or an offline copy, and the LLM would prioritise the docs over the pretrained code. Currently this is not feasible because 1. limited context is shared with the actual code, 2. RAG is one-way injection i to the LLM, the LLM usually wouldn't "ask for a specific docs page" even if they probably should.

by Zondartul

3/26/2025 at 9:26:51 PM

100% agreed on both points. Point 1 relates to https://news.ycombinator.com/item?id=43486526 as well. It's one of the biggest challenges, though maybe it'll automatically get better through models with bigger context windows (we can't assume that though)?

by janpaul123

3/27/2025 at 3:44:12 AM

Local Agent, 100%.

If I'm just exploring ideas for fun or scratching my own itch, I have no desire to be thinking about a continuous stream of expenditure happening in the background when I have an apple silicon mac with 64GB of ram fully capable of running an agentic stack with tool calling etc.

Please make it trivial to setup and use a llamafile or similar as the LLM for this.

by eutropia

3/27/2025 at 1:27:42 PM

I agree, this would be good to have soon, especially as good models keep getting smaller, and hardware keeps getting cheaper.

by janpaul123

3/26/2025 at 7:41:47 PM

Your timeline is indeed crazy fast. Did you recruit the 9 others in your first week? Did you pitch and secure funding in that week too? reply

by spankalee

3/26/2025 at 8:05:23 PM

In roughly the last 2 weeks, yes. It helped that everyone involved also activated their network, so we got a multiplicative effect. Can't speak to funding for now unfortunately.

by janpaul123

3/26/2025 at 6:35:11 PM

To be honest, I have yet to use any GenAI tool that makes me feel like it can replace me just writing code (I write this as an Engineer turned PM, that would really like the promise of GenAI to be true). What I'd actually like to see more than anything is a GenAI "agent" that can act like the /user/ of my software to help me identify gaps in documentation as the software changes and the documentation drifts/becomes stale, and generally help me to explore code paths that are off the happy path but will get hit by real users. I think there's a lot more value in having GenAI help me test/document my work than in trying to do my work, because I will always write higher quality code than GenAI can produce.

by tristor

3/26/2025 at 6:37:14 PM

Totally agree!

by janpaul123

3/26/2025 at 5:48:09 PM

I usually support everything but isn't this literally just "we are trying to fork roo code and pay $15 of your tokens so we can show VCs that we have users" - as in people like free money. But that wouldn't be enough of a bribe to justify using the fork over the real project for me at least

by bluelightning2k

3/26/2025 at 5:56:39 PM

Our backers have no interest in fake metrics. ;) It's a good way to quickly get feedback, which is key to our strategy. Totally fine to keep using Roo Code (or Cline) of course!

by janpaul123

3/26/2025 at 7:40:39 PM

>We don't take any cut, either per token or per top-up. In the future we'll add more LLM providers.

So where does the money come from?

by quikoa

3/26/2025 at 8:21:43 PM

At this point we plan to monetize enterprise features (LDAP login, things like that).

by janpaul123

3/27/2025 at 12:55:07 AM

> Since then I’ve been thinking a lot about AI agents. They’re the closest I’ve seen to the dream of “programming for all”

Programming is programming for all, you just have to put some effort in. This is alike to saying you wish there was a ‘Spanish for all’ so you invented Google Translate

by yapyap

3/26/2025 at 6:29:41 PM

Their approach seems very compelling, but I don't understand if/how they are building a differentiated product? The space of code agents is already pretty crowded.

by cpldcpu

3/26/2025 at 6:33:59 PM

We’ll take all the features people love in other products, and implement them in a coherent package as quickly as we can.

by janpaul123

3/26/2025 at 6:51:05 PM

Is this a real open source project or a pretend 'source (maybe-kinda) available' kinda thing where the really useful part is stuffed behind a paywall and the 'open source' part is just to lure you into the walled garden?

by rounce

3/27/2025 at 7:10:52 AM

how is it different from roo code or github copilot?

by jiri

3/26/2025 at 5:33:11 PM

>This success taught me an important lesson: an extremely fast-moving community can achieve incredible things

what a trite observation

by suddenlybananas

3/26/2025 at 5:58:18 PM

And yet few do it!

by janpaul123

3/26/2025 at 8:12:34 PM

Is that true? Plenty of fast-moving communities have achieved amazing things, both in computing and outside of it.

by achierius

3/26/2025 at 8:16:58 PM

Still a small percentage! Let's get more of this happening.

by janpaul123

3/26/2025 at 6:31:01 PM

"Our goal is to rapidly make the software better, not to have a shiny website."

Weird flex, but OK.

by handfuloflight

3/26/2025 at 6:32:37 PM

Excellent flex. The purpose of a thing is what it does.

by rpmisms

3/26/2025 at 6:34:58 PM

More context, the statement is in reply to the question they posed on their own site: "Why is this website so ugly?"

First, I don't think the website is ugly per se. Second, the weird flex is assuming that a website which had more effort put into it than what they put into theirs is "a shiny website."

Design aside, there's absolutely no statements regarding what makes this product differentiated. So it doesn't even succeed on its own terms.

by handfuloflight

3/26/2025 at 5:55:41 PM

[flagged]

by oulipo

3/26/2025 at 6:06:41 PM

[flagged]

by hsuduebc2

3/26/2025 at 6:09:31 PM

[flagged]

by oulipo

3/26/2025 at 7:34:48 PM

[flagged]

by lnenad

3/26/2025 at 5:40:14 PM

What is so problematic now about Silicon Valley is that the true lesson of hackers have been completely lost.

The primary purpose of the hacker mindset was protection against groupthink and cargo culting. And now it seems all people in tech only cargo cult and only groupthink.

by ilrwbwrkhv

3/26/2025 at 5:47:47 PM

That ship sailed the moment the super-individualist hacker turned into the 10x programmer.

by hengheng

3/26/2025 at 7:07:11 PM

True super-individualist hacker won't use AI tools. :-)

by NoOn3

3/26/2025 at 5:40:32 PM

$15 worth of Claude 3.7 tokens? Does this translate to 3 “hello world” files?

by heymax054

3/28/2025 at 6:00:42 PM

mmm... like a whole Harry Potter series

by ofou