3/31/2025 at 7:34:18 PM
I want to make sure I am understanding this.If I have an application that uses OpenAI models then this service can act as a proxy between the my application and the actual OpenAI service. It logs all of the requests that get sent to the OpenAI api. At some later time, I can go through and choose a subset of the API calls and mark them (I'm guessing as good or bad) and these get converted into a training set. I then have to create a value function as its own API that I run on my own servers somewhere (like fly.io). Then I start a training run, which I assume will use some open source AI model to regenerate responses to the training set derived from my initial OpenAI api calls. It then takes the generated responses from that open source model, sends them to my value function api which scores them, and then uses that score to apply some RL magic to the base open source model. At the end of this process I have an open source model that has been RL trained based on the captured api calls as well as the scoring from the value function.
I suppose the argument here is, a RL trained open source model will perform your task better than the base OpenAI model. So your target market is, people already using OpenAI api, they have the desire and funds to experiment with RL, they have the capability of defining a value function, they are able to sift through their api calls to identify the ones that aren't performing well and isolate them, and they are willing to swap out their OpenAI model with an open source model that is RL trained if it can be shown it is more accurate.
I would guess this market exists and the need is real. Defining a value function is much easier than building the infrastructure to RL a variety of open source models. So someone who wants to do this may appreciate paying for someone else who has already set up the infrastructure. And they don't want to host their own model (their already paying for OpenAI model hosting) so maybe they have no problem paying you for inference as well.
Whether or not this succeeds as a business really depends on how effective RL is for the clients you find. There are two paths here, RL is wildly successful and therefore so are you. Or RL fine-tuning is unable to keep up with foundation model advancements and clients will learn it is better to wait it out on the big fellas rather than go through the time-consuming and costly process.
by zoogeny
3/31/2025 at 8:12:00 PM
Wow! Thanks for taking the time to think through it. Yes, you are exactly right! I couldn’t have described Augento better than this myself. We actually want to make writing a reward function completely optional and build some RLHF (Reinforcement Learning from Human Feedback) loop soon. One of our long-term goals is to bring the cost of RL down so the barrier of entry to fine-tuning big models is not as high as it currently is.by Zollerboy1
3/31/2025 at 8:13:20 PM
I agree with you that the market exists and, as a result, solutions to this problem also exist in abundance. The most difficult part about a building a product like the one presented here is making something super generic that works for a wide swath of use cases. If you simplify the stack to more bespoke/custom approach, the build burden decreases exponentially.For the folks who are already technical in this vertical, especially ones that leverage a low cardinality architecture (one or two models, small subset of tasks, etc), this type of thing is quite easy to build yourself first as a working prototype and then only slightly more difficult to productionize & automate.
I have some in-house infra that does similar work: monitors inputs and outputs from models, puts them in a UI for a human to score/rank, preps a DPO dataset for training, kicks off training run. The total amount of calendar time I spent from prototype to production was roughly two person weeks. Changing the human intervention mechanism to an automated reward function would be an hour or two worth of work. If I had to make this work for all types of users, tasks, and models — no shot I'd have the time personally to pull that off with any reasonable velocity.
With that said, having a nice UI with great observability into the whole process is a pretty big value-add to get out of the box as well.
(EDIT: for clarity, not affiliated all with the OP project/org)
by spmurrayzzz
4/1/2025 at 5:50:55 AM
Does it mean after I successfully trained the Open Source Model, I don't need OpenAI anymore?by _ink_
4/1/2025 at 6:02:29 AM
Yes, indeedby lukasego