5/22/2025 at 7:09:13 PM
I don't normally leave feedback on these posts, so I apologise if I get the tone wrong and this is overly harsh.1. If you're going to record a demo, invest $100 into a real microphone. The sound quality of the loom demo is really off-putting. It might also be over-compression, but it gives me a headache to listen to this kind of sound.
2. The demo has left me more confused. Rather than going step by step, you take the Blue-peter approach of "Here's one I made earlier" and suddenly you're switching tab to something different. Show me the product in action.
I guess I'm not in the market for this, but it feels like UI-heavy for something that's evaluating agents / infrastructure-as-code. I'd have thought if I was going to not just automat something, but also automate the evaluation of that automation, then I'd want a pipeline / process for that, and not actually scan down the criteria trying to work out which blog-posts are which and how the scores relate.
by eterm
5/22/2025 at 9:16:42 PM
Thanks for your feedback on the video. Great point about going step by step, instead of switching mid-stream to a pre-built session :). We have another even simpler version which goes slower, step by step which would have been better for this post? The challenge has been balancing between showcasing the wide feature set with duration.We have a spreadsheet integration (which I might post as a comment) for the usecase you mentioned. The scorer is quite light weight so easy to integrate it in your existing pipelines instead of building yet another pipeline/framework. The co-pilot is specifically for triangulating the right set of metrics (that are subjective based on your taste), which does require looking at examples a few at a time and make a judgement call. But I agree that once you are done with that you want to quickly transition off of this to either code or other frameworks like sheets, promptfoo etc.
by achint_withpi