1/15/2025 at 8:50:03 PM
I tried this yesterday, asking it to create a simple daily reminder task, which it happily did. Then when the time came and went I simply got a chat that the task failed, with no explanation of why or how it failed. When I asked it why, it hallucinated that I had too many tasks. (I only had the one) So, now I don't know why it failed or how to fix it. Which leads to two related observations:1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.
2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?
by UmYeahNo
1/15/2025 at 9:20:31 PM
Same experience except mine insisted I had no tasks.It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.
Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.
by Terretta
1/16/2025 at 3:14:05 PM
> It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.
Example docs site that uses it: https://docs.browserbase.com
by reustle
1/16/2025 at 4:42:38 AM
Re: 2 — for the same reason that you shouldn't host your site's status page on the same infrastructure that hosts your site (if people want to see your status page, that probably means your infra is broken), I would guess that OpenAI think that if you're looking at the support docs, it might be because the AI service is currently broken.by derefr
1/16/2025 at 12:38:14 AM
You can hardly blame a product for not doing something that we don't know for certain to be possible.by fooker
1/15/2025 at 9:21:44 PM
I've thought about this a lot too and my guess is that because foundational modals take a lot to train, I don't think they are trained fairly often, and from my experiences you can't train in new data easily, so I think you'd have to have some little up to date side system, and I suspect they're very thoughtful about these "side systems" they place, from trying to build some agent orchestration stuff myself nothing ends up being as simple as as I expect with "side systems" and stuff easily goes off the rails. So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.by neom
1/15/2025 at 10:40:52 PM
> So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature."working like OpenAI said it should" is a weird thing to put low priority. Why do they continuously put out features that break and bug? I'm tired of stochastic outputs and being told that we should accept sub-90% success rates.
At their scale, being less than 99.99% right results in thousands of problems. So their scale and the outsized impact of their statistical bugs is part of the issue.
by miltonlost
1/15/2025 at 10:50:12 PM
Why are you setting your bar this way? Is it because of how they do their feature releases (no warning of it being an alpha or beta feature)? Their product, ChatGPT was released 2 years ago, and is a fairly complicated product. My understanding was the whole thing is still a pretty early product generally. It doesn't seem unusual that any startup doing something as big as they are to release features that don't have all the kinks ironed out. I've released some kinda janky features to 100,000s of users before not totally knowing how it's going to preform with all of them at that scale, I don't think that is very controversial in product development.Also, I was specifically talking about it being able to understand the features it has in my earlier comment, I don't think that is the same problem as the remind me feature not working consistently.
by neom
1/15/2025 at 11:07:09 PM
> I've released some kinda janky features to 100,000s of users before not totally knowing how it's going to preform with all of them at that scale, I don't think that is very controversial in product development.Oh, that's because modern-day product development of "ship fast, break things" is its own problem. The whole tech industry is built on principles that are antithetical to the profession of engineering. It's not controversial in product development, because the people doing the development all decided to loosen their morals and think its Fine to release broken things and fix later.
That my bar is high and OpenAI is so low is its own issue. But then again, I haven't released a product where it could randomly tell people to poison themselves by combining noxious chemicals or whatever other dangerous hallucination ChatGPT spews. If I had engineered something like that, with the opportunity to harm people and being unable to guarantee it wouldn't, if I had engineered that misinformation was a possibility to be created at scale, if I had engineered this, I would have trouble sleeping...
by miltonlost
1/16/2025 at 12:31:18 PM
So what's your plan? Opt out of ever using the products? You're a hypocrite if you continue to use them with a stance like that.by neom
1/15/2025 at 10:14:45 PM
I regularly use Perplexity and Cursor which can search the internet and documentation to answer questions that aren't in their training data. It doesn't seem that hard for ChatGPT to search and summarize their own docs when people ask about it.by yosito
1/15/2025 at 10:19:42 PM
You would want a feature like "self aware" to be pretty canonical, not based on a web search, and even if they had a discreet internal side system it could query that you controlled, if the training data was a year old, how would you keep it matched from a systems point of view over time? Also it's unclear how the model would interoperate the data each time it ran on the new context. It seems like a pretty complicated system to build tbh, esp when maintaining human created help and docs and FAQs etc is A LOT simpler and more reliable source of truth. That said, my understanding is behind the scenes they are working towards the product we experience just built around the foundational model, not THE foundational model is it pretty much is today. Once they have a bunch of smaller llms that do discreet standard tasks set up, I would guess they will become considerably more "aware".by neom
1/15/2025 at 10:22:50 PM
Now imagine giving this "agent" a task like booking a table at a restaurant or similar."Yeah sure I got you a table at a nice restaurant. Don’t worry."
by baxtr
1/16/2025 at 12:04:57 AM
> 2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?I agree, but then again, if you're a dev in this space, presumably you know what keywords to use to refine your search. RAG'ed search implies that the user (dev) are not "in the know".
by behnamoh
1/16/2025 at 1:57:04 AM
> it hallucinated that I had too many tasks.How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).
by varispeed
1/15/2025 at 10:10:46 PM
Buggy af right now, 95% tasks failed and I get a ton of emails about itby m3kw9
1/15/2025 at 10:59:15 PM
Very, very, very buggy and really looks extremely low effort as with many OpenAI feature rollouts. Nothing wrong with an MVP feature, but make it at least do what it’s supposed to do and maybe give it 10% more extensibility than the bare bones.by ProofHouse
1/16/2025 at 1:58:17 AM
I question the same things frequently. I routinely try to ask chatgpt to help me understand the openai api documentation and how to use it and it rarely is helpful, and frequently tells me things that are just blatantly untrue. At least nowadays I can link it directly to the documentation for it to read.But I dont understand why their own documentation and products and lots of examples using them wouldn't be the number one thing they would want to train the models on (or fine tune, or at least make available through a tool)
by netcraft
1/16/2025 at 3:42:10 AM
You mean converting $20 monthly subscribers into less profitable API users?by _factor
1/16/2025 at 7:45:07 AM
Wait so... they made the LLM itself control the scheduling?Yeah that's not gonna end well. I thought they, of all people, would know the limitations and problems.
by Mo3
1/15/2025 at 10:03:18 PM
Yeah, I saw the 4o with Tasks today, tried it and asked "what is 4o with Tasks", it had no idea. I had to set it to web search mode to figure it out.by ElijahLynn
1/16/2025 at 12:40:29 AM
If you ask me to describe how a human brain works, I'll have no idea and woukd have to search the web to get an (incomplete) idea.by fooker