5/21/2025 at 2:30:10 AM
Hi! I have a WIP of this over at https://talktrainer.app/ -- I just added Dutch to it.It uses OpenAI's realtime API to simulate either a tutoring session (the speaker will revert to English to help you) or a first date or business meeting (the speaker will always speak the target language)
You can see the AI's transcriptions but not your own, limitation of the current OpenAI API but definitely something I can fix.
The prompts are like this: https://gist.github.com/jc4p/d8b9d121425ec191d62602d8720eeed... and the rest of it is a Nextjs app wrapped around the WebRTC connection.
I'm not fully in love with the app so I'd love any feedback or hearing if it works well for you -- It doesn't have a lot of features yet (including saving context) and if you bump into the time limit just open it up in incognito to keep going.
by jc4p
5/21/2025 at 9:04:59 AM
This is great! Maybe some more tourist-related scenarios, like "ordering at restaurant", "resolving dispute about rental car crash" etc? :-)The "next level" feature would be to get it to speak even simpler, with some hints about how to reply, for the beginners. I don't know how that would ideally look, but maybe a button to pop up some "key words" or phrases that one could use? (Even so, I found myself using the little I know, so it's obviously somehow working even though my knowledge is extremely basic.)
This is one of the places where I feel LLM's can do something good for the world, giving a safe playground for getting experience with speaking new languages without the anxiety of performing badly in front of other people – and hopefully make it easier to connect with real people in that language later.
by internet_points
5/21/2025 at 4:39:55 AM
This is really impressive! Great job.One small piece of feedback… There were a couple times where I asked to learn something, and it asked me to repeat a phrase back, which was great. But when I repeated it back, I know I didn’t quite nail it (eg perhaps said “un” instead of “una”) and rather than correcting me, it actually told me I did it perfectly. Maybe there’s some tuning with the prompts that may help turn down the natural sycophancy of the model and make sure it’s a little more strict.
Keep up the great work!
by rowborg
5/21/2025 at 2:45:41 PM
One modification I would suggest is to add a bit more to the initial prompt like:"write as if you are a person from {{REGION}}. Modify your language to proficiency level {{PROFICIENCY_LEVEL}}"
that way I could for example, speak as if it's someone using Mexican Spanish vs Madrid Spanish vs Chilean Spanish, etc.
Secondly, you could include the user's speech transcribed as part of the conversation window
by sampleuser58
5/21/2025 at 7:15:46 PM
Amazing idea, do you think this should be a freeform text field the user can enter to add their own prompts to or should it be a checkbox/select on the homepage so the user can pick from a limited set?by jc4p
5/21/2025 at 8:03:35 PM
I think a drop down when you first choose the language, and it can be optional. You can test it with a few languages at first, to see how it is.by sampleuser58
5/21/2025 at 2:41:36 PM
Bit of feedback:I've learned Japanese a while back but haven't practised in a long time.
1. it would be awesome if this could transcript what I just said in japanese to be sure that it got me
2. I don't know kanjis that well, so reading is hard, having a button to have the AI repeat the sentence would be quite useful.
Other than that, I could definitely use something like that for practice
by d--b
5/21/2025 at 6:25:06 AM
Did you just add Dutch as per the submitter’s request or was it part of your plan prior?Curious because I’m trying to learn Romanian, and since it’s a less common language there are fewer resources available. So I wasn’t sure if you added Dutch with minimal amount of effort following the poster’s request.
That said, I gave your app a try with Spanish and it looks pretty good! But I didn’t see a Help page to clarify how I’m “supposed” to interact. Eg I tried saying in English “I don’t understand” (even though I know how to say that in Spanish) and it responded in Spanish which may be hard for absolute beginners. Although full immersion is much better way to learn.
I can try playing around more with it to give you some feedback.
by jeffwass
5/21/2025 at 8:16:46 AM
> Eg I tried saying in English “I don’t understand” (even though I know how to say that in Spanish) and it responded in Spanish which may be hard for absolute beginners.I tried to use ChatGPT as a "live" translator with my in laws and I noticed it is extremely bad at language "consistency" or at understanding your intent when it comes to multiple languages.
It will sometimes respond in English when you talk to it in the foreign language, it will sometimes assume that a clear instruction like "repeat the last sentence" needs to be translated, etc.
I don't know how the person above is approaching the problem but your experience is consistent with mine and I don't think GenAI models (at least OpenAI ones) are suitable for the task.
by iLoveOncall
5/21/2025 at 7:14:52 PM
I just added Romanian for you -- here's the entire diff for adding a new language (as long as it's in OpenAI's training data) -- https://images.kasra.codes/romanian_diff.pngPlease let me know if it works, and I'll definitely work on adding in instructions for the expected interactivity, thank you!
by jc4p
5/21/2025 at 11:31:24 AM
I'm a native Dutch speaker and tried this out for a bit. It works impressively well although it might be challenging for complete beginners. Maybe you can add an option for the trainer to use more simple language for beginners?I tried practicing some verb conjugations. The trainer displayed some fill-in-the-blank sentences like "she ... home after class", asking me to conjugate "to walk" in that sentence. However, the audio actually pronounced the full sentence "she walks home after class", giving away the answer.
by gield
5/21/2025 at 2:39:00 PM
Just tried this for Spanish and it works incredibly well. I have been hacking on something similar for translation (it's really quite easy too, just a few prompts), but I was using Google Translate's interface for vocalizing! This is seriously good stuff, really nice work putting it together.I will probably use something like this for language practice.
by sampleuser58
5/23/2025 at 1:18:38 AM
Please add Mandarin Chinese! :) would love to try thisby fhatfield
5/21/2025 at 4:13:18 AM
I just tried it and it works perfectly. The color scheme and font size could be touched up to look better. Just out of curiosity, is $10/month enough to cover the (unlimited) API cost? Do you estimate how many percentage of your users will use more than $10 API fee each month?by ciaovietnam
5/21/2025 at 4:39:09 AM
Thanks so much for trying it out! The realtime API is actually very cheap especially for short connections, for each user who uses it 30 minutes a day every day in a month it costs me ~$5 and I assume the average user is going to use it way less than that (although i have 0 users right now haha)by jc4p
5/21/2025 at 4:14:26 AM
This is great! Well done.I've used the realtime API for something similar (also related to practicing speaking, though not for foreign languages). I just wanted to comment that the realtime API will definitely give you the user's transcriptions -- they come back as an `server.conversation.item.input_audio_transcription.completed` event. I use it in my app for exactly that purpose.
by valleyer
5/21/2025 at 4:41:25 AM
Thank you so much!! While the transcription is technically in the API it's not a native part of the model and runs through Whisper separately, in my testing with it I often end up with a transcription that's a different language than what the user is speaking and the current API has no way to force a language on the internal Whisper call.If the language is correct, a lot of the times the exact text isn't 100% accurate, if that's 100% accurate, it comes in slower than the audio output and not in real time. All in all not what I would consider feature ready to release in my app.
What I've been thinking about is switching to a full audio in --> transcribe --> send to LLM --> TTS pipeline, in which case I would be able to show the exact input to the model, but that's way more work than just one single OpenAI API call.
by jc4p
5/23/2025 at 12:30:22 AM
Heyo, I work on the realtime api, this is a very cool app!With transcription I would recommend trying out "gpt-4o-transcribe" or "gpt-4o-mini-transcribe" models, which will be more accurate than "whisper-1". On any model you can set the language parameter, see docs here: https://platform.openai.com/docs/api-reference/realtime-clie.... This doesn't guarantee ordering relative to the rest of the response, but the idea is to optimize for conversational-feeling latency. Hope this is helpful.
by pbbakkum
5/21/2025 at 5:33:34 AM
Ah yes, I've seen that occasionally too, but it hasn't been a big enough issue for me to block adoption in a non-productized tool.I actually implemented the STT -> LLM -> TTS pipeline, too, and I allow users to switch between them. It's far less interactive, but it also gives much higher quality responses.
Best of luck!
by valleyer
5/21/2025 at 8:09:53 AM
[dead]by altern8