Show HN: Turn native language audio into flashcards and shadowing practice

6/25/2026 at 3:48:46 PM

Very cool!

Are you willing to share more technical details?

- Which data sources do you ingest?

- How do you transform and enrich the data? How does your pipeline look?

- What are your key challenges?

- Which tools do you use? What is your 'stack'? (Stanze, wordfreq, Whisper, wn, ...)

Background: I am currently building a multi-lang vocabulary hub for language learning. The goal is to match core words/lemmas to their senses/concepts, and then be able to generate multi-language flash cards.

I am still stuck on the sense alignment and fingerprinting (example: should 'to shop', 'einkaufen', ' alışveriş yapmak' and 'go shopping' point to the same concept of 'shop'?), but in a later stage I want to allow user-submission and data enrichment for IPA, pictograms [1] and audio.

[1: https://arasaac.org/pictograms/search]

Use-case (the dream): I come back from language class, I input new vocab and I output new Anki cards that work across all my fluent languages.

Currently, I mostly find myself knee-deep in problems of linguistics, NLP, Python and getting an LLM to do exactly what I want. At the same time it is a super fun project, and really makes me feel the joy of programming again. LLMs are magic, time just flies by, and all the random projects I always wanted to do suddenly materialize.

For coding, I mostly use free Gemini and some deepseek-v4-flash via openrouter to keep a tight oversight and understand the problem space. Maybe this slows me down, but agentic code jsut does not align with me. Overall, I haven't spent more than 2 € in total.

So far, surprisingly, the biggest problem is the lack of high-quality, free input data (example: English has the Oxford 5000 words as core vocabulary, but it is difficult to find the same for e.g. Turkish).

2nd place is the lack of high-quality synsets/wordnets (cross-language is mostly incomplete), and the 3rd place is getting LLMs to reliable play to their strength (on paper, a LLM is the perfect tool to provide multi-lang sense equivalents)

I plan to do a full writeup sometimes, but first I need it to work :)

by altgans

6/25/2026 at 6:07:41 PM

Thanks! As far as I understand your idea is to starts from the word and pulls examples from some huge data source. My approach is the other way round: I start from a source (the audio that you want to learn), and the tool extracts only the words that appear in it, with their meaning in that context. I think that hugely simplifies the implementation, and it is more useful for learners. They learn the meaning in a particular context.

As for the stack: STT with Soniox (word level timestamps), then spaCy for segmentation, POS and lemmas, then AI enrichment, correcting the lemma when spaCy is wrong. Some languages have no spaCy model at all and others are unreliable. I am trying to do spaCy thing in LLM then. Plus some extra magic for Japanese and Chinese.

by alder

6/25/2026 at 6:39:36 PM

Awesome, and yes, totally makes sense -- you are more learner-centric that way.

Having the full sentence context is actually one of the things I have been thinking about a lot -- this helps both the learner as well as the POS detection in Stanza. I always decided against, because I wanted to build agnostic flash-cards.

However, as your approach allows on-the-fly generation of flash cards, you always stay close to the learner progress. I could (e.g.) pick some Gutenberg fairy tales, allow the learner to read them in their target language and provide bi- and omni-directional translations across all languages. Creating flash cards from the source material keeps the learner in progress (context), allows to learn new words step-by-step (discovery), as well as providing a fun learning experience and measurable progress. Similarly, instead of fairy tales, we could use some series in combination with its subtitles. This allows video-progress. Awesome x2!

Sidenote: The awesome part about HN is that I get to chat with like-minded people and directly grasp some new inspiration. Probably I ought to visit some in-person hacker spaces :)

by altgans

6/25/2026 at 4:10:34 PM

OT but is any work anywhere being done with Japanese pronunciation problem?

Japanese language are often described as using multiple type of alphabets - kanji, kana, numbers, and English alphabets sometimes - and pronunciations of especially kanji is not very well constrained, creating tons of homophones and homographs, e.g. "koushou" shared across more than 20 words, and the character for "life" said to be involved in more than 150 differently read parts of words.

Even OT but Unicode code space used for Japanese Kanji is famously shared with Chinese Hanzi, leading to ambiguities.

This situation is causing AI-based TTS(and also image generators) trained directly on Unicode text to go weird on kanji, even for simple ones as "tomorrow". Classical pre-LLM Japanese TTS avoid this by operating on generated or manually specified pronunciations, skipping kanji altogether, which do occasionally lead to wrong readings, but won't lead to sound generation code creating butchered middle-of-road sounds.

It doesn't seem like most or any of AI TTS tackle this problem, but I'm not in that field. Do anyone know the statuses on it?

by numpad0

6/25/2026 at 2:08:26 PM

I don't know what resolution or display you built this on, but a heads up the initial impression on my 4K monitor is that everything is incredibly tiny.

by __float

6/25/2026 at 2:29:15 PM

To be honest I haven't tested it on a 4K monitor yet, so I am not surprised. There are two controls above the transcript that change the font size and the line spacing, which should help a bit for now. Something to fix, thanks!

by alder

6/25/2026 at 2:16:51 PM

Is it possible to add traditional characters for mandarin?

Also the pinyin for 誰/谁 is coming through as shuí, whilst this character has two pronounciations, I believe shéi is the more common one.

by jrrv

6/25/2026 at 2:49:38 PM

Thanks! Chinese and Japanese as source languages are still experimental, I did my best to support them but I have to rely on people who actually know the language and this kind of feedback is really useful. I'll look into adding traditional characters and fixing the pinyin.

by alder

6/25/2026 at 2:55:00 PM

No worries, I appreciate the effort. I did go back and listen and they are indeed pronouncing sheí in the audio too.

I use a firefox extension to convert simplified to traditional, looks like it's open source so that may be of some use to you: https://github.com/tongwentang/tongwentang-extension.

Although there are some clashes that it does not handle, e.g. 隻 and 只 are both 只 in simplified, you just have to know which one it is from context, but the extension fails to convert to 隻 where appropriate.

by jrrv

6/25/2026 at 5:23:48 PM

Thanks, really useful extension link. Proper traditional support probably needs a context aware layer, not a plain lookup. I will experiment with additional LLM enrichment. Appreciate you digging into this!

by alder

6/26/2026 at 4:55:17 PM

I deployed a new release with a Simplified/Traditional switch at the top of the workspace. Thanks for the OpenCC link. Please let me know if anything's off.

by alder

6/25/2026 at 10:09:17 PM

+1 this request!

I'd been dreaming about trying to make something almost exactly like this app, but have absolutely no clue how to even begin going about it. You've nailed it! I've been manually cropping out audio clips from Audacity, and grabbing the relevant translations from my textbook PDF's to make Anki cards that look 3.5% as nice as yours do. Really nice work on this!

by the_gastropod

6/26/2026 at 4:59:42 PM

Simplified/Traditional support is now deployed. Really glad you like it, thanks!

by alder

6/25/2026 at 5:51:46 PM

This is awesome.

Tested for Japanese. No problems so far, except sometimes repeating the desired number of times didn't work (mobile). Seems to work now.. But looping infinitely produces only three repetitions.

Really good UI with little friction: easily hone in on sentences, easily move on or jump ahead, see vocabulary, create Anki deck. Took a while to discover loop settings, but it's a good choice.

Only now discovered the "custom span loop mode". Great! I was about to ask for it!

AI mode is unobtrusive and helpful.

At last, found something that could need a touch up.. The starter deck from the example story is a bit nonsensical. It features words like URL and site from the Librivox intro. お is "translated" as "honorific" which is kind of true, but it's only a marker. A beginner might not know this. たち shows as answer "plural marker", there it worked. Integrating a flashcard app is no small feat. Impressive. I wonder what algorithm was used. Does it scale?

That's all. Thumbs up!

by nakedneuron

6/25/2026 at 7:57:45 PM

Thanks so much for testing Japanese! I'll review the starter deck and what goes into it. At the moment it's a bit random, and the nonsensical LibriVox intro spills in. I'll also try to filter out the function words. I have no way to judge that myself, my Japanese is zero.

Glad the mobile interface works. There's an iPhone app too, basically the web app but with native audio support. It can play in the background. Though it's not ready for release yet.

The built-in flashcards use FSRS-5, and I believe it's O(1) so it should scale but I haven't actually done any performance testing yet, to be honest.

The infinite loop is a bug, I'll dig into that too. And I'm not sure if you noticed, but I also put up an FSI Japanese course here: https://lingochunk.com/c/fsi-japanese. Again, I can't judge the quality myself, so I'd really welcome any feedback.

And thank you for the kind review!

by alder

6/25/2026 at 4:03:14 PM

If you don’t mind sharing, how much does that cost you to integrate the translation API, and the text to speech API you’re using? Just curious as I’ve been thinking about doing something in that area (not anki or translations, but also language learning related).

Great project, and congrats for launching :)

by dgellow

6/25/2026 at 4:26:40 PM

No TTS at all in my app :) that was a deliberate choice, only STT. I experimented with many STT options, even self hosting Whisper, but ended up with Soniox. A bit expensive, but reliable. For the AI enrichment I went with Gemini Flash. I also tried Gemma 31B, which is really cheap and surprisingly good, on par with Gemini Flash, but extremely slow everywhere I tried. So you can make your own calculations :) And thanks for the congrats!

by alder

6/25/2026 at 10:14:17 PM

Looks great technically. I am a language nerd as I have worked in multiple countries like China or Austria or the US and I am from Spain.

If you want to sell your app you will need to explain it for humans, like this guy is doing: https://www.youtube.com/watch?v=ROEJe-nBmqQ

You will need to get better at public speaking and telling stories. I recommend going to your closest Toastmasters club.

by cladopa

6/26/2026 at 8:08:58 AM

Yes, you're right, public speaking is not really my thing. I'm a tech nerd. I should probably rely on partnerships with other people to help it grow. But the first thing I wanted to find out was if the app is actually useful to anyone other than myself. And the response from this community is encouraging. I also thought about letting content producers publish their content/courses as collections here with restricted access. The app already supports that. It is also possible to realign the transcript to their "gold" transcript. They would have instant Anki cards in multiple languages, and the shadowing practice for their subscribers. I would have users, win-win :) But again I have no idea how to do marketing.

by alder

6/25/2026 at 1:56:41 PM

This is awesome! I’ll be lurking for new data sources. I’m working on a self-hosted language app more focused around cloze and sentence mining into Anki. I love seeing more stuff happening in this space

by 3stacks

6/25/2026 at 2:24:09 PM

Thanks! I am glad you like it! I essentially mine the source audio, and all examples have cloze style gaps (blurring, in my case) that are revealed on the back of the card. I also beep the word in the sentence when you try to play it on the front card in built-in SRS system. Unfortunately that is not implemented in the Anki export, but it is technically possible.

by alder

6/25/2026 at 3:22:15 PM

This is really cool, just as I'm starting to get towards the back end of the Kaishi 1.5k deck so this will be perfect for my Japanese studies. Thanks for sharing.

by deaton

6/25/2026 at 4:03:25 PM

Thanks, I hope it will be helpful! If anything looks off, please let me know.

by alder

6/25/2026 at 3:42:12 PM

I like the structure of their privacy policy page [0] and how it appears that they are not data-greedy.

And the site itself is a great idea and implementation, though the font size and family of the ui (not of the actual playback area) has a lot of room for improvement, but those are just minor changes.

[0] https://lingochunk.com/privacy

by qwertox

6/25/2026 at 4:42:51 PM

Thanks, just to clarify "they" is actually only me :) I'm a contractor and run this through my own company. I try to collect as little as possible. And you're right about the UI fonts it's clearly something I need to fix ASAP. Appreciate the feedback!

by alder

6/26/2026 at 6:21:45 AM

Interesting. Building something similar myself. Very interesting to see what people are cooking up for language learning using the latest tools.

by closetkantian

6/25/2026 at 2:37:48 PM

What are you doing for Chinese word segmentation/pinyin?

by dirteater_

6/25/2026 at 3:19:45 PM

For segmentation and POS I rely on spaCy zh_core_web_sm, pinyin from pypinyin library. Also the small correction level on top. But I am not a Chinese language expert to judge if it really works and I'll rely on feedback from the users to improve it.

by alder

6/25/2026 at 3:09:48 PM

I also built a tool to help me study Spanish. I really like the idea of shadowing, so I built a tool that lets you take any YouTube video and generate a sentence-by-sentence exercise to help you repeat the speaker's phrases.

https://talkhabit.com/shadow Or example, of one exercise: https://talkhabit.com/shadow?videoUrl=https%3A%2F%2Fwww.yout...

Stuff I need to work on: - It only works with videos that have auto-generated captions - It works best with monologue videos

by pzagor2

6/25/2026 at 2:23:22 PM

Just tried it with an unsupported language and it still worked I set it to Chinese and inputted the audio. Still got correct results.

by Koaisu

6/25/2026 at 3:53:35 PM

Yes, the transcriber API I use (Soniox) actually supports more than 60 languages. I just didn't have any automated testing for them. The way I tested was to find audio with a reliable reference transcription and put it through my pipeline. Then compare the results. Also some languages don't have reliable libraries to get part of speech and lemmas, something that flashcard needs.

by alder

6/25/2026 at 2:45:17 PM

Very cool! I'm also learning Greek and it's amazing how many resources are becoming available.

by jcg591

6/25/2026 at 3:00:42 PM

Thanks! Yes, it's getting better for Greek but still not on par with other languages. I completed the only 2 Greek levels on Duolingo and they are really boring compared to the German one I am doing now. Easy Greek is a bit above my level, and the number of YouTubers in Greek is tiny compared to German.

by alder

6/26/2026 at 10:26:31 AM

what i would like is a tool where it can listen and check my pronunciation, rather than me listening

by HSO

6/26/2026 at 11:08:38 AM

There's a related feature already. The mic icon in the gap control plays the chunk, records you, and plays it back (the recording stays local). But if you mean a tool that automatically grades your pronunciation, I never thought about that. Interesting area to explore though.

by alder

6/25/2026 at 2:13:52 PM

Very nice work. I'm going for a different thing, but my audio2anki tool [1] is about as streamlined as I could make it to turn a YouTube URL I want to learn into a stack of Anki flashcards, purely locally.

[1]: https://github.com/hiAndrewQuinn/audio2anki

by hiAndrewQuinn

6/26/2026 at 4:02:13 PM

[dead]

by yeah879846