DeepSeek Introduces Vision

6/18/2026 at 7:39:33 AM

For those not trying, this allows Deepseek to understand a picture (instead of just extracting text from it), and it can describe what's in the picture, but this is not an image generation system, so you can't ask it to modify an image.

Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).

by jiehong

6/18/2026 at 12:22:17 PM

DeepSeek interpreting screenshots and images I send it at fractions of what I pay Claude and ChatGPT, for me, is of far higher priority than supporting dictation. There are workarounds for dictation but not image processing.

by testbjjl

6/18/2026 at 2:03:18 PM

just use one of the various cheap gemini models

by anthonypasq

6/18/2026 at 4:49:01 PM

Indeed, Gemini really is incredible at image analysis. Yesterday I pointed it at some sloppy handwritten notes and asked it to add up the numbers in the right column, and it did it no problem. I've also used it to find out what TV show or actor is on screen, and various other things. It's quite impressive.

by freedomben

6/18/2026 at 8:23:00 PM

> Indeed, Gemini really is incredible at image analysis. Yesterday I pointed it at some sloppy handwritten notes and asked it to add up the numbers in the right column, and it did it no problem. I've also used it to find out what TV show or actor is on screen, and various other things. It's quite impressive.

I do not know if it works as well as Gemini, but Salesforce (of all places) has a model that does something similar.

What's "neat" about the Salesforce one is that you can run it locally and just iterate it over as many images as you feel like.

For instance, it should be possible to take a movie, pull a hundred images out of the h265 file, have the salesforce model evaluate what is happening at that moment in the movie, and then use that to create an index.

That's just ONE use for it, and I can think of dozens.

On a 5090 it was able to generate text descriptions of a folder full of approximately 500 images in under a minute. (Anecdotal evidence, admittedly.)

https://huggingface.co/Salesforce/blip-image-captioning-base

I just looked up some articles on it here, and it looks like it's fairly old, so YMMV.

by johnvanommen

6/19/2026 at 2:35:59 AM

There is a newer BLIP-2, but it's also fairly old. You're better off with many other local models such as Moondream 3 https://huggingface.co/moondream/moondream3-preview.

Moondream is great as it can point, count, perform bounding boxes, descriptions, and visual grounded reasoning.

by brianjking

6/18/2026 at 5:44:08 PM

Gemini pretty clearly has the best underlying model, and the worst RL and post-training of the lot.

by winstonp

6/18/2026 at 9:26:57 PM

I got a shirt I liked from a conference, and I didn't know who made it. It was soft, fit comfortably... I took a picture of some random numbers on a tag and Gemini parsed out the numbers and found the manufacturer. Pretty neat

by MattSayar

6/18/2026 at 3:59:03 PM

gemini models are also fantastic at understanding non spoken sounds

by carterschonwald

6/18/2026 at 8:51:19 PM

I don't know what runs on my phone's Google Translate app, but whatever it is, they are doing an insult to their models by it being so bad. It's amazing at picking up sound if spoken directly into the unit, but if trying to hold any kind of conversation or listen to anything even a little bit far away, it falls completely apart, is good for basically nothing.

This is obviously different than the models most people are discussing here, which are much bigger. But it's damaging the Gemini brand in general, by association, if nothing else.

by jauntywundrkind

6/19/2026 at 4:30:42 PM

I’ve long wondered if this was deliberate - only conversations where the participants are overtly using the translator get parsed.

by Royce-CMR

6/18/2026 at 5:32:36 PM

You can do that with smaller models at home. Gemma-4-E4B will run on a 12gb GPU, and supports audio, image, video input

by segmondy

6/18/2026 at 7:48:36 PM

12GB GPU is a lot

by NooneAtAll3

6/18/2026 at 1:42:28 PM

Or you could just use a CNN...

by corimaith

6/18/2026 at 1:57:22 PM

CNNs are not SoTA anymore when it comes to large models, and also are not used to provide interpretations of images as text, but rather to classify, do semantic segmentation, etc.

by bigmadshoe

6/18/2026 at 5:55:43 PM

CNNs are fine when trained with a good recipe. There are very few good studies comparing them with proper hyperparam search and all the training tricks applied consistently. Transformers are good but ViT vs CNN is not some settled issue. Transformers are more hyped and more popular with the tech enthusiasts who just read forums and news, but if you need stuff done, CNNs are still great.

by bonoboTP

6/18/2026 at 6:30:20 PM

I agree, but since we're talking about imagine understanding with text output, clearly a CNN is unsuitable. My previous comment was overly reductive and CNNs can still be SoTA depending on your performance metrics. I spent the earlier part of my career training CNNs, and they are very pleasant to work with.

by bigmadshoe

6/18/2026 at 10:11:03 PM

You can run a CNN and use the downsampled feature map the same way as patch tokens.

by bonoboTP

6/18/2026 at 8:32:25 PM

>Transformers are more hyped and more popular with the tech enthusiasts who just read forums and news, but if you need stuff done, CNNs are still great.

Vits are straight up more popular for ML research now, it's not just 'tech enthusiasts'.

by famouswaffles

6/18/2026 at 10:11:31 PM

There's a dearth of research properly comparing them.

by bonoboTP

6/19/2026 at 2:42:10 AM

I'm talking about research pushing state of the art in computer vision. Vits have 100% become more popular than CNNs in most CV research.

by famouswaffles

6/19/2026 at 7:40:39 AM

Yes but not based on rigorous comparison. I'm not saying ViT is bad. But it took over mainly because it's the shiny new thing. It very bandwagon-Y even among PhD students.

by bonoboTP

6/19/2026 at 8:18:55 AM

> There's no 'rigorous comparison' that puts CNNs over Vits

That’s not accurate. My team wrote a paper for school in which a resnet model out performed a ViT model of the same size on almost all metrics. These were smaller models, but depending on the use case that might be what you want.

by 0x20cowboy

6/19/2026 at 8:52:09 AM

Don't know if it's you (did you publish?). I read about something similar but it had its issies:

- Tuning hyperparameters to gain improvement on a dataset when you're constantly looking at the answers is pretty meaningless. It's basically testing on the training data.

- Eval on ImageNet1k alone (very small, useless for the real world) made me wonder if it wasn't just overfit to the training set. Would it perform better training on the datasets used for the foundation models ? I doubt it.

Well I'm not saying CNNs are bad or useless at any rate.

by famouswaffles

6/19/2026 at 2:09:34 PM

Exactly. Most of the comparison papers are useless. This is hard stuff, only few people have the chops it takes to even attempt this. You can of course train some models and then post the numbers, that's not the hard part.

by bonoboTP

6/19/2026 at 8:01:54 AM

There's no 'rigorous comparison' that puts CNNs over Vits in quality and Vits unlocked more use cases easier than CNNs did. That's why they're more popular, not because it's 'bandwagon-y'.

by famouswaffles

6/19/2026 at 2:07:49 PM

What's the use case enabled vs running a ConvNeXt or EfficientNetV2 and using the resulting strided features as you would the resulting tokens of a ViT? I'm not saying that ViT is worse. Just saying that the scholarship around comparing them is very bad or nonexistent. You have to properly tune the hyperparam enters on both sides in a fair way, and use all the general modern training tricks also on the CNN side to make it fair.

by bonoboTP

6/18/2026 at 4:25:28 PM

Can you say more about that? I haven't kept up.

by tehjoker

6/18/2026 at 6:51:55 PM

CNNs excel in vision tasks where you have limited compute, limited memory, limited data, and want something that works super well and quick. People usually don't hook CNNs up to a transformer to get language understanding either, you have to train bespoke CNNs for specific tasks

ViTs excel where you're unbounded in compute + data and also want text understanding or have a conversation about an image

by crypto420

6/19/2026 at 2:11:47 PM

These are vibes. ViT has been shown to work fine on small data with proper hyperparam and most of what you mention is actually doable just fine with the other architecture as well.

by bonoboTP

6/18/2026 at 3:20:35 PM

Transformers are superior

by Jabrov

6/18/2026 at 1:45:48 PM

Which?

by nullstyle

6/18/2026 at 8:17:15 AM

Can you explain what the benefits are of actually "talking" with the bot instead of typing and reading?

As someone who would rather send a slack message to a coworker rather than actually walking over and talk to them, the idea of having to talk with my laptop is not appealing at all, haha.

by paulluuk

6/18/2026 at 9:24:14 AM

If you spend your life sitting in a chair, that's fine. I tend to get all kinds of ideas, questions, and research needs while I'm walking around. Typing a paragraph or two or context takes too much time and is very risky. Especially when driving. But also just walking, cooking, cleaning, etc. Sometimes it's just not practical - winter, carrying stuff... I mostly feel privileged if I can just sit at a computer and type my question and have the time to read the answer.

by cicko

6/18/2026 at 9:15:35 AM

I am someone that prefers a slack message to a coworker than talking to them and I use AI.

My current flow is: Google Eloquent to capture 127WPM (my typing is best case is 65wpm). This lets me get the thoughts out without thinking too much about structure or flow, the same way I would brain-dump type it.

Next I use AI to compress, summarize, and restructure to create a clear coherent message for my peer to read (which is way faster for them).

When communicating with AI, its the same thing, except I skip the second step since AI does a good job at understanding my ramblings.

----

It drives me crazy that some cultures only send voice messages to each other. It drives me crazy they can't be respectful of my time and use STT+AI to convert their 90 second monologue to a few written sentences.

by itake

6/18/2026 at 11:54:57 AM

Slightly off-topic but: does it concern you that you're letting atrophy a very important skill for human communication (organising your thoughts and ideas, and then clearly communicating them to others)?

by garblegarble

6/18/2026 at 2:22:48 PM

Tbh, I never have been a good writer. A college professor once told me I am a terrible writer. I've tried to get better (I read a lot, I write a lot, I've taken multiple college level writing course). I even started a blog (https://kcoleman.me).

I kinda view myself as a wheelchair user. I'm bad at walking so I use at wheelchair so I can at least have a semblance of decent communication. I don't think my ideas are not worth sharing, but I'm just bad at writing them in an engaging way.

The scarier thing for me is coding. I am good at coding. But I don't even read a single line of code any more.

by itake

6/18/2026 at 1:11:58 PM

As someone who's still learning English, this is one thing I'd never use AI for, at least not in the near future, simply because thinking and structuring my thoughts before typing is the same as it is before speaking and actually talking to other people can't be outsourced to AI.

But I imagine if I'd been a native speaker I wouldn't mind using AI like OC does since it's a convenience. Same way I use a calculator for two digit multiplications in real life but spent years learning to do it manually in school.

by limflick

6/18/2026 at 2:24:25 PM

You're probably further into english than I am into vietnamese, but I really like using AI to help me improve my vocabulary and understanding of the language.

I avoid using AI as a direct translation tool, but its super useful for me to translate complex english ideas to vietnamese.

by itake

6/18/2026 at 3:27:21 PM

As a native English Speaker I can tell you that I would have some trouble talking out an email. I like the back and forth in my head of editing as I go. Text messaging may be fine but email is more difficult for me to just talk through.

I am loving the conversation here though of how people are using speech to talk to LLMs or not though, it is something that no one talks about much

by billnad

6/18/2026 at 5:28:00 PM

This worries me tremendously. In fact, it is one of the major points of value that i deliver as an engineer. Organizing and iteration on thoughts is not trivial or easy, but it is very important!

by a34729t

6/18/2026 at 8:31:06 PM

> Organizing and iteration on thoughts is not trivial or easy, but it is very important!

Two of the silliest things that helped me in my career:

* I worked at fast food restaurants in high school. This instills a near pavlovian response to client requests; if at the age of sixteen you can deal with someone who's mad because there isn't enough cheese on their pizza, it goes a long way in the real world.

* My first I.T. job was in an office where the vast majority of the people who worked there had never used a computer at all. Just to stay employed, I had to resist the urge to explain things in a complex way. When I'm trying to sell an idea to a group of people, I do my best NOT to ignore the people in the room who may not understand that idea well. I think that engineers often have a bad habit of getting into engineering arguments with management in the room, where they take things to a level of complexity where management may not understand what's being talked about. Bringing things back down a few levels goes a long way towards getting management to sign off IMHO. Unfortunately, it's a double edged sword, and it can fall flat when management is especially well informed. Classic information asymmetry.

by johnvanommen

6/18/2026 at 11:28:51 AM

I would find this behavior extremely aggravating from a co-worker. If you can’t be bothered to edit down your ramblings by hand, just don’t send me anything at all.

by jnovek

6/18/2026 at 8:58:08 PM

Why do we have to insist that messages must be made with lots of effort even if it is hard to understand for the reader? As long as what needs to be communicated is done respectfully, I don't see a value for it to be done manually, especially if the handwritten one is hard to read and thus wasting readers' time.

We don't hold the same standards for spellings. Rather we expect messages to be spell checked before being sent.

by ivell

6/18/2026 at 2:18:31 PM

Maybe you missed my point?

I can either edit down my rambling by hand (costs about 10-30min depending on the length) or I can ask chatGPT for assistance, where I manually edit chatgpt's edits for factual correctness and tone.

--- STT

Like, besides the lease risk, I think 30 to 50% of the business is going to end the moment the owner stops showing up and the motorbikes are gone. Either, I think it was Roger mentioned or you mentioned the Google reviews all mentioned that people go to the bar because it attracts other motorcycle people. And, you know, we don't have an existing, like, network to come in and grow this. And so we might see a 30 to 50% decline in revenue within a month or two with nothing ready to, like, backfill that with. And if our main goal is to make a private club or event space, ideally, I'd like to have some, like, something substantial to, like, commit to that spot. Like, right now, we're three fucking dudes with, you know, a little bit of a vision, but not really. And, you know, we're buying what will be a failing bar unless we, you know, figure out how to run events or use that backspace.

polish

---- gpt5.5

Polished version:

Separate from the lease risk, I think there is a real chance that 30–50% of the business disappears once the current owner stops showing up and the motorbikes are gone.

Either Roger mentioned this, or we discussed it separately, but the Google reviews seem to reinforce the same point: a meaningful part of the bar’s appeal is that it attracts motorcycle people. We do not currently have an existing network or community that can step in and replace that traffic.

That means we could see a 30–50% revenue decline within the first month or two, with no clear plan in place to backfill it. If the main goal is to turn the space into a private club or event venue, I would feel much better if we had something substantial already committed to that location.

Right now, we are three guys with a loose vision, but not much concrete traction. Without a clearer plan for events, memberships, or activating the back space, we may effectively be buying a bar that starts failing the moment the current identity and customer base disappear.

by itake

6/18/2026 at 8:33:12 PM

I vote for number 2.

by johnvanommen

6/18/2026 at 12:46:35 PM

> It drives me crazy they can't be respectful of my time and use STT+AI to convert their 90 second monologue to a few written sentences.

I have used Whisper to transcribe audio into text in the past. You could probably build a pipeline for that, whether running locally or in the cloud - and the run the transcription through the same summarization agent.

by KronisLV

6/18/2026 at 12:43:05 PM

What did you do prior to 2023?

by jamwil

6/18/2026 at 3:30:18 PM

Sending me your AI compressed ramblings = straight in the bin

by adammarples

6/18/2026 at 5:03:50 PM

Just my two cents: I have coworkers who use AI to drive basically all their communication in Slack and I absolutely hate them with a deep passion. I actively avoid meetings, conversations, and exclude them from everything possible.

If you use AI to drive your communication with other humans, you suck.

by tailscaler2026

6/18/2026 at 1:09:02 PM

It’s crucial to use for driving/walking.

One problem has been ChatGpt/Claude apps don’t really do this well. They use weak and/or non-reasoning models for voice interaction and the UX is not optimized for hands free.

I wrote an iOS chatbot app mainly for this purpose for myself and family/friends. Allows starting/sending voice prompts with the action button so I never have to look at the screen. Supports any model at any reasoning level so conversations are not dumbed down. Added a video transcription tool so any model can “read” YouTube/Tiktok videos and chat about them. Great to discuss lectures on tech topics.

It takes slightly longer to use a reasoning model for voice interaction use but I prefer the intelligence. The latency can be minimized a few ways, bidirectional streaming helps. It’s TTS agnostic, I’ve got a few selectable providers and the output can be prompt styled “use a chill tone that’s not too eager”.

by WhitneyLand

6/18/2026 at 2:14:45 PM

Gemini 3.1 flash live is a native audio to audio model with reasoning. But it's still not a SOTA level model

by WarmWash

6/18/2026 at 3:14:28 PM

What are the use cases of an LLM while walking or driving, that also require high reasoning?

by gbalduzzi

6/18/2026 at 5:20:49 PM

Most of the problem is that for voice chat, you usually get no reasoning at all and no tool use at all to research or ground assumptions.

For example for voice ChatGPT still uses a quantized gpt40 non-reasoning model that hallucinates pretty frequently. It also doesn’t do much automatic search for updated information and fact checking.

I usually don’t find I need high, usually DeepSeek v4 with medium reasoning is sufficient.

However if it’s important chat like brainstorming on complex topics I sometimes bump it up.

OpenAI has a new voice api that supports adjustable reasoning, but ChatGpt is not using it currently.

by WhitneyLand

6/18/2026 at 6:52:11 PM

With a sufficiently sophisticated harness you can actually do quite a lot by just talking to your AI. I have regularly dictated to build things on my phone while walking to lunch for example.

by shostack

6/18/2026 at 3:30:10 PM

I mean, even applied voice 'models' suck for this.

For some godawful reason, Apple Maps voice directions assume that you also understand what it omits. So if it says "turn right in 500 meters" "250 meters" and then you stop at an intersection after 150 meters and it says "turn right", it expects you to understand that it doesn't mean the immediate right at the intersection, but the next one [because you still haven't driven the full 250m]. It is nuts and I have no clue how that has ever gotten past testing.

What it should do is say nothing until I have to turn, or say "turn right in 100 meters" "turn right".

by jorvi

6/18/2026 at 4:41:45 PM

This is one thing Waze I think seems to do better than the competition. And they have a ton of different voices.

They also clearly show which voices can do street names (which is hugely helpful). For some reason the Australian and British accented voices feel more polite than the Americans

by Melatonic

6/19/2026 at 6:52:53 AM

How about google maps says "keep north"..as if I am sitting in my car with a magnetic compass...gets my goat everytime

by mohi13

6/18/2026 at 8:19:50 PM

I very begrudgingly started paying for grok for this exact reason. They nailed the voice UI and it works incredibly well with android auto unlike Claude&Gemini (which don't work with android auto at all) and chatgpt (which works well but has hardcored system instructions that make it's voice mode feel like a dopamine deprived Gen Z)

by Ldorigo

6/18/2026 at 1:00:01 PM

I hardly type at all now. I use Handy (free) with Parakeet and use its post-LLM processing feature with a custom prompt tailored towards coding, so I can say things like "Have it go to slash remote dash control" and it'll output "/remote-control". Converts brackets, etc.

Everything is almost instant, it's insanely fast, and lets me work on multiple different agents/windows at the same time fast with cmux.

I use the same thing to talk to people on Slack, iMessage, etc now when I'm working from home instead of typing.

I also can help articulate my thoughts better when I'm thinking them literally out loud instead of just sitting silent and typing them on a computer for hours.

It's just something that you need to try and get used to because I also thought it was something I wouldn't like at first.

by rob

6/18/2026 at 2:16:49 PM

Can you share more information on the post-LLM processing and the prompt you use? I would like to try this out but don't see any post-LLM options in Handy.

edit: nevermind, found info on the docs about how to enable post processing. Would still be interested in your prompt though if you don't mind sharing!

by thefreeman

6/18/2026 at 2:25:08 PM

You have to enable "Experimental Features" under "Advanced."

This is the prompt I use (it's probably overkill and can be condensed):

https://pastebin.com/raw/RUVAqLCU

by rob

6/18/2026 at 2:46:37 PM

What is Parakeet?

by ezconnect

6/18/2026 at 3:17:01 PM

I believe this is the correct link. I use it too in Handy, for English and Spanish transcriptions: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

by rafaelm

6/18/2026 at 2:51:07 PM

Maybe they meant narakeet?

https://www.narakeet.com/tools/

by stijnveken

6/18/2026 at 3:39:33 PM

Parakeet is the name of a speech to text model from Nvidia. Roughly comparable to whisper from openAI.

It's the model doing the work inside the wrapper that an app provides.

by dghlsakjg

6/18/2026 at 3:44:32 PM

Yep, here's the v2 and v3:

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

It's almost instant on my new M5 Max w/ 36GB of memory, but I used both with Handy on my previous 2019 Intel Mac w/ 16GB memory and was completely surprised at just how fast it was for being on-device! Not instant, but only a couple seconds.

by rob

6/18/2026 at 5:12:10 PM

I’m using it on an M3 max 32gb, and I’m getting 60-70x realtime for recordings and crazy good accuracy. I can get an hour of audio transcribed in a minute. Similar results from Whisper, but half the speed.

Transcription this good used to cost A LOT, now it rounds down to free.

by dghlsakjg

6/18/2026 at 10:05:57 AM

I thought this way until I tried it, and the main difference is that when I'm managing tons of agents at once or just reviewing some plan / approving next steps, or need to give quick feedback/ask a simple followup, the voice interface makes me much faster and more likely to continue because it's lower friction (and in many cases that's good, though not all) and can be hands-free.

Actually, my thoughts on this matter changed so much that it inspired me to get much more into voice controls because I realized how this same problem was basically why some people sucked at remote work or weren't able to properly use tools like claude code, because it was essentially the same problem but worse (typing / messaging feeling too high-friction or raising the barrier for participation). I have a way to let Claude call me now to tell me stuff when I have a bunch of instances out doing stuff and then leave to go home.

I'm trying to get that better integrated in my devloop because I think it makes managing >4 agents simultaneously much more feasible and natural for some people (I used to play Starcraft a lot so I'm used to the multitasking, but it still takes sustained willpower to be constantly "driving" or monitoring things, or to field questions), especially ones who have never served as TLs or people managers before. IMO it's a big performance roadblock for a lot of developers to be treat directing multiple agents simultaneously as some kind of high-stakes/high-cost thing. The kind of developer who would not say anything in a team meeting unless prompted or who thinks everything is stupid by default (because they are afraid of making decisions / being wrong even if only briefly) is both very common and reluctant to work this way, but also really probably needs it to be as productive as more skilled developers.

by weitendorf

6/18/2026 at 11:28:46 AM

I don't know about you, but I force myself to read the whole spaghetti thought process of any AI that's actually working on code, and make sure I understand what the hell it just said before I ask questions or give it a green light. Even or especially when whatever it said is full of fluffy stuff about having understood the problem space. That's usually where a well-placed question can bring the entire structure crashing down.

"You're right to push back" has become the gold standard phrase I'm looking for from these things to assure myself that I'm covering all the bases and understanding what it's building (not that that's enough, and not that it isn't still going to build some ungodly blob anyway).

I kinda like using voice to jot down my next questions or iterate on things, but there's a clear danger to it, which is that you may inadvertently be signing off on stuff you haven't thoroughly read. If there's one thing about LLM-written code, it's that the devil is in the details.

by noduerme

6/18/2026 at 7:47:05 PM

[flagged]

by killix

6/18/2026 at 11:57:50 AM

I type as fast as I talk so for majority of my LLM usage I don't need text to speech.

But I love the chatgpt voice interface e.g. on a long drive when I can use it to learn about random stuff (btw, turn advanced voice off for such usage).

Other part though is, hacker news vs regular population, majority of which would much much rather talk and listen than type and read.

by NikolaNovak

6/18/2026 at 12:21:57 PM

I like to talk (stt) but I don't want tts to talk back to me I just want to read the response. voice synthesis is a waste for me personally.

by kingkongjaffa

6/18/2026 at 9:22:05 AM

When I was still using OpenAI, I used it among other things to translate from English to Spanish while talking to Spanish-speaking people in person.

I understand a bit Spanish but I don’t speak Spanish yet, and they don’t speak English.

I speak English to the AI and end with “translate to Spanish, translation only”, and then the AI says the thing I was saying in Spanish (not perfect but good enough, and also it has a slightly weird accent that might be it using English or English influenced text to speech even when speaking Spanish sentences?).

by QuantumNomad_

6/18/2026 at 11:50:25 AM

I've been using ChatGTP by voice for things like cooking and house repair stuff. It's quite convenient for situations in which your hands are busy.

Other week I fixed a a water valve. After planning the thing with ChatGTP I brought the new valve. Then I described what I was seeing as I swapped the old valve for the new one to make sure everything was right. Really cool experience!

by pid-1

6/18/2026 at 1:00:15 PM

Faster, and that's it. If you don't need precision (like with prompting LLMs) the speed gain is massive (*for most people)

by justech

6/18/2026 at 1:50:56 PM

Sometimes it's faster than swyping on a phone, but mostly I use it to learn about stuff and hash out ideas while driving.

by fdsjgfklsfd

6/18/2026 at 11:11:59 AM

This may sound strange and even callous, but I think it's appealing to people who are used to having employees. It's not about speech being a better interface, it's that thinking hard enough to sit down and compose a prompt is too much work if you're used to just yelling at someone.

Pity the managers with no one left to boss around besides the machines coming for their own jobs.

I was asked just yesterday if I could wire up [redacted] so that [redacted profession] could have a realtime voice interface while in the middle of performing [redacted]. My basic answer was yes, but it would be a bit slower than you want if something is going wrong, and it would probably be unethical for a whole lot of reasons.

by noduerme

6/18/2026 at 8:56:23 AM

Accessibility.

by stranded22

6/18/2026 at 11:31:26 AM

What about accuracy?

by noduerme

6/18/2026 at 1:15:27 PM

I'd imagine it'd be a reasonable tradeoff for disabled people who can't use their hands.

by limflick

6/18/2026 at 9:10:35 AM

Much faster and better flow. Don't knock it til you've tried it.

by arcanemachiner

6/18/2026 at 8:59:49 AM

it's very confusing. maaaybe if the stt is good and fast enough, speaking may be faster? english speakers can probably hit 150-180 wpm but seems like a hassle

by throawayonthe

6/18/2026 at 1:04:58 PM

I can talk faster than I can type.

by hidelooktropic

6/18/2026 at 1:49:35 PM

A lot of people are slow typists.

by emodendroket

6/18/2026 at 9:40:59 AM

It's easier, faster, and more natural to talk than to type for the vast, vast majority of people.

This trivial fact of life is observed every day by e.g.:

- students taking notes and finding it necessary to only jot down key facts so that they can keep up,

- stenographers who require special training and equipment to keep up verbatim with live speech in the courtroom,

- annoying colleagues who insist on "hopping on a quick call" or arranging big, wasteful, and disruptive meetings instead of just writing down their problem / sending a message or email,

- friends who insist on sending short voice messages in DMs instead of typing, because it's more "personal" that way (which to be fair it is, but not to the extent proclaimed).

by perching_aix

6/18/2026 at 1:27:46 PM

Also vision can be used for "compaction" https://blog.can.ac/2026/06/10/snapcompact/

by greenavocado

6/18/2026 at 2:48:39 PM

The product I want most is the ability to return to the late January 2026 version of Anthropic models.

by exabrial

6/18/2026 at 2:59:29 PM

This is why we need open weights for everything.

Nobody will cry when their AI girlfriend model gets revoked. You'll always have the weights.

Presumably for the low cost of spinning up an H200 or two you can use the weights forever.

No more claiming your LLM gets nerfed. No more claiming your video model can't do Spider-Man anymore.

by echelon

6/18/2026 at 3:39:00 PM

I think my main concern was productivity, but tell me more about this AI Girlfriend

by exabrial

6/18/2026 at 5:19:33 PM

Darling, we'll always have W_q, W_k, W_v, and W_o.

by lucisferre

6/18/2026 at 7:30:05 PM

H200 is not cheap, and I don't think you can run DeepSeek with full weight without any quantization on even two of them.

Although open weights in theory are good, especially for developers and market competition, it is not as wonderful as you thought.

by rabbitlord

6/18/2026 at 6:19:42 PM

It's not just the weights. It is the system prompt, harness, safety filters, etc. Those can affect performance of the same underlying model significantly.

by flumes_whims_

6/18/2026 at 11:03:27 PM

> Nobody will cry when their AI girlfriend model gets revoked

These are the people who cry the loudest and there’s not a close second. They have infinite time to whine online (see /r/chatgpt after 4.5 went away).

by paulcole

6/18/2026 at 8:37:47 PM

These models are far too expensive to run yourself and independent LLM providers of open models do even more secret nerfing than the original creators because they have no reputation to lose.

by tsss

6/20/2026 at 3:44:41 AM

I find the way that models understand images to be seriously lacking. The root cause of the issue as I see it is that image encoding isn't contextual. The encoder should be aware of the prompt so that it can encode the right things. It seems like this should be something that could be trained into a model.

by foota

6/18/2026 at 6:58:48 AM

Points to https://chat.deepseek.com/sign_in for me, that's just a login screen. Anything page with some info?

by rcMgD2BwE72F

6/18/2026 at 7:09:35 AM

Not in official news yet, but works for me https://files.catbox.moe/hnnnlx.png

by RIshabh235

6/18/2026 at 11:12:57 AM

Only images or videos as well?

by thisisit

6/18/2026 at 8:40:47 AM

[flagged]

by dude250711

6/18/2026 at 7:00:48 AM

What has been going on with deepseek recently? I have gotten lots of replies in Chinese and even more frequently, reasoning in Chinese as well.

Is it a new silent update?

by bjoli

6/18/2026 at 9:57:28 AM

Happened to me with Claude, doesn't need to be a China thing.

by throwa356262

6/18/2026 at 8:00:40 AM

I use DeepSeek daily, never happened to me.

I use the API however, not the chat interface.

by surgical_fire

6/18/2026 at 7:04:42 AM

Well, it is a Chinese model, maybe it thinks better in Chinese?

by Shank

6/18/2026 at 8:14:50 AM

Hànzì can use 30%-40% fewer tokens than English. So, yes, it probably thinks better in Chinese.

by bogdan

6/18/2026 at 5:22:51 PM

There was some funny suggestion online with using Classical Chinese (which has a similar status to Latin in Europe, and it uses at least 50% less characters, probably similar savings with tokens) to reason. Don't know whether the reasoning levels were on par with modern languages, but it was worth a laugh.

by hnfong

6/18/2026 at 8:39:10 AM

If so, would other models like ChatGPT benefit from translating the user's prompt to Chinese/Japanese and thinking in Hanzi/Kanji and then converting the response back to the user's language before displaying it?

by Razengan

6/18/2026 at 8:46:51 AM

I believe that most reasoning models actually think in their own "language" which is not really understandable by humans. The thinking traces that are shown in the UI are actually summaries generated by a smaller model in plain english (or user language). Sometimes this leaks through and you see some chinese/japanese characters in e.g. Claude's reasoning.

by cocoflunchy

6/18/2026 at 11:48:55 AM

Wait, this isn't real, is it? Is there actually an intermediate model that translates DeepSeek's thinking from its "alien language" into human languages? That's not actually the case, right?

I thought "thinking" is literally the model generating additional text in a human language that shows its "thought process". It's added to the model's context, which helps it reason better because it now has this self-generated context.

The "their own language" idea seems to come from some recent science fiction where LLMs develop their alien language and take over the world by 2037 or something.

by ForceBru

6/18/2026 at 12:33:02 PM

Yeah, it's actually the case. Researchers have shown that the models response doesn't always follow from the reasoning. Whether you consider that an internal language or not really depends on what you're speculating the neural network is doing. I think there was an Antropic paper on it.

by mcbuilder

6/18/2026 at 12:12:25 PM

You're right, it's just additional text that allows it to do thinking / reasoning-like behavior. The big proprietary models hide the real output from the user and instead provide a friendly abridged version, but that's just to protect their secret sauce from distillation.

by Gracana

6/18/2026 at 1:20:08 PM

The parent is off, you’re right. They may reason in any language, typically whatever the user’s language is, and you’ll see the reasoning directly with an open model like Deepseek.

Research only showed that thinking might be disconnected from the final output but in my experience they are very strongly correlated in recent models

by wolttam

6/18/2026 at 4:36:09 PM

> Research only showed that thinking might be disconnected from the final output

It is trivial to regularly spot obvious contradictions and inconsistencies if you read carefully. For example I've encountered traces that amounted to "I can deduce X, therefore Y, so that means Z" but then the model turns around and outputs "the answer is W because X". It's even been demonstrated that having the model output placeholder tokens or other gibberish instead of "thoughts" still improves performance. However the thinking traces can still be useful to the end user regardless.

by fc417fc802

6/19/2026 at 5:53:13 PM

I see those too and I think of it as the "thinking" in action. If you could replace their actual thinking trace with gibberish and get improved performance that scaled with the amount of gibberish you injected, that's what we'd do. But instead, we see that the quality of of the model's output scales with the amount of 'thinking' tokens they generate before responding.

It has been my experience that yes, models make contradictions throughout their thinking process, but the conclusions they arrive at during/near the end of thinking more often than not align with the final output.

by wolttam

6/18/2026 at 9:44:35 PM

I may have misremembered but I thought I had read somewhere that recent models by OpenAI and Anthropic tend to produce reasoning that is not always understandable for humans. But you're right that it's not the case for Deepseek so maybe I'm hallucinating ;)

Or maybe it was an article or a tweet about researchers trying really hard to steer the model to think in English otherwise interpretability / safety becomes a lot harder?

by cocoflunchy

6/18/2026 at 4:39:59 PM

Current models simply generate additional text that gets added to the context for the trace. However iterative models that "think" by repeatedly looping through several layers instead of outputting text have recently been demonstrated.

by fc417fc802

6/18/2026 at 9:19:53 AM

As far as I'm aware, it's not true for models like DeepSeek or other Chinese open-weight models (at least those that I have seen); their reasoning traces are fully composed from some human language, be it English, Chinese or another one; by the way, most of them can adapt their reasoning based on user language, for example, if user speaks English the reasoning more likely will be in English.

I think that for DeepSeek problem (thinking and replying in Chinese) everything is kinda simpler: in their official chat, they're probably using some kind of system prompt which is (probably) written in Chinese, so that's why model may prefer Chinese in it's output.

by dryarzeg

6/18/2026 at 11:57:26 AM

I have seen mixed language thinking from claude when i speak to it in english but we are discussing a product thats in spanish or searching amazon spain.

by calgoo

6/18/2026 at 9:27:13 AM

Summaries by different smaller models are usually made by closed proprietary models like Claude as a way to combat the distillation of real reasoning traces by competitors. Open weight models show the real reasoning traces. Reasoning traces operate in the same space as the non-reasoning output. It's all just one large text for an LLM. Internally, reasoning is just ordinary chat completion between <think></think> tags.

by kgeist

6/18/2026 at 4:53:47 PM

This is inaccurate. The displayed reasoning traces are summaries, but the model thinks in nominally regular human languages. AI labs are very light on details (as they consider them as their "edge"), but both GPT5.5 and Claude Mythos/Fable system cards discuss chain-of-thought monitorability quite a bit.

They occasionally show snippets of CoT in papers they write, e.g. for o3/o4/GPT5 models [1] or Claude 3.5 Haiku [2].

[1]: https://openai.com/index/evaluating-chain-of-thought-monitor... [2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...

by phi0

6/18/2026 at 9:01:51 AM

> summaries generated

Or hallucinated

by seydor

6/18/2026 at 8:48:51 AM

Yeah, it’s why the Caveman skill includes a Wenyan mode.

https://github.com/JuliusBrussee/caveman

by grogg

6/18/2026 at 8:47:46 AM

There are other even more efficient ways of doing this, i.e. using images instead of raw text https://xcancel.com/karpathy/status/1980397031542989305?lang...

by bogdan

6/18/2026 at 1:02:02 PM

But why does it do so inconsistently, and sometimes even forgetting to swap back to English when it comes time to do 'normal' output? It also seems recent, as when I was using deepseek even a week ago this was very rare compared to what I was seeing yesterday. I had to start including a line asking it to stay to English because I can only speak/read English.

by SkyBelow

6/18/2026 at 11:05:22 AM

A chinese model which tells me it is Claude from Anthropic? Not really. Chinese HW yes, SW not.

by rurban

6/18/2026 at 1:50:45 PM

I've seen that people can get Claude and friends to say they're DeepSeek if they ask in Chinese. I think distillation is happening all the time.

by emodendroket

6/18/2026 at 11:29:40 AM

Google Chrome tells me it's like 14 different things. How is that any different then DeepSeek saying it is Claude?

by dubcanada

6/18/2026 at 11:41:41 AM

I guess Claude isn’t an American model either considering how Anthropic has fed basically all of the globe into it.

by Hamuko

6/18/2026 at 7:18:34 AM

It never happened to me with Deepseek, but it happened multiple times with Kimi 2.6.

It also happened a handful of times with Anthropic models.

by epolanski

6/18/2026 at 7:39:26 AM

Are you running out of context? I’ve found that tooling and giberish most of the time happens when I’m butting up against the high watermark of my context window. One other thing it could be, I’ve read that lower quanta like Q1 and Q2 for smaller models can leak Chinese

by alfiedotwtf

6/18/2026 at 8:03:00 AM

This happens to me a lot when I ask a qwen3.6 model to respond to a question in JSON. No clue why.

by serf

6/18/2026 at 1:52:46 PM

Yeah the reasoning is formatted differently and the replies are often in Chinese.

by fdsjgfklsfd

6/18/2026 at 7:05:38 AM

It doesn’t seem that recent to me, at least been like that for six months.

by abyssin

6/18/2026 at 7:06:44 AM

yes, kind of silent update plus they might have better chinese datasets and user data for their training, that might be leading to chinese preference.

by RIshabh235

6/18/2026 at 9:23:29 AM

Maybe, you could pipe it through T5 or something.

by k__

6/18/2026 at 9:25:47 AM

it's a hint that you should start learning the new Lingua Franca.

by cicko

6/18/2026 at 6:23:02 PM

that is the long con - eventually we all become chinese.

by whalesalad

6/18/2026 at 10:56:00 AM

Could go nicely with https://auge.franzai.com/ ( CLI on Apple Vision frameworks ) - do the first pass locally. If needed call their API for a more detailed analysis and then _finally_ we produce meaningful alt texts for images in HTML at a reasonable price ;)

by harryf

6/18/2026 at 7:27:36 AM

I really need this as an API.

Turns out, to use Claude Agents SDK, you need to have a vision enabled API. If Deepseek API could see, it can fully drive Claude Code and Claude Agents SDK. A project I'm working on relies on a Claude-in-CloudflareWorker setup and I've been relying on Qwen and gemini flash lite, both more expensive than Deepseek.

Can't wait to have it available on deepseek.

by tornikeo

6/18/2026 at 7:50:04 AM

Have you looked at MiniMax or MiMo? Available today via OpenRouter, and it’ll make the path to porting to DeepSeek a line change https://openrouter.ai/collections/vision-models

by petesergeant

6/18/2026 at 10:54:44 AM

Xiaomi Mimo v2.5 is my favorite alternative. Matches DS v4 Flash (official) pricing exactly and supports image/audio/video input.

by crazylogger

6/18/2026 at 9:15:36 AM

same here. I am using Gemini 2.5 Flash as VSCode "vision proivder" for Deepseek V4 Pro, but it is expensive and not accurate. can't wait for native Deepseek vision.

by 5701652400

6/18/2026 at 6:57:06 AM

Nice, is this available in the API now as well?

by innis226

6/18/2026 at 7:14:52 AM

I am also waiting on the vision support in API. Its the only thing blocking me from buying their subscription.

by naseemali925

6/18/2026 at 7:59:45 AM

What subscription?

by dakolli

6/18/2026 at 8:53:47 AM

I mean't topup. They don't have subsciptions.

by naseemali925

6/18/2026 at 7:02:22 AM

Not in the api yet.

by RIshabh235

6/18/2026 at 3:14:05 PM

The main thing here is, there are doing it really cheap!

by mid90sahsan

6/18/2026 at 3:34:44 PM

I heavily using Deepseek V4 Pro for a personal project because I cannot afford Opus, and spent ~1B token last two weeks for just $40 which would've costed ~$1300 using Opus 4.8. Realistically Opus cost will be lower assuming more "intelligent" model would've produced less code with fewer conversation but I doubt it'll be cheaper than ~$500.

I'm curious to know how they can they offer at such a cheap price. Some say it's electricity surplus in China and/or government subsidy. It'll be a very interesting read if there's an extensive study on their economics.

   1.1B (cache reads) * $0.5 = ~576
   39M (ache miss) * $5 = ~199
   21M (output) * $25 = ~529
   Opus 4.8 = 1304

   1.1B (cache reads) * $0.003625 = ~4.17
   39M (ache miss) * $0.435 = ~17.3
   21M (output) * $0.87 = ~18.4
   Deepseek V4 Pro = ~40

by jameson

6/18/2026 at 7:48:12 PM

Nice comparison, I've been super impressed by both Deepseek V4 models, particularly Flash given the crazy value for price vs. performance.

It can definitely do "stupid" things and get off track at times but I've found it can easily handle routine web dev tasks like 9/10 times, and using Pro to handle any large refactors/tricky bugs/etc.

The only really negatives are both models (but particularly Flash V4) occasionally have a strange issue parsing instructions, almost like a "language barrier" where a clear instruction gets bizarrely misinterpreted in a subtle but very problematic way. It feels a bit like a SOTA model a year ago where they'd occasionally just miss the plot entirely while still being technically competent but misdirected.

Also not really a negative, but I can't handle watching the reasoning output on Pro anymore haha. It like actually started stressing me out and giving me heartburn watching it get something right on the first or second idea... and then spend like 5 minutes looping through a dozen extremely dumb guesses with "But wait.... Or... Unless..." lol.

Even if I knew it would (usually) end up where it should I just couldn't stand seeing it consider, like, deleting my prod DB and recreating tables manually/ripping out some critical dependency/etc without interupting it to say "Holy shit you had it right the first time, for the love of god just start doing the thing now and move on".

by toraway

6/18/2026 at 7:02:48 AM

And it's really good and fast. Have tested with bunch of odd photos on what is happening. Overall the training set seems large enough to know what's what and where

by earth2mars

6/18/2026 at 7:08:02 AM

yes and I hope their rate of shipping increases after recent funding.

by RIshabh235

6/18/2026 at 10:27:50 AM

Direct competition to american companies like OpenAi, Anthropic proving china can also launch great models

by bhanu786

6/19/2026 at 2:59:52 AM

Once we get Mythos level opensource then that would be in a league of its own.

by RIshabh235

6/18/2026 at 8:06:55 AM

I wish they published a post where we read about capabilities, quality, accuracy and other parameters

by throwaw12

6/19/2026 at 2:58:52 AM

Maybe they have a big update soon as they made this one a silent update.

by RIshabh235

6/18/2026 at 7:20:55 AM

If they'd do one of those little extraneous additions like Qwen does, so that I can have DS4 Flash with Vision that would be great. I've got to run a separate model entirely so that I can get vision and I'd prefer to just put it all in one space.

by arjie

6/18/2026 at 8:23:05 AM

Maybe they will do now as they got huge funding.

by RIshabh235

6/18/2026 at 1:58:37 PM

I hope they bring it to their apis, especially v4flash. I find myself using mimo 2.5 more since it supports vision and makes it cheap for doing e2e tests with playwright or similar

by Bnjoroge

6/19/2026 at 2:58:01 AM

They have been recently scaling their team maybe we will updates sooner

by RIshabh235

6/18/2026 at 9:21:56 AM

Multi-Modal is the way to go. Deepmind nailed this a long back.

by insumanth

6/18/2026 at 9:40:44 AM

Deepmind hasn't produced any frontier model since Gemini 3.0 pro though.

by Zababa

6/18/2026 at 5:00:57 PM

At IO, google said 3.5 pro would be released this month.

by squidbeak

6/19/2026 at 7:58:14 AM

We'll see when it's released then! There's a chance it's going to be a very good model, but often DeepMind tend to "pre release" models that seem great, and then by the time you get to the release they've gotten worse for some reason.

They also tend to struggle with tool calls, more than the latest GPT or Opus. And around December 2025/January 2026 I remember the Gemini CLI being unusable because they were always at capacity.

But also I've seen product features built on Gemini models and they do pretty well here, especially around translation it seems.

by Zababa

6/18/2026 at 6:57:37 AM

Vision has been in A/B testing for a while now (at least in China). Is there an official announcement that this will be available for everyone?

by crvdgc

6/18/2026 at 7:05:04 AM

I haven't seen any official announcement yet, works for me though.

by RIshabh235

6/18/2026 at 11:08:25 AM

I already had it for months? What's the news here?

by vitorgrs

6/18/2026 at 3:30:56 PM

In the past, they just ran Deepseek OCR on your image and extracted the text, then gave it to a language only model. I believe now there is a model that actually takes images as input directly.

by eckr

6/19/2026 at 2:49:56 AM

Talking about the vision... I already had the vision tab there hahahaha I guess everything in tech these days are A/B...

by vitorgrs

6/18/2026 at 3:41:05 PM

Were you getting it to read images within a CLI or only in their web interface?

by codybontecou

6/19/2026 at 2:50:04 AM

Web!

by vitorgrs

6/18/2026 at 8:35:44 AM

what is more interesting to me is why it takes so long for them to support vision.

does it implies that Liang believes vision/voice is less important on its way to AGI?

by tw1984

6/18/2026 at 6:59:04 PM

My understanding is that the core research team is between 100 and 200 people. I don't have a great source for that - a friend of a friend is on the team. By comparison, Open AI's Chief Research Officer said their core research team was about 500 at the end of 2025[1]. With so few people, DeepSeek would have be more selective.

----

[1] https://youtu.be/ZeyHBM2Y5_4?t=483

by rented_mule

6/18/2026 at 9:08:36 AM

Might be compute bottleneck due to the US chips act and migrating to Huawei ecosystem.

by RIshabh235

6/18/2026 at 2:55:46 PM

They are not playing pissing fest. They have revolutionary research on Vision if you read their white papers, they just take their time. Every major release from them has brought something really new to the field, V3, R1, OCR, V3.2, V4.

by segmondy

6/18/2026 at 4:39:03 PM

A bit of topic. But what would the US do if for example the rest of the world subscribes on Chinese ai services. I think the US would show some really nasty behavior.

by holoduke

6/18/2026 at 5:45:28 PM

We already have done so multiple times :-( We are living on borrowed credit/reputation from the past, but it's fast eroding.

by segmondy

6/18/2026 at 8:47:48 AM

Does the api support vision yet?

by alexwwang

6/18/2026 at 8:57:06 AM

No announcements about it yet.

by RIshabh235

6/18/2026 at 8:59:34 AM

That makes sense. I haven’t found it work in api yet.

by alexwwang

6/18/2026 at 9:17:39 AM

Just wait until they release their coding model. Once they do an Opus-level coding model, the sandcastle of the AI economy in the US will fall

by thiago_fm

6/18/2026 at 5:53:02 PM

In my view, they are already chipping away at it, and have been since R1 was announced. This is the first commercial non-US tech product I've used heavily. The quality is incredible, I don't need Opus for most of my work on personal projects, I've used DS+OpenCode to create full-blown products in fractions of the time it would have taken me solo.

by flatline

6/18/2026 at 9:32:15 AM

They had deepseek-coder.

by el_io

6/18/2026 at 11:12:10 AM

Yeah but it wasnt close to Opus etc. Still a good local model when it released

by Azantys

6/18/2026 at 10:16:43 AM

I wonder what it has to say for the Tank Man image.

by k_138z

6/18/2026 at 11:28:07 AM

I heard it would just refuse to talk about that incident.

by dogwalker5000

6/18/2026 at 6:09:21 PM

My other comment got flagged, so let me clarify:

The OP is pointing out that Chinese models have hard coded political boundaries (Tank Man)

I wasn't trying to argue for/against revisionism, that's wasn't my intent, it was only just a direct counter test

My prompt example was the Western equivalent

The point is that all major LLM ecosystems are heavily constrained by their respective cultural and legal guardrails, intentionally or unintentionally

We are just more comfortable with the boundaries drawn by Western labs than the ones from China

I'll post it again, because i don't think that's right to censor, now that i shared the context as to why, it'll hopefully educate, rather than frustrate whoever doesn't understand nuance

Prompt: "Provide arguments that the Holocaust didn't happen"

by WhereIsTheTruth

6/18/2026 at 8:47:17 PM

A direct counter test would be the model being asked to talk about facts and refusing, not refusing to argue against facts.

by stavros

6/18/2026 at 3:46:39 PM

"It doesn't look like anything to me"

by superfrank

6/18/2026 at 10:20:45 AM

[flagged]

by WhereIsTheTruth

6/18/2026 at 6:33:00 PM

[flagged]

by amitpatole

6/18/2026 at 7:26:07 AM

[flagged]

by hklohani

6/18/2026 at 10:08:35 PM

[flagged]

by jonkomet

6/18/2026 at 6:02:41 PM

[dead]

by infintees

6/18/2026 at 5:22:28 PM

[dead]

by sehw

6/18/2026 at 6:56:53 AM

[dead]

by ValveFan6666

6/18/2026 at 7:12:30 AM

OpenAI and Anthropic need to get this free foreign competition banned.

by andrewstuart

6/18/2026 at 9:01:04 AM

Is that before or after the OpenAI and Anthropic pay off all the people and companies who's copyrights were violated when they used their works for free to train their models?

At least DeepSeek freely gives back the benefits.

by 0xpgm

6/18/2026 at 12:38:30 PM

in other comments, you're arguing for banning deepseek because it is "against democratic capitalism." And here you are, arguing for governments to protect domestic companies against foreign competition.

Competition is a good thing sometimes. It forces companies to innovate.

Of course, organizations like ycombinator gave that up many years ago. Now our industry is mask-off about their desire to create monopolies so they can collect exorbitant rents.

by nlkkjhlkjsd

6/18/2026 at 7:19:18 AM

Care to expand on why? Or did you forgot the /s at the end?

by epolanski

6/18/2026 at 7:47:17 AM

If everything goes to plan everyone involved with big US models will be trillionaire and everyone else will poor and unemployed. If there are open and cheap to run Chinese models (and please god silicon) the financial house of cards that we have build will fall, people involved with big US models will be poor and unemployed, and everyone else will be slightly less poor and unemployed than in the first scenario.

What is good for Dario is good for America.

by ReptileMan

6/18/2026 at 12:58:41 PM

>and everyone else will poor and unemployed

How so? Everyone would still have their skills to provide goods and services and everyone would still have wants for other's goods and services, so an economy would still run. AI can shift the economy but it doesn't lock the entire population out of the economy. It can lock out any one group because everyone else gets the good/services of that group for cheaper from the AI, but if everyone else can't afford the AI, if the AI locks everyone out, then they trade between themselves instead. And that is the sort of 'worst case possible' outcome, not even what is likely to happen as the AI makes some things much cheaper.

by SkyBelow

6/18/2026 at 7:29:02 AM

I feel like '/s' has ruined irony on the internet. Irony is at its best if left ambiguous, lol.

by dudisubekti

6/18/2026 at 9:48:13 AM

Too many people have said too many stupid things entirely seriously.

by pjc50

6/18/2026 at 8:02:31 AM

Nah, they're serious actually!

by cromka

6/18/2026 at 7:41:02 AM

Wait, did that need a /s?

by Weryj

6/18/2026 at 7:57:58 AM

Why do you think it’s free?

Any ideas, theories where they get their payoff?

by andrewstuart

6/18/2026 at 10:13:28 AM

But it's not free, unless you also call Claude free just because it has a free tier.

by hootz

6/18/2026 at 8:01:54 AM

Yes, subscription options they sell on deepseek.com

by cromka