Mistral OCR 4 | alt.hn

6/23/2026 at 3:34:56 PM

A tangential observation: the video on the linked page wasn't what I expected. I thought Mistral was a european AI company, so I didnt expect the video to be filmed in San Francisco featuring three people who don't seem to be european.

I'm not against them being a global organization, that's wonderful. I was just surprised. I expected a parisian office and european accents.

by andrewmutz

6/23/2026 at 3:42:40 PM

Unfortunately Europeans are terrible customers for making money. They ask a lot of questions and they're very stingy with their wallets. Americans on the other hand ...

by rjzzleep

6/24/2026 at 1:38:27 PM

> Americans on the other hand ...

Have realized most people are idiots and are willing to give away their and resources for a piece of paper if it has George Washington's face on it. So they've kept putting it on every piece of paper they could find.

Serious people keep pointing out that this is unsustainable, and will lead to the collapse of American society in weeks/months/years, but they've been saying that for decades and so far that hasn't happened yet.

by torginus

6/23/2026 at 4:42:59 PM

You're american?

by touwer

6/23/2026 at 7:49:03 PM

He's a prince of whales.

by megous

6/23/2026 at 8:50:47 PM

If he's a prince of whales, then I'm the king of eel-gland.

by qingcharles

6/23/2026 at 8:47:57 PM

This is absolutely not why there are no leading AI, other important silicon tech, or relevant space companies in Europe. To some degree they exist but are all B-Tier in comparison to US/China. You'd be surprised just how lose money can sit in Europe, I guess. Just not the way it needs to be for this.

The financial structure of the EU is nowhere close to enabling these capital devouring endeavors based on lofty future bets. Operating at a loss for years and years is simply unacceptable in European markets and the EU is not authoritarian enough to randomly divert capital based on political orders like China because the EU doesn't try to be a superpower controlling a hemisphere.

by neuronic

6/24/2026 at 12:49:11 PM

Have you raised venture capital? I have. It’s fine.

You don’t get US-level (or even Israel/China/Singapore) seeds, but often you get a matching public investment. Germany has better matching funds, but we did alright in Belgium, and previously Portugal.

Let’s stop pretending the EU can ‘fix’ this. It’s cultural, we’re simply not risk takers (on average) because a “normal” job comes with great benefits. Most of us don’t struggle to survive and have to “pull ourselves by the bootstraps”. The social security safety net protects you.

That’s all fine. We’re fine. No, we won’t lead on AI productisation, but we have AMAZING fundamental research going on at unis. I hired 2 such people in PhD+job setups, that part is also working fine.

Chill.

by port11

6/23/2026 at 11:46:53 PM

I think “Just not the way it needs to be for this.” is exactly the point.

by clickety_clack

6/23/2026 at 9:25:58 PM

[flagged]

by joe_mamba

6/23/2026 at 5:55:16 PM

Oh come on!

Mistral has a successful business model and is actually making money. Not sure opening and anthropic are doing that yet.

by throwa356262

6/24/2026 at 12:53:26 PM

> Anthropic generated $4.8 billion in sales in the first quarter. Its quarterly revenue is now growing faster than Zoom did during the pandemic, and Google and Facebook in the run-up to their initial public offerings. It is set to turn an operating profit of $559 million in the June quarter.

by port11

6/23/2026 at 4:11:54 PM

~Any borderline-large European tech company will have an office on the US west coast, for sales if nothing else. And probably sales engineering. The timezone difference is eight to ten hours; there is really no way around it.

(I did work for one which had an office in Vancouver, instead; same tz.)

by rsynnott

6/23/2026 at 4:20:50 PM

Mistral just hired as CMO a Seattle based former Amazon/Google VP¹ , so seems their US based presence is growing.

¹ The one locally famous for being sued by Amazon for non compete back when non compete were a thing: https://www.geekwire.com/2020/amazon-sues-former-aws-marketi...

by euio757

6/23/2026 at 4:28:28 PM

And US users spend much more than their EU counterpart

by madduci

6/23/2026 at 6:59:42 PM

Another company like this is Blackmagic Design. Despite being overwhelmingly based in Australia, you'd think it was an American company based on office listing ordering on https://www.blackmagicdesign.com/company/offices and /company page.

by mbo

6/23/2026 at 10:56:37 PM

...I had no idea Blackmagic was australian! Thats wild, but maybe explains why their tech is so prevalent here haha. Well that and its all pretty excellent and great value (at least in the indie scene I was a part of)

by girvo

6/23/2026 at 4:14:45 PM

To the best of my knowledge, most of the founding team started their careers in the US ( meta,etc..) and their primary investors are US VCs. In that regard, they smartly benefit on both side : US funding and European brains

by flashfaffe2

6/23/2026 at 7:49:11 PM

Uhm... isnt mistral mostly funded by ASML? A dutch company?

by pineaux

6/23/2026 at 9:33:48 PM

No, it's just one of the investors/customers.

by joe_mamba

6/23/2026 at 4:48:15 PM

There is even like an american flag flying high in the background

by dominotw

6/24/2026 at 3:33:02 PM

Gotta pretend so they buy your stuff

by weezing

6/23/2026 at 6:11:33 PM

I’ve always thought the US Postal Service is such a technological marvel. They somehow manage to identify and route billions of pieces of mail and I have to imagine their tech is significantly more primitive than this. Not only that but US addresses are absurdly non-standardized, you can often write the same address multiple ways and have it deliver to the same location. I’m sure there’s plenty of published knowledge in this area, but whenever I see announcements about OCR it feels like this should be a solved problem if it’s been accomplished at the scale of USPS for many years.

by ericyd

6/23/2026 at 9:26:13 PM

My father once received a letter from Algeria, with 3 words on the envelope : his first name, "Créteil" (the town where he lived, ≈100k inhabitants), and "France". Of course, in the 70s there was no Internet nor central database to find him, yet the postal service managed to deliver the letter. He was a very active social worker, managed a youth football team, etc. which made him locally well-known by his first name.

Nowadays, many people can't find anyone or any place unless their phone helps them. And postmen never stop to chat. Such a letter would not pass through the technology process, and probably not through the human network.

by idoubtit

6/23/2026 at 10:53:53 PM

> And postmen never stop to chat

I can say that at least where I live (Brisbane Australia's inner-ring suburbs) that's still the case: my postie is super friendly, loves my dog, and always has a couple of minutes to say hi!

And its great, because I live at a house number that is <number>A, and there is a <number>... but no <number>B or anything, which trips up a surprising amount of people/delivery drivers, so having a postie who cares helps

by girvo

6/24/2026 at 1:37:01 PM

sadly, over here in the netherlands and I assume elsewhere, the job as a postman changed drastically when they privatized. room for the human element has been all but squeezed out for profits long ago.

by mercer

6/24/2026 at 9:15:49 AM

Since the 70s the world population has almost tripled.

by TiredOfLife

6/23/2026 at 6:54:09 PM

I used to part time for the (Danish) mail service. The only sorting that was done automatically was the post codes. That was enough to get the letter to the right post office. The rest was done by the mailmen/women early in the morning. It was a lot of fun trying to figure out what was meant by some of the addresses. The older people in particular often knew the story of why certain places were sometimes addressed in certain ways, or could guess the addresses based on the names of the people living there.

by thomasahle

6/23/2026 at 6:15:43 PM

Great video by Tom Scott on this subject:

https://www.youtube.com/watch?v=XxCha4Kez9c

by alberth

6/23/2026 at 6:39:56 PM

haha this was great!

by ericyd

6/24/2026 at 4:53:37 PM

Fun fact: one of Yann LeCun's first 'deep learning' projects was on OCR for postal codes (MNIST).

by hikarudo

6/23/2026 at 11:14:13 PM

There's a lot of weird edge cases with US addresses. Carmel by the sea doesn't have street numbers. Florida keys addresses are often just a mile marker. The mail gets delivered because a human on the route is familiar with them.

by TurdF3rguson

6/23/2026 at 10:04:38 PM

>... US addresses are absurdly non-standardized.

Laughs in Indian addresses.

by keeda

6/23/2026 at 6:40:31 PM

IIRC the USPS was one of the first big budget orgs behind early OCR systems all the way back in 1965.

https://www.youtube.com/watch?v=V4LJs2ZoDR4

by vel0city

6/23/2026 at 7:48:38 PM

by shepherdjerred

6/23/2026 at 9:20:27 PM

USPS can also make a list of every address, that's not possible generally with freeform text

by NegativeLatency

6/23/2026 at 6:54:56 PM

The USPS Remote Encoding Center in Salt Lake City examined 841,260,847 images of poorly written addresses in fiscal year 2025. [0]

Unfortunately the page does not have a base rate--the total number of mail pieces that were not prepared for automated processing. Total first class mail, which includes a lot of bills prepared for automation was 25.7 billion [1]. If 10% of that are non-automated, then .8 / 2.57 = .31 or a third of mail not prepared for automation is handled by "employees look at the image and type in address information"

0. https://facts.usps.com/remote-encoding-center-rec-decipherin...

1. https://about.usps.com/what/financials/10k-reports/fy2025.pd...

by adolph

6/23/2026 at 8:54:49 PM

I can't help you with any of those questions... but back in the 90s I use to be one of those employees that looked at the image on the screen and typed the address information in Salt Lake City.

Quantitatively, I don't know the stats, but qualitatively I can confirm it felt like a lot.

by sbuttgereit

6/23/2026 at 9:47:55 PM

I’m not totally sure what your point is, but my response is that most OCR technology is reading “automated” (i.e. computer-printed) documents such as PDFs and things like that. So I think parsing the numbers by “automated” vs “non-automated” is not a very helpful way to think about the success of USPS OCR technology; the gross percentage of manual reviews compared to total mail volume is a much better way at looking at the success of their OCR. That’s my perspective anyway, but maybe commercial OCR is really optimized for reading handwriting and I’m just not aware of it. I’m not an expert in the area.

by ericyd

6/23/2026 at 10:56:17 PM

My response was to your thesis: "whenever I see announcements about OCR it feels like this should be a solved problem if it’s been accomplished at the scale of USPS for many years." I think the USPS has "solved" much of the problem by getting the most prolific generators of mail to conform with more basic tech than OCR, the bar code.

Much commercial mail (including first class non-junk mail) is physically presorted and bundled as it is dropped with USPS and has a bar code that states the routing needed. Stuff that has had OCR performed by computer or human gets a little sticker near the bottom with the barcode.

The barcode is applied by the sender; the Postal Service required use of the Intelligent Mail barcode to qualify for automation prices beginning January 28, 2013. Use of the barcode provides increased overall efficiency, including improved deliverability, and new services.

https://en.wikipedia.org/wiki/Intelligent_Mail_barcode

by adolph

6/24/2026 at 11:52:10 AM

Ah that is an interesting point!

by ericyd

6/23/2026 at 8:22:09 PM

Well I guess a lot of humans are involved as well, so that helps

by polyterative

6/23/2026 at 8:39:00 PM

Also the census, literally hundreds of millions of hand-written forms that have needed to be processed. Which was the foundation of MNIST!

by jubilanti

6/23/2026 at 2:46:11 PM

It'll be interesting to see how this ranks against https://github.com/baidu/Unlimited-OCR

by mdrzn

6/23/2026 at 3:09:56 PM

Right, just announced https://x.com/BaiduAI_News/status/2069322806748410291

by cdnsteve

6/23/2026 at 4:27:56 PM

It's cheap at $4/1k, but I'm hesitant to even benchmark this one again since the previous versions were all "98% accurate based on internal benchmarks of 4 pdfs" and ended up falling short of almost everything else on the market [1].

Even in this one, they just report that OlmOCRBench and OmniDocBench have "known limitations" and that's why they report flagship numbers from their internal benchmark.

https://getomni.ai/blog/benchmarking-open-source-models-for-...

by themanmaran

6/23/2026 at 6:22:57 PM

True, same conclusion, but the few samples I tried showed some real improvements since dec 2025 version.

by coulix

6/24/2026 at 7:00:33 AM

That's extremely cheap. I wonder how the companies making a living out of this will survive. They certainly don't charge $0.004 per scan.

by menaerus

6/23/2026 at 6:04:02 PM

All AI labs really need to stop using truncated y-axes for benchmark bar charts...

https://mistral.ai/_astro/cm-engish_ZhlvoT.webp?dpl=6a3a94bd...

by beklein

6/24/2026 at 4:25:53 AM

No, they need to keep using truncated y-axes to increase the hype cycle.

by HDBaseT

6/23/2026 at 3:23:23 PM

Little on differences other than bounding boxes and double the price compared to their previous OCR v3 model from December - https://mistral.ai/news/mistral-ocr-3/ - other benchmarks were used back then.

by mcbetz

6/23/2026 at 5:11:13 PM

Tested with Malayalam, normal handwriting got accurate but a slight different style got detected as kannada. Have samples if required, which sarvam got done with 99% accuracy leaving one text error.

by sreekanth850

6/24/2026 at 9:05:18 AM

I am making (open) finetunes for malayalam and kannada (and bengali, gujarati, hebrew), and need someone to transcribe a few images for me. Could you contact me if you are interested in helping?

by deivid

6/23/2026 at 5:43:39 PM

I'm curious what's been your experience with Sarvam outside of Indic languages - Indian English (perhaps mixed with romanised indic verbiage) and also documents with complex layouts (figures, tables, etc).

I've been quite curious but hesitant about Indian offerings, particularly because they seem to be priced a little higher than what I would think they should be (I could be wrong and simply be misrembering though).

by civet_java

6/23/2026 at 8:17:24 PM

Sarvam is exceptionally tuned for indic languages we have more than 20 languages and it perform well for all in ocr. Iam yet to test with other languages. No any models come close for indic languages like sarvam. I saw they recently dropped price per page to 0.5 inr which is much cheaper. The only downside is the zip file based delivery.

by sreekanth850

6/23/2026 at 3:04:29 PM

"A note on out-of-scope use. OCR 4 is a document-understanding model, not a decision-maker. It is not intended for medical diagnosis, legal advice or judgment, high-stakes financial decisions, safety-critical systems, real-time/latency-sensitive processing, or non-document inputs (raw audio, video, etc.). "

Can't wait for the "oh so innovative" manager who will suggest during the next meeting "Ok... but what if WE used it for high-stakes financial decisions on non-document inputs like a photo from my phone?"

I guarantee you somebody on HN is going to comment about this "idea" next week.

by utopiah

6/23/2026 at 3:17:23 PM

Why would anybody do that you would simply get terrible results compared to dozens of other more capable models. It's for converting to text not answering questions. Just seems like you need some sort of weird angle to bring out an anti AI stance

by weird-eye-issue

6/23/2026 at 3:34:41 PM

I think his comment is referring to a scenario where a decision is made on financial numbers that are misrecognized. E.g. 9.0% actual is OCR’d as 90%

by alex43578

6/24/2026 at 12:42:21 AM

I don't think so

But anyways just a side note one way to help reduce these errors is if you pass in both the original image and the OCR'd text to the models that make the decisions

by weird-eye-issue

6/23/2026 at 6:24:59 PM

Guess you haven't met management yet. Clearly nobody should do that but that official warning is not going to stop them from trying.

by utopiah

6/23/2026 at 6:42:47 PM

All AI companies are working on models with specialisms. Which are really good at one task.

Mistral is just a bit more forward about this. I guess because they don't need/want to "wow" an audience with generalist user-facing tools (chat) that seem to be experts in everything (but in reality quite often will be a lot of such specialist models chained together).

Here, what you want, is really just a few python scripts away. Voxtral to turn your spoken prompt into text, piped into mistral large 3 with extra system prompts that creates a prompt for ocr and paths to files. It could do this in a loop to actually find those files. which you throw at ocr3, is pased back to misteal large 3 to interpret and turn into decisions.

This is common. It's rather uncommon, really, to build something like this using only one model for everything.

by berkes

6/23/2026 at 3:53:55 PM

“I delegated critical financial decisions to my OCR software, and you won’t believe what happened next.”

by leoc

6/23/2026 at 3:14:23 PM

Recently I tied OCR with Opus 4.8. (I know, not technically right tool for the job). All I needed to do was extract dates from receipts. It got about 20% of the dates wrong yet rated all as “high confidence”.

Should have probably tried a more OCR specific model

by Insanity

6/23/2026 at 4:19:28 PM

> All I needed to do was extract dates from receipts

Was this... not basically a solved problem like 30 years ago? I'm pretty sure the shareware OCR tool that came with a black and white scanner I had at one point would do better than 20% wrong.

by rsynnott

6/23/2026 at 7:10:14 PM

I don't know about Opus but I can tell you with Gemini the subscription product OCR is apparently not done by the model. It used a separate old fashioned OCR tool and gives bad results in my tests.

But with Gemini the API the model does do the OCR resulting in much better accuracy.

by staticman2

6/23/2026 at 3:15:39 PM

Opus is very good at OCR. Way better than the small 1-4B VLMs. If Opus failed, most likely those smaller models will fail as well.

by nik736

6/24/2026 at 2:26:33 PM

We use Qwen 7B parameter model for this in production. It works quite well.

by bongoman42

6/23/2026 at 3:26:37 PM

How long have you been testing this? Have you noted a large improvement? I tested Opus for this quite a while ago (maybe 4.5? Whatever was out about a year ago), and it performed quite poorly on my use case.

by MostlyStable

6/23/2026 at 3:30:06 PM

I have put together an internal benchmark on 1000s of business documents with weird tables, structure, etc. that I run on every relevant model release. Opus 4.8 performs very very well. But it is obviously overkill for the task (and expensive at doing so). I just wanted to respond to the OP.

by nik736

6/23/2026 at 3:46:19 PM

I'm assuming that the reason I didn't have good success rate is because it was not scanned documents, but photographs, and lighting conditions weren't always ideal. I think scanned business documents are a happy-case scenario in a way. (obv, you seem to run it against some complex documents, so that's impressive)

by Insanity

6/23/2026 at 4:58:27 PM

I’m curious what your findings are for the best model for your use case

by apawloski

6/23/2026 at 3:17:01 PM

I do not believe this story.

Opus 4.8 scanned hundreds of PDFs for me recently with the worst handwriting imaginable. 100% successful, other than one record where even I could not figure out what was written.

by bpodgursky

6/23/2026 at 3:25:50 PM

I do not believe this story, because of the message I just posted above.

That's not really productive lol, I'm glad it worked for you but these models are non-deterministic and 'YMMV' very much applies everywhere. I had it parse receipts (in fairness, in variable lightning), all taken from iPhone cameras in the past year. And yeah, not a great job, about 20% failed to get the date correct. (Not outrageously wrong, e.g 05/20/2026 becomes 05/23/2026.

YMMV, glad it worked for you.

by Insanity

6/24/2026 at 7:15:28 AM

Your experience appears in line with at least this leaderboard: https://arbitrhq.ai/leaderboards

by Barbing

6/23/2026 at 3:31:51 PM

Are you sure you weren't using Sonnet or a low-effort reasoning mode?

by bpodgursky

6/23/2026 at 3:39:26 PM

Yes, lol

by Insanity

6/23/2026 at 3:21:41 PM

I believe it. Makes me curious what your prompt was that got such a good result out of Opus.

by 9cb14c1ec0

6/24/2026 at 2:24:55 PM

I ordered it to not be lazy and to show me result 1 at a time rather than doing it in batch.

by bpodgursky

6/23/2026 at 9:20:51 PM

likely just opus being dumbed down to prepare for fable

by jokethrowaway

6/23/2026 at 4:36:51 PM

The comparisons rank it against GPT and Gemini but not Claude. Is Claude's vision support simply not competitive when it comes to OCR tasks?

by bastawhiz

6/23/2026 at 4:47:18 PM

I think until Fable, Claude's vision was significantly worse than GPT and Gemini in my personal experience. I eval almost every vision model since I work on screenshot to code conversion project: https://github.com/abi/screenshot-to-code.

by abi

6/23/2026 at 2:42:00 PM

I was processing 55 year old paper files, most of them severely degraded, with its predecessor model. I was very impressed! I also tried Abbyy Finereader but it didn't even come close in my experience.

by Ducki

6/23/2026 at 3:13:19 PM

I used Abbyy Finereader for several years. I loved it. I completed some large projects with it. Modern VLMs put classic FineReader to shame for processing low-resolution/degraded/non-standard text.

I'm personally using the small Qwen 3.5 models. If you have an OCR problem, Mistral OCR 4 is probably great. Open weights models that you can run on a laptop may also work great.

by philipkglass

6/23/2026 at 3:19:38 PM

This has been a niche where Mistral has actually been successful. Btw, Hindi and Japanese are bucketed in "Rare Languages," which is odd.

by pmxi

6/23/2026 at 3:26:17 PM

I read that as "languages under-represented in the training set".

by ZiiS

6/23/2026 at 9:32:47 PM

> On our internal multilingual evaluation, OCR 4 leads across all eight language groups — English, Western Europe, Eastern Europe, Middle Eastern, Chinese, East Asian, Southeast Asian, and specialized languages (Hindi, Japanese, Georgian, Bengali, Armenian, Hebrew, Greek, Gujarati, Tamil, Malayalam, Kannada, Telugu).

The initial version of this page called these "minor languages" (vs specialized language), which is telling. If you're a speaker of one of these: This is why you need a sovereign set of models. (Japanese government: Are you listening?)

by flakiness

6/23/2026 at 7:07:32 PM

Given this a test on some scans of magazines, generally pretty impressed with the results. Mags are generally pretty whacky layouts and it does a reasonable job working out what is where and pulling it together into a single coherent md file. The way it crops relevant pics and puts them into the doc is pretty nice.

Haven't compared it with any other high tech OCR estups, but it's way better than the jank that comes as standard with my scanner.

by remus

6/23/2026 at 3:12:09 PM

Do these models (this one or its competitors) do handwriting recognition?

by gpm

6/23/2026 at 5:30:33 PM

Yes, we've been using Transkribus for this extensively. My wife is a historian who spends quite a bit of time sorting through old letters and diaries, and it has been a considerable quality of life improvement.

Even if you are able to read someone's scratches, having a model to do the bulk lifting saves your eyes a lot of squinting. One thing that makes Transkribus useful for research vs a chat interface is that it can line up its interpretation alongside the original image so you can examine its work directly.

by thadt

6/23/2026 at 3:19:59 PM

Yes, we have successfully used Mistral OCR for digitizing handwritten forms. You always have low percentage that need human review and adjustment, but overall Mistral has been highly accurate (their price is amazing, too).

by 9cb14c1ec0

6/23/2026 at 3:25:40 PM

In the sense that you can get similarity scores for individual characters referenced against a known database of characters written by various individuals. You can get stylometry scores out of small LLMs that do demographic segmentation based on writing style using the same methods.

They won't have the capacity to be fed an image of handwritten text and say "Ahh, this is a note written by Winston Churchill!". You could very easily use these models and your agent framework of choice, like Hermes, the Segment Anything models, and other foss tooling to build a dedicated, specialist handwriting recognition system. Or facial recognition, or fingerprint recognition, etc - these sorts of things can be done very procedurally, without a lot of interpretive AI.

by observationist

6/23/2026 at 4:24:39 PM

I think OP meant converting handwriting to text, not identifying a person based on their handwriting style! (but that sounds quite interesting)

by varenc

6/23/2026 at 3:18:19 PM

If you mean handwriting to text then yes

by weird-eye-issue

6/23/2026 at 3:19:55 PM

Yep that's what I mean, thanks :)

by gpm

6/23/2026 at 5:37:05 PM

Not well tested. It switched all U.S. (") double quotation marks to UK-style (') single quotation marks, ignoring the source document. Useless in the US.

by JGB100

6/23/2026 at 4:06:36 PM

This runs for free on CPU https://github.com/kouhxp/textsnap

by mrkn1

6/23/2026 at 3:24:40 PM

Does anyone know of OCR benchmarks that include hand-written documents? I'm currently using Gemini pro 3 for this, and error rates are quite good, but it's a little bit pricey, and I'd be interested in a cheaper model that could perform as well, but almost all the OCR benchmarks I'm aware of (and I believe all the ones included in this announcement) are about printed/typeset text.

by MostlyStable

6/23/2026 at 3:37:02 PM

[flagged]

by jimmypk

6/24/2026 at 12:22:54 AM

Are there any open models focused on LPR (license plate recognition)?

I have found some old ones but curious if there are new ones being developed like this OCR model. I may even try it for the purpose and see if it does well.

by zhivota

6/23/2026 at 3:15:57 PM

Way too expensive. Google vision OCR (which they failed to compare against), is $1.50 per 1k pages. Vs $4 from Mistral.

by stri8ted

6/23/2026 at 7:30:51 PM

It’s not the same service. Google’s vision OCR is pure text extraction, not layout. Pretty sure Google’s doc AI services that can identify headers vs body text is $10 per 1k pages.

by cvdub

6/23/2026 at 8:18:26 PM

That’s true, though worth beating a dead horse to say that traditional OCR won’t hallucinate sentences, perform unwanted translation, or change the meaning of whole paragraphs to something more “appropriate”.

by anon373839

6/23/2026 at 4:22:50 PM

interesting - an equivalent Azure Document Intelligence service (scanning with layout) is 10$/1k

by kojoru

6/23/2026 at 3:02:16 PM

Are there benchmarks for how this performs on charts, or maybe more accurately, plots? I've yet to find a model that can digitize a plot into X,Y points with some accuracy in my use case of digitizing old datasheets.

by tdubey

6/23/2026 at 2:57:39 PM

1000 pages for $4? damn how does it compare to llama parse I wonder

by ge96

6/23/2026 at 3:26:37 PM

I was just using infinity parser 2 (flash, to be fair) for pennies self-hosted to run through thousands of pages of documents with remarkable confidence. I decided to use https://huggingface.co/datasets/allenai/olmOCR-bench to determine what was the best OCR tool, yesterday, but I've got no idea what the best is now. What is the dominant OCR eval right now? Between Baidu and Mistral this morning, I wonder if there's a new tool to switch to..

by aliljet

6/23/2026 at 3:35:11 PM

(jerry from llamaindex here) we're gonna benchmark on ParseBench and report the results!

by freezed8

6/23/2026 at 3:08:55 PM

Or Apples local OCR/Vision models?

by thenthenthen

6/23/2026 at 6:08:38 PM

Mistral keeps reminding us that doesn´t just brew great coffee, they can build great AI too. Hats off to the team. Mistral O.C.R. (Only Cool Results)

by trilogic

6/23/2026 at 6:27:32 PM

Naive question: is Claude no good at OCR? Was surprised to see that none of Anthropic's models were included in the benchmark comparisons.

by nickvec

6/23/2026 at 4:25:44 PM

I wonder how it does compare to reducto, pulse, extendai.

by coulix

6/23/2026 at 4:37:10 PM

Is there a complete list of the languages they support, and benchmarks by language, instead of just "Rare Languages"?

by Ninjinka

6/23/2026 at 5:40:58 PM

Why the chart crimes?!

by sscaryterry

6/23/2026 at 2:44:00 PM

Is there something wrong with their certificate? Chromium is saying https isn't valid

by jppope

6/23/2026 at 2:55:22 PM

Looks good to me on both brave (on android) and firefox (on windows 11). Lets see what ssl labs says (it is running now)

https://www.ssllabs.com/ssltest/analyze.html?d=mistral.ai&la...

Looks good so far, A+ on ipv4 as well as ipv6

Edit: I also asked Gemini 3.1 Pro to analyze the certificate and it looks good

It looks like you have shared an `about:certificate` URL containing a chain of three Base64-encoded X.509 TLS/SSL certificates. This specific chain is used to secure connections to *mistral.ai*.

Here is the decoded breakdown of the certificate chain you provided:

## Certificate Chain Overview

This is a standard three-tier certificate chain issued by Google Trust Services for the Mistral AI domain.

---

### 1. Leaf Certificate (End-Entity)

This is the specific certificate issued to the website to verify its identity and encrypt traffic.

* *Subject (Common Name):* `mistral.ai` * *Subject Alternative Names (SANs):* `mistral.ai`, `workers.mistral.ai` * *Issuer:* WE1 (Google Trust Services) * *Valid From:* June 13, 2026 * *Valid To:* September 11, 2026 * *Key Type:* Elliptic Curve (ECDSA)

### 2. Intermediate Certificate

This certificate acts as a bridge between the website's certificate and the trusted Root CA.

* *Subject:* WE1 (Google Trust Services) * *Issuer:* GTS Root R4 (Google Trust Services LLC) * *Valid From:* December 13, 2023 * *Valid To:* February 20, 2029 * *Key Type:* Elliptic Curve (ECDSA)

### 3. Root Certificate

This is the foundational trust anchor pre-installed in browsers and operating systems.

* *Subject:* GTS Root R4 (Google Trust Services LLC) * *Issuer:* GTS Root R4 (Self-signed) * *Valid From:* June 22, 2016 * *Valid To:* June 22, 2036 * *Key Type:* Elliptic Curve (ECDSA)

by collabs

6/23/2026 at 3:36:25 PM

thanks I'm going to have to check whats going on with my setup then

by jppope

6/23/2026 at 4:35:47 PM

Not opensource right?

by v3ss0n

6/23/2026 at 5:25:52 PM

The weights do not appear to be downloadable, "contact sales for self hosting"

by verdverm

6/23/2026 at 4:51:14 PM

starting y axis from 50 and 95 is a bit mileading

by dominotw

6/24/2026 at 1:23:04 AM

[flagged]

by dagni132

6/23/2026 at 6:37:39 PM

[flagged]

by vasylvd

6/23/2026 at 3:22:48 PM

After paying for Mistral and using it for a while I genuinely hated it. It's a productivity black hole and can't realistically compete with anyone. I chose it only because it was European, but no. I'd rather let my one year subscription go to waste than use anything 'Mistral'.

by greenleafone7

6/23/2026 at 3:53:41 PM

Opposite advice. It's very useful to me for dev and general tasks.

Been using Claude in parallele, it's better not not that much, just 10x (or 100x ?) more expensive.

by maelito

6/23/2026 at 5:12:46 PM

Sure, well for me it isn't. It has been awful for even toy tasks that opencode's free plan did without an issue. The general sentiment about it is that it is really bad. I wish I knew before paying.

by greenleafone7

6/24/2026 at 8:35:00 AM

Well, you lost 17 $. I'm not being sarcastic, that's not a lot for dev tasks.

by maelito

6/23/2026 at 3:59:15 PM

Mistral's coding models aren't on par with current SOTA US and Chinese models if that's what you're referring to, but I rather like their OCR models.

by InsideOutSanta

6/24/2026 at 5:17:45 AM

I assume you mean for coding, for which I would agree. But for OCR Mistral is SOTA :)

by _kidlike

6/24/2026 at 7:06:14 AM

I am not totally following this area but the link from a commenter from above suggests that it is not SOTA but on the lower end (but still good): https://getomni.ai/blog/benchmarking-open-source-models-for-...

by menaerus

6/24/2026 at 7:37:42 AM

interesting... I have never seen such a benchmark before, where it forces the model to use a json schema for parsing a document. Sounds a bit counter intuitive to me, but I'm not that deep in the field. I usually just "ask" the model for information that is inside an image pdf. I haven't run any sophisticated benchmarks though...

by _kidlike

6/24/2026 at 7:53:36 AM

No, they don't force the model to use a json schema, they simply use the model to extract the data, and then they feed that OCR result into the pipeline further to evaluate the OCR results against the ground truth, and this is where JSON schema is used, and also another model (gpt4o).

by menaerus

6/24/2026 at 8:26:41 AM

aaahhh...

by _kidlike

6/23/2026 at 4:06:48 PM

> After paying for Mistral and using it for a while I genuinely hated it

For OCR?

by lxgr

6/23/2026 at 9:43:38 PM

Codestral is pretty good for coding autocomplete. possibly better than even Cursor's

by booi

6/23/2026 at 3:34:31 PM

what did you use it for and when?

by adlk

6/23/2026 at 5:11:10 PM

The armies of people desperate to defend mistral, scouring the internet for any of the hundreds of negative posts made about it daily is pathetic. There's a reason it needs 'fanboys' and 'defenders'... it sucks. Id have loved to use a European alternative, but Europeans need to get serious and actually offer an alternative that has value other than "it's trash, but it has a Made in Europe badge".

by greenleafone7

6/23/2026 at 3:44:08 PM

Same, I got a refund 3 days later. It is unusable.

by amunozo