alt.hn

6/28/2026 at 10:08:25 PM

Why frontier LLMs can't read the hard documents without experts involved

https://idp-software.com/news/the-76-percent-wall/

by chelm

6/29/2026 at 3:09:31 PM

> We do not sell IDP software and we are not paid by any vendor named here.

The vendor list contains your own product and you, as in the company you co-founded, pretty clearly sells IDP software. https://idp-software.com/vendors/konfuzio/

by tastroder

6/29/2026 at 3:15:15 PM

I can make this more transparent; it's the same issue that Parashift had, which ran https://intelligentdocumentprocessing.com/, which they terminated a month ago.

IDP is not a really sexy market. There are only a few people, who are working in the industry.

I do this in my free time to give small vendors a chance, as big corporates like Rossum, Abbyy, or Kofax (now Tungsten) just rule the market by their ads spent.

I can also make it closed source and ask for a fee to get listed as Gartner would do it in their IDP Magic Quadrant.

I did spend 1/580 of the time on the page konfuzio. Ok, true. And I spent 579/580 on the market. https://idp-software.com/sitemap.xml

by chelm

6/29/2026 at 3:22:47 PM

Are you high?

by HumanOstrich

6/29/2026 at 3:43:59 PM

168 meters above sea level

by chelm

6/29/2026 at 10:15:11 AM

a clearly LLM written piece about how frontier models are struggling to get past 76% accuracy on their benchmarks (they call it a "wall") in OCR tasks. that is, feeding it a picture of a document and asking it to extract the text.

The benchmark site is here https://www.idp-leaderboard.org/

They say some specialist models get better results on their benchmarks (Nanonets OCR-3 85.9%)

by RugnirViking

6/29/2026 at 10:32:41 AM

I linked your board already. You are right.

Do you know a benchmark that tries to measure the bussines accuracy.

Most benchmarks focus on the charackter level.

IDP Software typically uses metadata to map information that is either not readable or missing in the document, e.g. extracting the VAT and mapping the street, house number, cip and city.

I think there are many models and many providers. However, it's really difficult to measure the accuracy on a porcess not just on a character level.

https://idp-software.com/vendors/nanonets/

I saw that the leaderboard is hosted by Nanonets. Totally fine for me. So you might be the expert about Nanonets: Let me know if you want to update your post on my site.

by chelm

6/29/2026 at 10:59:39 AM

it's not my site, I have nothing to do with nanonets. the information was taken from the article.

by RugnirViking

6/29/2026 at 11:21:20 AM

tl;dr: years ago, Tesseract was the go to tool to extract text. Nowadays, vLLMs can not only extract the text and the layout but also context and provide structured data or even interpret or map data across documents. Prices dropped significantly, while extraction, classification and modification capabilities increased.

The intelligent document processing (a funny marketing term on top of OCR) market moves from "Can software extract the text", which is normally measured by benchmarks, to can software autonomously run "a" specific company process.

the fallback is called human in the loop, hallucination (LSTM vs. vLLM), prompt engineering.

proof me wrong: the hardest challenge is no longer the OCR accuracy but the integration and issue handling in production. Probably "an agentic team can handle this" ^^

by chelm

6/29/2026 at 2:22:14 PM

This is rather incoherent.

by HumanOstrich

6/29/2026 at 10:35:42 AM

I mean this is for handwritten OCR.. do humans do better?

I've been using Qwen3.6 to OCR stuff, primary receipts and it frequently accurately reads stuff on mangled/faded/folded documents that I have a hard time with... including handwritten stuff (though that's not flawless).

by nullc

6/29/2026 at 10:43:10 AM

ahahah, probably not. Looking at my own handwriting: Neither in writing nor in reading.

I find it interesting how the prompt changes the result.

If you let the model focus on the text, the open source got so good in the last year. That's remarkable. When you change to prompt to not only extract the text but also extract specific information, the pure text extraction result gets worse. For me, it worked to run two prompts on the same document to get both in a meaningful accruacy.

by chelm

6/29/2026 at 12:55:19 PM

[dead]

by rsfern

6/29/2026 at 1:42:59 PM

[flagged]

by madikz