Project Gutenberg – keeps getting better

5/15/2026 at 4:15:36 PM

Hi! I'm one of the programmers at Gutenberg. We've been improving the site a lot over the past few months (and more is coming!). If you haven't visited the page recently, it's worth checking out again: https://www.gutenberg.org/

by JSeiko

5/15/2026 at 8:14:20 PM

Have you considered having a detailed version history for each book (etext)? The process of submitting fixes to typos etc in books involves sending an email (https://www.gutenberg.org/help/errata.html) and although the last time I did this (2011) the fixes did get applied reasonably quickly (couple of days), it all felt a bit opaque. The version history could also include the project (usually PGDP correct?) the etext originated from; that way one would be able to compare against the actual page scans.

I have very mixed feelings about Standard Ebooks and would much prefer being able to use Project Gutenberg directly, but one good thing Standard Ebooks does is that every book has an associated git repository (on GitHub), so it's (in principle) possible to see a history of fixes to the text over time.

by svat

5/15/2026 at 9:26:59 PM

We're using git repos internally to keep history for each book. They existed on github for a while, but our implementation was awkward, and too big of project for the volunteer dev team. But it's likely that we'll evolve towards that.

by gluejar

5/16/2026 at 12:16:18 AM

> I have very mixed feelings about Standard Ebooks[…]

Why?

by marcprux

5/16/2026 at 9:10:24 AM

Not the GP, but I also have mixed feelings about Standard Ebooks. They modernise texts for American readers. This means changing the punctuation, merging some words, altering the syntax, etc.

When I read an old novel, written two centuries ago in England, the little differences to modern English are part of the charm, and I certainly don't want any Americanism mixed in. For one of my favorite novels, The Forsyte saga, the author deliberately used some rare forms of words, which SE replaced with the mainstream forms.

by idoubtit

5/16/2026 at 2:07:15 PM

SE editor in chief here. What you describe is incorrect. The only thing we do is very light sound-alike spelling modernization, like "to-night" -> "tonight". We do not do things like change from en-GB to en-US, replace old words with different modern words, or change text for "American readers", whatever that means. I have no idea where you got that impression.

I personally worked on the Forsyte saga. If you think something was done in error, please let us know and we'll be happy to fix it.

by acabal

5/16/2026 at 10:40:44 PM

I commented on this kind of editing several years ago:

https://news.ycombinator.com/item?id=16957359

The edit is still in place, and I still maintain that changing 'phone to phone in dialogue changes the meaning.

by mrob

5/16/2026 at 3:52:49 PM

> The only thing we do is very light sound-alike spelling modernization, like "to-night" -> "tonight".

Curious. Why even bother?

by natex

5/16/2026 at 9:39:45 PM

Guess: screen readers and such.

by bell-cot

5/16/2026 at 7:41:55 PM

One could argue that this falls into the previous poster's thought about "the little differences to modern English are part of the charm" ...

by tangledhelix

5/16/2026 at 10:37:43 AM

You may already be aware, but SE marks all commits making those kinds of changes as '[Editorial]', so it is generally trivial to use their tooling to build your own high-quality ebook without any of the editorial changes.

by jcurtis

5/16/2026 at 10:42:56 PM

When I tried this in the past, it was non-trivial because the editorial changes are mixed with the technical changes. Reverting the editorial changes broke the technical changes.

by mrob

5/16/2026 at 9:15:49 AM

SE sounds truly, truly awful. Thanks for making me aware of its existence so I can avoid it.

by AdamN

5/16/2026 at 3:25:47 PM

They're providing beautifully made ebooks for free...

The only thing they are is truly, truly wonderful.

by phaedrix

5/17/2026 at 5:13:55 AM

SE is an amazing and wonderful resource

by condwanaland

5/16/2026 at 7:25:52 AM

It splits the community and number of possible volunteer hours for one. It also splits the canon into different versions. More projects fight for the attention attention (and possibly donations) of the audience.

There are lots of reasons it could be preferable to centralize. OTOH their mission is limited and some competition is healthy, if only to explore alternative ways to do things.

by a2800276

5/16/2026 at 9:02:06 AM

It’s a different mission.

PG focuses on an accurate digital translation of the source material, sometimes hosting multiple different versions of the same text, and doing things like putting work into recreating the adverts at the back of some novels.

SE focuses less of preservation and more on making readers’ versions of the texts, like other publishing imprints. So there’s typography standardisation, a light-touch moderinisation of hyphenation and soundalike spelling, and things like author-wide collections of short fiction and poetry even if it didn’t previously exist.

Both are valuable, but they serve different segments.

by robin_reala

5/15/2026 at 8:45:54 PM

I believe our new-ish CEO Eric Hellman actually did some work on something very similar

by JSeiko

5/15/2026 at 8:24:29 PM

That's an interesting idea. not a small feat to accomplish though ...

by JSeiko

5/15/2026 at 5:59:28 PM

When I thought about Project Gutenberg I remembered that original brutalist non-design. The current site has been very tastefully updated but looks like it's still very accessible if you turn styles off. Great job!

by jefurii

5/15/2026 at 6:08:09 PM

sadly HN doesn't have a "heart" emoji I could use :D

by JSeiko

5/15/2026 at 11:41:32 PM

I like the design but liked the previous design as well, it was unique and Craigslistish, you knew what website you were visiting just by looking at it.

by ricardonunez

5/15/2026 at 6:16:12 PM

♡

by Wistar

5/15/2026 at 9:27:50 PM

<3

Less than three is a classic!

by ok_dad

5/16/2026 at 8:26:29 AM

Ess two is less than less than three, but also a classic.

s2 < <3

by agys

5/16/2026 at 8:24:32 PM

>When I thought about Project Gutenberg I remembered that original brutalist non-design.

I suppose a printed book, black ink on paper, is "brutalist" and unpleasant to look at?

The text of a book shouldn't be encrusted with format, your reader or browser should contain the presentation that you want to see, find appealing, or need (accessibility).

by fsckboy

5/16/2026 at 1:37:20 PM

The biggest lever: make the reading experience great. https://www.gutenberg.org/cache/epub/245/pg245-images.html is still hard to read: lines are tooo long (macbook), no great way for pagination/remembering where I was, notes

by eulerpoolapi

5/16/2026 at 3:31:20 PM

The ebook editions are very good for this. Most of the e-reader software provides all the amenities (bookmarks, highlighting, notes, control of margins, etc).

by tangledhelix

5/16/2026 at 2:02:16 PM

Firefox's reader mode works amazingly for these situations.

by SwampertX

5/16/2026 at 3:42:36 PM

A while back I attempted to extract the FF reader code to make it a front end to various non-web clients (email with pine key bindings etc)

I got it to a prototype level but then shelved it after having difficulty getting good results with various test datasets. Probably would make a fantastic ereader though

by drzaiusx11

5/16/2026 at 4:48:19 PM

Lines aren't too long. They look great on all my devices.

Use ⌘ + + until you get the line length you like.

by elch

5/15/2026 at 8:00:04 PM

Huh that's interesting: 4.5 seconds for the TCP handshake and an additional 9.2 seconds for the TLS handshake. Is this some kind of captcha, since most bots would disconnect before that, so if you complete it once then it knows you're good? (Until the bots catch on of course, but so long as it works it's relatively unintrusive and not discriminatory against uncommon client software (that is, non-Chrome/ium).) The rest of the requests were lightning fast

Edit: welcome to your first comment after 9 years on HN btw, nice to have you here!

by lucb1e

5/15/2026 at 8:10:13 PM

I think their site is just slow, potentially because more people than they are used to are trying to view it.

I was unable to load it initially (got an error from firefox) and had to re-attempt. Still slow if one forces a reload (shift-r, etc, to not use local cache).

by codys

5/15/2026 at 8:23:26 PM

we are having occasional lows in page speed performance due to LARGE amounts of bot traffic. full disclosure - we've not really been able to resolve this fully/well. Let us know if you have a good idea for how to deal with it

by JSeiko

5/16/2026 at 3:30:27 PM

How do you currently host everything? Your main web server should not be responsible for hosting content. All books should be hosted on mirrors, and clicking download should automatically select a mirror to download it from.

Furthermore:

* Make sure that all books are downloadable in bulk as torrents.

* Every day, generate a CSV file of all available books and their metadata. Distribute this so that bots and user clients can run queries locally, instead of using your search engine.

by uyzstvqs

5/15/2026 at 9:14:05 PM

Do you host a torrent?

I have about 50k of the books, I would have used a torrent of just the txt files if it was prominent.

by gropo

5/16/2026 at 6:56:32 PM

we have a tarball of all text files - link posted somewhere here

by gluejar

5/16/2026 at 12:20:10 AM

If it's purely bot traffic, then Anubis could help

You could have seen it on some websites already

https://anubis.techaro.lol/

by dimava

5/16/2026 at 3:42:11 AM

anubis only works against lazy scrapers, and at a cost to your users. I'd prefer people not use it.

Bot traffic comes from machines that usually have a lot of idle cpu (since they're largely blocked on network IO as they scrape a bunch of sites in parallel), so they can trivially solve the anubis "proof of work" challenge, save the cookie, and then not solve it again for that site.

The only reason scrapers don't solve it is if the developers were too lazy to implement it... and modern scrapers also do, codeberg stopped using anubis because modern scrapers were updated to solve it.

The "proof of work" has to be easy or else people on old cell phones couldn't access your site (since an old android phone would start to overheat and throttle trying to solve a challenge that would take a modern server even several seconds), and it also consumes your cell-phone user's batteries, which is a really precious resource for them compared to the idle cpu on a server.

by TheDong

5/16/2026 at 7:53:10 PM

Just to add to the two negative replies, I find Anubis to be the only system that doesn't ever get in the way. My browsers have Javascript enabled and, so far, it never took more than a fraction of a second to complete the checks

Every other system I've run into has constant false positives, e.g. Google captchas will sometimes say I've failed and make me do the hardest level (if it wasn't giving me that already), Cloudflare regularly thinks I'm a bot, Codeberg blocked me before, Github signup captchas used to take ~15 minutes to complete and then still said "well you failed, try again", Github's general rate limiting has false positives (some days I browse a lot, other days little, and on the little days it'll sometimes go "slow down" with no recourse whatsoever, you're just blocked for an indeterminate amount of time), OpenStreetMap blocks my browser at work because I'm using Firefox ESR instead of latest stable and it finds that user agent string to be implausible, whatever the german railway operator uses since a few days is triggering on me constantly, etc.,

etc.,

etc. Constant blocks everywhere.

With Anubis, my understanding is that you do the proof of work (with whatever implementation you like, it doesn't have to be the Javascript one that they provide) and you can move on without ever doing any task yourself. The power consumption is a shame, but so long as attackers aren't even doing this much, the couple Joules it takes doesn't seem to be an issue

Of course, the attackers will evolve, but for now...

by lucb1e

5/16/2026 at 7:12:58 AM

Please no. I'm a non-bot who gets stopped and turned away all the time by that menace. Anubis doesn't work without JS.

One of the things I give duckduckgo a lot of credit for is that while they're quick to interrupt me for a bot check (sometimes multiple times in a span of minutes) they'll let me identify ducks even on the most locked down browsers I use.

by autoexec

5/15/2026 at 8:40:05 PM

I'm only a small-scale sysadmin but the way that I understand the internet is that you send abuse notifications to the IP address block owner and, if it doesn't get resolved, you block. The whois/rdap database reveals which IPs all belong to the same hosting provider or ISP, so you can summarize that all to one list of IP addrs + timestamps per some time period

The ISP actually knows which subscriber is on that line, can send them notices, block them, terminate them... loads of things that you simply cannot do because you have no relation to this person. And frankly I wouldn't want to need to have a personal relation with every website that I visit; my ISP can reach me if there is anything relevant to continued use of the internet. From personal experience, when I was a teenager, the ISP cutting our household off after an abuse report was an effective way of stopping what I was doing

by lucb1e

5/15/2026 at 10:22:56 PM

It’s effective against teenagers maybe. Not so much against Amazon, Meta or wherever botnet/crawler is coming out of China these days from up-and-coming AI companies.

by Jolter

5/16/2026 at 7:39:50 PM

Then block all of Amazon, Meta, or wherever botnet/crawling traffic is coming from that doesn't honor robots.txt, sends DDoS reflection traffic, submits SMTP messages (in large volumes, not just probing) for domains they're not authorized for with SPF, or whatever else applies to the protocol you're using

If they can't keep their ranges clean to a reasonable degree, their customers will need to move if they want to access your part of the internet. New sign-ups will always be hard, so some amount of abuse is expected, but if it's the same abuse traffic for weeks after you've notified them, well, it stops being your problem at some point

by lucb1e

5/16/2026 at 7:41:57 PM

See the other comments in this thread. The perpetrators are unknown and are jumping between residential IPs. Possibly botnets?

by Jolter

5/16/2026 at 7:43:40 PM

Then see my other replies in the thread where I've specifically addressed residential IPs, e.g.: https://news.ycombinator.com/item?id=48163060

by lucb1e

5/17/2026 at 10:31:01 AM

This is the post I’m talking about. Make sure you understand how it would not be productive to go after each ISP individually when the traffic is from all of them.

https://news.ycombinator.com/item?id=48155512

by Jolter

5/15/2026 at 11:38:59 PM

I mean you could block entire AS numbers that relate to amazon or big tech datacenters

by tonetegeatinst

5/16/2026 at 12:07:02 AM

wouldn't help, much of the traffic we've observed look closer to ddos patterns - IPs from all over the world, many different networks, each IP makes one request only, doesn't come back. highly distributed, no form of blocking would be effective except maybe captcha or proof of work.

by tangledhelix

5/16/2026 at 12:00:54 PM

The problem with this approach is that modern scrapers use hordes of residential proxies and quickly rotate through IP addresses which belong to ASes you get a lot of real traffic from. There's nothing you can do if the ISP won't take any action against the customer.

by miki123211

5/16/2026 at 3:37:08 PM

Worse than that - even if they would take action, you can't possibly orchestrate filing all of the complaints. It's a drown-in-quicksand problem, you can't fight quicksand one grain at a time.

by tangledhelix

5/16/2026 at 7:35:08 PM

> you can't possibly orchestrate filing all of the complaints

To the ISPs? Each IP range has an abuse email address registered and this is specifically exempt from rate limiting at RIPE's WHOIS server. Not sure how it is in other RIRs but I just happen to know of this policy

You can automate the whole thing, provided that you have a reliable way of identifying the undesired traffic which you need anyway for being able to block it by any means. The trouble is in user identification (they'll just use a new IP address from that ISP or hosting provider if you don't tell the provider about the problematic user)

by lucb1e

5/16/2026 at 7:50:14 PM

See what I wrote above (and let me say I am talking about Project Gutenberg and Distributed Proofreaders here, I am one of the admins on both). A large amount of the hassle traffic we've seen is as I wrote above, the IPs come from everywhere and in many cases, each IP makes a single request and doesn't come back. They change user-agent dynamically, etc, to masquerade as regular traffic. They come from residential, cloud/hyperscale, corporate, educational, government, all the networks, on every continent. This is many thousands of "open a ticket with someone" events per hour territory. It's as difficult to fight as DDoS itself for the same reasons (presumably the harvesting parties know that and that's exactly why this approach is used).

Others online have been writing about their own experience with the same stuff; it's not unique to PG at all, it's everywhere. Talk to anyone that runs a web server and they'll have these stories...

by tangledhelix

5/16/2026 at 8:06:39 PM

I'm aware, I also host various websites that see an IP do a single request to the most unlikely of deep pages. Usually not hard to correlate with similar surprising requests from the same ISP, though, and that's exactly why it would be useful to talk to them: they know who used that IP address at the given timestamp. If they get a hundred complaints from different websites, the ISP is in the unique position to correlate that and find the subscriber(s) that are problematic

You also don't have to send out 1k support requests per hour. Could trial it with some hosting provider that you expect is responsive and see how it works out

edit: like, I just don't see another solution short of banning being anonymous online. Each site would have to know who you are. Someone has to be able to track it back to a person that is doing the abuse or there can't be any rules that we can apply. Imo it's better if that's the ISP (or VPN provider, say) who already has this information anyway

by lucb1e

5/16/2026 at 7:34:17 PM

I know. All the more reason to do it, right? If an ISP can't keep its network clean, then allowing them to send traffic onto the web is just asking for the problem to continue

Show people a useful error, such as "You are using [ISP name] which sends large volumes of abusive traffic (think of spam and DDoS). They allow the attackers to hop around points across their entire network so we cannot block the abusers more selectively. Despite our attempts to contact them, the abuse continues in volumes which we do not see from other ISPs. To access our corner of the internet, use a different ISP. You could try mobile data instead of Wi-Fi or vice versa.", and they can make their own choices about staying with this ISP if more and more websites show this sort of error

If everyone tries to identify people piecemeal, we all need to implement ~200 different identification systems (assuming each country has a central system that everyone is signed up to in the first place), or rely on algorithms to tell who is a bot (I'm currently being misidentified on a daily basis and I'm, eh, not a bot. Trying to buy public transport tickets is currently difficult, for example, because the monopolist in my country blocks me after a few route queries when using a Google browser, and 0 queries from Firefox)

by lucb1e

5/15/2026 at 8:32:27 PM

CF cache?

by TurdF3rguson

5/16/2026 at 10:05:11 AM

I would love it if you could detect AI scraper bots, and feed them AI generated bs instead of the real books...

by jimnotgym

5/16/2026 at 7:51:13 PM

Cloudflare sells that as a product, they call it Labyrinth IIRC.

by tangledhelix

5/16/2026 at 11:59:10 AM

This is very, very, very dangerous.

Occasionally, you misclassify a real user as a bot, and then your reputation is ruined forever.

The official Polish train schedules website did this recently, feeding incorrect departure and arrival times to IP addresses known for aggressive scraping, without taking CGNAT into account. People... have noticed[1].

[1] (Polish) https://zaufanatrzeciastrona.pl/post/kto-i-dlaczego-losuje-w...

by miki123211

5/16/2026 at 6:54:12 PM

traffic yesterday ~20% more than recent average. 4971601 sessions 177 robots 863462 robot files 3390115 user files 20.30% robot files (robots id'd based on requests/ip address) 5 apache servers for static content, 1 CherryPy server for dynamic content hosted at iBiblio.

by gluejar

5/15/2026 at 10:15:30 PM

As long as you're taking suggestions, since many of the books are quite old, adding a publication date or date range to the search functionality might be nice. I personally would find it very useful since I have a tendency to look for things that are older than year _x_ when researching various things.

Thanks for all the effort put into the site!

by 0x0203

5/16/2026 at 6:59:30 PM

only 20% of our books have original publication data in the db. We have a project to add another 40% or so from another database, let us know if you want to help.

by gluejar

5/17/2026 at 1:58:47 PM

I have the same problem on catholiclibrary.org, but insist on having something as the book date for every work. My solution is to temporarily default to the author dates until the book date can be refined. If there is no known author date I at least have a date range, hopefully to century or better.

Author dates are a much smaller data set, can be generally supplemented from public marc records (viaf, loc, etc - I don't do that, but it's an option) and at least provide basic filtering / sorting.

by sgc

5/16/2026 at 1:36:11 AM

Hi for the past 20 years I have known about Project Gutenberg and I used to read a lot from it. One of the obstacle that I face is that there is no way to arrange the books in the order of their original publication. Do you know of any such way. Surely we can arrange the books by their release date on Gutenberg but it has long baffled me as it feels to me the most useless way of sorting the books. Thank you for Project Gutenberg.

by Guestmodinfo

5/16/2026 at 7:00:30 PM

only 20% of our books have original publication data in the db. We have a project to add another 40% or so from another database, let us know if you want to help. reply

by gluejar

5/16/2026 at 7:33:42 PM

Yes I am willing to help. Plz include me in your efforts. Thank you for this

by Guestmodinfo

5/15/2026 at 4:27:11 PM

The book list elements on front page render as both horizontally and vertically scrollable divs on mobile - seems like an opportunity for improvement.

Keep up the good work!

by Falimonda

5/15/2026 at 4:33:04 PM

good feedback thanks! Doing an iteration on the homepage design is actually pretty high on the priority list. will keep your feedback in mind!

by JSeiko

5/15/2026 at 10:05:53 PM

Any interest in offering PG as a multi-lingual web e-reader in any language?

I've since discontinued hosting it, but happy to add you all and merge into an official PG offering: https://www.reddit.com/r/SideProject/s/VtYKxjrMme

by Falimonda

5/15/2026 at 10:07:10 PM

More content visible on various videos I took and posted to X

https://x.com/abal_ai

by Falimonda

5/15/2026 at 4:49:17 PM

Thank you for your work. This site is an international treasure.

by xrd

5/16/2026 at 2:54:59 PM

FWIW I absolutely love how 'no-frills' PG is compared to so much of the bloated, over-engineered, script-riddled web these days. Please don't ever change that!

by windowliker

5/15/2026 at 4:59:16 PM

Thank you for being one of the best places on the internet

by excitednumber

5/15/2026 at 7:30:54 PM

Thanks for the free work! Project Gutenberg is nice to have :).

On the site I noticed the library boxes have roughly a single extra line causing a scrollbar to appear and the last line to be chopped off https://i.imgur.com/PQ8T0qc.png is there an issues/bug portal to properly submit these kinds of things?

by zamadatix

5/15/2026 at 8:27:33 PM

you can open an Issue at https://github.com/gutenbergtools/gutenbergsite

by JSeiko

5/15/2026 at 5:04:32 PM

There's a minor bug with chrome in android where the menu will not close when you tap outside the menu or on the menu link/button

by smallnix

5/15/2026 at 5:29:57 PM

I've messaged the guy who's best suited to fixing this. He'll be on it this weekend

by JSeiko

5/16/2026 at 1:52:28 PM

Oh no. I did not want to cause someone to work on the weekend. I hope it's his hobby!

by smallnix

5/15/2026 at 5:11:39 PM

will open an "Issue" for it

by JSeiko

5/15/2026 at 4:56:32 PM

Oh, my! This does look nice. Thank you for your hard work!

by ExtremisAndy

5/15/2026 at 5:00:27 PM

Thanks! We're currently working on a design update of the page of any specific book. Should be online soon (next 1-2 weeks or so)

by JSeiko

5/15/2026 at 6:44:24 PM

I can't say for project Gutenberg specifically, but in general a huge issue I see is OCR errors. What do you all do to address OCR?

by freedomben

5/15/2026 at 7:11:20 PM

Check out Distributed Proofreaders: https://pgdp.net

by gluejar

5/15/2026 at 9:27:50 PM

I didn't realized DP was still around. I used to do it quite a bit, 15 years ago, but OCR has improved considerably since then.

by jfengel

5/16/2026 at 3:45:00 PM

OCR has improved a lot since then, but OCR is just step 1 of reading in text. They make a lot of errors (even now, especially on old worn out paper pages) and even if they didn't, one has to format the book, deal with footnotes, sidenotes, illustrations, etc. DP is very active, we will welcome you back with open arms :)

by tangledhelix

5/15/2026 at 6:47:41 PM

I uploaded a PDF to archive.org that auto-OCRs with plenty of mistakes. I have found no way of updating the entire stack of documents produced. I wonder if Project Gutenberg is similar

by lapetitejort

5/15/2026 at 5:03:03 PM

Great Work. Thank you. I'm also a programmer. If you are ever short on help, let me know. I would love to contribute.

by shuvrojit

5/15/2026 at 5:39:24 PM

https://github.com/gutenbergtools

autocat3 and gutenbergsite are repos responsible for generating gutenberg.org

by JSeiko

5/16/2026 at 4:12:04 AM

Great project. Are many of the books in a format that can easily be converted into audio? Is there a way to search for them, and information on what software your readers find useful for this purpose?

(Note: A lot of print media these days has switched to far-to-small font-sizes. Less of a problem for (zoomable) digital media, but for many that's still a barrier.)

by 8bitsrule

5/16/2026 at 3:43:28 PM

There are many books available as audio, some are human-read, some were automated. You can see lists here:

human-read: https://www.gutenberg.org/browse/categories/1

computer-generated: https://www.gutenberg.org/browse/categories/2

IIRC many of the human-generated ones come from LibriVox, many of the computer-generated ones came from a collaboration with Microsoft.

by tangledhelix

5/16/2026 at 1:57:55 PM

For the Audio part, I suggest https://desktop.with.audio

by OfflineSergio

5/17/2026 at 2:08:07 AM

IMO, most audio read by humans (esp. voice actors) are far preferable to machine readings. Also, I found no demos on that page.

by 8bitsrule

5/15/2026 at 5:55:01 PM

Wanna let you know you’re doing great work and you have my dream job, thanks to the team for everything!

by TimorousBestie

5/15/2026 at 6:09:14 PM

it's not my day job. PG is open-source. I'm "just" a contributor

by JSeiko

5/15/2026 at 6:16:01 PM

Oh, right. That makes sense.

by TimorousBestie

5/15/2026 at 5:23:50 PM

Thanks so much for the work you and your team do!

by BiraIgnacio

5/16/2026 at 10:48:56 PM

I don't know what the status of this is today, but a number of years ago my biggest complaint about Gutenberg is that a lot of books had images added back when low resolution images were the standard, so you have a ton of books with image resolutions from the year 2000.

by Jiro

5/16/2026 at 9:39:57 AM

Looking really good! Great work.

by samwho

5/16/2026 at 1:07:06 AM

[dead]

by openclawclub

5/15/2026 at 5:59:27 PM

[dead]

by nomoreusernames

5/16/2026 at 11:34:10 AM

There should be more books at Gutenberg.

Also by the way I just searched for 3d printing and found nothing. Either there are no books, or the search query makes things too complicated, IMO.

by shevy-java

5/16/2026 at 6:34:36 PM

Gutenberg is nearly all books that have lapsed into the US public domain by dint of being published 95+ years in the past. Which broadly explains why you hit nothing for 3d printing.

by robin_reala

5/16/2026 at 7:56:50 PM

As another commenter said PG is almost all books from 95+ years in the past due to copyright law in the US. We partner with a sister organization, the World Library Foundation, who have a self-publishing portal for modern works by authors who wish to put their own work in the public domain. You might want to look there for more modern material. https://self.gutenberg.org

by tangledhelix

5/15/2026 at 4:30:46 PM

Very cool! Do you have a recommended way for an agent to see an index of the books and epub links?

(I can’t quite tell if that’s an egregious abuse of the site or you’re perfectly fine to share without human eye balls hitting your www?)

by samcollins

5/15/2026 at 4:40:01 PM

Now i'm not associated with gutenberg in any form, but they do have a page for offline consumption:

https://www.gutenberg.org/ebooks/offline_catalogs.html

Perhaps you can find the information you are looking for there.

However if you plan on scraping or otherwise hitting them with a ton of traffic, consider at least to donate a good amount for the traffic you cause them. It ain't free after all.

by jzs

5/15/2026 at 4:42:10 PM

Donations are always appreciated ;)

by JSeiko

5/16/2026 at 10:11:05 AM

Presumably if you paid them enough money they would give you the books without you having to pay to scrape at all?

by jimnotgym

5/15/2026 at 5:10:09 PM

Thanks for the answers! Found it:

> All Project Gutenberg metadata are available digitally in the XML/RDF format. This is updated daily (other than the legacy format mentioned below). Please use one of these files as input to a database or other tools you may be developing, instead of crawling or roboting the website.

And strongly consider a donation! (My addition)

https://www.gutenberg.org/ebooks/offline_catalogs.html#the-p...

by samcollins

5/15/2026 at 4:34:46 PM

Check out https://www.gutenberg.org/ebooks/offline_catalogs.html

Don't hit the site with agent. The section furtherst bottom machine readable.

by kay_o

5/15/2026 at 5:57:53 PM

if what you want is all the text, please use the tarball or data files at https://www.gutenberg.org/cache/epub/feeds

by gluejar

5/15/2026 at 4:35:11 PM

not yet, but that's not a bad idea imo. Dealing with Ai crawler traffic is definitely a challenge if that's what you were referring to.

by JSeiko

5/15/2026 at 11:40:12 PM

Possibly ZIMs is of interest: <https://ebookfoundation.org/openzim.html> (via: <https://news.ycombinator.com/item?id=48152200>).

by dredmorbius

5/15/2026 at 4:39:04 PM

OPDS?

by ancientcatz

5/15/2026 at 5:33:32 PM

OPDS 2.0 coming RSN. email us if you want to test. OPDS 0.x is currently available (not recommended) by adding .opds to the end of a url

by gluejar

5/15/2026 at 4:34:44 PM

[flagged]

by e0d075b569cd

5/15/2026 at 4:41:30 PM

While PG has probably gotten a lot of use and growth with the growth/maintreaming of the Internet since the 1990s, (TIL) it started back in 1971:

> Michael S. Hart began Project Gutenberg in 1971 with the digitization of the United States Declaration of Independence.[5] Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. […] This computer was one of the 15 nodes on ARPANET, the computer network that would become the Internet. Hart believed one day the general public would be able to access computers and decided to make works of literature available in electronic form for free. […]

* https://en.wikipedia.org/wiki/Project_Gutenberg

by throw0101c

5/15/2026 at 7:20:05 PM

"Project Gutenberg began in 1971 when Michael Hart was given an operator’s account with $100,000,000 of computer time in it by the operators of the Xerox Sigma V mainframe at the Materials Research Lab at the University of Illinois."

https://www.gutenberg.org/about/background/history_and_philo...

by aksss

5/15/2026 at 5:58:44 PM

wikipedians, please help update this article.

by gluejar

5/15/2026 at 7:48:53 PM

In what way? And from what sources? (Wikipedia as a tertiary source is supposed to be a summary of information present in reliable secondary sources — see for instance https://en.wikipedia.org/wiki/Wikipedia:Based_upon. So if the information on the Wikipedia article is incomplete or out of date, where is the correct information available?)

by svat

5/16/2026 at 7:10:04 PM

There's quite a lot of information here: https://www.gutenberg.org/about/ All our text is now utf-8. No Plucker! Almost every book is HTML(5).

by gluejar

5/15/2026 at 9:48:10 PM

good question. Eric - any pointers?

by JSeiko

5/15/2026 at 6:06:39 PM

Prescient

by mcdonje

5/15/2026 at 9:50:09 PM

The best thing I ever did for my father was to buy him a kindle and an access point and show him how to use Project Gutenberg to get books. He loved the old writings (he being a GED holder who was in the Navy during Korea yet had read the entire Harvard Classics). He had a special rolled up towel he used to prop it on his lap in his favorite chair and he read and read and read. When he passed he was reading "Legends of the Jews" from 1931.

I had some small e-correspondence with Michael S. Hart back in the 90's as well, and made a few modest contributions to the project, which made my English major undergraduate heart swell with pride and joy.

I guess this is only to say that PG is special to me for these reasons, and I am glad to see it still thriving. <3

by drummojg

5/15/2026 at 10:01:47 PM

this is so great to hear! Distributed proofreaders (the org that actually does transcriptions) is still looking for volunteer should you feel the urge/inclination :) https://www.pgdp.net

by JSeiko

5/16/2026 at 3:48:45 AM

This was very touching, thanks for sharing. Sorry for your loss.

by j_bum

5/15/2026 at 4:49:10 PM

I'm surprised no eBook Reader vendor has a Project Gutenberg "Store." Where you can just browse Gutenberg, find a book, and just grab it down to the reader. Instead, they either are actively hostile (Kindle), or require the use of Calibre (which itself is good, it is just the friction).

by Someone1234

5/15/2026 at 4:58:28 PM

I've used https://standardebooks.org/ to pull nicely formatted Project Gutenberg books on any e-reader that supports a browser (in my case, Boox).

Technically, I can also just directly pull the epub from Project Gutenberg, but sometimes the formatting leaves a lot to be desired.

Once you get an e-reader that runs a semi-capable OS (ex - stock android, even an older version), it's hard to go back to something like a kindle.

by horsawlarway

5/15/2026 at 7:01:43 PM

To be precise, the vast majority of SE is from Gutenberg, but we also source from Faded Page, Gutenberg Australia, Wikisource and occasionally do our own transcriptions.

by robin_reala

5/15/2026 at 5:57:19 PM

HTML editions from the two sites contrast interestingly:

https://www.gutenberg.org/cache/epub/1513/pg1513-images.html

https://standardebooks.org/ebooks/william-shakespeare/romeo-...

Each has its particular advantages relative to the other ...

by everybodyknows

5/15/2026 at 6:14:12 PM

Curious, what are the advantages you see in each relative to the other?

Also one should probably compare the former to the single-page version on standardebooks: https://standardebooks.org/ebooks/william-shakespeare/romeo-...

by svat

5/16/2026 at 6:27:15 AM

Personally I find the formatting used by the Gutenberg one to be a lot nicer/easier to read, despite (or perhaps because of) being simpler, more plain.

At least for the first few pages of content that I looked at on both versions.

by swores

5/15/2026 at 5:04:16 PM

standardebooks.org is great!

by JSeiko

5/15/2026 at 8:43:09 PM

If you don’t strip the Project Gutenberg license from the book text (leaving only the book text, which no-one disputes is public domain and freely distributable), you are required to give “pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg-tm works calculated using the method you already use to calculate your applicable taxes”

https://www.gutenberg.org/policy/license.html

[Way back in the early days of the iPhone, I sold a book reading app which was backed directly by Project Gutenberg texts, called “Eucalyptus”. I sent 20% of the gross profits to PG - which was never less than very supportive of the app - and felt good about doing so.]

by jrmg

5/15/2026 at 4:54:17 PM

Most of them offer their own paid storefronts and have a perverse incentive not to offer a large area full of free books.

by GaryBluto

5/15/2026 at 5:03:02 PM

probably true. Maybe an true open-source eReader should exist.

by JSeiko

5/15/2026 at 6:31:56 PM

Arguably

https://play.google.com/store/apps/details?id=biz.bookdesign...

should ~~be~~ EDIT have been ENDEDIT opensource --- it does at least work to support Project Librivox (or at least that's my understanding)

Seems to no longer be available (see below)

by WillAdams

5/15/2026 at 6:48:30 PM

I'm getting "We're sorry, the requested URL was not found on this server." if I go to the link

by JSeiko

5/15/2026 at 7:20:43 PM

I believe the app is discontinued and the reason I can see the page is that I am on record as having downloaded it.

by WillAdams

5/15/2026 at 10:26:06 PM

They do exist, since a pretty long time. I bought a pocketbook beginning 2010s. https://pocketbook.ch.

If you mean epub reader software Calibre and a bunch of others exist since pretty much the beginning of epub

by dgellow

5/15/2026 at 5:45:55 PM

Used to be one could sort of get that with the Project Librivox:

https://librivox.org/

e-book app Gutebooks (in addition to their audio app), but it seems to have been deprecated (I'm no longer able to connect to the server on my copy (which I only got 'cause there was an in-app purchase to fund Project Librivox).

FWIW, Barnes & Noble has been plundering the public domain using a book composition/keying house in the Philippines to make their public domain books which they make available in their stores --- Amazon apparently has a similar setup for the Kindle Store:

https://www.amazon.com/Public-Domain-Books-Kindle-Store/s?k=...

Rather a shame that PG didn't monetize by putting their books up there pre-emptively.

by WillAdams

5/15/2026 at 6:55:01 PM

>Barnes & Noble has been plundering the public domain using a book composition/keying house in the Philippines to make their public domain books which they make available in their stores

Why is it 'plundering' for B&N to print physical books, transport them to their brick-and-mortar stores to sell? There are real costs associated to doing so. It would not have zero cost for me to print and bind a copy myself at home.

by dessimus

5/15/2026 at 10:45:03 PM

I'm working on audiobook app that integrates the Libivox catalog. Only on Windows right now - https://apps.microsoft.com/detail/9n1z76ffb3fc?hl=en-US I'll release Android, IOS, Mac & Linux versions soon.

by gman83

5/16/2026 at 6:33:47 AM

In a similar vein to cobbzilla, I have a couple of family members (and to a lesser extent myself too) who would be keen on such an app for their iOS devices if you ever need some testers for that :)

(iPhones 15 Pro, 11 Pro, SE-2nd; and an iPad of some kind)

by swores

5/16/2026 at 3:24:31 AM

I love librivox! please add me to your ios release list. my HN username at g mail

by cobbzilla

5/16/2026 at 8:26:00 PM

[dead]

by litreads

5/15/2026 at 6:02:13 PM

the way I see it PG is a labor of love. Bit odd if Barnes & Noble or whoever piggyback off it. But in the end - the more people read the books, the better.

by JSeiko

5/15/2026 at 6:30:17 PM

It is a public good, and it would be appropos if corporations would support it directly rather than work at cross-purposes to it.

If Amazon is going to sell public domain texts, then it would make sense to source them from PG, and fund some money from those sales to the non-profit, similarly, they could then funnel reports of typos to PG for review and correction (it was a bit of a struggle the last time I tried to get a text corrected, and the project founder/director actually stepped in on my behalf).

by WillAdams

5/15/2026 at 6:47:19 PM

that would be great! Sadly I'm not very confident that that will actually happen ...

by JSeiko

5/15/2026 at 7:21:41 PM

Needs new legislation where the commons/public domain have public benefit corporations appointed as the manager of said resource.

by WillAdams

5/15/2026 at 4:51:56 PM

I've heard that the newest Kobo e-readers have a browser that you could use to go to gutenberg.org and directly download files.

but yes, generally I agree with your point. Library of 75k books seems pretty valuable to have direct access to.

by JSeiko

5/16/2026 at 6:50:57 AM

On any device you can install KOReader, PG is one of the default options in the builtin OPDS browser.

https://koreader.rocks/

by BHSPitMonkey

5/15/2026 at 5:47:06 PM

You can download books directly from the Project Gutenberg website using the web browser on most eBook readers - even the Kindle supports it.

by daveoc64

5/16/2026 at 1:24:15 PM

Yep! This is how I get all my books on Kindle! For me, I choose the 'older Kindles' option and it downloads directly to my homepage.

by moichael

5/15/2026 at 5:16:00 PM

No money for them.

by cstever

5/15/2026 at 8:19:25 PM

From Italy, https://www.gutenberg.org/ gives a 404 error and https://gutenberg.org/ opens a very official-looking page stating "police notice. This site is under judicial seizure" and references a sentence number: "criminal proceedings 52127/20 R.N.R.I. tribunal of Rome"

Any idea what's happening? I thought PG published public domain books...

by cosmos0072

5/15/2026 at 8:23:41 PM

Found: it's a sentence from 2020, and PG decided not to appeal (!?)

Full story (in Italian) at https://www.wired.it/internet/web/2020/06/30/progetto-gutenb...

by cosmos0072

5/15/2026 at 8:28:38 PM

Seems like a case for HTTP 451 (Unavailable for Legal Reasons) rather than 404.

by charonn0

5/16/2026 at 1:38:51 PM

HTTP 666 (We're evil) seems more fitting here.

by amelius

5/15/2026 at 8:45:35 PM

It looks like the issue was that, in Italy, copyright expires 70 years after the death of the author or the first translator of a work.

by johndough

5/15/2026 at 9:54:02 PM

PG works based on US copyright law. And as I understand it that's also 70 years after author/translator death. My gut feeling is that if anyone tried hard enough this ban could probably get lifted

by JSeiko

5/16/2026 at 6:38:37 PM

That’s only the case for works published after the mid-70s. For works published before (which is all current PD books in the US), it’s 95 years after the date of publication, with a few exceptions where people failed to file renewal notices.

by robin_reala

5/16/2026 at 7:15:58 PM

A silly legal tribunal confused PG with pirate sites. We sent the tribunal a letter pointing out their error but it was ignored. The block was served on local dns providers so many Italian users evade the block by using DNS from Google or Cloudflare.

by gluejar

5/15/2026 at 10:21:50 PM

It was also blocked in Germany for a while due to a court order https://cand.pglaf.org/germany/index.html

by dgellow

5/16/2026 at 12:11:29 AM

The Alfred Döblin books are still blocked in Germany (for a couple more years).

by tangledhelix

5/15/2026 at 8:26:27 PM

I asked Claude to research the background story: "In May 2020, the Court of Rome ordered Italian ISPs to seize/block a list of domains as part of a criminal case (the 52127/20 R.N.R. you're seeing) targeting sites and Telegram channels distributing pirated newspapers and magazines. 28 domains were on the list, and Project Gutenberg got thrown in alongside the actual pirate sites."

apparently this situation hasn't been resolved yet

by JSeiko

5/15/2026 at 5:42:43 PM

Nice to see so much appreciation for what we do. (I'm the new-ish executive director.) Any wikipedians reading this, the article about PG is... aging. Last I looked, it said we offered Plucker files. @Jseiko has done some nice work.

by gluejar

5/16/2026 at 8:10:36 PM

FYI, I took Plucker out of the lead in November, after a PG volunteer recommended that update on the article talk page. Plucker is currently only mentioned in a sentence about formats offered in 2009.

Happy to make other updates! Writing specific notes on the talk page is helpful.

by britta

5/15/2026 at 6:13:49 PM

Looks like the top downloaded book yesterday[0] was Concrete Construction: Methods and Costs by Gillette and Hill.[1] Beat out Moby Dick, Count of Monte Cristo, Frankenstien, Romeo and Juliet, and others.

> 23644 downloads in the last 30 days.

I wonder if this is bot behavior? 23k downloads feels like a lot?

[0] https://www.gutenberg.org/browse/scores/top [1] https://www.gutenberg.org/ebooks/24855

by ssgodderidge

5/15/2026 at 6:42:54 PM

Haha well there is an exciting movie about concrete coming out, “The History of Concrete” by John Wilson. Surely the superfans are studying up

by sovietswag

5/16/2026 at 2:26:01 AM

For context, here is the first paragraph of the book's preface:

How best to perform construction work and what it will cost for materials, labor, plant and general expenses are matters of vital interest to engineers and contractors. This book is a treatise on the methods and cost of concrete construction. No attempt has been made to present the subject of cement testing which is already covered by Mr. W. Purves Taylor's excellent book, nor to discuss the physical properties of cements and concrete, as they are discussed by Falk and by Sabin, nor to consider reinforced concrete design as do Turneaure and Maurer or Buel and Hill, nor to present a general treatise on cements, mortars and concrete construction like that of Reid or of Taylor and Thompson. On the contrary, the authors have handled the subject of concrete construction solely from the viewpoint of the builder of concrete structures. By doing this they have been able to crowd a great amount of detailed information on methods and costs of concrete construction into a volume of moderate size.

by tmoertel

5/16/2026 at 8:28:18 PM

I ... now want to read this book.

by nout

5/16/2026 at 3:36:20 PM

exciting :)

by JSeiko

5/15/2026 at 6:19:39 PM

bot traffic would be my guess too. I doubt there was a sudden global spike in interest in "Concrete Construction Methods" :D

by JSeiko

5/15/2026 at 9:02:20 PM

It's got better reviews on Goodreads than Moby Dick too. I know what I'm reading next

by why_at

5/15/2026 at 7:26:50 PM

Project Gutenberg is a treasure trove, though many technical details defy automatic typesetting of its books. Standard Ebooks takes consistency to an unbelievable level. My post compares various sources of public domain books with an eye on typesetting:

https://dave.autonoma.ca/blog/2020/04/11/project-gutenberg-p...

by thangalin

5/15/2026 at 6:39:19 PM

Worth mentioning the Project Gutenberg ZIMs. You can download the entire ENglish Gutenberg corpus for about 60GB (English Wikipedia ZIM complete with images is ~120GB):

https://ebookfoundation.org/openzim.html

by fmajid

5/16/2026 at 12:49:57 PM

Like the Project Gutenberg collection on archive.org, the ZIMs are only current up to 2018.

by cxr

5/15/2026 at 5:46:38 PM

Gutenberg is awesome. There is also

https://www.fadedpage.com/ from Canada I think

https://runeberg.org/ from Sweden

by kreyenborgi

5/16/2026 at 5:31:35 AM

Don't forget Wikisource! https://en.wikisource.org/wiki/Main_Page

by Arcorann

5/15/2026 at 5:24:30 PM

Their feeds of new books is a goldmine:

https://www.gutenberg.org/ebooks/feeds.html

Every day you'll get much more than you're bargaining for, right into your feed or inbox. Easy download books you're interested in and put them on your Kindle.

by carlosjobim

5/15/2026 at 5:47:57 PM

I used to use the Online Books Page new books listing similarly:

https://onlinebooks.library.upenn.edu/new.html

by WillAdams

5/15/2026 at 5:53:36 PM

I remember printing out project Gutenberg books in the mid-90s, four regular pages to an A4 page, double-sided on my inkjet. I had a background in typography, so I made it work.

Any yes, the text needed a lot of processing to make it right.

Now, in my early fifties and with declining eyesight, that's out of reach now.

Thanks for sticking with the project!

by smilespray

5/15/2026 at 6:21:34 PM

that's cool! one of my "pet-ideas" is actually to make an AI-agent that does all that typographical work for any PG book to make it nicely printable without any manual labor whatsoever. Maybe that's doable now ...

by JSeiko

5/15/2026 at 6:42:24 PM

That is doable. Most of my work was regexp and repetitive stuff. And the typograhpy stuff is achievable with the current state of the art models. Not that I remember what I did, it was 30 years ago.

by smilespray

5/15/2026 at 6:51:32 PM

Interesting!

by JSeiko

5/15/2026 at 5:25:27 PM

The project was geo-blocked in Germany for a long time: https://news.ycombinator.com/item?id=29024039

by ndr42

5/16/2026 at 8:23:10 PM

One author remains blocked in Germany (but only for a couple more years)...

by tangledhelix

5/15/2026 at 6:10:24 PM

very glad this has been resolved (I'm from Germany myself)

by JSeiko

5/15/2026 at 6:51:16 PM

Project Gesperrtberg

by debo_

5/16/2026 at 8:30:56 PM

Deeply grateful for Project Gutenberg & LibriVox! I've been using the text to force-align LibriVox recordings to produce word-by-word synced audiobooks; first stage of this project is a YouTube channel but I could definitely turn this into a mobile reader app if there's interest: https://www.youtube.com/@LitReadsEditions

by litreads

5/16/2026 at 3:19:18 PM

PG is proof that the best things on the internet are still built by people who just care about the mission.

by aymenfurter

5/16/2026 at 3:22:38 PM

Paul Graham? ;-)

by GeorgeTirebiter

5/15/2026 at 5:03:00 PM

Project Gutenberg had (has?) a tendency toward plaintext that always put me off. (And it has been over a decade I'm sure since I explored the site—so I am no doubt now misinformed.)

I like a styled formatted book—would prefer PDFs. (I know, not a popular format apparently.)

I like the idea of Project Gutenberg but guess I found book scans on archive.org my preference.

My go-to example is Lewis Carroll's "Through the Looking Glass" with the fantastic art of John Tenniel and Carroll's sometimes creative formatting of the prose…

I see they (Project Gutenberg) have ePub now, which can be good if well done.

(If not well done it can be a kind of mess. Re-flowable "HTML", paginated… Anyone ever try to print a long web page and did you enjoy the result? Perhaps that is as much on the ePub reader though.)

by JKCalhoun

5/15/2026 at 5:08:44 PM

We're supporting EPUB3 for the vast majority of books! At the same time we also have a "Plain Text" version for each as in a sense it's the most robust. PdFs are in the works!

by JSeiko

5/15/2026 at 8:19:10 PM

That's cool. I'll have to read up on EPUB3—I'm not familiar with it.

(I worked on iBooks for the Mac like 15 years ago—it's where I got to dive into the ePub format. A lot has changed in the standard since I am sure.)

EDIT: looks like EPUB3 has a "paginated" mode as well as more sophisticated layout tags.

Also appears to have support for ruby and vertical writing modes. This was not yet supported in WebKit when I worked on iBooks. Somehow, this white guy from Kansas (who knows no language other than English) got tapped to implement the vertical TOC for Asian languages. Also tasked with annotating the ePUB pages to display (also vertical) ruby text…

by JKCalhoun

5/15/2026 at 5:06:19 PM

As others here have mentioned, https://standardebooks.org/ is excellent and my understanding is that they use Gutenberg books as a source for theirs but done up much nicer.

by JLO64

5/15/2026 at 5:38:32 PM

You can contribute to Standard Ebooks by finding OCR errors, then pushing your fixes to https://github.com/standardebooks

by everybodyknows

5/15/2026 at 5:20:33 PM

Source can be anything with the original text, but, more often than not, ends up being PG.

by dempedempe

5/15/2026 at 9:02:08 PM

I love, love, looove the fact that I can have a book's html version on project gutenberg bookmarked and continue to read across devices without ever having to login. I use the browser's inbuilt capability extensively to enhance my reading experience (fonts, backgrounds, text to speech, print formatting, share snippets). None of this is a good experience with pdf, epub or any other format.

I've read more (meaningful) text on PG than any other digital platform. Huge fan. Thanks for all the work and for keeping it clean and free

by gofreddygo

5/16/2026 at 3:38:54 PM

Interesting. Do you "just" use the browser's built-in capabilities or also some browser extensions?

by JSeiko

5/16/2026 at 5:03:35 PM

I just use the built-in capabilities these days as everything that I would need is in there. This was not true many years ago when I did use some browser extensions.

by gofreddygo

5/15/2026 at 5:04:59 PM

Check out Standard eBooks. They take the text from Gutenberg and add a level of polish to the ePubs.

by RattlesnakeJake

5/16/2026 at 8:21:01 PM

This is covered in the FAQ - https://www.gutenberg.org/help/faq.html#why-is-project-guten...

And as another person noted, the vast majority of books have HTML, EPUB, Mobi formats. We are also looking at both KEPUB (Kobo) and PDF which will probably come in the future.

by tangledhelix

5/15/2026 at 5:07:57 PM

I on the other hand prefer epubs for fiction. I mostly read on the phone.

by jiffygist

5/15/2026 at 5:37:36 PM

The common issue with PDFs is that e-readers generally have terrible support for them.

by skrtskrt

5/15/2026 at 5:35:47 PM

PDF coming this year.

by gluejar

5/16/2026 at 12:17:00 AM

check it again. most books have epub avalible

by iberator

5/15/2026 at 5:04:57 PM

I have got quite a few books over the years from Gutenberg, and the epubs have been fine 0 even of illustrated ones.

by graemep

5/15/2026 at 5:20:14 PM

I like plain text. You can always post process it into any other format you prefer.

by the_af

5/15/2026 at 6:06:54 PM

it's also very "accessible" - good for assistive technologies and people with "ou-of-the-ordinary" requirements

by JSeiko

5/16/2026 at 6:41:02 PM

Well, the problem is that you lose then all the semantic information that was encoded into the HTML or ePub versions. Those tend to be better for assistive tech users.

by robin_reala

5/15/2026 at 11:00:13 PM

Not really, given that it can’t represent even basic formatting such as bold or italic text, chapter markers etc.

As an output format it’s ok, but as an input format, it’s almost as bad as PDF.

by lxgr

5/15/2026 at 7:22:15 PM

Project Gutenberg feels like the opposite of modern internet design philosophy. Quiet, useful, accessible, and built to last.

by cold_tom

5/16/2026 at 5:21:44 AM

Project Gutenberg is awesome and amazing.

I was visting the ruins of a monestary the other day, and one of the texts listed that it had a library of 320ish books.

I chucked because I have almost 200 books in my personal Kindle library, but I was wrong. I actually have 75000+ books, thanks to Project Gutenberg.

I just haven't downloaded them all yet.

by tomjen3

5/15/2026 at 5:04:02 PM

As a Kindle user, I still miss the old version of the site. The new one looks great on normal desktop, but the old one was simple enough to load and directly download books on the device's built-in browser.

by RattlesnakeJake

5/15/2026 at 5:05:52 PM

That's interesting. What about the new design prevents you from doing it? Genuinely asking here. We may fix it if it's actionable

by JSeiko

5/15/2026 at 5:24:41 PM

And now it's time to put my foot in my mouth. I haven't used it in a while because it was frustrating, but you guys seem to have already fixed it :)

The previous version of the site had two major flaws:

1. The search bar had been removed from the top of the page, and hidden behind a "Click here to search" (or similar) link partway down the page

2. Once you opened that page, the coloring of the site was so washed out on e-ink that the text input was hard to find.

Thanks for fixing it!

by RattlesnakeJake

5/15/2026 at 5:28:11 PM

"you guys seem to have already fixed it" - that's what we like to hear :)

by JSeiko

5/15/2026 at 9:58:36 PM

Maybe include a "Lite" version that only displays text/links? No to minimal styling would be great!

by bitigchi

5/15/2026 at 5:06:43 PM

Is that a Kindle issue?

You can download books in most browsers. I know Amazon have done things to make life difficult for other stores in the past.

by graemep

5/15/2026 at 7:59:09 PM

I'd call it one of those middle-ground things:

• On the one hand, E Ink devices have a fairly known set of limitations, and it would be ridiculous for me to expect them to render the whole web well.

• On the other hand, it's good for website designs to consider the kind of devices employed by their users. Using a Kindle to access Gutenberg is likely less of an edge case than it would be for other sites, so it's worth the extra design work.

(Keep in mind that -- given my sibling comment -- this is all theoretical. The latest iteration of Gutenberg's site is much better than the previous version)

by RattlesnakeJake

5/16/2026 at 9:34:03 AM

Not sure if this is the right place, but the new layout of the German Projekt Gutenberg is missing any download links. For example

https://projekt-gutenberg.org/authors/johann-wolfgang-von-go...

by vwkd

5/15/2026 at 4:40:14 PM

A big pet peeve of mine with Project Gutenberg was the lack of mobile styling. Looks like it’s been fixed! Awesome.

by seizethecheese

5/15/2026 at 4:42:55 PM

good to hear - that was a lot of work!

by JSeiko

5/15/2026 at 4:46:11 PM

Made an app that allows reading PG books as audiobooks on iPhone https://loudreader.io/

by mowmiatlas

5/15/2026 at 4:47:24 PM

that's cool!

by JSeiko

5/16/2026 at 2:02:35 PM

if the doesn't leave my phone why is it a subscription?

by OfflineSergio

5/15/2026 at 4:57:12 PM

Recently downloaded Moby Dick from here:) very easy to use

by aronhegedus

5/15/2026 at 5:36:17 PM

Moby Dick is consistently one of the Top Downloads

by JSeiko

5/15/2026 at 6:20:58 PM

I love how usable the site is even with JS disabled!

by autoexec

5/15/2026 at 5:42:00 PM

I'm slightly curious how PG handles heavily illustrated books. I've downloaded some years ago, and the quality of the illustrations was always pretty poor. Has it been improved lately? What's the QA like for illustrations?

by oidar

5/15/2026 at 5:51:07 PM

Nowadays we depend on scans from Internet Archive, Hathitrust, and other sources. Some scans are better than others. Bear in mind that our illustrations need to be in the public domain and usually from the same edition as the text. https://www.gutenberg.org/help/errata.html

by gluejar

5/15/2026 at 7:48:23 PM

I wonder if the people behind project Gutenberg use Anna's Archive or mam for books that can't be put on Gutenberg.

by Myzel394

5/15/2026 at 5:41:53 PM

PG remains one of the best things on the internet. The amount of fascinating material almost beggers belief.

by AndrewStephens

5/15/2026 at 5:48:44 PM

the amount of weird/interesting stuff that one would find nowhere else is possibly the coolest aspect of PG imo

by JSeiko

5/16/2026 at 11:14:36 PM

Project Gutenberg is the best. Kudos to the team and to the 1000s years of humans developed it!

by alexdesouza

5/15/2026 at 5:59:09 PM

How did "Concrete Construction: Methods and Costs" come to be the #1 download?

by kgwxd

5/15/2026 at 6:04:16 PM

good question. first though - maybe some bot has downloaded it often for whatever reasons and our systems didn't detect it as bot traffic. just a guess.

by JSeiko

5/15/2026 at 7:40:37 PM

I thought this was for the Wordpress Gutenberg Editor for a second

by elias1233

5/16/2026 at 7:21:48 PM

I should hit Matt up for a donation.

by gluejar

5/16/2026 at 9:46:58 AM

Needs "translate" buttons. Now little too cumbersome for most,

https://www-gutenberg-org.translate.goog/cache/epub/64099/pg...

by timonoko

5/16/2026 at 5:09:04 PM

Is Project Guternberg ever going to add PDF download options?

by zahirbmirza

5/16/2026 at 7:19:14 PM

later this year

by gluejar

5/16/2026 at 8:27:36 PM

Amazing!!! As ereaders get faster and with colour, this could make books from the Project even more attractive. I love the work of your team. Thank you.

by zahirbmirza

5/15/2026 at 11:41:01 PM

my first ever coding project was making a chrome extension that made the typography better on the html formats: https://github.com/smcalilly/gutenberg-typography

by greenie_beans

5/16/2026 at 2:07:38 AM

nice!

by JSeiko

5/15/2026 at 6:37:34 PM

Please give me some book recommendations :)

by jwpapi

5/15/2026 at 6:50:27 PM

Flatland: https://www.gutenberg.org/ebooks/search/?query=flatland

I've heard good things. Also - Sherlock Holmes :)

by JSeiko

5/15/2026 at 6:47:43 PM

Not a recommendation per se but I used to use Amphetype on Gutenberg texts to practise touch-typing. There's something about writing out a book that hits differently to reading it. You skip less, odd parts stick with you. I think the last one I tried was The Island of Dr Moreau.

by klondike_klive

5/15/2026 at 6:56:28 PM

Ulnar Nerve Entrapement :/

by jwpapi

5/15/2026 at 7:46:14 PM

From the newest releases page I stumbled into "Some Nigerian fertility cults" by Percy Amaury Talbot & am enjoying it so far.

https://www.gutenberg.org/ebooks/78684

by BaseBaal

5/15/2026 at 5:48:27 PM

I find it interesting that the context of this comments page apparently overrides the normal definition of “PG” on HN.

by bryankaplan

5/15/2026 at 5:50:13 PM

:D

by JSeiko

5/15/2026 at 5:50:41 PM

personally I'm a fan of the other "PG" as well.

by JSeiko

5/16/2026 at 11:20:07 PM

one of the last good websites on the web...

by marcellocurto

5/16/2026 at 3:57:02 PM

Text files are still the best

Good job

by 1vuio0pswjnm7

5/16/2026 at 8:12:56 AM

I wonder how extensive the overlap is with sacred-texts.com

by jdthedisciple

5/16/2026 at 10:35:22 PM

I love PG... but the covers stink. Should have a public competition to have new ones made and voted on. I'm willing to vibe code a website to make it happen if you're willing...

by cpill

5/16/2026 at 12:31:59 AM

Is there a plan to extend search to book content?

by oxag3n

5/16/2026 at 8:25:31 PM

Since the books are available on the site as text and HTML the search engines index them already for you. Try searching for the below; it should take you to the book you expect as the first result:

site:gutenberg.org "it was the best of times"

by tangledhelix

5/16/2026 at 1:27:51 AM

not that I know of ...

by JSeiko

5/16/2026 at 11:33:38 AM

All the books should be there. I understand that current society has restrictions, what with near infinite copyright and other shenanigans - but I don't see any of these as reason to hide information from mankind. Eventually we'll free all the information. Remuneration will have to occur in other ways than the current status quo.

by shevy-java

5/16/2026 at 3:40:46 PM

hopefully!

by JSeiko

5/16/2026 at 10:48:16 AM

Project Guttenburg was my first introduction to the foss ethos. Well I suppose there was Wikipedia, but project Guttenburg really spoke to me. This was probably around 2003? So I'm glad to see it still going strong.

I just looked at the history (https://www.gutenberg.org/cache/epub/60600/pg60600-images.ht...) and it dates back to the 70s. There was me thinking it was some new fangled web thing.

by benj111

5/15/2026 at 6:25:56 PM

I keep getting PR_CONNECT_RESET_ERROR

by monegator

5/15/2026 at 6:57:31 PM

just heard back that the server provider has been doing a security update. Maybe you were one of the users that got unlucky as a result... maybe try later if still interested

by JSeiko

5/15/2026 at 6:29:22 PM

I've reported it.

by JSeiko

5/15/2026 at 10:06:01 PM

Keep up the awesome work !

by mentalgear

5/15/2026 at 4:21:05 PM

Thank you for reminding me about this project. Didn’t visit it in a long time.

by taubek

5/16/2026 at 12:16:40 AM

I love Project Gutenberg, don't get me wrong... but frankly, Anna's is better.

by gwerbret

5/16/2026 at 6:09:21 PM

I came here to post something similar. PG is perhaps still important as an archive of proofread OCRed public-domain material, but for ordinary people, the shadow libraries have vastly more stuff. After all, readers don’t want their reading to be limited to what was published before a copyright cutoff date many decades ago.

by TFNA

5/16/2026 at 1:28:55 AM

in which way? (genuine question)

by JSeiko

5/16/2026 at 2:02:15 AM

Well, mainly in the fact that Anna's has several orders of magnitude more books, and includes research publications and more, ah, contemporary materials to boot.

by gwerbret

5/15/2026 at 4:42:50 PM

Awesome

by solarity_studio

5/15/2026 at 11:20:53 PM

If you like Project Gutenberg, the closest analog for music is IMSLP, the Petrucci Music Library (imslp.org) — over 855,000 public-domain scores maintained by volunteers, with the same labor-of-love energy and the same perpetual scan-quality and copyright-jurisdiction headaches. Same ethos of "the works belong to humanity, not a storefront." Worth a bookmark for the musicians on HN.

by derekhdawson

5/16/2026 at 8:55:53 AM

[flagged]

by mashijian

5/16/2026 at 8:18:03 PM

[dead]

by NexiunDev

5/16/2026 at 10:41:10 AM

[flagged]

by zhenglei11

5/16/2026 at 1:21:28 PM

[dead]

by simonTrace

5/16/2026 at 12:37:43 PM

[dead]

by targetbridge

5/15/2026 at 7:11:32 PM

[flagged]

by Timixx

5/15/2026 at 4:54:51 PM

I can't read anymore due to fear of not being productive with AI

by brcmthrowaway

5/15/2026 at 4:59:17 PM

maybe there's a way to read more productively using AI: https://x.com/karpathy/status/1990577951671509438

could be a trick to ease that fear :D

by JSeiko

5/15/2026 at 6:28:47 PM

I've found that the larger open-weight AI models do a great job of explaining the old non-fiction content on PG, particularly magazine articles which are a good size for the AI to handle. It breaks down the long wall-of-text paragraphs for you and explains all the historically relevant background that would've been assumed to be known back in the day.

If you ask it to assess the relevance of the text in the present day it will also do that very nicely, highlighting the places where the text shows old-fashioned viewpoints that would be sharply criticized today.

by zozbot234

5/15/2026 at 6:39:19 PM

so maybe Karpathy has a point that LLM-assisted reading should be a thing. Would be cool if that worked on E-Reader screens as well. Maybe when the browsers on E-Readers become good enough ...

by JSeiko