4/30/2026 at 8:27:17 PM
"Overall, we find a Postgres server can handle up to 144K of these writes per second. That’s a lot, equivalent to 12 billion writes per day."Based on a problem I'm facing with Postgres today, I wonder if this really progresses as linearly as the article wants to make it out.
We're in the middle of evaluating Postgres as a replacement for MySQL, and experience notable slow-down for plain multi-row inserts due to index growth as soon as the table reaches just a couple of dozen million rows. It's an uncomplicated and flat (no constraints or foreign keys etc.) medium width table of about 10-15 columns and a handful of non-composite btree indices - and/or hash indices; we've tried mixing and matching just to see what happens - but ingestion drops to less than half already before 50m rows. At 100m rows the insertion performance is down to a fraction and from there it just gets worse the larger the table and its indices grow. It's as if there's some specific exponential cut-off point where everything goes awry. However, if we simply remove all indices from the table, Postgres will happily insert hundreds of millions rows at a steady and near identical pace from start to end. The exact same table and indices on MySQL, as closely as we can match between MySQL and Postgres, running on the same OS and hardware, maintains more or less linear insertion performance well beyond 500m rows.
Now, there's a lot to say about the whys and why-nots when it comes to keeping tables of this size in an RDBMS and application design relying on it to work out, and probably a fair amount more about tuning Postgres' config, but we're stumped as to why PG's indexing performance falters this early when contrasted against InnoDB/MySQL. 50-100m rows really isn't much. Would greatly appreciate if anyone with insight could shed some light on it and maybe offer a few ideas to test out.
(add.: during these stress tests the hardware is nowhere close to over-encumbered, and there's consistent headroom on both memory, CPU and disk I/O)
by daneel_w
4/30/2026 at 11:00:21 PM
problem is table design and write amplification. Every row insert triggers update into every index, so you get classic amplification problem.Separate your table into Cold (with all indexes and bells and whistles) and Hot (heap table with no indexes except PK).
Insert as many rows as you want into Hot heap, and then move them in the background into cold in batches, so that index recalculation is amortized across many rows, instead of per-row.
Another poster suggested partitioning, thats the same idea: separate Hot and Cold data into partitions and keep hot partition as heap
by bijowo1676
5/1/2026 at 4:48:47 AM
That’s a really cool idea I had not heard before, thank you for sharing this. It also feels like the type of thing a db ought to be able to do under the hood. I wonder why this is not a config (though there’s a pg extension for everything so maybe it does exist)by appplication
5/1/2026 at 11:35:07 AM
As far as I can tell, Postgres is not designed with this inclination towards doing lighter work when clients are waiting and piling up maintenance work to do in background. I think the background work it does is mostly running vacuum on tables now and then.Contrast that with ClickHouse, for example. It operates in a different niche than Postgres (OLAP instead of OLTP) – with their merge tree engine family [1] that does data deduplication in background.
There is one project of modernizing Postgres' storage engine called OrioleDB [2], but I think the company got acquihired by Supabase [3] and maybe the project has not been progressing very quickly since then.
[1] https://clickhouse.com/docs/engines/table-engines/mergetree-... [2] https://www.orioledb.com/ [3] https://supabase.com/blog/supabase-acquires-oriole
by giovannibonetti
5/1/2026 at 9:34:00 PM
Oriole is under very heavy development since we acquired them. We cut a release just 2 weeks ago:https://github.com/orioledb/orioledb/releases/tag/beta15
We expect it to be production ready this year
by kiwicopple
5/1/2026 at 8:57:32 AM
It wouldn't be atomic, and so would break transaction semantics.If you committed a row update but didn't update the index, a subsequent query using the not yet updated index would not find the updated row correctly.
It would also only work for certain types of indexes, you couldn't do it for uniqueness constraint for example.
I do agree that in theory you could have some extension to the index declaration that covers all that, but my worry there would be that it would be non obvious and a foot gun. Doing it the way described above makes that break in semantics clear.
by SCdF
5/1/2026 at 11:26:47 AM
> If you committed a row update but didn't update the index, a subsequent query using the not yet updated index would not find the updated row correctly.I wonder if you could make it so that queries read from both the index and the unindexed changes. It would be slightly slower but as long as the unindexed changes are kept small it might be fine.
by SkiFire13
5/2/2026 at 12:13:12 AM
My impression is that InnoDB (MySQL's primary storage engine) is doing something like this. We have never seen any slow-downs on adding to the data set I've discussed in this thread, even at hundreds of millions of rows, and per the nature of the system creating this data the majority of these rows are targeted for additional single-row DML within a few seconds of being inserted, with instantaneous effect.by daneel_w
5/1/2026 at 9:00:30 PM
> problem is table design and write amplification. Every row insert triggers update into every index, so you get classic amplification problem.Yes, this is understood. In particular for b-trees that require some refurnishing when growing. What's less understood is why Postgres hasn't solved this in a way similar to how InnoDB solves it behind the scenes.
by daneel_w
4/30/2026 at 8:56:03 PM
You've given us some idea of the volume of your data but there's no mention of what's ingesting it or how.> during these stress tests the hardware is nowhere close to over-encumbered, and there's consistent headroom on both memory, CPU and disk I/O
This assertion is likely wrong - you're likely skipping over some metrics that has clues to what we need to know. Here are some questions to get the discussion moving.
- Is this PostgreSQL managed or self-hosted?
Your mention of "consistent headroom on both memory, CPU and disk I/O" gives me hope you're self-hosting it but I've heard the same thing in the past from people attempting to use RDS and wondering the same as you are, so no assumptions.
- Are you using COPY or multi-row INSERT statements?
- How much RAM does that server have?
- What is the fillfactor, max_wal_size and checkpoint_timeout?
- Is the WAL on NVMe?
- What's the iostat or wa during the slowdown?
- Are random UUIDs (part of) the index?
Have you posted to https://dba.stackexchange.com/
If I were you, I would create a GitHub repo that has scripts that synthesize the data and reproduce the issues you're seeing.
by subhobroto
5/1/2026 at 10:19:15 PM
> This assertion is likely wrongWe can clearly see with iostat/iotop that the server and its storage isn't overworked.
> Is this PostgreSQL managed or self-hosted?
We're evaluating on our own bare metal. It's an 8c/16t Zen 3 with 32 GB of RAM. Storage (where everything incl. the WAL is) is NVMe and the drives' true sustainable/synchronous write speed is about ~750 MiB/second.
The specs are far, far higher than required for something this basic. Total memory usage when we begin observing the problem is a fraction of what's available. The MySQL production and test environments running this without hindrances actually have only half the amount of RAM.
> Are you using COPY or multi-row INSERT statements?
Multi-row INSERTs, one per transaction, with anything from 100 to 500 rows. Evaluation simulates the volumes we can see through our APIs on production, though it omits everything but Postgres itself in order to test Postgres' ingestion capability without other factors getting in the way - it's just Postgres and a light-weight data generator mimicking production data.
> What is the fillfactor, max_wal_size and checkpoint_timeout?
Not sure about the fill factor. Everything is running on default Postgres 17.9 as packaged for Debian 13, so that would be 1GB max_wal_size and checkpoint_timeout is commented out with a default of 5 minutes. Haven't gotten to any performance tuning yet, would be thankful for any suggestions to try out. The only thing we've tried is disabling auto-vacuuming to see if it was too frequent, causing i/o contention or otherwise hogging throughput. Not really any noticable change.
> Are random UUIDs (part of) the index?
No, neither v4 UUIDs nor b-tree-friendlier v7 UUIDs, but a couple of the indexed columns contain random integral numbers that can become quite sprawly inside a b-tree. We do observe somewhat better results indexing these with HASH indices instead, which also makes a lot more sense for that particular data and how we query it. For evaluation our outset was to stick with b-tree indices because that's what's used on the MySQL setup (InnoDB does not support on-disk hash indices).
> What's the iostat or wa during the slowdown?
When we reach the point where ingestion speed has shrunk to about a third we observe iowait peaks having grown to some 15%, which tells us the problem is likely Postgres spending more and more of its time shoveling in the indices rather than storing actual row data. Maximum written data at about 150 MB/second is just a fraction of what the NVMe drives can sustain. None of Postgres' individual processes ever top out anywhere close to 100% of a single core on the machine. Total memory usage is less than 2 GB, and here we suspect we have a lot of tuning to look into. To contrast, the MySQL setup is greedier with both CPU and memory usage, and an educated guess is that the major difference allowing it to keep ingesting hundreds of millions of rows without slowing down is that InnoDB, without us having to jump through any hoops, on one hand defers persisting new index data so that the DML can be finalized as quickly as possible, and on the other hand operates on its indices in a much more efficient way than Postgres.
Everything needed to reproduce the problem is in this paste, which contains a neutral version of the full table and indices, and a simplified version of the data generator: https://paste.debian.net/plainh/ddc819cb
by daneel_w
5/2/2026 at 12:31:01 AM
That's a phenomenal script that absolutely belongs on a dba.stackexchange question and even the PostgreSQL mailing list.I strongly encourage you to post to https://dba.stackexchange.com/, as a HN thread is the wrong place for this discussion (there's a lot of tuning ahead of you and others who are in your situation in the future might skip this nested thread) but be forewarned, the TLDR from dba.stackexchange will either be a quick "you need atleast 64 GB RAM for your PostgreSQL usecase" or there will a better, thorough discussion of increasing max_wal_size and lowering the fillfactor (which is what you're looking for). The ideal answer should even walk you through BRIN (vs btree) indexes. I'm asking you to post there because it will enumerate the WHY of all of these. At that point you can make an informed decision if this all would be worth it.
Now I empathize that MySQL is doing a phenomenal job at only a quarter of that (16 GB you said?) but not for the reasons you might think (and one can certainly argue, nor should an end user care!). MySQL's method of buffering (InnoDB change buffers) and its clustered index gives you the performance you like when suddenly doing bursty writes aka "write sprint". I need you to be aware of that (and dba.stackexchange responses will certainly address that).
I would have written a lengthy post on what to do next but I must first ask:
- Why are you evaluating PostgreSQL in this case when MySQL seems to work well? For example: does it feel like your aggregations are getting slower? As you can see, with PostgreSQL, you will have different set of tradeoffs (RAM, tuning, VACUUM)
- Are there real, limiting business constraints that force you to operate on less than 64 GB of RAM given your volume and throughput expectations (like FF limitations, or these are smaller machines on the edge, etc)
- If you can, as an experiment, while you write your dba.stackexchange question and for the PostgreSQL mailing list, you can tweak multiple parameters and tell me what you see:
- I'm concerned you have a `shared_buffers = 128MB`. Set it to `shared_buffers = 8GB` (give the B-trees room to live in RAM)
- Increase `max_wal_size = 16GB` (stop the checkpoint flooding and let Postgres "breathe" during your batch inserts).
- Increase `checkpoint_timeout = 30min` (set 30min to the actual window it takes; also, this is temporary but this should push checkpoints out so they don't interrupt your "write sprint").
- Set `maintenance_work_mem = 2GB` (should speed up index creation and vacuuming).
- Lower `fillfactor` on those specific 8 indexes from the default 100 down to 70. The B-trees should now have the ability to absorb those inserts better.
This should get you in the right direction, googling for the right documentation, but there's even more ahead of you, including a separate discussion about BRIN (vs btree) indexes.
> defers persisting new index data so that the DML can be finalized as quickly as possible
"defers persisting" might get misread as if it doesn't write to disk - it does but you're close and you will uncover more :)
MySQL defaults are specifically tuned for your "write sprint" usecase, infact, to actually mask the IO latency of secondary index updates but if your real usecase is not just large sudden bursts of writes to a table that has a btree index, you certainly will appreciate this effort. Happy weekend!
by subhobroto
5/3/2026 at 11:02:01 AM
Thanks a bunch for all of the config tuning advice. Can see huge improvements in both flattening the creeping slow-down and really pushing back the point where we eventually hit the wall. Just lowering the fill factor on the indices made a substantial difference even before permitting Postgres to use more RAM. Using BRIN indices for certain columns, as suggested by several others in the discussion, also resulted in some smaller but still valuable speed-ups. It's obvious by now that a careful combination of curated choices greatly matters for the full scope of this data. On the whole, in fact, it looks like all of the combined advice clear this hurdle up so much that we can no longer say we observe Postgres having an actual problem dealing with our current volumes and intensity of ingestion.We run quite a few of these setups, so cost absolutely plays a big part in the size and selection of hardware. We're evaluating Postgres because of a mergers and acquisitions event prompting us to consolidate portions of the platforms for the sake of savings and reducing the overall "sprawl". Conversely we're also testing how the other party would fare if some of their stuff were to run under MySQL instead of Postgres. There's a chance we end up leaving things just as they are.
by daneel_w
5/3/2026 at 12:32:33 PM
> On the whole, in fact, it looks like all of the combined advice clear this hurdle up so much that we can no longer say we observe Postgres having an actual problem dealing with our current volumes and intensity of ingestion.I am extremely happy for you!
This is fantastic news. I would really appreciate a followup post on DB SE and/or PostgreSQL ML because I assure you, there are others in your position. A deeply nested HN thread is not something that will be visible - you would be saving a lot of people heartburn.
Also remember, we just got started! The tips I gave you earlier is just a drop i the bucket. In the days of AWS RDS, knowledge about these are a "lost art" but running PostgreSQL yourself is extremely empowering. If you could make the posts, please do and I also urge you to test out https://github.com/pgsty/pigsty
by subhobroto
4/30/2026 at 8:48:03 PM
With some extra admin work, you can greatly increase your insert throughput, as long as the table load is comprised mostly of inserts: 1. Partition your table by range of a monotonic ID or timestamp. Notice the primary key will have to contain this column. A BIGINT id column should work fine; 2. Remove all the other indexes from the partitioned table. Add them to all the partitions, except the latest one. This way, the latest one can endure a tough write load, while the other ones work fine for reads; 3. Create an admin routine (perhaps with pg_cron) to create a new partition whenever the newest one is getting close to the limit. When the load moves to the newer partition, add indexes concurrently to the old one; 4. You'll notice the newest partition will the optimized for writes but not reads. You can offset some of that by replacing BTREE secondary indexes with BRIN [1], particularly the one with bloom operator (not to be confused with Postgres Bloom regular indexes [2]). BRIN is a family of indexes more optimized for writes than reads. If the partition is not too large, it shouldn't be too bad to read from it. 5. Later you can merge partitions to avoid having too many of them. Postgres has commands for that, but I think they lock the whole table, so a safer bet is to copy small partitions into a new larger one and swap them manually.[1] https://www.postgresql.org/docs/current/brin.html [2] https://www.postgresql.org/docs/current/bloom.html
by giovannibonetti
5/1/2026 at 2:25:16 AM
I like this strategy a lot, but the performance of read queries suffer if they span partitions, correct?The issue I'm facing is a very large table, that is both write and read heavy, and the reads do not fall into a specific range of values for any particular column, so I don't think partitioning is an option.
by timbowhite
5/1/2026 at 11:17:20 AM
Yes, partitioning will decrease a bit the read performance of queries not correlated with the partition key. That's why you need to periodically merge smaller partitions, so that you can keep the overall partition count bounded.It is a lot of admin work, but if you really need to scale up Postgres write throughput, I don't see many other options without increasing hardware costs.
I assume you have already picked the low-hanging fruit discussed in the neighboring comments - batch writes, make sure you are using COPY instead of INSERT, tune Postgres parameters adequately and use the fastest disk you can grab for the WAL.
by giovannibonetti
5/1/2026 at 11:38:52 AM
Partitioning is not all that expensive. It is definitely worth testing for your specific workload. We use TimescaleDB, which relies heavily on postgres partitions, have a bit under 100 million rows in our active set (last 90 days), across 120 partitions (device*time), and it works nicely. Over 100 partitions is probably a bit many for this workload, but since it works OK we have not changed it.by jononor
4/30/2026 at 8:59:19 PM
These are good suggestions but I'm apprehensive they might come back and say they have 64 GB (or less) of RAM or they are using PostgreSQL RDS on AWS or something.I asked them for specifics.
by subhobroto
4/30/2026 at 10:38:21 PM
I don't think it really matters in terms of their question though, given MySql on the same specs doesn't have the problem and postgres does. Quite clearly it has something to do with indexes and what is the wall postgres is running into that causes the drop off on quite low amounts of rows. If the answer is just get more RAM, it kind of implies postgres is not really that scalable. Especially if the drop off is proportional to the number of rows.by keithnz
5/1/2026 at 1:25:19 AM
Why are you using hash indexes? They're much less widely used than standard B-Tree indexes. The bucket split code likely isn't very scalable [1].I suggest testing the same workload with your existing hash indexes replaced with equivalent B-Trees.
[1] https://github.com/postgres/postgres/blob/master/src/backend...
by petergeoghegan
5/1/2026 at 4:48:17 AM
Last time I almost used a hash index in Postgres, I learned it was an incomplete feature and not crash-safe yet. This was v9.3? At that same time, MySQL had them and they were ok to use.Later that got fixed, but I haven't tried again since, just been using btree because it seemed like Postgres favored that and it has theoretical advantages too.
by zadikian
5/1/2026 at 9:02:17 PM
They are fully stable and perform very well in Postgres today. There are some caveats, but they don't result in any sort of hiccups or unpredictable behavior.by daneel_w
5/2/2026 at 7:49:48 PM
Yeah that's what I've read too. Just haven't gotten around to trying them.You've probably already read the Postgres docs on hash indexes, but just in case, it says "hash indexes may not be suitable for tables with rapidly increasing number of rows." I agree with the other commenter that it's worth at least trying without them if you haven't already, even though you're already VACUUMing.
by zadikian
5/1/2026 at 10:29:55 PM
> Why are you using hash indexes?On some data, for certain ways of using and querying that data, it makes a lot more sense than a b-tree. When we use hash indices for some of the columns in our test just to see how Postgres will perform, and run our test towards a quarter of a billion rows where Postgres' 32-bit hash indices have a relatively high chance of colliding, querying these hash indices and returning the single requested row is still instantaneous.
by daneel_w
5/1/2026 at 11:22:41 PM
Thanks for sharing this "tactic". Will look into it.by daneel_w
4/30/2026 at 9:45:42 PM
The problem is row locks when using interactive transactions over the network and contention. That can absolutely kill your performance with postgres, there's not really anything you can do to get around it (other than avoid interactive transactions). [1][1] - https://andersmurphy.com/2025/12/02/100000-tps-over-a-billio...
by andersmurphy
4/30/2026 at 10:49:25 PM
We had an interesting architecture situation at work. Puppet Enterprise uses a single Postgres server. The company had moved from a recommendation of using a single PuppetDB API node (which fell over at high load) to running a PuppetDB API server on each compiler node.That, however, came with its own set of problems. Of course you have to tune for concurrent connections as you scale wider, but there were much more serious contention issues than you'd expect, and the compilation times were terrible too. It turned out to be because those transactions locked the DB during their (synchronous) operations, and we had a globally distributed set of compilers in order to serve globally distributed traffic.
The solution ended up being to run a separate cluster of API servers in the same region as the DB. The expensive calls from the compilers to the API servers were largely async https so they didn't have to wait on the API nodes, and the API nodes could talk to the DB synchronously with low latency.
by rconti
5/1/2026 at 4:15:33 AM
If you find yourself doing a lot of explicit transactions, it can be a sign that your schema isn't as normalized as it should be.by frollogaston
5/1/2026 at 10:36:54 AM
How does better normalization reduce the need for explicit transactions?by CodesInChaos
5/2/2026 at 2:29:33 AM
The need to change two separate tables atomically means that you probably have info duplicated. Also it can be non-ideal in a web backend cause it means keeping DB connections open longer.by frollogaston
5/2/2026 at 4:41:08 AM
Sorry ignore the second sentence, I meant keeps the connection checked out of the pool longer, but also it has nothing to do with normalization.by frollogaston
5/1/2026 at 10:20:02 PM
In this case that is not the problem.by daneel_w
4/30/2026 at 10:29:18 PM
What's the underlying filesystem(s) you're using for the data storage?by justinclift
5/1/2026 at 9:06:14 PM
Ext4 with journaling. I've played around with XFS as well just to see what would happen. No major difference. I'm certain it's caused by the way Postgres builds its indices.by daneel_w
5/1/2026 at 10:16:02 PM
No worries. Just checking if it was ZFS.While I really like ZFS personally for 99% of things, for "need maximum performance with PostgreSQL and we're already pushing high end hardware to the maximum" ZFS is reportedly the wrong choice.
by justinclift
5/1/2026 at 11:04:59 PM
We've reached the same conclusion in other situations. Great for reliable and scalable storage, not so great for focused intense i/o.by daneel_w
5/1/2026 at 4:11:50 AM
So you open transaction, insert multiple rows, commit? And you're not using any special xact settings like SERIALIZABLE mode, right? Normally you use COPY FROM if it's a huge number of rows in some batch process, but not like inserting a few rows handling a backend request or something.I would try filling up the table, forcing a vacuum, then timing the inserts afterwards. Not that you should need to vacuum in real usage, but it might uncover a problem.
by frollogaston
5/1/2026 at 9:04:46 PM
> So you open transaction, insert multiple rows, commit?Pretty much, yes. High volume rapid multi-row INSERTs. Nothing special involved in the DDL or how the data goes into the table. Vacuuming the table (whether automatically or between every 10-20m rows) makes no difference. The indices still penalize the performance substantially the larger they grow.
by daneel_w
5/1/2026 at 5:35:37 PM
if the index updates are your bottleneck you can often get away with using a much lighter index (for example BRIN) at a cost of slightly slower queries. This is very often a great and much overlooked tradeoff.by abraxas
5/3/2026 at 11:11:20 AM
Using BRIN for certain columns resulted in an improvement. Thanks for the suggestion.by daneel_w
5/1/2026 at 1:55:59 AM
Just don't bother with Postgres.I've run mysql for years in production and have spent probably 30 minutes thinking about managing it. Unless there's some psql feature you need (unlikely), it'll just become a severe pain in your ass down the road because you set something up "wrong."
Just vacuuming can barf completely, leaving you dead in the water.
TL;DR: if you aren't a DBA and don't want to play one on TV don't bother with psql.
by mannyv
5/1/2026 at 4:16:04 AM
Dunno, I'm not a DBA but find Postgres to be fine.by frollogaston
5/1/2026 at 5:14:33 AM
Agreed. And, any time I ask for help with something DB-related, people just chastise me for not running postgres and blindly tell me that's my problem. That sort of community is not what I want to look forward to.by ranger_danger
5/2/2026 at 4:38:29 AM
For what it's worth, I would never say that, even though Postgres is my typical choice. MySQL is fine too.by frollogaston
5/1/2026 at 2:42:30 AM
Deadlocks?by financltravsty