2/2/2026 at 3:38:18 PM
Note there is no intrinsic reason running multiple streams should be faster than one [EDIT: "at this scale"]. It almost always indicates some bottleneck in the application or TCP tuning. (Though, very fast links can overwhelm slow hardware, and ISPs might do some traffic shaping too, but this doesn't apply to local links).SSH was never really meant to be a high performance data transfer tool, and it shows. For example, it has a hardcoded maximum receive buffer of 2MiB (separate from the TCP one), which drastically limits transfer speed over high BDP links (even a fast local link, like the 10gbps one the author has). The encryption can also be a bottleneck. hpn-ssh [1] aims to solve this issue but I'm not so sure about running an ssh fork on important systems.
by digiown
2/2/2026 at 7:30:48 PM
> has a hardcoded maximum receive buffer of 2MiBFor completeness, I want to add:
The 2MiB are per SSH "channel" -- the SSH protocol multiplexes multiple independent transmission channels over TCP [1], and each one has its own window size.
rsync and `cat | ssh | cat` only use a single channel, so if their counterparty is an OpenSSH sshd server, their throughput is limited by the 2MiB window limit.
rclone seems to be able to use multiple ssh channels over a single connection; I believe this is what the `--sftp-concurrency` setting controls.
Some more discussion about the 2MiB limit and links to work for upstreaming a removal of these limits can be found in my post [3].
Looking into it just now, I found that the SSH protocol itself already supports dynamically growing per-channel window sizes with `CHANNEL_WINDOW_ADJUST`, and OpenSSH seems to generally implement that. I don't fully grasp why it doesn't just use that to extend as needed.
I also found that there's an official `no-flow-control` extension with the description
> channel behaves as if all window sizes are infinite. > > This extension is intended for, but not limited to, use by file transfer applications that are only going to use one channel and for which the flow control provided by SSH is an impediment, rather than a feature.
So this looks exactly as designed for rsync. But no software implements this extension!
I wrote those things down in [4].
It is frustrating to me that we're only a ~200 line patch away from "unlimited" instead of shitty SSH transfer speeds -- for >20 years!
[1]: https://datatracker.ietf.org/doc/html/rfc4254#section-5
[2]: https://rclone.org/sftp/#sftp-concurrency
[3]: https://news.ycombinator.com/item?id=40856136
[4]: https://github.com/djmdjm/openssh-portable-wip/pull/4#issuec...
by nh2
2/2/2026 at 11:07:16 PM
> TCP tuningI think a lot of file transfer issues that occur outside of the corporate intranet world involve hardware that you don't fully control on (at least) one hand. In science, for example, transferring huge amounts of data over long distances is pretty common, and I've had to do this on boxes that had poor TCP buffer configurations. Being able to multiplex your streams in situations like this is invaluable and I'd love to see more open source software that does this effectively, especially if it can punch through a firewall.
by bscphil
2/2/2026 at 4:00:41 PM
In general TCP just isn't great for high performance. In the film industry we used to use a commercial product Aspera (now owned by IBM) which emulated ftp or scp but used UDP with forward error correction (instead of TCP retransmission). You could configure it to use a specific amount of bandwidth and it would just push everything else off the network to achieve it.by mprovost
2/2/2026 at 6:24:48 PM
What does "high performance" mean here?I get 40 Gbit/s over a single localhost TCP stream on my 10 years old laptop with iperf3.
So the TCP does not seem to be a bottleneck if 40 Gbit/s is "high" enough, which it probably is currently for most people.
I have also seen plenty situations in which TCP is faster than UDP in datacenters.
For example, on Hetzner Cloud VMs, iperf3 gets me 7 Gbit/s over TCP but only 1.5 Gbit/s over UDP. On Hetzner dedicated servers with 10 Gbit links, I get 10 Gbit/s over TCP but only 4.5 Gbit/s over UDP. But this could also be due to my use of iperf3 or its implementation.
I also suspect that TCP being a protocol whose state is inspectable by the network equipment between endpoints allows implementing higher performance, but I have not validated if that is done.
by nh2
2/2/2026 at 10:00:59 PM
Aspera was/is designed for high latency links. Ie sending multi terabytes from london to new Zealand, or LAFor that use case, Aspera was the best tool for the job. It's designed to be fast over links that single TCP streams couldn't
You could, if you were so bold, stack up multiple TCP links and send data down those. You got the same speed, but possible not the same efficiency. It was a fucktonne cheaper to do though.
by KaiserPro
2/2/2026 at 10:40:22 PM
> I get 40 Gbit/s over a single localhost TCP stream on my 10 years old laptop with iperf3.Do you mean literally just streaming data from one process to another on the same machine, without that data ever actually transiting a real network link? There's so many caveats to that test that it's basically worthless for evaluating what could happen on a real network.
by wtallis
2/3/2026 at 1:26:54 AM
Yes. Why?To measure other overhead of what's claimed (TCP the protocol being slow), one should exclude other things that necessarily affect alternative protocols as well (e.g. latency) as much as possible, which is what this does.
by nh2
2/3/2026 at 7:13:35 AM
It sounds like you're reasoning starting from an assumption that any claimed slowness of TCP would be something like a fixed per-packet overhead or delay that could be isolated and added back in to the result of your local testing to get a useful prediction. And it sounds like you think alternative protocols must be equally affected by latency.But it's much more complicated than that; TCP interacts with latency and congestion and packet loss as both cause and effect. If you're testing TCP without sending traffic over real networks that have their own buffering and congestion control and packet reordering and loss, you're going to miss all of the most important dynamics affecting real-world performance. For example, you're not going to measure how multiplexing multiple data streams onto one TCP connection allows head of line blocking to drastically inflate the impact of a lost or reordered packet, because none of that happens when all you're testing is the speed at which your kernel can context-switch packets between local processes.
And all of that is without even beginning to touch on what happens to wireless networks.
by wtallis
2/3/2026 at 2:46:46 PM
Somebody made a claim that TCP isn't high performance without specifying what that means, I gave a counterexample of just how high performance TCP is picking some arbitrary notion of "high performance".Almost like it makes the point that arguing about "high performance" is useless without saying what that means.
That said:
> you're not going to measure how multiplexing multiple data streams onto one TCP connection
Of course not: When I want to argue against "TCP is not a high performance protocol", why would I want to measure some other protocol that multiplexes connections over TCP? That is not measuring the performance of TCP.
I could conjure any protocol that requires acknowledgement from the other side for each emitted packet before sending the next, and then claim "UDP is not high performance" when running that over UDP - that doesn't make sense.
by nh2
2/3/2026 at 10:04:00 AM
UDP by itself cannot be used to transfer files or any other kind of data with a size bigger than an IP packet.So it is impossible to compare the performance of TCP and UDP.
UDP is used to implement various other protocols, whose performance can be compared with TCP. Any protocol implemented over UDP must have a performance better than TCP, at least in some specific scenarios, otherwise there would be no reason for its existence.
I do not know how UDP is used by iperf3, but perhaps it uses some protocol akin to TFTP, i.e. it sends a new UDP packet when the other side acknowledges the previous UDP packet. In that case the speed of iperf3 over UDP will always be inferior to that of TCP.
Sending UDP packets without acknowledgment will always be faster than for any usable transfer protocol, but the speed for this case does not provide any information about the network, but only about the speed of executing a loop in the sending computer and network-interface card.
You can transfer data without using any transfer protocol, by just sending UDP packets at maximum rate, if you accept that a fraction of the data will be lost. The fraction that is lost can be minimized, but not eliminated, by using an error-correcting code.
by adrian_b
2/3/2026 at 1:58:09 PM
> perhaps it [..] sends a new UDP packet when the other side acknowledges the previous UDP packet. In that case the speed of iperf3 over UDP will always be inferior to that of TCPIt does not, otherwise it would be impossible by a factor ~100x to measure 4.5 Gbit/s as per the bandwidth-delay calculation (the ping is around the usual 0.2 ms).
With iperf3, as with many other UDP measurement tools, you set a sending rate and the other side reports how many bytes arrived.
by nh2
2/3/2026 at 4:25:58 PM
You are right.It is a long time since I have last used iperf3, but now that you have mentioned it I have also remembered this.
So the previous poster has misinterpreted the iperf3 results, by believing that UDP was slower, as iperf3 cannot demonstrate a speed difference between TCP and UDP, since for the former the speed is determined by the network, while for the latter the speed is determined by the "--bandwidth" iperf3 command-line option, so the poster has probably just seen some default UDP speed.
by adrian_b
2/2/2026 at 10:37:56 PM
High performance means transferring files from NZ to a director's yacht in the Mediterranean with a 40Mbps satellite link and getting 40Mbps, to the point that the link is unusable for anyone else.by mprovost
2/2/2026 at 4:09:07 PM
There's an open source implementation that does something similar but for a more specific use case: https://github.com/apernet/tcp-brutalThere's gotta be a less antisocial way though. I'd say using BBR and increasing the buffer sizes to 64 MiB does the trick in most cases.
by digiown
2/2/2026 at 5:48:47 PM
Have you tried searching for "tcp-kind"?by tclancy
2/2/2026 at 10:16:12 PM
Looks unmaintained.Can we throw a bunch of AI agents at it? This sounds like a pretty tightly defined problem, much better than wasting tokens on re-inventing web browsers.
by Onavo
2/2/2026 at 4:17:05 PM
Was the torrent protocol considered at some point? Always surprised how little presence has in the industry considering how good the technology is.by pezgrande
2/2/2026 at 5:17:27 PM
If you strip out the swarm logic (ie. downloading from multiple peers), you're just left with a protocol that transfers big files via chunks, so there's no reason that'd be faster than any other sort of download manager that supports multi-thread downloads.by gruez
2/2/2026 at 10:04:56 PM
Aspera did the chunking and encryption for you, and it looked and acted like SFTP.The cost of leaking data was/is catastrophic (as in company ending) So paying a bit of money to guarantee that your data was being sent to the right place (point to point) and couldn't leak was a worthwhile tradeoff.
For Point to point transfer torrenting is a lot higher overhead than you want. plus most clients have an anti-leaching setting, so you'd need not only a custom client, but a custom protocol as well.
The idea is sound though, have an index file with and then a list of chunks to pull over multiple TCP connections.
by KaiserPro
2/2/2026 at 4:45:44 PM
torrent is great for many-to-one type downloads but I assume GP is talking about single machine to single machine transfers.by ambicapter
2/2/2026 at 10:29:37 PM
So what do you use now in film industry?by robaato
2/2/2026 at 11:01:43 PM
I'm in a tiny part of the film industry. Bigger clients lend us licenses to Aspera and FileCatalyst when receiving files from them, but for our own trans-oceanic transfers I dug up an ancient program called Tsunami UDP and fixed it up just enough.by magarnicle
2/2/2026 at 10:38:45 PM
I suspect mostly Aspera because there are still no good alternatives.by mprovost
2/2/2026 at 5:49:01 PM
Aspera's FASP [0] is very neat. One drawback to it is that the TCP stuff not being done the traditional way must be done on CPU. Say if one packet is missing or if packets are sent out of order, the Aspera client fixes those instead of all that being done as TCP.As I understand it, this is also the approach of WEKA.io [1]. Another approach is RDMA [2] used by storage systems like Vast which pushes those order and resend tasks to NICs that support RDMA so that applications can read and write directly to the network instead of to system buffers.
0. https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol
1. https://docs.weka.io/weka-system-overview/weka-client-and-mo...
2. https://en.wikipedia.org/wiki/Remote_direct_memory_access
by adolph
2/2/2026 at 10:49:16 PM
FASP uses forward error correction instead of retransmission. So instead of waiting for something not to show up on the other end and sending it again, it calculates parity and transmits slightly more data up front, with enough redundancy that the receiving end is capable of reconstructing any missing bits. This is basically how all storage systems work, not just Weka. You calculate enough parity bits to be able to reconstruct the missing data when a drive fails. The more disks you have, the smaller the parity overhead is. Object storage like S3 does this on a massive scale. With a network transfer you typically only need a few percent, unless it's really lossy like Wifi, in which case standards like 802.11n are doing FEC for you to reduce retransmissions at the TCP layer.by mprovost
2/3/2026 at 4:23:24 PM
In RDMA are the NICs able to perform the reconstruction or does that use a different mechanism to avoid CPU?by adolph
2/2/2026 at 4:30:08 PM
> Note there is no intrinsic reason running multiple streams should be faster than one.The issue is the serialization of operations. There is overhead for each operation which translates into dead time between transfers.
However there are issues that can cause singular streams to underperform multiple streams in the real world once you reach a certain scale or face problems like packet loss.
by Aurornis
2/2/2026 at 5:57:09 PM
Is it certain that this is the reason?rsync's man page says "pipelining of file transfers to minimize latency costs" and https://rsync.samba.org/how-rsync-works.html says "Rsync is heavily pipelined".
If pipelining is really in rsync, there should be no "dead time between transfers".
by nh2
2/2/2026 at 7:20:58 PM
The simple model for scp and rsync (it's likely more complex in rsync): for loop over all files. for each file, determine its metadata with fstat, then fopen and copy bytes in chunks until done. Proceed to next iteration.I don't know what rsync does on top of that (pipelining could mean many different things), but my empirical experience is that copying 1 1 TB file is far faster than copying 1 billion 1k files (both sum to ~1 TB), and that load balancing/partitioning/parallelizing the tool when copying large numbers of small files leads to significant speedups, likely because the per-file overhead is hidden by the parallelism (in addition to dealing with individual copies stalling due to TCP or whatever else).
I guess the question is whether rsync is using multiple threads or otherwise accessing the filesystem in parallel, which I do not think it does, while tools like rclone, kopia, and aws sync all take advantage of parallelism (multiple ongoing file lookups and copies).
by dekhn
2/3/2026 at 1:40:15 AM
> I don't know what rsync does on top of that (pipelining could mean many different things), but my empirical experience is that copying 1 1 TB file is far faster than copying 1 billion 1k files (both sum to ~1 TB), and that load balancing/partitioning/parallelizing the tool when copying large numbers of small files leads to significant speedups, likely because the per-file overhead is hidden by the parallelism (in addition to dealing with individual copies stalling due to TCP or whatever else).That's because of fast paths:
- For a large file, assuming the disk isn't fragmented to hell and beyond, there isn't much to do for rsync / the kernel: the source reads data and copies it to the network socket, the receiver copies data from the incoming network socket to the disk, the kernel just dumps it in sequence directly to the disk, that's it.
- The slightly less performant path is on a fragmented disk. Source and network still doesn't have much to do, but the kernel has a bit more work every now and then to find a contiguous block on the disk to write the data to. For spinning rust HDDs, the disk also has to do some seeking.
- Many small files? Now that's more nasty. First, the source side has to do a lot of stat(2) calls to get basic attributes of the file. For HDDs, that seeking can incur a sometimes significant latency penalty as well. Then, this information needs to be transferred to the destination, the destination has to do the same stat call again, and then the source needs to transfer the data, involving more seeking, and the destination has to write it.
- The utter worst case is when the files are plenty and small, but large enough to not fit into an inode as inline data [1]. That means two writes and thus seeks per small file. Utterly disastrous for performance.
And that's before stepping into stuff such as systems disabling write caches, soft-RAID (or the impact of RAID in general), journaling filesystems, filesystems with additional metadata...
[1] https://archive.kernel.org/oldwiki/ext4.wiki.kernel.org/inde...
by mschuster91
2/2/2026 at 7:38:07 PM
> I guess the question is whether rsync is using multiple threads or otherwise accessing the filesystem in parallelNo, that is not the question. Even Wikipedia explains that rsync is single-threaded. And even if it was multithreaded "or otherwise" used concurent file IO:
The question is whether rsync _transmission_ is pipelined or not, meaning: Does it wait for 1 file to be transferred and acknowledged before sending the data of the next?
Somebody has to go check that.
If yes: Then parallel filesystem access won't matter, because a network roundtrip has brutally higher latency than reading data sequentially of an SSD.
by nh2
2/2/2026 at 7:58:13 PM
Note that rsync on many small files is slow even within the same machine (across two physical devices), suggesting that the network roundtrip latency is not the major contributor.by dekhn
2/3/2026 at 1:23:25 AM
The original post only mentions 3564 files and rsync spending 8 minutes on that. This just doesn't check out.by nh2
2/3/2026 at 12:41:55 AM
The filesystem access and general threading is the question because transmission is pipelined and not a thing "somebody has to go check". You just quoted the documentation for it.The dead time isn't waiting for network trips between files, it's parts of the program that sometimes can't keep up with the network.
by Dylan16807
2/3/2026 at 1:20:25 AM
I quoted the documentation that claims _something_ is pipelined.That is extremely vague on what that is and I also didn't check that it's true.
Both the original claim "the issue is the serialization of operations" and the counter-claim all sound like extreme guesswork or me. If you know for certain, please link the relevant code.
Otherwise somebody needs to go check what it actually does; everything else is just speculating "oh surely it's the files" and then people remember stuff that might just be plain wrong.
by nh2
2/3/2026 at 1:39:39 AM
Speculation isn't the most useful thing, but saying "that is not the question" to valid speculation is even less useful.by Dylan16807
2/2/2026 at 7:08:02 PM
I’m not sure why, but just like with scp, I’ve achieved significant speeds ups by tarring the directory first (optionally compressing it), transferring and then decompressing. Maybe because it makes the tar and submit, and the receive, untar/uncompress, happen on different threads?by spockz
2/2/2026 at 11:20:33 PM
One of my "goto" tools is copying files over a "tar pipe". This avoids the temporary tar file. Something like: tar cf - *.txt | ssh user@host tar xf - -C /some/dir/
by poke646
2/3/2026 at 4:20:55 AM
I've never verified this, but it feels like scp starts a new TCP connection per file. If that's the case, then scp-ing a tarred directory would be faster because you only hit the slow start once. https://www.rfc-editor.org/rfc/rfc5681#section-3.1by dnmc
2/2/2026 at 7:23:00 PM
It's typically a disk-latency thing, as just stat-ing the many files in a directory can have significant latency implications (especially on spinning HDDs) vs opening a single file (the tar) and read-()ing that one file in memory before writing to the network.If copying a folder with many files is slower than tarring that folder and the moving the tar (but not counting the untar) then disk latency is your bottleneck.
by lelandbatey
2/2/2026 at 9:07:15 PM
Not useful very often, but fast and kind of cool: You can also just netcat the whole block device if you wanted a full filesystem copy anyway. Optionally zero all empty space before using a tool like zerofree and use on-the-fly compression / decompression with lz4 or lzo. Of course, none of the block devices should be mounted, though you could probably get away with a source that's mounted read-only.dd is not a magic tool that can deal with block devices while others can't. You can just cp myLinuxInstallDisk.iso to /dev/myUsbDrive, too.
by ahartmetz
2/2/2026 at 10:08:04 PM
Okay. In this case the whole operation is faster end to end. That includes the time it takes to tar and untar. Maybe those programs do something more efficient in disk access than scp and rsync?by spockz
2/3/2026 at 8:25:16 AM
Also handy to note that tar can handle sparse files, whereas scp doesn't.by ndsipa_pomu
2/2/2026 at 5:51:21 PM
The ideal solution to that is pipelining but it can be complex to implement.by wmf
2/3/2026 at 4:11:36 PM
> SSH was never really meant to be a high performance data transfer tool, and it shows.A practical example can be `ssh -X` vs X11 over Wireguard. The lag is obvious with the former, but X11 windows from remote clients can be indistinguishable performance-wise from those of local ones with the latter.
by jolmg
2/2/2026 at 4:05:43 PM
The author tried running rsyncd demon so it's not _just_ the ssh protocol.by yegle
2/3/2026 at 1:15:03 PM
Great point, and even beyond that I think (based on the paths) it was just a command line invocation, with something like NFS handling all the networking.by ruperthair
2/2/2026 at 4:23:50 PM
Uhh.. I work with this stuff daily and there are a LOT of intrinsic reasons a single stream would be slower than running multiple: MPLS ECMP hashing you over a single path, a single loss event with a high BDP causing congestion control to kick in for a single flow, CPU IRQ affinity, probably many more I’m not thinking like the inner workings of NIC offloading queues.Source: Been in big tech for roughly ten years now trying to get servers to move packets faster
by oceanplexian
2/2/2026 at 4:39:33 PM
Ha, it sounds like the best way to learn something is to make a confident and incorrect claim :)> MPLS ECMP hashing you over a single path
This is kinda like the traffic shaping I was talking about though, but fair enough. It's not an inherent limitation of a single stream, just a consequence of how your network is designed.
> a single loss event with a high BDP
I thought BBR mitigates this. Even if it doesn't, I'd still count that as a TCP stack issue.
At a large enough scale I'd say you are correct that multiple streams is inherently easier to optimize throughput for. But probably not a single 1-10gb link though.
by digiown
2/2/2026 at 7:59:08 PM
> This is kinda like the traffic shaping I was talking about though, but fair enough. It's not an inherent limitation of a single stream, just a consequence of how your network is designed.It is. one stream gets you traffic of one path to the infrastructure. Multiple streams get you multiple and possibly also hit different servers to accelerate it even more. Just the limitation isn't hardware but "our networking device have 4 10Gbit ports instead of single 40Gbit port"
Especially if link is saturated, you'd be essentially taking n-times your "fair share" of bandwidth on link.
by PunchyHamster
2/2/2026 at 5:39:05 PM
> It almost always indicates some bottleneck in the application or TCP tuning.Yeah, this has been my experience with low-overhead streams as well.
Interestingly, I see a ubiquity of this "open more streams to send more data" pattern all over the place for file transfer tooling.
Recent ones that come to mind have been BackBlaze's CLI (B2) and taking a peek at Amazon's SDK for S3 uploads with Wireshark. (What do they know that we don't seem to think we know?)
It seems like they're all doing this? Which is maybe odd, because when I analyse what Plex or Netflix is doing, it's not the same? They do what you're suggesting, tune the application + TCP/UDP stack. Though that could be due to their 1-to-1 streaming use case.
There is overhead somewhere and they're trying to get past it via semi-brute-force methods (in my opinion).
I wonder if there is a serialization or loss handling problem that we could be glossing over here?
by softfalcon
2/2/2026 at 10:07:23 PM
Memory and CPU are cheap (up to a point) so why not just copy/paste TCP streams. It neatly fits into multi-processing/threading as well.When we were doing 100TB backups of storage servers we had a wrapper that run multiple rsyncs over the file system, that got throughput up to about 20gigbits a second over lan
by KaiserPro
2/2/2026 at 7:56:37 PM
that is a different problem. For S3-esque transfers you might very well be limited by ability for target to receive X MB/s and not more and so starting parallel streams will make it faster.I used B2 as third leg for our backups and pretty much had to give rclone more connections at once because defaults were nowhere close to saturating bandwidth
by PunchyHamster
2/2/2026 at 6:35:13 PM
Tuning on Linux requires root and is systemwide. I don't think BBR is even available on other systems. And you need to tune the buffer sizes of both ends too. Using multiple streams is just less of a hassle for client users. It can also fool some traffic shaping tools. Internal use is a different story.by digiown
2/2/2026 at 6:24:19 PM
not sure about B2 but AWS S3 SDK not assuming that people will do any tuning makes total sensecuz in my experience no one is doing that tbh
by akdev1l
2/2/2026 at 9:19:57 PM
I’ve found aws s3 it’s always been painful to get any good speed out of it unless it’s massive files you’re moving.It’s base line tuning seems to just assume large files and does no auto scaling and it’s mostly single threaded.
Then even when tuning it’s still painfully slow, again seemly limited by its cpu processing and mostly on a single thread, highly annoying.
Especially when you’re running it on a high core, fast storage, large internet connection machine.
Just feels like there is a large amount of untapped potential in the machines…
by slightlygrilled
2/2/2026 at 9:56:59 PM
It’s almost certainly also tuned to prevent excessive or “spiky” traffic to their service.by odo1242
2/2/2026 at 5:50:40 PM
Note there is no intrinsic reason running multiple streams should be faster than oneIf the server side scales (as cloud services do) it might end up using different end points for the parallel connections and saturate the bandwidth better. One server instance might be serving other clients as well and can't fill one particular client's pipe entirely.
by yason
2/2/2026 at 4:17:24 PM
Wouldn't lots of streams speed up transfers of thousands of small files?by Saris
2/2/2026 at 4:44:35 PM
If the application handles them serially, then yeah. But one can imagine the application opening files in threads, buffering them, and then finally sending it at full speed, so in that sense it is an application issue. If you truly have millions of small files, you're more likely to be bottlenecked by disk IO performance rather than application or network, though. My primary use case for ssh streams is zfs send, which is mostly bottlenecked by ssh itself.by digiown
2/2/2026 at 5:29:42 PM
It's an application issue but implementation wise it's probably way more straightforward to just open a separate network connection per thread.by catdog
2/2/2026 at 4:02:18 PM
Single file overheads (opening millions of tiny files whose metadata is not in the OS cache and reading them) appears to be an intrinsic reason (intrinsic to the OS, at least).by dekhn
2/2/2026 at 4:21:42 PM
IOPs and disk read depth are common limits.Depending on what you're doing it can be faster to leave your files in a solid archive that is less likely to be fragmented and get contiguous reads.
by pixl97
2/2/2026 at 8:00:18 PM
the majority of that will be big files. And to NVMe it is VERY fast even if you run single threaded 10Gbit should be easyby PunchyHamster
2/2/2026 at 6:53:50 PM
I mean isn't a single TCP connections throughput limited by the latency? Which is why in high(er) latency WAN links you generally want to open multiple connections for large file transfers.by patmorgan23
2/3/2026 at 10:29:41 AM
Only simpler transfer protocols, like TFTP, are limited by latency.The whole reason for the existence of TCP is to overcome the throughput limit determined by latency. On a network with negligible latency there is no need for TCP (you could just send each packet only after the previous is acknowledged, but the higher is the throughput of your network interface, the less likely is that the latency can be negligible).
However, for latency to not matter, the TCP windows must be large enough (i.e. the amount of data that is sent before an acknowledge is received, which happens after a delay caused by latency).
I use Windows very rarely today, so I do not know its current status, but until the Windows XP days it was very frequent for Windows computers to have very small default TCP window sizes, which caused low throughput on high-latency networks, so on such networks they had to be reconfigured.
On high-latency networks, opening multiple connections is just a workaround for not having appropriate network settings. However, even when your own computer is configured optimally, opening multiple connections can be a workaround against various kinds of throttling implemented either by some intermediate ISP or by the destination server, though nowadays most rate limits are applied globally, to all connections from the same IP address, in order to make this workaround ineffective.
by adrian_b
2/2/2026 at 9:51:12 PM
> Note there is no intrinsic reason running multiple streams should be faster than oneInherent reasons or no, it's been my experience across multiple protocols, applications, network connections and environments, and machines on both ends, that, _in fact_, splitting data up and operating using multiple streams is significantly faster.
So, ok, it might not be because of an "inherent reason", but we still have to deal with it in real life.
by einpoklum