Copy-Item is 27% slower than File Explorer(til.andrew-quinn.me)
47 points byhiAndrewQuinn2 hours ago |15 comments
chihuahua40 minutes ago
To properly appreciate a post like this one, it should ideally be paired with a Raymond Chen post that argues in Hercule Poirot style irrefutable logic how a combination of backwards-compatibility with CP/M and some 1990s programming done by raccoons means that this messed up state of affairs is logically the only way it could possibly be.
pixl9722 minutes ago
Be nice if you listed any particular settings on the commands....

For robocopy, for example, if you're copying small files/bunch of directories use the /MT:$number flag. It's so much massively faster it's not like the same application.

Also is this a newer version of Windows that supports smb3, Explorer is likely using that to copy more in parallel.

r1ch1 hour ago
OP mentions using "Cat 7" cables - please don't buy these. Cat 7 isn't something that exists in TIA/EIA standards, only in ISO/IEC and it requires GG45 or TERA connectors. Cat 7 with RJ45 connectors isn't standardized, so you have no idea what you're actually getting. Stick with pure copper Cat 6A.
0xC0ncord19 minutes ago
For what it's worth, I recently bought a spool of CAT7 cable and a bunch of RJ45 connectors and made my own cables that perform well and reliably. I don't know if this was wise in the end but I was able to get what I needed out of it.
someguyiguess38 minutes ago
What about Cat 8? I know it’s not really used in consumer grade applications but is it in TIA/EIA standards?
r1ch17 minutes ago
Yes, that's standardized but is only rated for up to 30 meters at the higher speeds you get from it, so it's not very useful outside of server room / data center applications and you probably want to be using fiber at that point.
kichik1 hour ago
Invoke-WebRequest is also very slow if you forget to disable the progress bar with $ProgressPreference = 'SilentlyContinue'

PowerShell has some "interesting" design choices...

Almondsetat0 minutes ago
Came here to post this, and it's even more egregious when you realize curl is an alias for Invoke-WebRequest
Lariscus1 hour ago
It also buffers the downloaded data completely into memory last time I checked. So downloading a file bigger than the available RAM just doesn't work and you have to use WebClient instead.

Another fun one is Extract-Archive which is painfully slow while using the System.IO.Compression.ZipFile CLR type directly is reasonably fast. Powershell is really a head scratcher sometimes.

jeroenhd34 minutes ago
The download being cached in RAM kind of makes sense, curl will do the same (up to a point) if the output stream is slower than the download itself. For a scripting language, I think it makes sense. Microsoft deciding to alias wget to Invoke-WebRequest does make for a rather annoying side effect, but perhaps it was to be expected as all of their aliases for GNU tools are poor replacements.

I tried to look into the whole Expand-Archive thing, but as of https://github.com/PowerShell/Microsoft.PowerShell.Archive/c... I can't even find the Expand-Archive cmdlet source code anymore. The archive files themselves seem to have "expand" be unimplemented. Unless they moved the expand command to another repo for some reason, it looks like the entire command will disappear at one point?

Still, it does look like Expand-Archive was using the plain old System.IO.Compression library for its file I/O, though, although there is a bit of pre-processing to validate paths existing and such, that may take a while.

mort9619 minutes ago
> curl will do the same (up to a point) if the output stream is slower than the download itself

That "up to a point" is crucial. Storing chunks in memory up to some max size as you wait for them to be written to disk makes complete sense. Buffering the entire download in memory before writing to disk at the end doesn't make sense at all.

DHowett10 minutes ago
tar.exe, however, beats both of those in terms of archive format support and speed.
AHTERIX500035 minutes ago
Yep. And 'wget' is often alias for WebRequest in PowerShell. The amount of footguns I ran into while trying to get a simple Windows Container CI job running, oh man
ycombinatrix29 minutes ago
"curl" being aliased to "Invoke-WebRequest" is also a massive dick move
pixl9721 minutes ago
yea, curl.exe and curl are two different commands on windows. Fun stuff.
coffeeaddict14 minutes ago
I think that's only an issue with Windows Powershell. Powershell 7 works just fine.
orthoxerox1 hour ago
Wasn't something like npm much slower as well when it showed a progress indicator by default?
archi421 hour ago
This is atrocious. I get it, some things are less trivial than they seem - but I would be ashamed for shipping something like this, and even more for not fixing it.
cheema331 hour ago
I am not surprised. My Windows 11 systems with modern and beefy hardware frequently runs very slow for reasons unknown. I did use https://github.com/Raphire/Win11Debloat recently and that seemed to have helped. Windows by default comes with a lot of crap that most of us do not use but it consumes resources anyway.

I have been considering a move back to Linux. It is only Microsoft Teams on Windows that I have to use daily that is holding me back.

mft_1 hour ago
> I have been considering a move back to Linux. It is only Microsoft Teams on Windows that I have to use daily that is holding me back.

Me too. I've not tried this yet, but will soon: https://github.com/IsmaelMartinez/teams-for-linux

cm218728 minutes ago
One thing I don't understand with Windows Server is that it seems no matter how fast the nvme drives I use, or I pair/pool, I can't get a normal file copy to go faster than around 1.5GB/s (that's local, no network). The underlying disks show multi GB/s performance under crystal disk mark. But I suspect something in the OS must get in the way.
g-mork19 minutes ago
If it's over SMB/Windows file sharing then you might be looking at some kind of latency-induced limit. AFAIK SMB doesn't stream uploads, they occur as a sequence of individual write operations, which I'm going to guess also produce an acknowledgement from the other end. It's possible something like this (say, client waiting for an ACK before issuing a new pending IO) is responsible

What does iperf say about your client/server combination? If it's capping out at the same level then networking, else something somewhere else in the stack.

I noticed recently that OS X file IO performance is absolute garbage because of all the extra protection functionality they've been piling into newer versions. No idea how any of it works, all I know is some background process burns CPU just from simple operations like recursively listing directories

cm218710 minutes ago
The problem I describe is local (U.2 to U.2 SSD on the same machine, drives that could easily performs at 4GB/s read/write, and even when I pool them in RAID0 in arrays that can do 10GB/s).

Windows has weird behaviors for copying. Like if I pool some SAS or NVMe SSD in storage space parity (~RAID5) the performance in CrystalDiskMark is abyssal (~250MB/s) but a windows copy will be stable at about 1GB/s over terabytes of data.

So it seems that whatever they do hurts in certain cases and severely limits the upside as well.

DustinEchoes2 hours ago
Never assume anything done in Powershell is fast.
sgc1 hour ago
It's fortunately been years since I have used Windows, but it looks like the old staples are still ahead of the curve:

https://fastcopy.jp/

https://www.codesector.com/teracopy

(I have certainly forgotten at least one...)

abbeyj1 hour ago
The page is 404 now. It looks like something went wrong when the author was trying to push a small edit to the page. The content is viewable at https://github.com/hiAndrewQuinn/til/blob/main/copy-item-is-...
doormatt1 hour ago
Works fine for me.
jeroenhd53 minutes ago
Looking at the source code or copy-item (assuming the author is using a recent version of PowerShell) at https://github.com/PowerShell/PowerShell/blob/master/src/Mic... which calls https://github.com/PowerShell/PowerShell/blob/master/src/Sys..., there seems to be quite a bit of (non-OS) logic that takes place before copying across the network. Copying many small files probably triggers some overhead there.

Then, when the copying happens, this seems to be the code that actually copies the file, at least when copying from remote to local, using the default file system provider: https://github.com/PowerShell/PowerShell/blob/master/src/Sys...

Unless I've taken a wrong turn following the many abstraction layers, this file copy seems to involve connecting to a remote server and exchanging the file contents over a base64 stream (?) using nothing but a standard OutputStream to write the contents.

This means that whatever performance improvements Microsoft may have stuffed into their native network filesystem copy operations doesn't seem to get triggered. The impact will probably differ depending on if you're copying Windows-to-Windows or SAMBA-to-Windows or Windows-to-SAMBA.

I'm no PowerShell guru, but if you can write a (C#?) cmdlet to invoke https://learn.microsoft.com/en-us/windows/win32/api/shellapi... with a properly prepared https://learn.microsoft.com/en-us/windows/win32/api/shellapi... rather than use the native Copy-Item, I expect you'd get the exact same performance you'd get on Windows Explorer.

However, the other measurements do show some rather weird slowdowns for basic filesystem operations over SFTP or WSL2. I think there's more at play there, as I've never seen sftp not reach at least a gigabit given enough time for the window sizes to grow. I think the NAS itself may not be very powerful/powerful enough to support many operations per second, limiting the output for other copy tools.

As an alternative, Windows contains an NFS client that can be tuned to be quite fast, which should have minimal performance overhead on Linux if kernel-NFS is available.

pixl9717 minutes ago
>PowerShell guru, but if you can write a (C#?) cmdlet

Yea, I have a workload that has to delete millions of directories/small files on occasion and we wrote a cmdlet to spawn a huge amount of threads to perform the delete to keep the IOPS saturated and it performs much better than explorer or other deletion methods.

bakugo1 hour ago
Just tried copying a 20GB file to my Windows desktop from a mounted Samba share through gigabit ethernet (nvme on both sides). Explorer, Copy-Item and robocopy all saturated the connection with no issues.

There's definitely something off about OP's setup, though I have no idea what it could be. I'd start by checking the latency between the machines. Might also be the network adapter or its drivers.

ninkendo1 hour ago
My first thought would be some kind of "security" software (maybe even as simple as windows defender) inspecting the files as they're coming in, which might be done for any process not on some allow-list. And maybe the allow-list is basically just "explorer.exe". And maybe it's faster at checking some processes than others.
kachapopopow1 hour ago
rsync being that much slower makes no sense since back when I used windows I rsync was saturating 1 gig easily, this has to be running on a very slow pentium or something.
zaptheimpaler1 hour ago
ugh, I don't know why copying files and basic I/O is so fucked on Windows. Recently I was trying to copy some large movie files between 2 folders on an NVME SSD formatted to ExFAT in a good USB-C enclosure connected over 20Gbps USB-C port and explorer would literally just freeze & crash doing that. I had to copy one file at a time to make it not crash, and then it would have this weird I/O pattern where the transfer would do almost nothing for 1-2 minutes, then the speed eventually picked up.

This isn't even going into WSL. I specifically stopped using WSL and moved to a separate linux devbox because of all the weirdness and slowness with filesystem access across the WSL boundary. Something like listing a lot of files would be very slow IIRC. Slightly tangentially, the whole situation around sharing files across OSes is pretty frustrating. The only one that works without 3rd party paid drivers on all 3 major OSes is ExFAT and that is limited in many other ways compared to ext4 or NTFS.

jeroenhd29 minutes ago
Explorer freezing halfway through copying happens all the time for me, usually it means Windows' I/O buffer is full and the drive is taking its sweet time actually doing data transfers. Windows will happily show you gigabytes per second being copied to a USB 2.0 drive if your RAM is empty enough, but it'll hang when it tries to flush.

Sometimes it's interference, sometimes the backing SSD is just a lot slower than it says on the box. I've also seen large file transfers (hundreds of gigabytes) expose bad RAM as caches would get filled and cleared over and over again.

You should be able to go into the Windows settings and reduce the drive cache. Copying will be slower, but behaviour will be more predictable.

toast01 hour ago
This feels like usb 3 super speed flakeyness. Did you do all the usual things of trying different ports, moving sources of interferrence, etc? Front ports at super speed are typically the most trouble.
kg1 hour ago
> SFTP is an encrypted protocol, so maybe those CPU cycles add up to a lot of extra work over time or slowdown. That… shouldn’t feel convincing to anyone who gives it more than 15 seconds of thought, but we all live with our eyes wide shut at times.

FWIW, I previously spent some time trying to get the maximum possible throughput when copying files between a Windows host and a Linux VM, and the encryption used by most protocols did actually become a bottleneck eventually. I expect this isn't a big factor on 1gbps ethernet, but I've never measured it.

r1ch1 hour ago
The bottleneck with SFTP / SCP / SSH is usually the server software - SSH can multiplex streams, so it implements its own TCP-style sliding windows for channel data. Unfortunately OpenSSH and similar server implementations suffer from the exact same problems that TCP did, where the windows don't scale up to modern connection speeds, so the maximum data in-flight quickly gets limited at higher BDPs.

HPN-SSH[1] resolves this but isn't widely deployed.

[1] https://www.psc.edu/hpn-ssh-home/

itsthecourier1 hour ago
want to see rsync WSL 1 in that comparison

filesystem should be faster in WSL2 but not if the file resides in the windows path I think