That’s actually an incredibly complex topic. The simple answer is ‘it depends’.
In general, swap is used when there isn’t enough memory to back the pages actually used. The kernel doesn’t limit the pages requested, but only actually allocates a page when its first written. So, when a page is allocated, and then written, if there isn’t a ready to go page, the kernel scrounges one up.
The first complicating feature is disk cache. When a file is read or written, a copy of its contents lingers in memory, in case it’s needed again soon. Normally, disk cache is lower precedence than program memory, so if the kernel needs to scrounge a page, it’ll dump disk cache. Setting vm.swappiness
to 100 makes it consider disk cache and program pages on equal footing. It may discard a page of disk cache, if there’s one that hasn’t been used in a while. If there’s a program page that’s older, it’ll write that to swap instead.
The next complicating feature is zram/zswap/zcache. These are 3 similar mechanisms (zcache is going away) to let the kernel keep some compressed pages around, getting a 2-3x space gain in exchange for extra cpu time (of them, zram is both the most flexible and the most efficient, but the setup is the most complex). Most computers don’t have this enabled by default.
The last complicating feature is cgroups. You can put artificial limits on memory use by programs (and you want to do so if you run a swap file or especially zram). If you do this, you can put both a soft and a hard cap. If a program group exceeds its soft cap, the kernel will start swapping some of the pages out, but not in a huge hurry. If it hits the hard cap, it’ll be treated like it would if the whole system ran out, including tripping the OOMK if needed.
You’ve not given enough data to answer why swap is higher, but probably due to your vm.swappiness
value combined with the large amount of disk cache that’s likely to be in use from the disk copy. If you want to avoid it, you could use cgroups to limit the amount of disk cache the copy program uses (since it won’t be looking at the data again once written, it doesn’t need much cache).