Linux Tip: super-fast network file copy

If you've ever had to move a huge directory containing many files from one server to another, you may have encountered a situation where the copy rate was significantly less that what you'd expect your network could support. Rsync does a fantastic job of quickly syncing two relatively similar directory structures, but the initial clone can take quite a while, especially as the file count increases.

The problem is that there is a certain amount of per-file overhead when using scp or rsync to copy files from one machine to the other. This is not a problem under most circumstances, but if you are attempting to duplicate tens of thousands of files (think, server or database backup), this per-file overhead can really add up. The solution is to copy the files over in a single stream, which normally means tarring them up on one server, copying the tarball, then untarring on the destination. Unless you are under 50% disk utilization on the source server, this could cause you to run out of space.

Brett Jones has an alternative solution, which uses the handy netcat utility:

After clearing up 10 GBs of log files, we were left with hundreds of thousands of small files that were going to slow us down. We couldn't tarball the file because of a lack of space on the source server. I started searching around and found this nifty tip that takes our encryption and streams all the files as one large file:


This requires netcat on both servers.

Destination box: nc -l -p 2342 | tar -C /target/dir -xzf -
Source box: tar -cz /source/dir | nc Target_Box 2342

This causes the source machine to tar the files up and send them over the netcat pipe, where they are extracted on the destination machine, all with no per-file negotiation or unnecessary disk space used. It's also faster than the usual scp or rsync over scp because there is no encryption overhead. If you are on a local protected network, this will perform much better, even for large single-file copies.

If you are on an unprotected network, however, you may still want your data encrypted in transit. You can perform about the same task over ssh:

Run this on the destination machine:
cd /path/to/extract/to/
ssh user@source.server 'tar -cz -C /source/path/ *' | tar -zxv

This command will issue the tar command across the network on the source machine, causing tar's stdout to be sent back over the network. This is then piped to stdin on the destination machine and the files magically appear in the directory you are currently in.

The ssh route is a little slower than using netcat, due to the encryption overhead, but it's still way faster than scping the files individually. It also has the added advantage of potentially being compatible with Windows servers, provided you have a few of the unix tools like ssh and tar installed on your Windows server (using the cygwin linked binaries that are available).

Fast File Copy - Linux!

Posted by Jason Striegel | Nov 14, 2008 08:40 PM
Linux, Linux Server | Permalink | Comments (1) Bookmark and Share

Recent Entries

Comments

Newest comments listed first.

Posted by: Matt Simmons on November 16, 2008 at 7:29 AM

You might want to remove the compression flag

On my GB network, I went from 20MB/s to 400MB/s just by removing the compression flag from my rsync. You might want to try it here, too with the tar option.

But nevermind that, holy cow. I completely forgot about nc. I'm going to give this a shot today. I'm doing a 340GB database copy from one host to another. It typically takes just over 2 hours. I'm willing to bet that netcat can get that down under an hour and a half. I'll be sure to write a blog entry about it. Thanks!

Here is my original post:
http://standalone-sysadmin.blogspot.com/2008/10/answer-to-slow-speeds-from-last-week.html


Leave a comment



Bloggers

Welcome to the Hacks Blog!

Brian Jepson.Brian Jepson


Jason Striegel.Jason Striegel


Philip Torrone.Phillip Torrone



See all of the books in the Hacks Series!
Advertise here.

Recent Posts

www.flickr.com
photos in Hacks More photos in Hacks