Linux Tip: super-fast network file copy
If you've ever had to move a huge directory containing many files from one server to another, you may have encountered a situation where the copy rate was significantly less that what you'd expect your network could support. Rsync does a fantastic job of quickly syncing two relatively similar directory structures, but the initial clone can take quite a while, especially as the file count increases.
The problem is that there is a certain amount of per-file overhead when using scp or rsync to copy files from one machine to the other. This is not a problem under most circumstances, but if you are attempting to duplicate tens of thousands of files (think, server or database backup), this per-file overhead can really add up. The solution is to copy the files over in a single stream, which normally means tarring them up on one server, copying the tarball, then untarring on the destination. Unless you are under 50% disk utilization on the source server, this could cause you to run out of space.
Brett Jones has an alternative solution, which uses the handy netcat utility:
After clearing up 10 GBs of log files, we were left with hundreds of thousands of small files that were going to slow us down. We couldn't tarball the file because of a lack of space on the source server. I started searching around and found this nifty tip that takes our encryption and streams all the files as one large file:
This requires netcat on both servers.Destination box: nc -l -p 2342 | tar -C /target/dir -xzf -
Source box: tar -cz /source/dir | nc Target_Box 2342
This causes the source machine to tar the files up and send them over the netcat pipe, where they are extracted on the destination machine, all with no per-file negotiation or unnecessary disk space used. It's also faster than the usual scp or rsync over scp because there is no encryption overhead. If you are on a local protected network, this will perform much better, even for large single-file copies.
If you are on an unprotected network, however, you may still want your data encrypted in transit. You can perform about the same task over ssh:
Run this on the destination machine:
cd /path/to/extract/to/
ssh user@source.server 'tar -cz -C /source/path/ *' | tar -zxv
This command will issue the tar command across the network on the source machine, causing tar's stdout to be sent back over the network. This is then piped to stdin on the destination machine and the files magically appear in the directory you are currently in.
The ssh route is a little slower than using netcat, due to the encryption overhead, but it's still way faster than scping the files individually. It also has the added advantage of potentially being compatible with Windows servers, provided you have a few of the unix tools like ssh and tar installed on your Windows server (using the cygwin linked binaries that are available).
Posted by Jason Striegel |
Nov 14, 2008 08:40 PM
Linux, Linux Server |
Permalink
| Comments (1)
Recent Entries
- Minty soldering jig
- Selecting row number in MySQL
- iPhone 3G software unlock
- Python on Android
- Controlling Sony camcorders with the Arduino
- Gradient text effect in CSS
- Retro gaming emulators that include (legal) ROMs?
- Das DereLicht - ham radio transmitter from a CFL bulb
- Using Google App Engine as a personal CDN
- Route-me - Open Source mapping library for iPhone
Comments
Newest comments listed first.
| Posted by: Matt Simmons on November 16, 2008 at 7:29 AM |
On my GB network, I went from 20MB/s to 400MB/s just by removing the compression flag from my rsync. You might want to try it here, too with the tar option.
But nevermind that, holy cow. I completely forgot about nc. I'm going to give this a shot today. I'm doing a 340GB database copy from one host to another. It typically takes just over 2 hours. I'm willing to bet that netcat can get that down under an hour and a half. I'll be sure to write a blog entry about it. Thanks!
Here is my original post:
http://standalone-sysadmin.blogspot.com/2008/10/answer-to-slow-speeds-from-last-week.html
Leave a comment
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- Android
- AppleTV
- arduino
- Astronomy
- Baseball
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Design
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- Online Investing
- OpenOffice
- Outdoor
- Parenting
- PCs
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Security
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- User Interface
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
Recent Posts
- Minty soldering jig
- Selecting row number in MySQL
- iPhone 3G software unlock
- Python on Android
- Controlling Sony camcorders with the Arduino
- Gradient text effect in CSS
- Retro gaming emulators that include (legal) ROMs?
- Das DereLicht - ham radio transmitter from a CFL bulb
- Using Google App Engine as a personal CDN
- Route-me - Open Source mapping library for iPhone
www.flickr.com
|





