Dealing with large numbers of files in Unix
Most of the time, you can move a bunch of files from one folder to another by running a simple mv command like "mv sourcedir/* destdir/". The problem is that when that asterisk gets expanded, each file in the directory is added as a command line parameter to the mv command. If sourcedir contains a lot of files, this can overflow the command line buffer, resulting in a mysterious "Too many arguments" error.
I ran into this problem recently while trying to manage a directory that had over a million files in it. It's not every day you run across a directory that contains a metric crap-ton of files, but when the problem arises, there's an easy way to deal with it. The trick is to use the handy xargs program, which is designed to take a big list as stdin and separate it as arguments to another command:
find sourcedir -type f -print | xargs -l1 -i mv {} destdir/
The -l1 tells xargs to only use one argument at a time to pass to mv. The -i parameter tells xargs to replace the {} with the argument. This command will execute mv for each file in the directory. Ideally, you would optimize this and specify something like -l50, sending mv 50 files at a time to move. This is how I remember xargs working on other Unix systems, but the GNU xargs that I have on my Linux box forces the number of arguments to 1 any time the -i is invoked. Either way, it gets the job done.
Without the -i, the -l parameter will work in Linux, but you can no longer use the {} substitution and all parameters are placed as the final arguments in the command. This is useless for when you want to add a final parameter such as the destination directory for the mv command. On the other hand, it's helpful for commands that will end with your file parameters, such as when you are batch removing files with rm.
Oddly enough, in OS X the parameters for xargs are a bit wonky and capitalized. The good news is that you can invoke the parameter substitution with multiple arguments at a time. To move a bunch of files in OS X, 50 files at a time, try the following:
find sourcedir -type f -print | xargs -L50 -I{} mv {} destdir/
That's about all there is to it. This is just a basic example, but once you get used to using xargs and find together, it's pretty easy to tweak the find parameters and move files around based on their date, permissions or file extension.
Posted by Jason Striegel |
Aug 26, 2008 07:22 PM
Linux, Linux Server, Mac |
Permalink
| Comments (6)
Recent Entries
- Plotting streaming data in real-time with Gnuplot
- Arduway: LEGO and Arduino make a Segway
- WiFi robot
- Resin casting
- The (unfortunate) iPhone Development Story
- Stanford Engineering Everywhere
- DIY espresso machine
- Zoom H2 line input hack - make a 4 channel field recorder
- SnackUpon
- Removing tourists from your travel photos
Comments
Newest comments listed first.
| Posted by: Pär-Ola Nilsson on August 27, 2008 at 1:46 AM |
You can use "mv --target-directory=destdir/"
to get around the 1-file limit in xargs on linux.
| Posted by: Marcus on August 27, 2008 at 4:52 AM |
you can also use the "-exec" parameter of find. Actually, it does the same.
| Posted by: xiojason on August 27, 2008 at 2:20 PM |
As Marcus said, you don't need xargs either if you have a reasonably recent version of posix find:
find sourcedir -type f -exec mv --target-directory=destdir/ {} +
It's the + that does the magic of inserting many items into the command line at a time.
| Posted by: Dave on September 2, 2008 at 11:08 PM |
If you are having troubles with the number of arguments when using
find sourcedir -exec mv --target-directory=destdir {} +
then you can use the same command but with \; instead of +
find sourcedir -exec mv --target-directory=destdir {} \;
This run a single "mv" command for each file in the directory rather than passing all of the files as arguments to a single "mv" command.
| Posted by: raf on September 13, 2008 at 12:31 AM |
is there any way to do this for filenames that have spaces in them?
it seems that the spaces in the names don't get escaped automatically, and so mv thinks it is looking for two different files that don't exist, instead of the one file with a space in its name
| Posted by: on September 23, 2008 at 6:12 PM |
if you have spaces in the name, use quotes. i.e. find sourcedir -type f -exec mv '{}' destdir/ \;
Leave a comment
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- AppleTV
- Astronomy
- Baseball
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Design
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- OpenOffice
- Outdoor
- Parenting
- PCs
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Security
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
Recent Posts
- Plotting streaming data in real-time with Gnuplot
- Arduway: LEGO and Arduino make a Segway
- WiFi robot
- Resin casting
- The (unfortunate) iPhone Development Story
- Stanford Engineering Everywhere
- DIY espresso machine
- Zoom H2 line input hack - make a 4 channel field recorder
- SnackUpon
- Removing tourists from your travel photos
www.flickr.com
|





