Background
I ran out of space on /home/data and need to transfer /home/data/repo to /home/data2.
/home/data/repo contains 1M dirs, each of which contain 11 dirs and 10 files. It totals 2TB.
/home/data is on ext3 with dir_index enabled./home/data2 is on ext4.
Running CentOS 6.4.
I assume these approaches are slow because of the fact that repo/ has 1 million dirs directly underneath it.
Attempt 1: mv is fast but gets interrupted
I could be done if this had finished:
/home/data> mv repo ../data2But it was interrupted after 1.5TB was transferred. It was writing at about 1GB/min.
Attempt 2: rsync crawls after 8 hours of building file list
/home/data> rsync --ignore-existing -rv repo ../data2It took several hours to build the 'incremental file list' and then it transfers at 100MB/min.
I cancel it to try a faster approach.
Attempt 3a: mv complains
Testing it on a subdirectory:
/home/data/repo> mv -f foobar ../../data2/repo/
mv: inter-device move failed: '(foobar)' to '../../data2/repo/foobar'; unable to remove target: Is a directoryI'm not sure what this is error about, but maybe cp can bail me out..
Attempt 3b: cp gets nowhere after 8 hours
/home/data> cp -nr repo ../data2It reads the disk for 8 hours and I decide to cancel it and go back to rsync.
Attempt 4: rsync crawls after 8 hours of building file list
/home/data> rsync --ignore-existing --remove-source-files -rv repo ../data2I used --remove-source-files thinking it might make it faster if I start cleanup now.
It takes at least 6 hours to build the file list then it transfers at 100-200MB/min.
But the server was burdened overnight and my connection closed.
Attempt 5: THERES ONLY 300GB LEFT TO MOVE WHY IS THIS SO PAINFUL
/home/data> rsync --ignore-existing --remove-source-files -rvW repo ../data2Interrupted again. The -W almost seemed to make "sending incremental file list" faster, which to my understanding shouldn't make sense. Regardless, the transfer is horribly slow and I'm giving up on this one.
Attempt 6: tar
/home/data> nohup tar cf - . |(cd ../data2; tar xvfk -)Basically attempting to re-copy everything but ignoring existing files. It has to wade thru 1.7TB of existing files but at least it's reading at 1.2GB/min.
So far, this is the only command which gives instant gratification.
Update: interrupted again, somehow, even with nohup..
Attempt 7: harakiri
Still debating this one
Attempt 8: scripted 'merge' with mv
The destination dir had about 120k empty dirs, so I ran
/home/data2/repo> find . -type d -empty -exec rmdir {} \;Ruby script:
SRC = "/home/data/repo"
DEST = "/home/data2/repo"
`ls #{SRC} --color=never > lst1.tmp`
`ls #{DEST} --color=never > lst2.tmp`
`diff lst1.tmp lst2.tmp | grep '<' > /home/data/missing.tmp`
t = `cat /home/data/missing.tmp | wc -l`.to_i
puts "Todo: #{t}"
# Manually `mv` each missing directory
File.open('missing.tmp').each do |line| dir = line.strip.gsub('< ', '') puts `mv #{SRC}/#{dir} #{DEST}/`
endDONE.
133 Answers
Ever heard of splitting large tasks into smaller tasks?
/home/data/repo contains 1M dirs, each of which contain 11 dirs and 10 files. It totals 2TB.
rsync -a /source/1/ /destination/1/
rsync -a /source/2/ /destination/2/
rsync -a /source/3/ /destination/3/
rsync -a /source/4/ /destination/4/
rsync -a /source/5/ /destination/5/
rsync -a /source/6/ /destination/6/
rsync -a /source/7/ /destination/7/
rsync -a /source/8/ /destination/8/
rsync -a /source/9/ /destination/9/
rsync -a /source/10/ /destination/10/
rsync -a /source/11/ /destination/11/
(...)Coffee break time.
2This is what is happening:
- Initially rsync will build the list of files.
- Building this list is really slow, due to an initial sorting of the file list.
- This can be avoided by using ls -f -1 and combining it with xargs for building the set of files that rsync will use, or either redirecting output to a file with the file list.
- Passing this list to rsync instead of the folder, will make rsync to start working immediately.
- This trick of ls -f -1 over folders with millions of files is perfectly described in this article:
Even if rsync is slow (why is it slow? maybe -z will help) it sounds like you've gotten a lot of it moved over, so you could just keep trying:
If you used --remove-source-files, you could then follow-up by removing empty directories. --remove-source-files will remove all the files, but will leave the directories there.
Just make sure you DO NOT use --remove-source-files with --delete to do multiple passes.
Also for increased speed you can use --inplace
If you're getting kicked out because you're trying to do this remotely on a server, go ahead and run this inside a 'screen' session. At least that way you can let it run.