Glam Prestige Journal

Bright entertainment trends with youth appeal.

Background

I ran out of space on /home/data and need to transfer /home/data/repo to /home/data2.

/home/data/repo contains 1M dirs, each of which contain 11 dirs and 10 files. It totals 2TB.

/home/data is on ext3 with dir_index enabled./home/data2 is on ext4. Running CentOS 6.4.

I assume these approaches are slow because of the fact that repo/ has 1 million dirs directly underneath it.


Attempt 1: mv is fast but gets interrupted

I could be done if this had finished:

/home/data> mv repo ../data2

But it was interrupted after 1.5TB was transferred. It was writing at about 1GB/min.

Attempt 2: rsync crawls after 8 hours of building file list

/home/data> rsync --ignore-existing -rv repo ../data2

It took several hours to build the 'incremental file list' and then it transfers at 100MB/min.

I cancel it to try a faster approach.

Attempt 3a: mv complains

Testing it on a subdirectory:

/home/data/repo> mv -f foobar ../../data2/repo/
mv: inter-device move failed: '(foobar)' to '../../data2/repo/foobar'; unable to remove target: Is a directory

I'm not sure what this is error about, but maybe cp can bail me out..

Attempt 3b: cp gets nowhere after 8 hours

/home/data> cp -nr repo ../data2

It reads the disk for 8 hours and I decide to cancel it and go back to rsync.

Attempt 4: rsync crawls after 8 hours of building file list

/home/data> rsync --ignore-existing --remove-source-files -rv repo ../data2

I used --remove-source-files thinking it might make it faster if I start cleanup now.

It takes at least 6 hours to build the file list then it transfers at 100-200MB/min.

But the server was burdened overnight and my connection closed.

Attempt 5: THERES ONLY 300GB LEFT TO MOVE WHY IS THIS SO PAINFUL

/home/data> rsync --ignore-existing --remove-source-files -rvW repo ../data2

Interrupted again. The -W almost seemed to make "sending incremental file list" faster, which to my understanding shouldn't make sense. Regardless, the transfer is horribly slow and I'm giving up on this one.

Attempt 6: tar

/home/data> nohup tar cf - . |(cd ../data2; tar xvfk -)

Basically attempting to re-copy everything but ignoring existing files. It has to wade thru 1.7TB of existing files but at least it's reading at 1.2GB/min.

So far, this is the only command which gives instant gratification.

Update: interrupted again, somehow, even with nohup..

Attempt 7: harakiri

Still debating this one

Attempt 8: scripted 'merge' with mv

The destination dir had about 120k empty dirs, so I ran

/home/data2/repo> find . -type d -empty -exec rmdir {} \;

Ruby script:

SRC = "/home/data/repo"
DEST = "/home/data2/repo"
`ls #{SRC} --color=never > lst1.tmp`
`ls #{DEST} --color=never > lst2.tmp`
`diff lst1.tmp lst2.tmp | grep '<' > /home/data/missing.tmp`
t = `cat /home/data/missing.tmp | wc -l`.to_i
puts "Todo: #{t}"
# Manually `mv` each missing directory
File.open('missing.tmp').each do |line| dir = line.strip.gsub('< ', '') puts `mv #{SRC}/#{dir} #{DEST}/`
end

DONE.

13

3 Answers

Ever heard of splitting large tasks into smaller tasks?

/home/data/repo contains 1M dirs, each of which contain 11 dirs and 10 files. It totals 2TB.

rsync -a /source/1/ /destination/1/
rsync -a /source/2/ /destination/2/
rsync -a /source/3/ /destination/3/
rsync -a /source/4/ /destination/4/
rsync -a /source/5/ /destination/5/
rsync -a /source/6/ /destination/6/
rsync -a /source/7/ /destination/7/
rsync -a /source/8/ /destination/8/
rsync -a /source/9/ /destination/9/
rsync -a /source/10/ /destination/10/
rsync -a /source/11/ /destination/11/
(...)

Coffee break time.

2

This is what is happening:

  • Initially rsync will build the list of files.
  • Building this list is really slow, due to an initial sorting of the file list.
  • This can be avoided by using ls -f -1 and combining it with xargs for building the set of files that rsync will use, or either redirecting output to a file with the file list.
  • Passing this list to rsync instead of the folder, will make rsync to start working immediately.
  • This trick of ls -f -1 over folders with millions of files is perfectly described in this article:
2

Even if rsync is slow (why is it slow? maybe -z will help) it sounds like you've gotten a lot of it moved over, so you could just keep trying:

If you used --remove-source-files, you could then follow-up by removing empty directories. --remove-source-files will remove all the files, but will leave the directories there.

Just make sure you DO NOT use --remove-source-files with --delete to do multiple passes.

Also for increased speed you can use --inplace

If you're getting kicked out because you're trying to do this remotely on a server, go ahead and run this inside a 'screen' session. At least that way you can let it run.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy