rm -rf at ~ 500 Mbytes/second

2010-01-08 2-minute read

If you ever have the misfortune of accidentally passing the path to a directory containing 177GB of data to the rm -rf command, I’ll start by suggesting that you hit ctl-c. The sooner the better.

Next, assuming you have some sort of backup, you’ll be staring at two monumentally large data sets, wondering exactly what was deleted from the original.

With help from dkg, I learned somethings about rm -rf. For one, it deletes one top level directory at a time. So - a comparison of the top level directory listings of the original and backup is a good place to start.

Top level directories that are entirely missing in the original are easy to restore. However, the presence of a top level directory in the original doesn’t mean it was un-touched.

Next, you’ll want to figure out which top level directory rm was operating on when you hit ctl-c. dkg discovered that ls -UR will provide a listing in the same order that rm -rf uses. The -U means do not sort. Note - the unsorted listing of the backup directory might not be the same as the unsorted listing of the original, so ls -UR is only really helpful on the original directory.

After selecting the first top level directory, rm -rf seems to delete all files in that directory first (presumably in the same order that ls -UR will list them), then it enters the first sub directory (as returned by ls -UR) and repeats the process.

With a careful comparison of ls -UR on the original with the directory listings on the backup, you should be able to pinpoint the exact sub directories affected, allowing you to restore only the files and directories that you deleted.

Thanks to dkg for technical and blog title suggestions.