Recovering specific files from NTFS using ddrescue and ntfstools

This is a simple perl script to copy blocks occupied by some file with ddrescue. It works like this:

runs ntfscat / ntfsls / other tool on the destination disk / image
records with strace which blocks are needed
runs ddrescue to copy these blocks
goes back to step 1 to check if there are more blocks to copy

The script is here among my other ddrescue related junk: http://pp.siedziba.pl/tmp/ddr/ddrstrace.pl

There is also a test script here: http://pp.siedziba.pl/tmp/ddr/ddrstrace-test.sh

Example usage (based on real events :)

Imagine I'm recovering data using this command:

# ddrescue /dev/sda disk.img disk.log

Supposedly the recovery goes slowly and I'd like to recover Outlook mail first before trying less important data. I need access to the relevant partition on the drive. I use fdisk to list partitions:

# fdisk -c -u 512 -l disk.img
[...]
   Device Boot      Start         End      Blocks   Id  System
disk.img1            2048    27265023    13631488   27  Unknown
disk.img2   *    27265024   969515007   471124992    7  HPFS/NTFS
disk.img3       969515008   976771071     3628032   12  Compaq diagnostics

Note: if the partition table is not yet copied I can try to copy it with:

# ddrstrace.pl 5 0 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- fdisk -c -u 512 -l disk.img

The partition I'd like to access is the second one, 27265024 blocks of 512 bytes each from the start, so I create a loop device:

# calc 27265024*512
	13959692288
# losetup -o 13959692288 /dev/loop0 disk.img

Note: before creating the loop device, make sure the image file size is large enough to cover your partition.

Now I can try to list the root directory and at the same time recover all the needed meta-data:

# ddrstrace.pl 1 13959692288 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- ntfsls /dev/loop0

Please note, that I need to supply the offset of /dev/loop0 to ddrstrace.pl, as it can't figure it out by itself.

I can add other arguments to the ddrescue command, like -r to retry bad blocks or '-R' to go in reverse. The special argument - '{}' - will get replaced by the script with "-i $offset -s $size".

I can list subdirectories the same way, until I find the file I need:

# ddrstrace.pl 3 13959692288 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- ntfsls -p Users /dev/loop0

Note that the first argument to ddrstrace is the index of the argument containing file/device name to trace reads from, so in "ntfsls /dev/loop0" case it's 1 (the first argument), while in "ntfsls -p Users /dev/loop0" it's 3 (/dev/loop0 is the third argument).

Using a bigger block size (the 3rd argument to ddrstrace.pl) than 512 may speed up listing of big directories at the cost of trying to copy more data than necessary.

Hopefully we find the needed file, and we can try to copy it with ntfscat. Unfortunately, ntfscat can only dump the file to stdout, so we redirect stdout to /dev/null (we don't need its output, all data will be copied to our image). Unfortunately, it will also suppress ddrescue output, so we use --verbose to see what is happening:

# ddrstrace.pl --verbose 1 13959692288 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- ntfscat /dev/loop0 Users/Alice/AppData/Local/Microsoft/Outlook/Outlook.pst >/dev/null

In my case the Outlook.pst file was 2GB in size, heavily fragmented with more than 100 fragments. This is where script limitations get apparent. Perl memory usage went up to 1.5GB from ~~processing such a large run-list~~ storing such a large strace result (Sys::Trace module should really have an iterator), and stracing so many reads took quite a long time.

Anyway, the file was recovered mostly intact, and ddrescue features helped a lot to recover it. The nice thing about recovering files this way is that once a block is finally read, it is saved and does not need to be read again. Also, read errors are not fatal. The tool may give up on error, but cope just fine with uninitialized block.

The script has not been tested with anything else than ntfsls / ntfscat. It may or may not work properly with other tools such as the Sleuth Kit. It is probably a good idea to run it as a limited user, give the user write access to the destination disk/image (disk.img in the example), but read only access to the file system (/dev/loop0) and source disk (/dev/sda).