This is a simple perl script to copy blocks occupied by some file with ddrescue. It works like this:
The script is here among my other ddrescue related junk: http://pp.siedziba.pl/tmp/ddr/ddrstrace.pl
There is also a test script here: http://pp.siedziba.pl/tmp/ddr/ddrstrace-test.sh
Example usage (based on real events :)
Imagine I'm recovering data using this command:
# ddrescue /dev/sda disk.img disk.log
Supposedly the recovery goes slowly and I'd like to recover Outlook mail first before trying less important data. I need access to the relevant partition on the drive. I use fdisk to list partitions:
# fdisk -c -u 512 -l disk.img
[...]
Device Boot Start End Blocks Id System
disk.img1 2048 27265023 13631488 27 Unknown
disk.img2 * 27265024 969515007 471124992 7 HPFS/NTFS
disk.img3 969515008 976771071 3628032 12 Compaq diagnostics
Note: if the partition table is not yet copied I can try to copy it with:
# ddrstrace.pl 5 0 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- fdisk -c -u 512 -l disk.img
The partition I'd like to access is the second one, 27265024 blocks of 512 bytes each from the start, so I create a loop device:
# calc 27265024*512
13959692288
# losetup -o 13959692288 /dev/loop0 disk.img
Note: before creating the loop device, make sure the image file size is large enough to cover your partition.
Now I can try to list the root directory and at the same time recover all the needed meta-data:
# ddrstrace.pl 1 13959692288 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- ntfsls /dev/loop0
Please note, that I need to supply the offset of /dev/loop0 to ddrstrace.pl, as it can't figure it out by itself.
I can add other arguments to the ddrescue command, like -r to retry bad blocks or '-R' to go in reverse. The special argument - '{}' - will get replaced by the script with "-i $offset -s $size".
I can list subdirectories the same way, until I find the file I need:
# ddrstrace.pl 3 13959692288 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- ntfsls -p Users /dev/loop0
Note that the first argument to ddrstrace is the index of the argument containing file/device name to trace reads from, so in "ntfsls /dev/loop0" case it's 1 (the first argument), while in "ntfsls -p Users /dev/loop0" it's 3 (/dev/loop0 is the third argument).
Using a bigger block size (the 3rd argument to ddrstrace.pl) than 512 may speed up listing of big directories at the cost of trying to copy more data than necessary.
Hopefully we find the needed file, and we can try to copy it with ntfscat. Unfortunately, ntfscat can only dump the file to stdout, so we redirect stdout to /dev/null (we don't need its output, all data will be copied to our image). Unfortunately, it will also suppress ddrescue output, so we use --verbose to see what is happening:
# ddrstrace.pl --verbose 1 13959692288 512 -- ddrescue '{}' /dev/sda disk.img disk.log -- ntfscat /dev/loop0 Users/Alice/AppData/Local/Microsoft/Outlook/Outlook.pst >/dev/null
In my case the Outlook.pst file was 2GB in size, heavily fragmented with more than 100 fragments. This is where script limitations get apparent. Perl memory usage went up to 1.5GB from processing such a large run-list storing such a large strace result (Sys::Trace module should really have an iterator), and stracing so many reads took quite a long time.
Anyway, the file was recovered mostly intact, and ddrescue features helped a lot to recover it. The nice thing about recovering files this way is that once a block is finally read, it is saved and does not need to be read again. Also, read errors are not fatal. The tool may give up on error, but cope just fine with uninitialized block.
The script has not been tested with anything else than ntfsls / ntfscat. It may or may not work properly with other tools such as the Sleuth Kit. It is probably a good idea to run it as a limited user, give the user write access to the destination disk/image (disk.img in the example), but read only access to the file system (/dev/loop0) and source disk (/dev/sda).