Last commit for README: 96d8d7227295125e2f8c8020b6e3adf7989e9400

- information about recently discovered weakness of 1 pass version

pp [2006-07-31 23:54:47]
- information about recently discovered weakness of 1 pass version

git-svn-id: 455248ca-bdda-0310-9134-f4ebb693071a
Dbxrecover is a Perl script for recovering mail from damaged Outlook Express dbx
files. There are currently 2 versions of this script, using different methods to
reconstruct messages:

- dbxrecover-1p does everything in 1 pass, constructing messages as soon as
fragments are read from input, and writing messages as soon as they seem to be
- dbxrecover-2p uses 2 passes, it scans whole file looking for message fragments
first, then rearranges the fragments into complete messages.

Both approaches have pros and cons.

1 pass version typically uses much less memory, as it keeps only incomplete
messages in memory. Usually there are only a few, especially if the mailbox was
recently compacted. 2 pass version needs to keep information about the whole
file in memory. Although is keeps only meta-data, it is still quite a lot.

1 pass version may get confused and write an incomplete message to the output.
It is caused by a fact, that it is impossible, in general, to detect the first
fragment of a message. The last fragment is easy to find, as it has 0 as an id
of the next fragment. The first fragment, on the other hand, is not marked in
any way. Currently the script treats a fragment as first if it starts with one
of common headers such as "From:" or "Received:" (update: some mail systems and
antivirus/antispam solutions add their non standard headers at the top of the
message, thus making this test not effective; better solution is needed). The
probability of writing incomplete message is low though, due to the following

- the message must contain one or more of the header strings in its body
- the string must be located at the start of a fragment, which means a
probability of 1:512
- all other message fragments, starting from the next one up to the last one,
must be found in advance. This is quite uncommon, as usually the fragments are
found in the same order as they appear in the message.

1 pass version currently deals better with files containing fragments of
multiple mailboxes, as it sometimes happens when using various data recovery
software to recover lost OE files. Although it has no advanced algorithms to
detect if parts of message fit together, and will happily join fragments of
different messages together provided that identifiers match, it surprisingly
works well enough due to high spatial data locality and the fact that it purges
all completed messages from memory. Works best on compacted mailboxes and
defragmented drives.

2 pass version theoretically could work better than 1 pass version on files
containing multiple mailboxes, if equipped with a smart code to deal with
duplicate fragment identifiers. Currently it does not have any such code and
just discards messages with too many duplicates found, which is BAD.


To recover messages from Inbox.dbx and write them to Inbox.mbox file using the 2
pass version, open your terminal/console and type:

perl dbxrecover-2p Inbox.dbx >Inbox.mbox

Similarily, using 1 pass version:

perl dbxrecover-1p Inbox.dbx >Inbox.mbox

1 pass version also works as a filter, so you can feed it multiple files like

cat *.dbx | perl dbxrecover-1p >messages.mbox

The output file is a unix-style mbox, which can be easily imported or used
directly by most e-mail programs, such as Mozilla Thunderbird (just copy to user
profile), The Bat!, Opera etc. Unfortunately Outlook Express is unable to import
mbox files, so you need 3rd party tools to import recovered mail back to OE.
This program will not write dbx files, as it seems too complicated for me and
too easy to get something wrong. I haven't tried writing eml files yet.

This script is developed on Linux, but it should work on any platform for which
a perl interpreter exists. I ran it few times on Windows using ActivePerl and it
worked fine.