- readme reformatted

pp [2006-05-10 04:26:33]
- readme reformatted


git-svn-id: https://siedziba.pl:790/svn/repos/dbxrecover@237 455248ca-bdda-0310-9134-f4ebb693071a
Filename
README
diff --git a/README b/README
index e753d8c..8757601 100644
--- a/README
+++ b/README
@@ -1,25 +1,53 @@
-Dbxrecover is a Perl script for recovering mail from damaged Outlook Express dbx files. There are currently 2 versions of this script, using different methods to reconstruct messages:
+Dbxrecover is a Perl script for recovering mail from damaged Outlook Express dbx
+files. There are currently 2 versions of this script, using different methods to
+reconstruct messages:

-- dbxrecover-1p does everything in 1 pass, constructing messages as soon as fragments are read from input, and writing messages as soon as they seem to be complete.
-- dbxrecover-2p uses 2 passes, it scans whole file looking for message fragments first, then rearranges the fragments into complete messages.
+- dbxrecover-1p does everything in 1 pass, constructing messages as soon as
+fragments are read from input, and writing messages as soon as they seem to be
+complete.
+- dbxrecover-2p uses 2 passes, it scans whole file looking for message fragments
+first, then rearranges the fragments into complete messages.

 Both approaches have pros and cons.

-1 pass version typically uses much less memory, as it keeps only incomplete messages in memory. Usually there are only a few, especially if the mailbox was recently compacted. 2 pass version needs to keep information about the whole file in memory. Although is keeps only meta-data, it is still quite a lot.
+1 pass version typically uses much less memory, as it keeps only incomplete
+messages in memory. Usually there are only a few, especially if the mailbox was
+recently compacted. 2 pass version needs to keep information about the whole
+file in memory. Although is keeps only meta-data, it is still quite a lot.

-1 pass version may get confused and write an incomplete message to the output. It is caused by a fact, that it is impossible, in general, to detect the first fragment of a message. The last fragment is easy to find, as it has 0 as an id of the next fragment. The first fragment, on the other hand, is not marked in any way. Currently the script treats a fragment as first if it starts with one of common headers such as "From:" or "Received:". The probability of writing incomplete message is low though, due to the following factors:
+1 pass version may get confused and write an incomplete message to the output.
+It is caused by a fact, that it is impossible, in general, to detect the first
+fragment of a message. The last fragment is easy to find, as it has 0 as an id
+of the next fragment. The first fragment, on the other hand, is not marked in
+any way. Currently the script treats a fragment as first if it starts with one
+of common headers such as "From:" or "Received:". The probability of writing
+incomplete message is low though, due to the following factors:

 - the message must contain one or more of the header strings in its body
-- the string must be located at the start of a fragment, which means a probability of 1:512
-- all other message fragments, starting from the next one up to the last one, must be found in advance. This is quite uncommon, as usually the fragments are found in the same order as they appear in the message.
-
-1 pass version currently deals better with files containing fragments of multiple mailboxes, as it sometimes happens when using various data recovery software to recover lost OE files. Although it has no advanced algorithms to detect if parts of message fit together, and will happily join fragments of different messages together provided that identifiers match, it surprisingly works well enough due to high spatial data locality and the fact that it purges all completed messages from memory. Works best on compacted mailboxes and defragmented drives.
-
-2 pass version theoretically could work better than 1 pass version on files containing multiple mailboxes, if equipped with a smart code to deal with duplicate fragment identifiers. Currently it does not have any such code and just discards messages with too many duplicates found, which is BAD.
+- the string must be located at the start of a fragment, which means a
+probability of 1:512
+- all other message fragments, starting from the next one up to the last one,
+must be found in advance. This is quite uncommon, as usually the fragments are
+found in the same order as they appear in the message.
+
+1 pass version currently deals better with files containing fragments of
+multiple mailboxes, as it sometimes happens when using various data recovery
+software to recover lost OE files. Although it has no advanced algorithms to
+detect if parts of message fit together, and will happily join fragments of
+different messages together provided that identifiers match, it surprisingly
+works well enough due to high spatial data locality and the fact that it purges
+all completed messages from memory. Works best on compacted mailboxes and
+defragmented drives.
+
+2 pass version theoretically could work better than 1 pass version on files
+containing multiple mailboxes, if equipped with a smart code to deal with
+duplicate fragment identifiers. Currently it does not have any such code and
+just discards messages with too many duplicates found, which is BAD.

 Usage:

-To recover messages from Inbox.dbx and write them to Inbox.mbox file using the 2 pass version, open your terminal/console and type:
+To recover messages from Inbox.dbx and write them to Inbox.mbox file using the 2
+pass version, open your terminal/console and type:

 perl dbxrecover-2p Inbox.dbx >Inbox.mbox

@@ -27,10 +55,18 @@ Similarily, using 1 pass version:

 perl dbxrecover-1p Inbox.dbx >Inbox.mbox

-1 pass version also works as a filter, so you can feed it multiple files like this:
+1 pass version also works as a filter, so you can feed it multiple files like
+this:

 cat *.dbx | perl dbxrecover-1p >messages.mbox

-The output file is a unix-style mbox, which can be easily imported or used directly by most e-mail programs, such as Mozilla Thunderbird (just copy to user profile), The Bat!, Opera etc. Unfortunately Outlook Express is unable to import mbox files, so you need 3rd party tools to import recovered mail back to OE. This program will not write dbx files, as it seems too complicated for me and too easy to get something wrong. I haven't tried writing eml files yet.
+The output file is a unix-style mbox, which can be easily imported or used
+directly by most e-mail programs, such as Mozilla Thunderbird (just copy to user
+profile), The Bat!, Opera etc. Unfortunately Outlook Express is unable to import
+mbox files, so you need 3rd party tools to import recovered mail back to OE.
+This program will not write dbx files, as it seems too complicated for me and
+too easy to get something wrong. I haven't tried writing eml files yet.

-This script is developed on Linux, but it should work on any platform for which a perl interpreter exists. I ran it few times on Windows using ActivePerl and it worked fine.
+This script is developed on Linux, but it should work on any platform for which
+a perl interpreter exists. I ran it few times on Windows using ActivePerl and it
+worked fine.
ViewGit