* mu.1: added note on performance, other improvements

This commit is contained in:
Dirk-Jan C. Binnema 2010-01-23 13:36:23 +02:00
parent f5d785cad2
commit 1c7dcbfa4a
1 changed files with 83 additions and 49 deletions

132
man/mu.1
View File

@ -2,7 +2,7 @@
.SH NAME
mu \- index and search the contents of e-mail messages stored in Maildirs
mu \- index and search e-mail messages stored in Maildirs
.SH SYNOPSIS
@ -17,14 +17,34 @@ mu \- index and search the contents of e-mail messages stored in Maildirs
.SH DESCRIPTION
\fBmu\fR is a set of tools for indexing and searching e-mail messages stored
in Maildirs. It does so by recursively scanning a Maildir directory tree and
analyzing the e-mail messages found. The results of this analysis are then
stored in a database, which can then be queried for specific messages.
in Maildirs. It does so by scanning a Maildir directory tree and analyzing the
e-mail messages found. The results of this analysis are stored in a database,
which can then be queried.
\fBmu\fR can be used from the command line or can be integrated with e-mail
clients. This manpage has some examples.
\fBmu\fR can be used from the command line, or can be integrated with e-mail
clients. This manpage contains examples of both.
The various tools are available as commands for a single \fBmu\fR executable.
.SH COMMANDS
\fBmu\fR offers the following commands:
.TP
\fBindex\fR
for indexing (analyzing) the contents of your Maildirs, and storing the
information in a database
.TP
\fBfind\fR
for finding messages in your database, using certain search parameters (see
below for details). You can use \fBquery\fR and \fBsearch\fR as synonyms for
\fBfind\fR.
.TP
\fBmkdir\fR
for creating Maildirs.
.TP
The various commands are discussed in more detail below.
.SH GENERAL OPTIONS
@ -40,7 +60,7 @@ store and read its database and logs. By default, \fI~/.mu\fR is used.
makes \fBmu\fR generate extra debug information,
useful for debugging the program itself. By default, debug information goes to
the log file, \fI~/.mu/mu.log\fR. It can safely be deleted when \fBmu\fR is
not running.
not running. Note, with the debug option, the log file can grow rather quickly.
.TP
\fB\-q\fR, \fB\-\-quiet\fR
@ -67,25 +87,6 @@ list the various command line options, while
the options for one command, or all of the commands.
.SH COMMANDS
\fBmu\fR offers the following commands:
.TP
\fBindex\fR
for indexing (analyzing) the contents of your Maildirs, and storing the
information in a database
.TP
\fBfind\fR
for finding messages in your database, using certain search parameters (see
below for details). You can use \fBquery\fR and \fBsearch\fR as synonyms for
\fBfind\fR.
.TP
\fBmkdir\fR
for creating Maildirs.
.SH THE INDEX COMMAND
Using the
@ -94,32 +95,32 @@ command, you can index your Maildir directories, and store the information in
a Xapian database.
.B index
understands Maildirs as defined by Dan Bernstein for qmail(7). It also
understands recursive Maildirs (Maildirs within Maildirs), and the
VFAT-version of Maildir, as used by Tinymail/Modest.
understands Maildirs as defined by Dan Bernstein for qmail(7). In addition, it
understands recursive Maildirs (Maildirs within Maildirs), Maildir++. It can
also deal with VFAT-based Maildirs which use '!' as the seperators instead
of ':' as used by Tinymail/Modest and some other e-mail programs.
E-mail messages which are not stored in something that looks like a Maildir
leaf directory (\fIcur\fR and \fInew\fR) are ignored.
E-mail messages which are not stored in something resembling a maildir leaf
directory (\fIcur\fR and \fInew\fR) are ignored.
Currently, symlinks are not followed.
If there is a file called
.B .noindex
in a directory, the contents of that directory and any of its subdirectories
will be ignored. This can be useful to exclude certain directories from the
indexing process, for example directories with spam-messages.
If there is a file called \fI.noindex\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored. This can be useful to
exclude certain directories from the indexing process, for example directories
with spam-messages.
The first run of
.B mu index
may take a few minutes if you have a lot of mail (ten thousands of messages).
Fortunately, such a full scan needs to be done only once, after that it
suffices to index the changes, which goes much faster. Also note that a
substantial amount of the time goes to printing the progress information; if
you turn that off (with \fB\-q\fR or \fB\-\-quiet\fR), it goes a lot faster.
The first run of \fBmu index\fR may take a few minutes if you have a lot of
mail (ten thousands of messages). Fortunately, such a full scan needs to be
done only once, after that it suffices to index the changes, which goes much
faster. Also note that a substantial amount of the time goes to printing the
progress information; if you turn that off (with \fB\-q\fR or
\fB\-\-quiet\fR), it goes a lot faster. See the 'Note on performance' below
for more information.
Phase two of the indexing-process is the removal of messages from the database
for which there is no longer a corresponding file in the Maildir. If you do
not want this, you can use \fB\-u\fR, \fB\-\-nocleanup\fR.
The optional phase two of the indexing-process is the removal of messages from
the database for which there is no longer a corresponding file in the
Maildir. If you do not want this, you can use \fB\-u\fR, \fB\-\-nocleanup\fR.
.SS Indexing options
@ -140,7 +141,6 @@ re-index all mails, even ones that are already in the database.
\fB\-u\fR, \fB\-\-nocleanup\fR
disables the database cleanup that \fBmu\fR does by default after indexing.
.TP
.B NOTE:
@ -153,6 +153,40 @@ Also note that, before indexing is completed, searches for messages may fail,
even if they have already been indexed, as some of the esssential database
information will only be written in batches during the indexing process.
.SS A note on performance
As a non-scientific benchmark, a simple test on the authors machine (a
Thinkpad X61s laptop using Linux 2.6.31 and an ext3 file system) with no
existing database, and a maildir with 14,200 messages:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
83.36s user 6.49s system 43% cpu 3:26.21 total
.si
(about 69 messages per second)
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0.29s user 0.62s system 14% cpu 6.409 total
.si
(about 2219 message per second)
Note that each of test flushes the caches first; a more common use case might
be to run \fBmu index\fR when new mail has arrived; the cache may stay
quite 'warm' in that case:
.nf
$ time mu index --quiet
0.19s user 0.21s system 98% cpu 0.402 total
.si
which is more than 35,0000 messages per second (there is some variance here,
but the author has not seen it getting under 30,0000 messages per second).
.SH THE FIND COMMAND
The
@ -238,7 +272,7 @@ search parameters:
p Full path to the message
P Message priority (high, normal, low)
s Message subject
m Message ID
m Message ID
t To: recipient
.fi