mu/man/mu-index.1

185 lines
6.9 KiB
Groff

.TH MU-INDEX 1 "November 2010" "User Manuals"
.SH NAME
mu index \- index e-mail messages stored in Maildirs
.SH SYNOPSIS
.B mu index [options]
.SH DESCRIPTION
\fBmu index\fR is the \fBmu\fR sub-command for scanning the contents of
Maildir directories and storing the results in a Xapian database which can
then be searched using
.BR mu-find(1)
\.
.B index
understands Maildirs as defined by Daniel Bernstein for qmail(7). In addition,
it understands recursive Maildirs (Maildirs within Maildirs), Maildir++. It
can also deal with VFAT-based Maildirs which use '!' as the seperators instead
of ':' as used by \fITinymail\fR/\fIModest\fR and some other e-mail programs.
E-mail messages which are not stored in something resembling a maildir leaf
directory (\fIcur\fR and \fInew\fR) are ignored.
Symlinks are not followed.
If there is a file called \fI.noindex\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored. This can be useful to
exclude certain directories from the indexing process, for example directories
with spam-messages.
The first run of \fBmu index\fR may take a few minutes if you have a lot of
mail (ten thousands of messages). Fortunately, such a full scan needs to be
done only once, after that it suffices to index the changes, which goes much
faster. Also note that a substantial amount of the time goes to printing the
progress information; if you turn that off (with \fB\-q\fR or
\fB\-\-quiet\fR), it goes a lot faster. See the 'Note on performance' below
for more information.
The optional phase two of the indexing-process is the removal of messages from
the database for which there is no longer a corresponding file in the
Maildir. If you do not want this, you can use \fB\-n\fR, \fB\-\-nocleanup\fR.
When \fBmu index\fR catches on of the signals \fBSIGINT\fR, \fBSIGHUP\fR or
\fBSIGTERM\fR (e.g,, when you press Ctrl-C during the indexing process), it
tries to shutdown gracefully; it tries to save and commit data, and close the
database etc. If it receives another signal (e.g,, when pressing Ctrl-C once
more), \fBmu index\fR will terminate immediately.
.SH OPTIONS
Note, some of the important options are described in the \fBmu(1)\fR man-page
and not here, as they apply to multiple mu-commands.
.TP
\fB\-m\fR, \fB\-\-maildir\fR=\fI<maildir>\fR
starts searching at \fI<maildir>\fR. By default, \fBmu\fR uses whatever the
\fBMAILDIR\fR environment variable is set to; if that is not set, it tries
\fI~/Maildir\fR \. In either case, the path must be \fBabsolute\fR.
Also please see the note on mixing sub-maildirs below.
.TP
\fB\-\-reindex\fR
re-index all mails, even ones that are already in the database.
.TP
\fB\-\-nocleanup\fR
disables the database cleanup that \fBmu\fR does by default after indexing.
.TP
\fB\-\-rebuild\fR
clear all messages from the database before
indexing. This is effectively the same as removing the database. The
difference with \fB\-\-reindex\fR is that \fB\-\-rebuild\fR guarantees that
after the indexing has finished, there are no 'old' messages in the database
anymore, which is not true with \fB\-\-reindex\fR when indexing only a part of
messages (using \fB\-\-maildir\fR). For this reason, it is necessary to run
\fBmu index \-\-rebuild\fR when there is an upgrade in the database
format. \fBmu index\fR will issue a warning about this.
.TP
\fB\-\-autoupgrade\fR automatically use \fB\-y\fR, \fB\-\-empty\fR
when \fBmu\fR notices that the database version is not up-to-date. This option
is for use in cron scripts and the like, so they won't require any user
interaction, even when mu introduces a new database version.
.TP
.B NOTE:
It is not a good idea to run multiple instances of
.B mu index
concurrently. No data loss should occur, but one or more of the instances may
experience errors due to database locks.
Also note that, before indexing is completed, searches for messages may fail,
even if they have already been indexed, as some of the esssential database
information will only be written in batches during the indexing process.
Furthermore, it is not recommended tot mix maildirs and sub-maildirs within
the hierarchy in the same database; for example, it's better not to index both
with \fB\-\-maildir\fR=~/MyMaildir and \fB\-\-maildir\fR=~/MyMaildir/foo, as
this may lead to unexpected results when searching with the the 'maildir:'
search parameter (see below).
.SS A note on performance
As a non-scientific benchmark, a simple test on the authors machine (a
Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with no
existing database, and a maildir with 27273 messages:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total
.si
(about 103 messages per second)
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total
.si
(more than 2300 messages per second)
Note that each of test flushes the caches first; a more common use case might
be to run \fBmu index\fR when new mail has arrived; the cache may stay
quite 'warm' in that case:
.nf
$ time mu index --quiet
0,33s user 0,40s system 80% cpu 0,905 total
.si
which is more than 30000 messages per second.
In general, \fBmu\fR has been getting faster with each release, even with
relatively expensive new features such as text-normalization (for
case-insensitve/accent-insensitive matching). The profiles are dominated by
operations in the Xapian database now.
.SH FILES
By default, \fBmu index\fR stores its message database in \fI~/.mu/xapian\fR;
the database has an embedded version number, and \fBmu\fR will automatically
update it when it notices a different version. This allows for automatic
updating of \fBmu\fR-versions, without the need to clear out any old
databases.
However, note that versions of \fBmu\fR before 0.7 used a different scheme,
which put the database in \fI~/.mu/xapian\-<version>\fR. These older databases
can safely be deleted. Starting from version 0.7, this manual cleanup should
no longer be needed.
\fBmu\fR stores logs of its operations and queries in \fI<muhome>/mu.log\fR
(by default, this is \fI~/.mu/mu.log\fR). Upon startup, \fBmu\fR checks the
size of this log file. If it exceeds 1 MB, it will be moved to
\fI~/.mu/mu.log.old\fR, overwriting any existing file of that name, and start
with an empty log file. This scheme allows for continued use of \fBmu\fR
without the need for any manual maintenance of log files.
.SH ENVIRONMENT
\fBmu index\fR uses \fBMAILDIR\fR to find the user's Maildir if it has not
been specified explicitly with \fB\-\-maildir\fR=\fI<maildir>\fR. If
\fBMAILDIR\fR is not set, \fBmu index\fR will try \fI~/Maildir\fR.
.SH BUGS
Please report bugs if you find them:
.BR http://code.google.com/p/mu0/issues/list
.SH AUTHOR
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
.SH "SEE ALSO"
.BR maildir(5)
.BR mu(1)
.BR mu-find(1)