mu/man/mu-index.1

214 lines
7.7 KiB
Groff

.TH MU-INDEX 1 "February 2020" "User Manuals"
.SH NAME
mu index \- index e-mail messages stored in Maildirs
.SH SYNOPSIS
.B mu index [options]
.SH DESCRIPTION
\fBmu index\fR is the \fBmu\fR command for scanning the contents of Maildir
directories and storing the results in a Xapian database. The data can then be
queried using
.BR mu-find (1)\.
Note that before the first time you run \fBmu index\fR, you must run \fBmu
init\fR to initialize the database.
\fBindex\fR understands Maildirs as defined by Daniel Bernstein for
\fBqmail\fR(7). In addition, it understands recursive Maildirs (Maildirs
within Maildirs), Maildir++. It can also deal with VFAT-based Maildirs
which use '!' as the separators instead of ':'.
E-mail messages which are not stored in something resembling a maildir
leaf-directory (\fIcur\fR and \fInew\fR) are ignored, as are the cache
directories for \fInotmuch\fR and \fIgnus\fR, and any dot-directory.
Symlinks are not followed.
If there is a file called \fI.noindex\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored. This can be useful to
exclude certain directories from the indexing process, for example directories
with spam-messages.
If there is a file called \fI.noupdate\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored, unless we do a full
rebuild (with \fB--rebuild\fR). This can be useful to speed up things you have
some maildirs that never change. Note that you can still search for these
messages, this only affects updating the database.
There also the \fB--lazy-check\fR which can greatly speed up indexing;
see below for details.
The first run of \fBmu index\fR may take a few minutes if you have a
lot of mail (tens of thousands of messages). Fortunately, such a full
scan needs to be done only once; after that it suffices to index the
changes, which goes much faster. See the 'Note on performance
(i,ii,iii)' below for more information.
The optional 'phase two' of the indexing-process is the removal of messages
from the database for which there is no longer a corresponding file in the
Maildir. If you do not want this, you can use \fB\-n\fR, \fB\-\-nocleanup\fR.
When \fBmu index\fR catches one of the signals \fBSIGINT\fR, \fBSIGHUP\fR or
\fBSIGTERM\fR (e.g., when you press Ctrl-C during the indexing process), it
tries to shutdown gracefully; it tries to save and commit data, and close the
database etc. If it receives another signal (e.g., when pressing Ctrl-C once
more), \fBmu index\fR will terminate immediately.
.SH OPTIONS
Note, some of the general options are described in the \fBmu(1)\fR man-page
and not here, as they apply to multiple mu commands.
.TP
\fB\-\-lazy-check\fR
in lazy-check mode, \fBmu\fR does not consider messages for which the
time-stamp (ctime) of the directory they reside in has not changed
since the previous indexing run. This is much faster than the non-lazy
check, but won't update messages that have change (rather than having
been added or removed), since merely editing a message does not update
the directory time-stamp. Of course, you can run \fBmu-index\fR
occasionally without \fB\-\-lazy-check\fR, to pick up such messages.
.TP
\fB\-\-nocleanup\fR
disables the database cleanup that \fBmu\fR does by default after indexing.
.TP
\fB\-\-rebuild\fR
clear all messages from the database before indexing. \fB\-\-rebuild\fR
guarantees that after the indexing has finished, there are no 'old' messages
in the database anymore, which is not true with \fB\-\-reindex\fR when
indexing only a part of messages (using \fB\-\-maildir\fR). For this reason,
it is necessary to run \fBmu index \-\-rebuild\fR when there is an upgrade in
the database format. \fBmu index\fR will issue a warning about this.
.TP
\fB\-\-max-msg-size\fR=\fI<max msg size>\fR
set the maximum size (in bytes) for messages. The default maximum
(currently at 500Mb) should be enough in most cases, but if you
encounter warnings from \fBmu\fR about ignoring messsage because they
are too big, you may want to increase this. Note that the reason for
having a maximum size is that big messages require big memory
allocations, which may lead to problems.
.B NOTE:
It is not recommended to mix maildirs and sub-maildirs within the hierarchy
in the same database; for example, it's better not to index both with
\fB\-\-maildir\fR=~/MyMaildir and \fB\-\-maildir\fR=~/MyMaildir/foo, as this
may lead to unexpected results when searching with the 'maildir:' search
parameter (see below).
.SS A note on performance (i)
As a non-scientific benchmark, a simple test on the author's machine (a
Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with no
existing database, and a maildir with 27273 messages:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total
.fi
(about 103 messages per second)
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total
.fi
(more than 56818 messages per second)
Note that each test flushes the caches first; a more common use case might
be to run \fBmu index\fR when new mail has arrived; the cache may stay
quite 'warm' in that case:
.nf
$ time mu index --quiet
0,33s user 0,40s system 80% cpu 0,905 total
.fi
which is more than 30000 messages per second.
.SS A note on performance (ii)
As per June 2012, we did the same non-scientific benchmark, this time with an
Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589
messages. We start without an existing database.
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total
.fi
(about 813 messages per second)
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total
.fi
(more than 173000 messages per second)
.SS A note on performance (iii)
As per July 2016, we did the same non-scientific benchmark, again with
the Intel i5-2500 CPU @ 3.30GHz, an ext4 file system. This time, the
maildir contains 72525 messages.
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
40,34s user 2,56s system 64% cpu 1:06,17 total
.fi
(about 1099 messages per second).
As shown, \fBmu\fR has been getting faster with each release, even
with relatively expensive new features such as text-normalization (for
case-insensitve/accent-insensitive matching). The profiles are
dominated by operations in the Xapian database now.
.SH FILES
\fBmu\fR stores logs of its operations and queries in \fI<muhome>/mu.log\fR
(by default, this is \fI~/.mu/mu.log\fR). Upon startup, \fBmu\fR checks the
size of this log file. If it exceeds 1 MB, it will be moved to
\fI~/.mu/mu.log.old\fR, overwriting any existing file of that name, and start
with an empty log file. This scheme allows for continued use of \fBmu\fR
without the need for any manual maintenance of log files.
.SH ENVIRONMENT
\fBmu index\fR uses \fBMAILDIR\fR to find the user's Maildir if it has not
been specified explicitly with \fB\-\-maildir\fR=\fI<maildir>\fR. If
\fBMAILDIR\fR is not set, \fBmu index\fR will try \fI~/Maildir\fR.
.SH RETURN VALUE
\fBmu index\fR return 0 upon successful completion, and any other number
greater than 0 signals an error.
.SH BUGS
Please report bugs if you find them:
.BR https://github.com/djcb/mu/issues
.SH AUTHOR
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
.SH "SEE ALSO"
.BR maildir (5),
.BR mu (1),
.BR mu-init (1),
.BR mu-find (1),
.BR mu-cfind (1)