mu/man/mu-index.1

197 lines
6.7 KiB
Groff
Raw Normal View History

2022-06-07 22:04:31 +02:00
.TH MU-INDEX 1 "June 2022" "User Manuals"
.SH NAME
mu index \- index e-mail messages stored in Maildirs
.SH SYNOPSIS
.B mu index [options]
.SH DESCRIPTION
\fBmu index\fR is the \fBmu\fR command for scanning the contents of Maildir
directories and storing the results in a Xapian database. The data can then be
queried using
2018-03-10 21:08:17 +01:00
.BR mu-find (1)\.
2022-06-07 22:04:31 +02:00
Before the first time you run \fBmu index\fR, you must run \fBmu init\fR to
initialize the database.
\fBindex\fR understands Maildirs as defined by Daniel Bernstein for
2022-06-07 22:04:31 +02:00
\fBqmail\fR(7). In addition, it understands recursive Maildirs (Maildirs within
Maildirs), Maildir++. It can also deal with VFAT-based Maildirs which use '!'
or ';' as the separators instead of ':'.
2011-05-25 21:04:13 +02:00
E-mail messages which are not stored in something resembling a maildir
leaf-directory (\fIcur\fR and \fInew\fR) are ignored, as are the cache
directories for \fInotmuch\fR and \fIgnus\fR, and any dot-directory.
Starting with mu 1.5.x, symlinks are followed, and can be spread over multiple
filesystems; however note that moving files around is much faster when multiple
filesystems are not involved.
If there is a file called \fI.noindex\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored. This can be useful to
exclude certain directories from the indexing process, for example directories
with spam-messages.
If there is a file called \fI.noupdate\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored, unless we do a full
rebuild (with \fBmu init\fR). This can be useful to speed up things you have
some maildirs that never change. Note that you can still search for these
2022-06-07 22:04:31 +02:00
messages, this only affects updating the database. \fI.noupdate\fR is ignored
when you start indexing with an empty database (such as directly after \fImu
init\fR.
2022-06-07 22:04:31 +02:00
There also the \fB--lazy-check\fR which can greatly speed up indexing; see below
for details.
2022-06-07 22:04:31 +02:00
The first run of \fBmu index\fR may take a few minutes if you have a lot of mail
(tens of thousands of messages). Fortunately, such a full scan needs to be done
only once; after that it suffices to index the changes, which goes much faster.
See the 'Note on performance (i,ii,iii)' below for more information.
2022-06-07 22:04:31 +02:00
The optional 'phase two' of the indexing-process is the removal of messages from
the database for which there is no longer a corresponding file in the Maildir.
If you do not want this, you can use \fB\-n\fR, \fB\-\-nocleanup\fR.
When \fBmu index\fR catches one of the signals \fBSIGINT\fR, \fBSIGHUP\fR or
\fBSIGTERM\fR (e.g., when you press Ctrl-C during the indexing process), it
tries to shutdown gracefully; it tries to save and commit data, and close the
database etc. If it receives another signal (e.g., when pressing Ctrl-C once
more), \fBmu index\fR will terminate immediately.
2010-10-25 23:25:14 +02:00
.SH OPTIONS
2022-06-07 22:04:31 +02:00
Some of the general options are described in the \fBmu(1)\fR man-page and not
here, as they apply to multiple mu commands.
2010-11-12 20:00:03 +01:00
.TP
\fB\-\-lazy-check\fR
in lazy-check mode, \fBmu\fR does not consider messages for which the
time-stamp (ctime) of the directory they reside in has not changed
since the previous indexing run. This is much faster than the non-lazy
check, but won't update messages that have change (rather than having
been added or removed), since merely editing a message does not update
the directory time-stamp. Of course, you can run \fBmu-index\fR
occasionally without \fB\-\-lazy-check\fR, to pick up such messages.
.TP
\fB\-\-nocleanup\fR
disables the database cleanup that \fBmu\fR does by default after indexing.
.SS A note on performance (i)
As a non-scientific benchmark, a simple test on the author's machine (a
Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with no
existing database, and a maildir with 27273 messages:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total
.fi
(about 103 messages per second)
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total
.fi
(more than 56818 messages per second)
Note that each test flushes the caches first; a more common use case might
be to run \fBmu index\fR when new mail has arrived; the cache may stay
quite 'warm' in that case:
.nf
$ time mu index --quiet
0,33s user 0,40s system 80% cpu 0,905 total
.fi
which is more than 30000 messages per second.
.SS A note on performance (ii)
As per June 2012, we did the same non-scientific benchmark, this time with an
Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589
messages. We start without an existing database.
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total
.fi
(about 813 messages per second)
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total
.fi
(more than 173000 messages per second)
.SS A note on performance (iii)
As per July 2016, we did the same non-scientific benchmark, again with
the Intel i5-2500 CPU @ 3.30GHz, an ext4 file system. This time, the
maildir contains 72525 messages.
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
40,34s user 2,56s system 64% cpu 1:06,17 total
.fi
(about 1099 messages per second).
2022-06-07 22:04:31 +02:00
.SS A note on performance (iv)
A few years later and its June 2022. There's a lot more happening during indexing, but indexing became multi-threaded and machines are faster; e.g. this
is with an AMD Ryzen Threadripper 1950X (32) @ 3.399GHz.
2022-06-07 22:04:31 +02:00
The instructions are a little different since we have a proper repeatable
benchmark now. After building,
2022-06-07 22:04:31 +02:00
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
% THREAD_NUM=4 build/lib/tests/bench-indexer -m perf
# random seed: R02Sf5c50e4851ec51adaf301e0e054bd52b
1..1
# Start of bench tests
# Start of indexer tests
indexed 5000 messages in 20 maildirs in 3763ms; 752 μs/message; 1328 messages/s (4 thread(s))
ok 1 /bench/indexer/4-cores
# End of indexer tests
# End of bench tests
.fi
2022-06-07 22:04:31 +02:00
Things are again a little faster, even though the index does a lot more now
(text-normalizatian, and pre-generating message-sexps). A faster machine helps,
too!
2011-01-12 22:14:51 +01:00
.SH RETURN VALUE
2011-05-25 21:04:13 +02:00
2022-06-07 22:04:31 +02:00
\fBmu index\fR return 0 upon successful completion; any other number signals an
error.
2011-01-12 22:14:51 +01:00
.SH BUGS
2022-06-07 22:04:31 +02:00
Please report bugs if you find any:
.BR https://github.com/djcb/mu/issues
.SH AUTHOR
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
.SH "SEE ALSO"
2018-03-10 21:08:17 +01:00
.BR maildir (5),
.BR mu (1),
.BR mu-init (1),
2018-03-10 21:08:17 +01:00
.BR mu-find (1),
.BR mu-cfind (1)