2021-11-13 06:58:49 +01:00
.TH MU-INDEX 1 "November 2021" "User Manuals"
2010-08-15 19:29:15 +02:00
2012-05-22 09:19:49 +02:00
.SH NAME
2010-08-15 19:29:15 +02:00
mu index \- index e-mail messages stored in Maildirs
.SH SYNOPSIS
.B mu index [options]
.SH DESCRIPTION
2011-01-02 17:05:43 +01:00
\fB mu index\fR is the \fB mu\fR command for scanning the contents of Maildir
directories and storing the results in a Xapian database. The data can then be
queried using
2018-03-10 21:08:17 +01:00
.BR mu-find (1)\.
2010-08-15 19:29:15 +02:00
2020-02-06 19:22:43 +01:00
Note that before the first time you run \fB mu index\fR , you must run \fB mu
init\fR to initialize the database.
2016-07-24 11:31:22 +02:00
\fB index\fR understands Maildirs as defined by Daniel Bernstein for
2018-03-10 21:08:17 +01:00
\fB qmail\fR (7). In addition, it understands recursive Maildirs (Maildirs
2016-07-24 11:31:22 +02:00
within Maildirs), Maildir++. It can also deal with VFAT-based Maildirs
2020-05-23 03:21:57 +02:00
which use '!' or ';' as the separators instead of ':'.
2010-08-15 19:29:15 +02:00
2011-05-25 21:04:13 +02:00
E-mail messages which are not stored in something resembling a maildir
leaf-directory (\fI cur\fR and \fI new\fR ) are ignored, as are the cache
2016-07-24 11:31:22 +02:00
directories for \fI notmuch\fR and \fI gnus\fR , and any dot-directory.
2010-08-15 19:29:15 +02:00
2020-05-26 18:07:56 +02:00
Starting with mu 1.5.x, symlinks are followed, and can be spread over multiple
filesystems; however note that moving files around is much faster when multiple
filesystems are not involved.
2010-08-15 19:29:15 +02:00
If there is a file called \fI .noindex\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored. This can be useful to
exclude certain directories from the indexing process, for example directories
with spam-messages.
2012-05-22 09:19:49 +02:00
If there is a file called \fI .noupdate\fR in a directory, the contents of that
directory and all of its subdirectories will be ignored, unless we do a full
2020-04-19 21:00:07 +02:00
rebuild (with \fB mu init\fR ). This can be useful to speed up things you have
2012-05-22 09:19:49 +02:00
some maildirs that never change. Note that you can still search for these
2021-11-13 06:58:49 +01:00
messages, this only affects updating the database. \fI .noupdate\fR is ignored when you start indexing with an empty database (such as directly after \fI mu init\fR .
2012-05-22 09:19:49 +02:00
2016-07-24 11:31:22 +02:00
There also the \fB --lazy-check\fR which can greatly speed up indexing;
see below for details.
The first run of \fB mu index\fR may take a few minutes if you have a
lot of mail (tens of thousands of messages). Fortunately, such a full
scan needs to be done only once; after that it suffices to index the
changes, which goes much faster. See the 'Note on performance
(i,ii,iii)' below for more information.
2011-01-02 17:05:43 +01:00
The optional 'phase two' of the indexing-process is the removal of messages
from the database for which there is no longer a corresponding file in the
2010-08-15 19:29:15 +02:00
Maildir. If you do not want this, you can use \fB \- n\fR , \fB \- \- nocleanup\fR .
2011-01-02 17:05:43 +01:00
When \fB mu index\fR catches one of the signals \fB SIGINT\fR , \fB SIGHUP\fR or
2013-06-04 00:15:28 +02:00
\fB SIGTERM\fR (e.g., when you press Ctrl-C during the indexing process), it
2010-08-15 19:29:15 +02:00
tries to shutdown gracefully; it tries to save and commit data, and close the
2013-06-04 00:15:28 +02:00
database etc. If it receives another signal (e.g., when pressing Ctrl-C once
2010-08-15 19:29:15 +02:00
more), \fB mu index\fR will terminate immediately.
2010-10-25 23:25:14 +02:00
.SH OPTIONS
2010-08-15 19:29:15 +02:00
2011-01-02 17:05:43 +01:00
Note, some of the general options are described in the \fB mu(1)\fR man-page
and not here, as they apply to multiple mu commands.
2010-11-12 20:00:03 +01:00
2016-07-24 11:31:22 +02:00
.TP
\fB \- \- lazy-check\fR
in lazy-check mode, \fB mu\fR does not consider messages for which the
time-stamp (ctime) of the directory they reside in has not changed
since the previous indexing run. This is much faster than the non-lazy
check, but won't update messages that have change (rather than having
been added or removed), since merely editing a message does not update
the directory time-stamp. Of course, you can run \fB mu-index\fR
occasionally without \fB \- \- lazy-check\fR , to pick up such messages.
2010-08-15 19:29:15 +02:00
.TP
2010-08-27 07:26:41 +02:00
\fB \- \- nocleanup\fR
2010-08-15 19:29:15 +02:00
disables the database cleanup that \fB mu\fR does by default after indexing.
2012-06-20 08:21:10 +02:00
.SS A note on performance (i)
2013-06-04 00:15:28 +02:00
As a non-scientific benchmark, a simple test on the author's machine (a
2010-11-28 12:40:36 +01:00
Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with no
existing database, and a maildir with 27273 messages:
2010-08-15 19:29:15 +02:00
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
2010-11-28 12:40:36 +01:00
66,65s user 6,05s system 27% cpu 4:24,20 total
2012-10-21 15:02:06 +02:00
.fi
2010-11-28 12:40:36 +01:00
(about 103 messages per second)
2010-08-15 19:29:15 +02:00
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
2010-11-28 12:40:36 +01:00
0,48s user 0,76s system 10% cpu 11,796 total
2012-10-21 15:02:06 +02:00
.fi
2012-06-20 08:21:10 +02:00
(more than 56818 messages per second)
2010-08-15 19:29:15 +02:00
2013-06-04 00:15:28 +02:00
Note that each test flushes the caches first; a more common use case might
2010-08-15 19:29:15 +02:00
be to run \fB mu index\fR when new mail has arrived; the cache may stay
quite 'warm' in that case:
.nf
$ time mu index --quiet
2012-10-21 15:02:06 +02:00
0,33s user 0,40s system 80% cpu 0,905 total
.fi
2010-11-28 12:40:36 +01:00
which is more than 30000 messages per second.
2012-06-20 08:21:10 +02:00
.SS A note on performance (ii)
As per June 2012, we did the same non-scientific benchmark, this time with an
2016-07-24 11:31:22 +02:00
Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589
messages. We start without an existing database.
2012-06-20 08:21:10 +02:00
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total
2012-10-21 15:02:06 +02:00
.fi
2012-06-20 08:21:10 +02:00
(about 813 messages per second)
A second run, which is the more typical use case when there is a database
already, goes much faster:
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total
2012-10-21 15:02:06 +02:00
.fi
2012-06-20 08:21:10 +02:00
(more than 173000 messages per second)
2016-07-24 11:31:22 +02:00
.SS A note on performance (iii)
As per July 2016, we did the same non-scientific benchmark, again with
the Intel i5-2500 CPU @ 3.30GHz, an ext4 file system. This time, the
maildir contains 72525 messages.
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
40,34s user 2,56s system 64% cpu 1:06,17 total
.fi
(about 1099 messages per second).
As shown, \fB mu\fR has been getting faster with each release, even
with relatively expensive new features such as text-normalization (for
case-insensitve/accent-insensitive matching). The profiles are
dominated by operations in the Xapian database now.
2010-08-15 19:29:15 +02:00
.SH FILES
2010-11-29 20:32:15 +01:00
\fB mu\fR stores logs of its operations and queries in \fI <muhome>/mu.log\fR
2020-04-19 13:02:48 +02:00
(by default, this is \fI ~/.cache/mu/mu.log\fR ). Upon startup, \fB mu\fR checks the
2010-11-29 20:32:15 +01:00
size of this log file. If it exceeds 1 MB, it will be moved to
2020-04-19 13:02:48 +02:00
\fI ~/.cache/mu/mu.log.old\fR , overwriting any existing file of that name, and start
2010-11-29 20:32:15 +01:00
with an empty log file. This scheme allows for continued use of \fB mu\fR
without the need for any manual maintenance of log files.
2010-08-15 19:29:15 +02:00
.SH ENVIRONMENT
2010-08-23 07:23:58 +02:00
\fB mu index\fR uses \fB MAILDIR\fR to find the user's Maildir if it has not
2010-11-29 20:32:15 +01:00
been specified explicitly with \fB \- \- maildir\fR =\fI <maildir>\fR . If
\fB MAILDIR\fR is not set, \fB mu index\fR will try \fI ~/Maildir\fR .
2010-08-23 07:23:58 +02:00
2011-01-12 22:14:51 +01:00
.SH RETURN VALUE
2011-05-25 21:04:13 +02:00
2013-09-07 07:56:39 +02:00
\fB mu index\fR return 0 upon successful completion, and any other number
greater than 0 signals an error.
2011-01-12 22:14:51 +01:00
2010-08-15 19:29:15 +02:00
.SH BUGS
2010-09-05 20:20:46 +02:00
Please report bugs if you find them:
2014-12-20 23:08:17 +01:00
.BR https://github.com/djcb/mu/issues
2010-08-15 19:29:15 +02:00
.SH AUTHOR
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
.SH "SEE ALSO"
2018-03-10 21:08:17 +01:00
.BR maildir (5),
.BR mu (1),
2020-02-06 19:22:43 +01:00
.BR mu-init (1),
2018-03-10 21:08:17 +01:00
.BR mu-find (1),
.BR mu-cfind (1)