2013-09-07 07:56:39 +02:00
|
|
|
.TH MU-INDEX 1 "September 2013" "User Manuals"
|
2010-08-15 19:29:15 +02:00
|
|
|
|
2012-05-22 09:19:49 +02:00
|
|
|
.SH NAME
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
mu index \- index e-mail messages stored in Maildirs
|
|
|
|
|
|
|
|
.SH SYNOPSIS
|
|
|
|
|
|
|
|
.B mu index [options]
|
|
|
|
|
|
|
|
.SH DESCRIPTION
|
|
|
|
|
2011-01-02 17:05:43 +01:00
|
|
|
\fBmu index\fR is the \fBmu\fR command for scanning the contents of Maildir
|
|
|
|
directories and storing the results in a Xapian database. The data can then be
|
|
|
|
queried using
|
2010-08-15 19:29:15 +02:00
|
|
|
.BR mu-find(1)
|
2012-05-22 09:19:49 +02:00
|
|
|
\.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.B index
|
2010-09-05 20:20:46 +02:00
|
|
|
understands Maildirs as defined by Daniel Bernstein for qmail(7). In addition,
|
|
|
|
it understands recursive Maildirs (Maildirs within Maildirs), Maildir++. It
|
2013-06-04 00:15:28 +02:00
|
|
|
can also deal with VFAT-based Maildirs which use '!' as the separators instead
|
2010-11-28 12:40:36 +01:00
|
|
|
of ':' as used by \fITinymail\fR/\fIModest\fR and some other e-mail programs.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
2011-05-25 21:04:13 +02:00
|
|
|
E-mail messages which are not stored in something resembling a maildir
|
|
|
|
leaf-directory (\fIcur\fR and \fInew\fR) are ignored, as are the cache
|
|
|
|
directories for \fInotmuch\fR and \fIgnus\fR.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
2010-09-05 20:20:46 +02:00
|
|
|
Symlinks are not followed.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
If there is a file called \fI.noindex\fR in a directory, the contents of that
|
|
|
|
directory and all of its subdirectories will be ignored. This can be useful to
|
|
|
|
exclude certain directories from the indexing process, for example directories
|
|
|
|
with spam-messages.
|
|
|
|
|
2012-05-22 09:19:49 +02:00
|
|
|
If there is a file called \fI.noupdate\fR in a directory, the contents of that
|
|
|
|
directory and all of its subdirectories will be ignored, unless we do a full
|
|
|
|
rebuild (with \fB--rebuild\fR). This can be useful to speed up things you have
|
|
|
|
some maildirs that never change. Note that you can still search for these
|
|
|
|
messages, this only affects updating the database.
|
|
|
|
|
2010-08-15 19:29:15 +02:00
|
|
|
The first run of \fBmu index\fR may take a few minutes if you have a lot of
|
2013-06-04 00:15:28 +02:00
|
|
|
mail (tens of thousands of messages). Fortunately, such a full scan needs to be
|
2011-01-02 17:05:43 +01:00
|
|
|
done only once; after that it suffices to index the changes, which goes much
|
|
|
|
faster. See the 'Note on performance' below for more information.
|
|
|
|
|
|
|
|
The optional 'phase two' of the indexing-process is the removal of messages
|
|
|
|
from the database for which there is no longer a corresponding file in the
|
2010-08-15 19:29:15 +02:00
|
|
|
Maildir. If you do not want this, you can use \fB\-n\fR, \fB\-\-nocleanup\fR.
|
|
|
|
|
2011-01-02 17:05:43 +01:00
|
|
|
When \fBmu index\fR catches one of the signals \fBSIGINT\fR, \fBSIGHUP\fR or
|
2013-06-04 00:15:28 +02:00
|
|
|
\fBSIGTERM\fR (e.g., when you press Ctrl-C during the indexing process), it
|
2010-08-15 19:29:15 +02:00
|
|
|
tries to shutdown gracefully; it tries to save and commit data, and close the
|
2013-06-04 00:15:28 +02:00
|
|
|
database etc. If it receives another signal (e.g., when pressing Ctrl-C once
|
2010-08-15 19:29:15 +02:00
|
|
|
more), \fBmu index\fR will terminate immediately.
|
|
|
|
|
2010-10-25 23:25:14 +02:00
|
|
|
.SH OPTIONS
|
2010-08-15 19:29:15 +02:00
|
|
|
|
2011-01-02 17:05:43 +01:00
|
|
|
Note, some of the general options are described in the \fBmu(1)\fR man-page
|
|
|
|
and not here, as they apply to multiple mu commands.
|
2010-11-12 20:00:03 +01:00
|
|
|
|
2010-08-15 19:29:15 +02:00
|
|
|
.TP
|
|
|
|
\fB\-m\fR, \fB\-\-maildir\fR=\fI<maildir>\fR
|
|
|
|
starts searching at \fI<maildir>\fR. By default, \fBmu\fR uses whatever the
|
2011-01-02 17:05:43 +01:00
|
|
|
\fBMAILDIR\fR environment variable is set to; if it is not set, it tries
|
|
|
|
\fI~/Maildir\fR. See the note on mixing sub-maildirs below.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.TP
|
2012-06-20 08:21:10 +02:00
|
|
|
\fB\-\-my-address\fR=\fI<my-email-address>\fR
|
|
|
|
|
|
|
|
specifies that some e-mail address is 'my-address' (\fB\-\-my-address\fR can
|
|
|
|
be used multiple times). This is used by \fBmu cfind\fR -- any e-mail address
|
|
|
|
found in the address fields of a message which also has
|
2013-06-04 00:15:28 +02:00
|
|
|
\fI<my-email-address>\fR in one of its address fields is considered a
|
2012-06-20 08:21:10 +02:00
|
|
|
\fIpersonal\fR e-mail address. This allows you, for example, to filter out
|
|
|
|
(\fBmu cfind --personal\fR) addresses which were merely seen in mailing list
|
|
|
|
messages.
|
|
|
|
|
2010-08-15 19:29:15 +02:00
|
|
|
.TP
|
2010-08-27 07:26:41 +02:00
|
|
|
\fB\-\-nocleanup\fR
|
2010-08-15 19:29:15 +02:00
|
|
|
disables the database cleanup that \fBmu\fR does by default after indexing.
|
|
|
|
|
|
|
|
.TP
|
2010-08-27 07:26:41 +02:00
|
|
|
\fB\-\-rebuild\fR
|
2013-04-21 22:10:40 +02:00
|
|
|
clear all messages from the database before indexing. \fB\-\-rebuild\fR
|
|
|
|
guarantees that after the indexing has finished, there are no 'old' messages
|
|
|
|
in the database anymore, which is not true with \fB\-\-reindex\fR when
|
|
|
|
indexing only a part of messages (using \fB\-\-maildir\fR). For this reason,
|
|
|
|
it is necessary to run \fBmu index \-\-rebuild\fR when there is an upgrade in
|
|
|
|
the database format. \fBmu index\fR will issue a warning about this.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.TP
|
2011-01-02 17:05:43 +01:00
|
|
|
\fB\-\-autoupgrade\fR
|
|
|
|
automatically use \fB\-y\fR, \fB\-\-empty\fR
|
2010-08-15 19:29:15 +02:00
|
|
|
when \fBmu\fR notices that the database version is not up-to-date. This option
|
|
|
|
is for use in cron scripts and the like, so they won't require any user
|
2010-11-28 12:40:36 +01:00
|
|
|
interaction, even when mu introduces a new database version.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.TP
|
2011-01-02 17:05:43 +01:00
|
|
|
\fB\-\-xbatchsize\fR=\fI<batch size>\fR
|
|
|
|
set the maximum number of messages to process in a single Xapian
|
|
|
|
transaction. In practice, this option is only useful if you find that \fBmu\fR
|
|
|
|
is running out of memory while indexing; in that case, you can set the batch
|
|
|
|
size to (for example) 1000, which will reduce memory consumption, but also
|
2011-01-15 12:27:41 +01:00
|
|
|
substantially reduce the indexing performance.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
\fB\-\-max-msg-size\fR=\fI<max msg size>\fR
|
|
|
|
set the maximum size (in bytes) for messages. The default maximum (currently
|
|
|
|
at 50Mb) should be enough in most cases, but if you encounter warnings from
|
|
|
|
\fBmu\fR about ignoring messsage because they are too big, you may want to
|
|
|
|
increase this. Note that the reason for having a maximum size is that big
|
2015-11-19 07:28:05 +01:00
|
|
|
messages require big memory allocations, which may lead to problems.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.B NOTE:
|
2013-06-04 00:15:28 +02:00
|
|
|
It is not recommended to mix maildirs and sub-maildirs within the hierarchy
|
2011-01-15 12:27:41 +01:00
|
|
|
in the same database; for example, it's better not to index both with
|
|
|
|
\fB\-\-maildir\fR=~/MyMaildir and \fB\-\-maildir\fR=~/MyMaildir/foo, as this
|
2013-06-04 00:15:28 +02:00
|
|
|
may lead to unexpected results when searching with the 'maildir:' search
|
2011-01-15 12:27:41 +01:00
|
|
|
parameter (see below).
|
2010-08-15 19:29:15 +02:00
|
|
|
|
2012-06-20 08:21:10 +02:00
|
|
|
.SS A note on performance (i)
|
2013-06-04 00:15:28 +02:00
|
|
|
As a non-scientific benchmark, a simple test on the author's machine (a
|
2010-11-28 12:40:36 +01:00
|
|
|
Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with no
|
|
|
|
existing database, and a maildir with 27273 messages:
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.nf
|
|
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
|
|
$ time mu index --quiet
|
2010-11-28 12:40:36 +01:00
|
|
|
66,65s user 6,05s system 27% cpu 4:24,20 total
|
2012-10-21 15:02:06 +02:00
|
|
|
.fi
|
2010-11-28 12:40:36 +01:00
|
|
|
(about 103 messages per second)
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
A second run, which is the more typical use case when there is a database
|
|
|
|
already, goes much faster:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
|
|
$ time mu index --quiet
|
2010-11-28 12:40:36 +01:00
|
|
|
0,48s user 0,76s system 10% cpu 11,796 total
|
2012-10-21 15:02:06 +02:00
|
|
|
.fi
|
2012-06-20 08:21:10 +02:00
|
|
|
(more than 56818 messages per second)
|
2010-08-15 19:29:15 +02:00
|
|
|
|
2013-06-04 00:15:28 +02:00
|
|
|
Note that each test flushes the caches first; a more common use case might
|
2010-08-15 19:29:15 +02:00
|
|
|
be to run \fBmu index\fR when new mail has arrived; the cache may stay
|
|
|
|
quite 'warm' in that case:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
$ time mu index --quiet
|
2012-10-21 15:02:06 +02:00
|
|
|
0,33s user 0,40s system 80% cpu 0,905 total
|
|
|
|
.fi
|
2010-11-28 12:40:36 +01:00
|
|
|
which is more than 30000 messages per second.
|
|
|
|
|
2012-06-20 08:21:10 +02:00
|
|
|
|
|
|
|
.SS A note on performance (ii)
|
|
|
|
As per June 2012, we did the same non-scientific benchmark, this time with an
|
|
|
|
Intel) i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589
|
|
|
|
messages.
|
|
|
|
|
|
|
|
.nf
|
|
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
|
|
$ time mu index --quiet
|
|
|
|
27,79s user 2,17s system 48% cpu 1:01,47 total
|
2012-10-21 15:02:06 +02:00
|
|
|
.fi
|
2012-06-20 08:21:10 +02:00
|
|
|
(about 813 messages per second)
|
|
|
|
|
|
|
|
A second run, which is the more typical use case when there is a database
|
|
|
|
already, goes much faster:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
|
|
$ time mu index --quiet
|
|
|
|
0,13s user 0,30s system 19% cpu 2,162 total
|
2012-10-21 15:02:06 +02:00
|
|
|
.fi
|
2012-06-20 08:21:10 +02:00
|
|
|
(more than 173000 messages per second)
|
|
|
|
|
|
|
|
|
2010-11-28 12:40:36 +01:00
|
|
|
In general, \fBmu\fR has been getting faster with each release, even with
|
|
|
|
relatively expensive new features such as text-normalization (for
|
|
|
|
case-insensitve/accent-insensitive matching). The profiles are dominated by
|
|
|
|
operations in the Xapian database now.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.SH FILES
|
|
|
|
By default, \fBmu index\fR stores its message database in \fI~/.mu/xapian\fR;
|
|
|
|
the database has an embedded version number, and \fBmu\fR will automatically
|
|
|
|
update it when it notices a different version. This allows for automatic
|
|
|
|
updating of \fBmu\fR-versions, without the need to clear out any old
|
|
|
|
databases.
|
|
|
|
|
|
|
|
However, note that versions of \fBmu\fR before 0.7 used a different scheme,
|
2012-06-20 08:21:10 +02:00
|
|
|
which puts the database in \fI~/.mu/xapian\-<version>\fR. These older
|
|
|
|
databases can safely be deleted. Starting from version 0.7, this manual
|
|
|
|
cleanup should no longer be needed.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
2010-11-29 20:32:15 +01:00
|
|
|
\fBmu\fR stores logs of its operations and queries in \fI<muhome>/mu.log\fR
|
|
|
|
(by default, this is \fI~/.mu/mu.log\fR). Upon startup, \fBmu\fR checks the
|
|
|
|
size of this log file. If it exceeds 1 MB, it will be moved to
|
|
|
|
\fI~/.mu/mu.log.old\fR, overwriting any existing file of that name, and start
|
|
|
|
with an empty log file. This scheme allows for continued use of \fBmu\fR
|
|
|
|
without the need for any manual maintenance of log files.
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.SH ENVIRONMENT
|
|
|
|
|
2010-08-23 07:23:58 +02:00
|
|
|
\fBmu index\fR uses \fBMAILDIR\fR to find the user's Maildir if it has not
|
2010-11-29 20:32:15 +01:00
|
|
|
been specified explicitly with \fB\-\-maildir\fR=\fI<maildir>\fR. If
|
|
|
|
\fBMAILDIR\fR is not set, \fBmu index\fR will try \fI~/Maildir\fR.
|
2010-08-23 07:23:58 +02:00
|
|
|
|
2011-01-12 22:14:51 +01:00
|
|
|
.SH RETURN VALUE
|
2011-05-25 21:04:13 +02:00
|
|
|
|
2013-09-07 07:56:39 +02:00
|
|
|
\fBmu index\fR return 0 upon successful completion, and any other number
|
|
|
|
greater than 0 signals an error.
|
2011-01-12 22:14:51 +01:00
|
|
|
|
2010-08-15 19:29:15 +02:00
|
|
|
.SH BUGS
|
|
|
|
|
2010-09-05 20:20:46 +02:00
|
|
|
Please report bugs if you find them:
|
2014-12-20 23:08:17 +01:00
|
|
|
.BR https://github.com/djcb/mu/issues
|
2010-08-15 19:29:15 +02:00
|
|
|
|
|
|
|
.SH AUTHOR
|
|
|
|
|
|
|
|
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
|
|
|
|
|
|
|
|
.SH "SEE ALSO"
|
|
|
|
|
2011-05-25 21:04:13 +02:00
|
|
|
.BR maildir(5)
|
|
|
|
.BR mu(1)
|
|
|
|
.BR mu-find(1)
|
2012-06-20 08:21:10 +02:00
|
|
|
.BR mu-cfind(1)
|