Go to file
Nicolas Avrutin eb9bfbb1ca Perform threading calculation on related set instead of entire result.
The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.

This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.

To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.

This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.

Perform threading calculation on related set instead of entire result.

The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.

This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.

To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.

This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.
2018-07-09 02:41:22 -04:00
.github Create issue_template.md 2016-03-28 17:54:09 +03:00
build-aux build: add dummy config.rpath 2018-05-29 10:59:20 +03:00
contrib gmime-test: dump (unencode) body, too. 2017-05-06 13:12:38 +03:00
guile update compiler warnings, fix them 2018-06-11 10:49:07 +03:00
lib Perform threading calculation on related set instead of entire result. 2018-07-09 02:41:22 -04:00
m4 guile: add some more m4 for guile detection 2018-06-05 17:49:30 +03:00
man fix a typo in the mu-query man page 2018-03-17 18:54:54 -07:00
mu update compiler warnings, fix them 2018-06-11 10:49:07 +03:00
mu4e mu4e: cleanup handler functions 2018-06-30 21:14:00 +03:00
perl perl: Add a .gitignore file with the MYMETA build assets 2017-02-17 14:43:16 +01:00
toys toys: fix compiler warnings 2017-10-24 09:17:27 +03:00
www www: update 2016-12-11 18:34:14 +02:00
.gitignore gitignore: update 2017-12-10 13:50:25 +02:00
.travis.yml Travis: use recent autoconf 2.69 and recent autoconf-archive 2016-12-14 01:57:33 +01:00
AUTHORS * initial import of mu - the next generation 2009-11-25 22:55:06 +02:00
COPYING * initial import of mu - the next generation 2009-11-25 22:55:06 +02:00
ChangeLog Fix incorrect ChangeLog. 2015-08-08 08:54:33 +07:00
HACKING HACKING: update build instructions 2016-12-15 08:21:25 +02:00
Makefile.am perl: disable build 2018-05-28 13:55:16 +03:00
NEWS Add the missing NEWS 2015-06-09 21:08:02 +03:00
NEWS.org update NEWS.org 2018-02-11 12:02:53 +02:00
README Two minor fixes to README 2016-12-17 13:28:05 +00:00
README.org Two minor fixes to README 2016-12-17 13:28:05 +00:00
TODO * update TODO 2012-12-02 22:57:47 +02:00
autogen.sh Replace Bash-specific [[]] with POSIX sh [] in autogen.sh 2018-01-24 19:30:04 -08:00
c.cfg Add uncrustify configuration for C code 2014-10-20 15:00:53 +03:00
configure.ac guile: require guile 2.2 2018-06-11 13:10:29 +03:00
gtest.mk * gtest.mk: fix for magical cd 2012-07-20 11:56:07 +03:00

README

                                README
                                ======
 
Welcome to mu! 
---------------

  Given the enormous amounts of e-mail many people gather and the importance of
  e-mail message in our work-flows, it's essential to quickly deal with all that
  mail - in particular, to instantly find that one important e-mail you need right
  now.
  
  [mu] is a tool for dealing with e-mail messages stored in the
  Maildir-format. =mu='s purpose in life is to help you to quickly find the
  messages you need; in addition, it allows you to view messages, extract
  attachments, create new maildirs, and so on. See the [mu cheatsheet] for some
  examples. =mu= is fully documented.
  
  After indexing your messages into a [Xapian]-database, you can search them using
  a custom query language. You can use various message fields or words in the
  body text to find the right messages.
  
  Built on top of =mu= are some extensions (included in this package):

  - mu-for-emacs ([mu4e]): a full-featured e-mail client that runs inside emacs
  - [mu-guile]: bindings for the Guile/Scheme programming language (version 2.0
    and later)
  - a toy GTK+-interface called 'mug' (in the 'toys/' subdir)

  =mu= is written in C and a bit of C++ (to interface with Xapian), with =mu4e=
  written in [Emacs-Lisp] and =mu-guile= in a mix of C and Scheme.
  
  Note, =mu= is available in Debian/Ubuntu under the name =maildir-utils=;
  apparently because they don't like short names. It's also possible to confuse
  that name with the [GNU Mailutils] project (which is totally unrelated) - but
  now you have been warned.
  

  [mu]: http://www.djcbsoftware.nl/code/mu
  [mu cheatsheet]: http://www.djcbsoftware.nl/code/mu/cheatsheet.html
  [Xapian]: http://www.xapian.org
  [mu4e]: http://www.djcbsoftware.nl/code/mu/mu4e.html
  [mu-guile]: http://www.djcbsoftware.nl/code/mu/mu-guile.html
  [Emacs-Lisp]: http://en.wikipedia.org/wiki/Emacs-Lisp
  [GNU Mailutils]: http://mailutils.org/