mirror of https://github.com/djcb/mu.git
eb9bfbb1ca
The current threading algorithm is applied to the entire result of a query, even if maxnum is specified, and then the result of the threading algorithm is truncated to maxnum. The improves threading results by returning the entire thread even when only a single message makes it into the top maxnum results. This commit applies the threading algorithm to the related message set of the maxnum-truncated query result instead of to the entire query result. For a given set of messages, the set of messages which will share threads with any of the original messages is exactly the related message sets. Put another way, either any messages returned by the original query but removed by the maxnum truncation will also be returned by the related message query, or they would not have been needed anyway because they would not be members of any visible thread. To maintain backward compatibility and allow threading to be used without including related messages, the related message set is found for the threading calculation, but any messages which would not have matched the original query are then pruned, resulting in a superset of the truncated query, but a subset of the untruncated query. This does not improve (or degrade) the run time of a threading calculation when maxnum is not set, but significant improves it when maxnum is set by making it scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages and maxnum set to 500 (the default), the run time of a threading calculation is lowered from ~1m to ~0.1s. Perform threading calculation on related set instead of entire result. The current threading algorithm is applied to the entire result of a query, even if maxnum is specified, and then the result of the threading algorithm is truncated to maxnum. The improves threading results by returning the entire thread even when only a single message makes it into the top maxnum results. This commit applies the threading algorithm to the related message set of the maxnum-truncated query result instead of to the entire query result. For a given set of messages, the set of messages which will share threads with any of the original messages is exactly the related message sets. Put another way, either any messages returned by the original query but removed by the maxnum truncation will also be returned by the related message query, or they would not have been needed anyway because they would not be members of any visible thread. To maintain backward compatibility and allow threading to be used without including related messages, the related message set is found for the threading calculation, but any messages which would not have matched the original query are then pruned, resulting in a superset of the truncated query, but a subset of the untruncated query. This does not improve (or degrade) the run time of a threading calculation when maxnum is not set, but significant improves it when maxnum is set by making it scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages and maxnum set to 500 (the default), the run time of a threading calculation is lowered from ~1m to ~0.1s. |
||
---|---|---|
.github | ||
build-aux | ||
contrib | ||
guile | ||
lib | ||
m4 | ||
man | ||
mu | ||
mu4e | ||
perl | ||
toys | ||
www | ||
.gitignore | ||
.travis.yml | ||
AUTHORS | ||
COPYING | ||
ChangeLog | ||
HACKING | ||
Makefile.am | ||
NEWS | ||
NEWS.org | ||
README | ||
README.org | ||
TODO | ||
autogen.sh | ||
c.cfg | ||
configure.ac | ||
gtest.mk |
README
README ====== Welcome to mu! --------------- Given the enormous amounts of e-mail many people gather and the importance of e-mail message in our work-flows, it's essential to quickly deal with all that mail - in particular, to instantly find that one important e-mail you need right now. [mu] is a tool for dealing with e-mail messages stored in the Maildir-format. =mu='s purpose in life is to help you to quickly find the messages you need; in addition, it allows you to view messages, extract attachments, create new maildirs, and so on. See the [mu cheatsheet] for some examples. =mu= is fully documented. After indexing your messages into a [Xapian]-database, you can search them using a custom query language. You can use various message fields or words in the body text to find the right messages. Built on top of =mu= are some extensions (included in this package): - mu-for-emacs ([mu4e]): a full-featured e-mail client that runs inside emacs - [mu-guile]: bindings for the Guile/Scheme programming language (version 2.0 and later) - a toy GTK+-interface called 'mug' (in the 'toys/' subdir) =mu= is written in C and a bit of C++ (to interface with Xapian), with =mu4e= written in [Emacs-Lisp] and =mu-guile= in a mix of C and Scheme. Note, =mu= is available in Debian/Ubuntu under the name =maildir-utils=; apparently because they don't like short names. It's also possible to confuse that name with the [GNU Mailutils] project (which is totally unrelated) - but now you have been warned. [mu]: http://www.djcbsoftware.nl/code/mu [mu cheatsheet]: http://www.djcbsoftware.nl/code/mu/cheatsheet.html [Xapian]: http://www.xapian.org [mu4e]: http://www.djcbsoftware.nl/code/mu/mu4e.html [mu-guile]: http://www.djcbsoftware.nl/code/mu/mu-guile.html [Emacs-Lisp]: http://en.wikipedia.org/wiki/Emacs-Lisp [GNU Mailutils]: http://mailutils.org/