mu/lib
Nicolas Avrutin eb9bfbb1ca Perform threading calculation on related set instead of entire result.
The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.

This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.

To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.

This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.

Perform threading calculation on related set instead of entire result.

The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.

This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.

To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.

This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.
2018-07-09 02:41:22 -04:00
..
parser only use OP_WILDCARD for xapian >= 1.3.3 2018-05-19 22:22:41 +03:00
tests test-str: fix arglist test 2017-11-04 13:06:43 +02:00
Makefile.am parser: add more tests 2017-10-28 14:12:50 +03:00
doxyfile.in * lib: doxygen support (WIP, just starting...) 2012-10-27 14:42:21 +03:00
mu-bookmarks.c * update copyright years 2013-03-30 11:32:07 +02:00
mu-bookmarks.h * update copyright years 2013-03-30 11:32:07 +02:00
mu-contacts.c mu: fix strncpy usage 2018-06-11 09:18:27 +03:00
mu-contacts.h cfind: uniquify nicks 2016-12-27 16:21:10 +02:00
mu-container.c lib: add last_child flag to thread information 2018-04-23 01:05:54 -03:00
mu-container.h mu: Make mu_container_splice_grandchildren() do only one thing 2014-08-15 10:11:21 +02:00
mu-date.c mu: fix strncpy usage 2018-06-11 09:18:27 +03:00
mu-date.h mu: remove some dead code 2017-10-25 23:50:17 +03:00
mu-flags.c integrate new query parser 2017-10-25 23:50:17 +03:00
mu-flags.h integrate new query parser 2017-10-25 23:50:17 +03:00
mu-index.c mu: add '--lazy-check' option for indexing 2016-07-23 21:33:10 +03:00
mu-index.h mu: add '--lazy-check' option for indexing 2016-07-23 21:33:10 +03:00
mu-log.c mu: cosmetic 2016-07-23 19:14:13 +03:00
mu-log.h * update copyright years 2013-03-30 11:32:07 +02:00
mu-maildir.c mu: fix some compiler warnings 2017-06-24 12:20:16 +02:00
mu-maildir.h mu: add '--lazy-check' option for indexing 2016-07-23 21:33:10 +03:00
mu-msg-crypto.c mu: include signers in signature report 2017-08-27 17:32:23 +03:00
mu-msg-doc.cc mu: use correct conversion for size 2017-10-30 21:14:20 +02:00
mu-msg-doc.h * update copyright years 2013-03-30 11:32:07 +02:00
mu-msg-fields.c mu: some optimizations 2017-10-29 13:34:57 +02:00
mu-msg-fields.h integrate new query parser 2017-10-25 23:50:17 +03:00
mu-msg-file.c mu: mark some more inline parts as attachments 2016-11-24 22:51:23 +02:00
mu-msg-file.h * update copyright years 2013-03-30 11:32:07 +02:00
mu-msg-iter.cc Fix call to c_str() that sometimes dumps core on OpenBSD i386-current 2015-07-02 15:14:29 -05:00
mu-msg-iter.h lib: add last_child flag to thread information 2018-04-23 01:05:54 -03:00
mu-msg-part.c mu: fix some compiler warnings 2016-12-11 18:33:31 +02:00
mu-msg-part.h mu: include signers in signature report 2017-08-27 17:32:23 +03:00
mu-msg-prio.c integrate new query parser 2017-10-25 23:50:17 +03:00
mu-msg-prio.h integrate new query parser 2017-10-25 23:50:17 +03:00
mu-msg-priv.h Fix #280 2015-02-16 01:19:32 +02:00
mu-msg-sexp.c lib: add last_child flag to thread information 2018-04-23 01:05:54 -03:00
mu-msg.c mu: fix some compiler warnings 2017-06-24 12:20:16 +02:00
mu-msg.h doc: tickle 2017-10-30 21:15:47 +02:00
mu-query.cc Perform threading calculation on related set instead of entire result. 2018-07-09 02:41:22 -04:00
mu-query.h mu: support 'raw' query (internally) 2017-12-03 22:16:32 +02:00
mu-runtime.c Fix some compiler warnings 2016-02-14 12:13:11 +02:00
mu-runtime.h * update copyright years 2013-03-30 11:32:07 +02:00
mu-script.c cosmetic 2017-11-05 13:47:30 +02:00
mu-script.h * update copyright years 2013-03-30 11:32:07 +02:00
mu-store-priv.hh lib: fix a few compiler warnings 2017-01-03 00:37:19 +02:00
mu-store-read.cc mu: optimize indexing (get_uid_term) 2015-11-17 10:55:56 +02:00
mu-store-write.cc integrate new query parser 2017-10-25 23:50:17 +03:00
mu-store.cc lib: fix a few compiler warnings 2017-01-03 00:37:19 +02:00
mu-store.h integrate new query parser 2017-10-25 23:50:17 +03:00
mu-str.c mu: fix quoting/unquoting parameters 2017-10-30 22:06:36 +02:00
mu-str.h integrate new query parser 2017-10-25 23:50:17 +03:00
mu-threader.c cosmetic 2015-10-07 10:34:55 +03:00
mu-threader.h * update copyright years 2013-03-30 11:32:07 +02:00
mu-util.c mu: remove some dead code 2017-10-25 23:50:17 +03:00
mu-util.h mu: remove some dead code 2017-10-25 23:50:17 +03:00