The 'processed' statistic for indexing was more-or-less synonymous for
'updated'; let's change to something more useful, 'checked' which roughly means
the number of messages checked for updates (typically a cheap timestamp check).
Update all cc code using .clang-format; please do so as well for future PRs
etc.; emacs has a handy 'clang-format' mode to make this automatic.
For comparing old changes with git blame, we can disregard this one using
--ignore-rev
(see https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame )
Let's use the _current time_ (time(NULL)) instead of the dir-tstamp for a maildir;
this avoids re-indexing mail where the mails have a newer time, but their
directory hasn't (e.g. 'touch')
Experimental, let see how this works.
We got many reports where the 'lazy check' didn't work too well for
people... so make it a bit less lazy, so it'll just work for more
people.
In practice, never skip _directories_ unless they're leaf directories;
this avoids the mtime-does-not-bubble-up problem.
Not only check for duplicate subjects in *siblings*, also recurse into
the children. This remove some clutter from deeply nested threads.
Fixes: #2078.
When marking a message a read, do the same for the duplicates; this
was the old behavior and the intention of the new behavior but didn't
quite work.
Fixes: #2071.
Some #includes were missing for the latter (but only noticeable on some
systems - e.g., build breaks on Cygwin).
So let's replace with something that works equally everywhere.
Fixes: #2060
The scanner had a timeout for scanning, which doesn't work too well with
machine with rel. fast disks / rel. slow machines. Which I don't happen
to have!
Let's remove the timeout, should help with getting unwanted timeouts
which would cut short the indexing.
mu-query.cc:
- make_related_enquire: don't include first query in qvec, we already have all
thread IDs we need to query in thread_ds.
- run_related: always sort first query by date, explained by the comment.
- run_related: include qflags (in particular ascending vs descending) in
leader_qflags.
- run_theaded: don't limit results to maxnum, that results in threads
potentially being cut off.
mu-server.cc:
- output_sexp: don't limit results to maxnum so as to match the behaviour of
mu find (and avoid cuttong off threads).
Fixes#1924 and #1911.
Calculate the thread subject, that is, the subject of the (sub)thread _or_
empty if it's the same as the previous subject.
This is for the UI feature of _not_ showing the subject when it's just
repeating from the previous.
For threading, we still get the _full_ set of messages (since the mset is
limited, but not the enquire); so no need to warn about docids we
haven't seen before.
Also, ensure the unwanted docids are sorted after the wanted ones.
Fixes: #1926.
Rewrite the query machinery in c++:
- use an MSet decorator instead of the mu-msg-iter stuff
- use mu-query-decider to mark duplicates/unreadable/related messages
- use mu-query-threader to replace the older container/thread code
Algorithm did not substantially change, but the implementation details
did.
Add some Rust-style Result/Option types, based on TartanLlama's
expected, optional classes.
There's std::optional of course, but we can't depend on C++17 yet.
- Move the lib/query/ stuff up a level into lib/
- Associate directly with the Query object
- Rework the Query object to be C++ rather than mixed with C
- Update all dependencies, tests
Using deque gives compilation errors when compiling on
MacOS/clang (where it defaults to libc++ rather than gcc's libstdc++)
```
#include <deque>
struct Foo { std::deque<Foo> foos; };
int main() { Foo foo; }
```
So, let's use a vector instead; this is a drop-in replacement here, but
unfortunately in some future code...
Seems there are problems compiling mu with XCode 11.6 (see build tests);
apparently because of libc++ being different from libstdc++.
clang++ builds works fine as long as we're using libstdc++.
If the user has wants to postpone clean-up we shouldn't lock the
indexer waiting for something that will never happen. Clear the flag
event though we are actually skipping cleanup.
When this function is declared const or pure, clang at -O1 or higher optimizes
away the call to mu_str_size_s() inside mu_str_size(), so that it ignores its
argument and returns whatever is in mu_str_size_s()'s static buffer.
Found when test-mu-str failed while testing an update of mu in OpenBSD's ports tree.
Implement a new message indexer consisting of a single-threaded scanner
and a multi-threaded indexer.
This allows for a number of optimizations as well as background
indexing, though this initial version should be behave similar to the
old indexer.
reimplement the old mu-log.[ch] into mu-logging.{cc,hh}
If available (and using an appropriately equipped glib), log to the
systemd journal
Only g_criticals have stderr output, all the other g_* go to the log
file / journal.
For the new symlink-support, it's better to use the *canonical* path than
the *realpath(3)* for files, so removing a symlinked maildir will work as
expected.
Until now, mu would _not_ follow symlinks; with these changes, we do.
There were some complications with that ~10 years ago, but I forgot the
details. So let's re-enable. At least one thing is in place now: moving
between file systems.
Fixes#1489Fixes#1628 (technically, this came with slightly earlier commit)
When calling mu_maildir_move_message with the new_name
options (workaround for mbsync's), do the src=target check *without* first
creating that new name.
This avoids some unnecessary moves.
Isync uses this by default on Windows where ':' is an invalid character
in file names. Also try to preserve the existing separator character
when generating a new file name.
We were verifying signatures when this was not needed; it seems that
gpgme is a bit slow (?), and on some people's machine the extra
verification made opening messages slow (with the non-gnus view esp.)
Today when we query a find cmd with the `--threads` option, all the
childs of each thread are sorted according to their leader based on
the sortfield.
This patch change the way of how the childs of a thread are sorted.
The threads are still sorted according to their leader but all the
childs of each thread are now sorted based on the sortfield only.
Here is an example of what happened with the previous sorting:
Example with random kernel thread sorted by date:
[PATCH 0/4] drm/panel: jh057n0090: Add regulators and drop magic value in init
┣━▶[PATCH 1/4] MAINTAINERS: Add Purism mail alias as reviewer for their devkit's panel
┣━▶[PATCH 2/4] drm/panel: jh057n0090: Don't use magic constant
┣━▶[PATCH 3/4] dt-bindings: display/panel: jh057n0090: Document power supply properties
┗━▶[PATCH 4/4] drm/panel: jh057n0090: Add regulator support
If someone reply to one of these emails in the middle, this email
become the leader and the thread is displayed like this:
[PATCH 0/4] drm/panel: jh057n0090: Add regulators and drop magic value in init
┣━▶[PATCH 2/4] drm/panel: jh057n0090: Don't use magic constant
┃ ┗━▶ Re: [PATCH 2/4] drm/panel: jh057n0090: Don't use magic constant
┣━▶[PATCH 1/4] MAINTAINERS: Add Purism mail alias as reviewer for their devkit's panel
┣━▶[PATCH 3/4] dt-bindings: display/panel: jh057n0090: Document power supply properties
┗━▶[PATCH 4/4] drm/panel: jh057n0090: Add regulator support
With this patch, we will have the following output:
[PATCH 0/4] drm/panel: jh057n0090: Add regulators and drop magic value in init
┣━▶[PATCH 1/4] MAINTAINERS: Add Purism mail alias as reviewer for their devkit's panel
┣━▶[PATCH 2/4] drm/panel: jh057n0090: Don't use magic constant
┃ ┗━▶ Re: [PATCH 2/4] drm/panel: jh057n0090: Don't use magic constant
┣━▶[PATCH 3/4] dt-bindings: display/panel: jh057n0090: Document power supply properties
┗━▶[PATCH 4/4] drm/panel: jh057n0090: Add regulator support
The tests cases concerning threads have also been updated.
Signed-off-by: Julien Masson <massonju.eseo@gmail.com>
* mu-store.h, mu-store-read.cc, mu-store-write.cc, mu-store-priv.hh have been reworked
in mu-store.{cc,hh}, it the mix of c/c++ improved
* update all the dependent modules
* make it easier to upgrade an database in place (without user intervention)
* remove the xbatch-size option
Instead of using ~/.mu, use the XDG Base Directory Specification, typically:
~/.cache/xapian
~/.cache/mu.log
~/.cache/parts
~/.config/bookmarks
Update dependencies, documentation.
Seems gmime passes them on; and it causes havoc with our contacts cache.
Bump database schema version to force an rebuild (since that's what's
required.)
Rewrite the contacts-cache backend in c++
Store the contacts as metadata in the xapian database, rather than in a
separate file.
Update the Store to deal with this.
Consider all 'inline' text parts attachments too, unless they're
'text/plain' or something that looks like a signature.
It's a heuristic so we might get some new corner-cases.. let's see.
Some mailing lists do _not_ set reply-to, see e.g.,
https://github.com/djcb/mu/pull/1278
In that case, use the 'List-Post' address instead, so the behavior is
the same (in mu4e) as for other mailing lists.
We got some errors when some of the key values exceeded the Xapian
maximum; in particular the message-id.
So make all the key-methods check, and truncate the message-id if
necessary.
The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.
This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.
To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.
This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.
Perform threading calculation on related set instead of entire result.
The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.
This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.
To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.
This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.
We were transforming wild-card searches into regular-expression
searches; while that works, it's also significantly slower.
So, instead, special-case wildcards, and use the Xapian machinery for
wildcard queries.
In the olden days, we stored dates like e.g. 20180131121234, and do a
lexicographical check. With that, we could use e.g. upper-limits
201802312359 for "all dates in Feb 2018", even if Feb doesn't have 31
days.
However, nowadays we use time_t values, and g_date_time_new_local raises
errors for non-existent days; easiest fix is to massage things a bit; so
let's do that.
Fixes issue #1197.
For now, don't treat "and not" specially; this gets us back into a
somewhat working state. At some point, we probably _do_ want to
special-case and_not though (since Xapian supports it).
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.