* mu-store.h, mu-store-read.cc, mu-store-write.cc, mu-store-priv.hh have been reworked
in mu-store.{cc,hh}, it the mix of c/c++ improved
* update all the dependent modules
* make it easier to upgrade an database in place (without user intervention)
* remove the xbatch-size option
Instead of using ~/.mu, use the XDG Base Directory Specification, typically:
~/.cache/xapian
~/.cache/mu.log
~/.cache/parts
~/.config/bookmarks
Update dependencies, documentation.
Seems gmime passes them on; and it causes havoc with our contacts cache.
Bump database schema version to force an rebuild (since that's what's
required.)
Rewrite the contacts-cache backend in c++
Store the contacts as metadata in the xapian database, rather than in a
separate file.
Update the Store to deal with this.
Consider all 'inline' text parts attachments too, unless they're
'text/plain' or something that looks like a signature.
It's a heuristic so we might get some new corner-cases.. let's see.
Some mailing lists do _not_ set reply-to, see e.g.,
https://github.com/djcb/mu/pull/1278
In that case, use the 'List-Post' address instead, so the behavior is
the same (in mu4e) as for other mailing lists.
We got some errors when some of the key values exceeded the Xapian
maximum; in particular the message-id.
So make all the key-methods check, and truncate the message-id if
necessary.
The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.
This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.
To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.
This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.
Perform threading calculation on related set instead of entire result.
The current threading algorithm is applied to the entire result of a query, even
if maxnum is specified, and then the result of the threading algorithm is
truncated to maxnum. The improves threading results by returning the entire
thread even when only a single message makes it into the top maxnum results.
This commit applies the threading algorithm to the related message set of the
maxnum-truncated query result instead of to the entire query result. For a given
set of messages, the set of messages which will share threads with any of the
original messages is exactly the related message sets. Put another way, either
any messages returned by the original query but removed by the maxnum truncation
will also be returned by the related message query, or they would not have been
needed anyway because they would not be members of any visible thread.
To maintain backward compatibility and allow threading to be used without
including related messages, the related message set is found for the threading
calculation, but any messages which would not have matched the original query
are then pruned, resulting in a superset of the truncated query, but a subset of
the untruncated query.
This does not improve (or degrade) the run time of a threading calculation when
maxnum is not set, but significant improves it when maxnum is set by making it
scale (roughly) linearly in terms of maxnum. On a maildir with ~200k messages
and maxnum set to 500 (the default), the run time of a threading calculation is
lowered from ~1m to ~0.1s.
We were transforming wild-card searches into regular-expression
searches; while that works, it's also significantly slower.
So, instead, special-case wildcards, and use the Xapian machinery for
wildcard queries.
In the olden days, we stored dates like e.g. 20180131121234, and do a
lexicographical check. With that, we could use e.g. upper-limits
201802312359 for "all dates in Feb 2018", even if Feb doesn't have 31
days.
However, nowadays we use time_t values, and g_date_time_new_local raises
errors for non-existent days; easiest fix is to massage things a bit; so
let's do that.
Fixes issue #1197.
For now, don't treat "and not" specially; this gets us back into a
somewhat working state. At some point, we probably _do_ want to
special-case and_not though (since Xapian supports it).
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
When you have multiple mu home directories, e.g. for the use case
detailed in my "Changing mu4e-{maildir,mu-home} from a context hook"
post to the mailing list it's quite inconvenient to have to hammer out
"mu --muhome=.. find .." every time you want to run some ad-hoc
command.
This allows me to set up a screen session where I do searches in mu
directory A in some screen panes, and searches in directory B in
others.
I initially called this MU_MUHOME but then I noticed that the perl
plugin has MUP_MU_HOME for analogous functionality, so I'm just
following its example.
The code I'm adding in mu-util.c is just a copy/paste & adjustment of
the same sort of already tested functionality in
mu_util_guess_maildir() just a few lines earlier.
If not, when the session of mu is killed, these child processes are also
killed. This scenario shows up when using mu4e: a PDF attachment, for
example, is opened by Evince, but as soon as Emacs exits, Evince is also
killed.
clear_links as used for the --clear-links option had some broken
filename generation, causing garbage data at the end.
Clean up this old code, and fix this problem as a side-effect.
Fixes issue #951.
mu_util_fputs_encode was aborting on behalf of the stack-guard on
OpenBSD (seemingly only when compile with optimization). It appears as
if the root cause of this was a differences in sizes of the parameters
to g_locale_from_utf8. Fix this.
The callbacks for the contacts functions should return TRUE (or be
terminated early), but were void. Seems on Linux this usually still
worked, not so on OpenBSD at least (unit test broke). So, fix this.
Can't say I fully understand what's going on, but it seems gpg-before-2
has some trouble with its agent, at least when using
gnome-session (which stopped using gnome-keyring as a gpg-agent since
Fedora 23 at least).
Sanity seems to be restored when preferring gpg2 instead. "gpg" is used
when gpg2 isn't there; and there's the MU_GPG_PATH env variable to
override all of that.
Add an option --lazy-check to ignore any directories that don't have
their ctime changed since the last indexing operation.
There are a few corner-cases (such as editing a message outside mu's
control) where this might miss a change, but apart from that, makes
indexing in for a maildir (and its sub-maildirs) almost a no-op if there
were no changes.
Improve the function ``cleanup_filename()`` of ``lib/mu-msg-part.c`` to
use Unicode characters when replacing the control characters, slashes
and colons with ``-``.
Originally, this function just use plain C characters (i.e., assuming
ASCII string) when checking each character is or not a control character,
slash or colon. However, when the attachment filename contains non-ASCII
(e.g., Chinese characters), all the non-ASCII characters are replaced
with ``-``.
For example:
* Before:
```
> mu view test_chinese_attachment_filename.eml
From: Tester <tester@example.com>
To: Example <example@example.com>
Subject: Test email with attachment of Chinese filename
Date: Mon 23 May 2016 05:22:09 PM CST
Attachments: 'attachment-test.txt', '------------.txt', '-------test.txt'
Hello,
This is a simple test email with three attachments:
1. `attachment:test.txt`: filename is all English;
2. `测试附件.txt`: filename is all Chinese (exclude the extension);
3. `附件-test.txt`: filename mixes Chinese and English.
```
* After:
```
> ./build/mu/mu/mu view test_chinese_attachment_filename.eml
From: Tester <tester@example.com>
To: Example <example@example.com>
Subject: Test email with attachment of Chinese filename
Date: Mon 23 May 2016 05:22:09 PM CST
Attachments: 'attachment-test.txt', '测试附件.txt', '附件-test.txt'
Hello,
This is a simple test email with three attachments:
1. `attachment:test.txt`: filename is all English;
2. `测试附件.txt`: filename is all Chinese (exclude the extension);
3. `附件-test.txt`: filename mixes Chinese and English.
```
Add a user-agent property to the full message sexps (i.e., the ones
available in mu4e-view). This property contains either the User-Agent or
X-Mailer string (and is absent otherwise)
Seems people are getting really big mails these days, so let's up the
default (which is also what mu4e uses) to 500 Mb (which should be enough
for everyone, always)
mu: cleanup server side; make sure not to loose 'personal' flag when
seeing same contact in non-personal context
mu4e: tweak the sorting algorithm a bit to take the personal flag into
account
Doing:
!access(...) == 0
Is equivalent to:
(!access(...)) == 0
Not:
!(access(...) == 0)
And throws this warning under clang:
mu-store.cc:77:6: warning: logical not is only applied to the left hand
side of this comparison [-Wlogical-not-parentheses]
if (!access(xpath, F_OK) == 0) {
^ ~~
mu-store.cc:77:6: note: add parentheses after the '!' to evaluate the
comparison first
if (!access(xpath, F_OK) == 0) {
^
( )
mu-store.cc:77:6: note: add parentheses around left hand side expression
to silence this warning
if (!access(xpath, F_OK) == 0) {
^
( )
It ends up doing what the author intended anyway since access() returns
-1 on error, and !-1 == 0, but just do the more obvious check and check
that we don't get 0 here with !=.
Some users were report seeing get_uid_term high in the profiles; so
optimize this:
- make mu_util_get_hash a static inline function (used by get_uid_term)
- don't use 'realpath' in get_uid_term, seem that's the main culprit
- some slight faster string handling there too.
It seems some tools try to interpret the filename of message files,
even though they shouldn't:
"Do not try to extract information from unique names."
In particular, they seem to interpret the first part of the name (before
the first dot) as the # of seconds since the Unix epoch (ie.,
time(NULL)). That's not what mu/mu4e put there.
So, let's conform a bit more to the expected filename (as per the
maildir spec), so we're not confusing those tools.
The core dump only seems to occur if mu4e-headers-include-related is
set to t.
Apparently, std::string's c_str() method is confusing to many
people, c.f.
http://stackoverflow.com/questions/22330250/how-to-return-a-stdstring-c-str
The answer seems to be that the pointer c_str() returns may not be
valid past the current statement; returning it, or even using it
subsequently can have you sending a wild pointer into e.g. g_strdup().
In short, it seems idioms like this are okay:
return g_strcmp0 (s1.c_str(), s2.c_str()) < 0;
Whereas idioms like this are not:
const char *msgid (iter->msgid().c_str());
return msgid ? g_strdup (msgid) : NULL;
At least in my environment by the time we get to g_strdup() the
pointer returned by c_str() is wild and points at garbage. Since
g_strdup() returns NULL if passed NULL, it seems collapsing it into a
single line is not only possible but necessary.
I've looked at all of the calls to c_str() in mu and it appears to
me this was the one remaining one that was bad.
The test fails in some cases with interesting directory setups, although
the function does work. So de-activate the test for now, until we come
up with a better one.
Since `parent` is not really used as a parent, I use it as the last
visited encrypted part while going down the parts-tree.
At the decryption of a part (`mu_msg_crypto_decrypt_part`) I check,
through the GMimeDecryptResult, for signatures (`check_decrypt_result`)
and add them to the part (`tag_with_sig_status`). Any nested parts hold
that encrypted part as their parent. Finally at `handle_part`, for each
part I check if it a descendent of an encrypted part. If so, I proceed
checking for signatures and adding them to the `msgpart`.
This reverts commit 6e9b9ad2d0.
Unfortunately the reverted commit breaks the Signature field for
encrypted and, at the same time, signed messages.
TODO: details button in the Signatures field does not work for such
cases because the signature is encrypted.
Conflicts:
lib/mu-msg-part.c
Add a decryption field of the form
Decryption: 2 part(s) decrypted 1 part(s) failed
Meaning that 2 encrypted mime parts where successfully decrypted and 1
part failed. Note that the number 2 refers to the number of
successfully decrypted mime parts and not the number of successfully
decrypted encryptes multiparts, i.e., if an encrypted multipart
contains 4 parts and decryption is successful the field will be
Decryption: 4 part(s) decrypted
TODO: Add details button listing the names and indexes of the
decrypted (or not) mime-parts
Pull request #483 does not handle encrypted multiparts properly. It
used to just verify the signature and not process the parts of the
multipart. This commit resolves this issue.
Additionally it did not index attachments properly and in the case of a
multipart directly containing more than one multiparts resulted on non
unique indexing of attachments/parts. This commit resolves this issue
as well.
This patch fixes the attachment extraction (open, save, temp) when using
`mu4e`. `mu4e` used to not notify the mu-server about the
mu4e-decryption-policy. As a result mu-server did not decrypt the
attachments for extract, open, or temp.
After a multipart/encrypted part gets decrypted the result is usually a
`multipart/mixed` part (see enigmail).
Before this commit mime multiparts where handled only by
`g_mime_message_foreach`. As a result the decrypted mime multiparts
where not processed.
This patch handles mime multiparts explicitly by removing the
`g_mime_message_foreach` invocations. This might come at the cost of
reduced maintainability, in the case of radical gmime changes. However,
gmime is pretty stable and that scenario is highly unlikely.
TODO: After decryption make any attachments available
When the root set contains only one empty container with one child
first promote the child container to the root set and only then
remove the empty parent container so that the root set never goes
empty.
Also make mu_container_splice_children() do only one thing, that is
promote one container's children to be another container's siblings.
The resultant childless container is no longer removed by this
function.
Fixes#460.
This test reproduces a regression introduced by commit 97101f1f82
("mu: Prune empty container when an only child gets promoted to the
root set").
When the results of a mu-find query contain only a one thread:
$ mu find --threads --fields 'd s' ''
Sat 09 Aug 2014 07:00:00 PM CEST [mu4e] Test Message
`-> Sat 09 Aug 2014 08:30:00 PM CEST Re: [mu4e] Test Message
... and we narrow down the query in such a way that the root message
gets excluded, then a crash occurs:
$ mu find --threads --fields 'd s' '' date:2014-08-09/20:00..2014-08-09/21:00
**
ERROR:mu-container.c:117:mu_container_append_siblings: assertion failed: (c)
Aborted (core dumped)
Reported-by: Josiah Schwab <jschwab@gmail.com>
Traverse the container tree depth first and for each container find
the node in the subtree rooted at this container which comes first in
the descending sort order. Remember it as the subtree leader. Then,
while sorting siblings, compare their subtree leaders instead of the
sibling containers themselves.
IOW, make threads containing the newest message float to the top when
sorting by date in the descending order.
There is no significant performance degradation when sorting a
mailbox with ~16k messages:
$ mu find maildir:/INBOX | wc -l
16503
Current state:
$ perf stat --event=task-clock --repeat=10 -- \
mu find maildir:/INBOX -n 1 -t > /dev/null
Performance counter stats for 'mu find maildir:/INBOX -n 1 -t' (10 runs):
1231.761588 task-clock (msec) # 0.996 CPUs utilized ( +- 1.02% )
1.236209133 seconds time elapsed ( +- 1.08% )
With patch applied:
$ perf stat --event=task-clock --repeat=10 -- \
mu find maildir:/INBOX -n 1 -t > /dev/null
Performance counter stats for 'mu find maildir:/INBOX -n 1 -t' (10 runs):
1459.883316 task-clock (msec) # 0.998 CPUs utilized ( +- 0.72% )
1.462540088 seconds time elapsed ( +- 0.77% )
This implements https://github.com/djcb/mu/issues/164.
This reverts commit c7b28419ab.
The reverted change fails to sort threads correctly when there is an
empty container, serving as a parent to orphan messages, in the thread
tree as demonstrated by the test in commit f49296759e ("tests:
threads: Test if orphan message promotes its thread").
Also, the reverted commit introduces a performance hit. The time it
takes to sort threads has increased roughly by a factor of 4.
Current state:
$ perf stat --event=task-clock --repeat=10 -- \
mu find maildir:/INBOX -n 1 -t > /dev/null
Performance counter stats for 'mu find maildir:/INBOX -n 1 -t' (10 runs):
4967.692519 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% )
4.969247128 seconds time elapsed ( +- 0.14% )
With the reverted patch applied:
$ perf stat --event=task-clock --repeat=10 -- \
mu find maildir:/INBOX -n 1 -t > /dev/null
Performance counter stats for 'mu find maildir:/INBOX -n 1 -t' (10 runs):
1231.761588 task-clock (msec) # 0.996 CPUs utilized ( +- 1.02% )
1.236209133 seconds time elapsed ( +- 1.08% )
The benchmark was ran on a maildir with ~16k messages:
$ mu find maildir:/INBOX | wc -l
16503
When processing multiple lines for a subject line separated by TAB
characters we don't want to eliminate the control character totally but
replace it with a simple space. I've left the control handling as before
for non-white space characters.
Signed-off-by: Alex Bennée <alex@bennee.com>
it seems g_locale_from_utf8 behaves a bit differently on bsd/macosx,
causing a segfault (but when run under gdb!). this code path was hit
for messages with encoding problems in non-utf8 locales
at macos this function /seemed/ to massively leak, when looking at the
valgrind output on macos (but not linux). with this update, this
leak(?) is gone.
The problem was that once a container got a parent, it did not change it anymore
due to the child_elligible condition, but the parent might have been assigned
from an incomplete References sequence.
Now, we make sure the last reference gets to be the message's parent (following
the JWZ's algorithm), reparenting the message if necessary. This makes sense, as
the last parent-child relationship (between last ref and the message) is the
most reliable piece of info here.
Instead of child_elligible, we now only check that the new parent is not a
descendant of the current message, to prevent making a loop. Everything else is
fine, as it only moves a subtree around.
mu_container_append_siblings was showing up high in profiles as it has to
walk chains of next->next->next->... pointers to find the last one. we now
cache the last link in the chain. for listing ~ 23K messages, this saves
about 20%.
use an option enum instead of boolean args for code clarity; allow for
printing an \n before logging to tty (improved mu-index output). allow
for color in logging to tty
(since N (new) messages cannot have any other flags, you would loose
e.g. the T flag when moving to trash; now, we remove the N flag, and the T
flag remains)
which, if true, means that the contact was seen in a message where at least
one of the addresses in the recipients field was 'my' address (this is
decided when in mu-store-write.cc). using this, we can exclude mailing list
posts.
- update the protocol a bit (mu4e-proc, mu-cmd-server)
- provide the user-interface (mu4e-headers.el)
- document it (mu4e.texi, mu-server.1)
- some cosmetics (the other changes)