Xapian supports an "ngrams" option to help with languages/scripts
without explicit wordbreaks, such as Chinese / Japanese / Korean.
Add some plumbing for supporting this in mu as well. Experimental for
now.
When passing messages to mu, often we got a (parsed from string)
message-sexp from the message document; then appended some more
properties ("build_message_sexp").
Instead, we can do it in terms of the strings; this is _a little_
inelegant, but also much faster; compare:
(base)
[mu4e] Found 500 matching messages; 0 hidden; search: 1298.0 ms (2.60 ms/msg); render: 642.1 ms (1.28 ms/msg)
(with temp-file optimization (earlier commit)
[mu4e] Found 500 matching messages; 0 hidden; search: 1152.7 ms (2.31 ms/msg); render: 270.1 ms (0.54 ms/msg)
(with temp file optimize _and_ the string opt (this commit)
[mu4e] Found 500 matching messages; 0 hidden; search: 266.0 ms (0.53 ms/msg); render: 199.7 ms (0.40 ms/msg)
This makes queries where we don't need the sexp much faster; e.g.
before:
mu find "a" --include-related 47,51s user 2,68s system 99% cpu 50,651 total
after:
mu find "a" --include-related 7,12s user 1,97s system 87% cpu 10,363 total
We were dumping the HTML-parts as-is in the Xapian indexer; however,
it's better to remove the html decoration first, and just pass the text.
We use the new built-in html->text scraper for that.
This is a bit of hack to include html text in results.
Of course, html text is not really plain text, so this is a bit of a
hack until we introduce some html parsing step.
Previously, mu generated a fake message ID for messages without a
Message-ID header. This fake message ID allows these messages to show in
an --include-related query. However, if a message contained a Message-ID
header with the value equal to the empty string, we did not generate a
fake message ID in the index, and consequently, these messages failed to
appear in an --include-related query. This change uses a fake message ID
when the Message-ID header is absent _or_ empty.
Since 2008, autotools has served us well - thank you!
However, mu is now using meson build, and it's time to remove the
autotools support -- one build system is enough.
Clean up the implementation at bit, and filter out 'fake' message-ids,
such as the ones from protonmail.
Update documentation.
Add Mu::Message::thread_id().
This fixes#2312.