Xapian supports an "ngrams" option to help with languages/scripts
without explicit wordbreaks, such as Chinese / Japanese / Korean.
Add some plumbing for supporting this in mu as well. Experimental for
now.
This makes queries where we don't need the sexp much faster; e.g.
before:
mu find "a" --include-related 47,51s user 2,68s system 99% cpu 50,651 total
after:
mu find "a" --include-related 7,12s user 1,97s system 87% cpu 10,363 total
We were dumping the HTML-parts as-is in the Xapian indexer; however,
it's better to remove the html decoration first, and just pass the text.
We use the new built-in html->text scraper for that.
This is a bit of hack to include html text in results.
Of course, html text is not really plain text, so this is a bit of a
hack until we introduce some html parsing step.
1. Also add 'normal' terms for some indexable fields
2. Add terms for e-mail address components
And add some tests.
This helps for some corner-case queries (see tests).
Fixes#2278Fixes#2281