mu/lib
Daniel Colascione 26b3110b8f Avoid word-splitting regular expression matches
Previously, we would conduct regular expression searches by
enumerating all values of a given term, manually regex-matching each
one against our search regular expression, remember all the term
values that matched our regular expression, then do a big Xapian
OR-query that matched any of those term values. In constructing this
OR-query, however, we would split each term value on space and add a
separate Xapian phrase search term for each resulting word. This
approach worked fine most of the time, beacuse when we index a term,
we index both each word in a term and the whole term by itself.

This word splitting produced false negatives in some matches, however,
because Xapian and the Mu-level word splitting code do word splitting
slightly differently and apply different transformations to the text
while splitting.  (For example, Xapian transforms fancy Unicode
apostrophes to ASCII apostrophes.)

This patch avoids the problem by not word splitting when constructing
the big Xapian OR-query for finding the results of regular
expression matching.
2022-11-20 10:18:01 +02:00
..
index autotools: remove 2022-08-20 11:19:29 +03:00
message message: updates for new sexp 2022-11-07 18:38:03 +02:00
tests store: update for new sexp api 2022-11-07 18:38:03 +02:00
thirdparty thirdparty: include CLI11 2022-11-17 11:00:06 +02:00
utils script: Rework guile script with new CLI support 2022-11-17 11:00:06 +02:00
doxyfile.in * lib: doxygen support (WIP, just starting...) 2012-10-27 14:42:21 +03:00
meson.build lib: remove mu-runtime 2022-11-16 23:31:51 +02:00
mu-bookmarks.cc clang-format: update c/cc coding style 2021-10-20 12:26:16 +03:00
mu-bookmarks.hh clang-format: update c/cc coding style 2021-10-20 12:26:16 +03:00
mu-contacts-cache.cc tests: update test helpers and users 2022-08-11 22:55:10 +03:00
mu-contacts-cache.hh contacts-cache: return most relevant contacts 2022-05-09 22:25:28 +03:00
mu-maildir.cc mu-maildir: improve error handling / reporting 2022-10-30 11:27:54 +02:00
mu-maildir.hh maildir: improve testing coverage 2022-06-29 22:19:26 +03:00
mu-parser.cc Avoid word-splitting regular expression matches 2022-11-20 10:18:01 +02:00
mu-parser.hh query-parser: tidy up 2022-06-14 23:15:27 +03:00
mu-query-match-deciders.cc query-match-deciders: cosmetics 2022-06-10 23:28:43 +03:00
mu-query-match-deciders.hh clang-format: update c/cc coding style 2021-10-20 12:26:16 +03:00
mu-query-results.hh query-threads/results: cosmetics 2022-06-09 00:39:34 +03:00
mu-query-threads.cc query-threads/results: cosmetics 2022-06-09 00:39:34 +03:00
mu-query-threads.hh clang-format: update c/cc coding style 2021-10-20 12:26:16 +03:00
mu-query.cc query: exclude some test code from coverage 2022-06-29 22:20:09 +03:00
mu-query.hh query: update query subsys to use Message 2022-04-30 10:40:45 +03:00
mu-script.cc script: Rework guile script with new CLI support 2022-11-17 11:00:06 +02:00
mu-script.hh script: Rework guile script with new CLI support 2022-11-17 11:00:06 +02:00
mu-server.cc lib: remove mu-runtime 2022-11-16 23:31:51 +02:00
mu-server.hh server: rework for updated Sexp/CommandHandler 2022-11-07 18:38:03 +02:00
mu-store.cc store: update for new sexp api 2022-11-07 18:38:03 +02:00
mu-store.hh store: support reinit 2022-10-02 18:24:23 +03:00
mu-tokenizer.cc clang-format: update c/cc coding style 2021-10-20 12:26:16 +03:00
mu-tokenizer.hh clang-format: update c/cc coding style 2021-10-20 12:26:16 +03:00
mu-tree.hh Avoid word-splitting regular expression matches 2022-11-20 10:18:01 +02:00
mu-xapian.cc Avoid word-splitting regular expression matches 2022-11-20 10:18:01 +02:00
mu-xapian.hh query-parser: tidy up 2022-06-14 23:15:27 +03:00
tokenize.cc clang-format: update c/cc coding style 2021-10-20 12:26:16 +03:00