Commit Graph

25 Commits

Author SHA1 Message Date
djcb bc891030f6 mu: fix utf-8 flatten 2019-03-24 11:43:51 +02:00
djcb da10f30adf utils: small optimization in utf8_flatten
In the common path, avoid building an unneeded std::string. This should
up in some profiles.
2019-03-23 17:00:25 +02:00
Ulrich Ölmann d37a961c8f parser: fix clang-7.0.1 warnings
Without this commit clang++-7.0.1 whines

|   CXX      parser.lo
| parser.cc:138:15: warning: braces around scalar initializer [-Wbraced-scalar-init]
|         return Tree({{Node::Type::Range},
|                      ^~~~~~~~~~~~~~~~~~~
2019-01-11 06:49:01 +01:00
djcb 7b6bccd49a parser: avoid query parsing error
See #1261.
2018-11-11 13:15:08 +02:00
djcb 052a228be7 add optional support for building with asan 2018-11-04 12:31:32 +02:00
djcb 7a8d43dc5f only use OP_WILDCARD for xapian >= 1.3.3
It's not available for earlier versions.
2018-05-19 22:22:41 +03:00
djcb 6290e4ad9a query-parser: special-case wildcards
We were transforming wild-card searches into regular-expression
searches; while that works, it's also significantly slower.

So, instead, special-case wildcards, and use the Xapian machinery for
wildcard queries.
2018-05-19 11:20:58 +03:00
djcb b4cc67d455 parser/tests: allow for DST change
e.g. 21d / 2w are subject to DST; update the tolerance.
2018-03-31 18:44:17 +03:00
djcb ebbe3ea023 mu: _XOPEN_SOURCE: fix typo 2018-03-10 13:05:44 +02:00
djcb 6fe67b354d lib/parser: fix month days
In the olden days, we stored dates like e.g. 20180131121234, and do a
lexicographical check. With that, we could use e.g. upper-limits
201802312359 for "all dates in Feb 2018", even if Feb doesn't have 31
days.

However, nowadays we use time_t values, and g_date_time_new_local raises
errors for non-existent days; easiest fix is to massage things a bit; so
let's do that.

Fixes issue #1197.
2018-02-17 18:07:13 +02:00
djcb 15ba4699ab lib/parser: use g_vasprintf, _XOPEN_SOURCE
Attempt to restore building on Cygwin.
2018-02-11 12:02:53 +02:00
djcb f840d0deaa parser: promote single value to a range for range-fields
Treat e.g. 'date:20170101' as 'date:20170101..20170101', just like
the Xapian parser does.
2017-12-03 12:39:31 +02:00
djcb f794cea6e7 parser: small regex optimization 2017-11-04 14:32:41 +02:00
djcb 6a0654c91b parser/utils: enforce 64-bit times on 32-bit platforms
don't assume a 64-bit platform.
2017-11-04 11:30:23 +00:00
djcb 3cd150f289 parser: handle implicit 'and not' 2017-11-04 12:59:48 +02:00
djcb 65863e46cd parser: fix and-not precedence
For now, don't treat "and not" specially; this gets us back into a
somewhat working state. At some point, we probably _do_ want to
special-case and_not though (since Xapian supports it).
2017-10-31 07:18:14 +02:00
djcb 57b5fe6156 mu: some optimizations
add fast-path for (common) plain-ascii. fix silly static misuse.

should improve indexing with some single-digit percentage.
2017-10-29 13:34:57 +02:00
djcb 55ffb524db tokenizer: clean unicode-aware 2017-10-28 14:13:09 +03:00
djcb 0e5e8b6bce parser: add more tests 2017-10-28 14:12:50 +03:00
djcb 6ce7c89488 phrases: only allow for index fields 2017-10-27 18:42:58 +03:00
djcb fe18603843 parser: fix some post-c++14 code
don't require anything post c++14
2017-10-27 18:40:37 +03:00
djcb 160d3ec036 query-parser: cleanup source string
Ensure there's no non-' ' whitespace, and no trailing/leading spaces.
2017-10-27 01:21:57 +03:00
djcb 7cd7d118e2 query-parser: support phrase queries 2017-10-26 21:31:22 +03:00
djcb 5e9cafea59 integrate new query parser 2017-10-25 23:50:17 +03:00
djcb b75f9f508b lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.

Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).

Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.

The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.

From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.

From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
      subject:/h.ll?o/
  will find subjects with hallo, hello, halo,  philosophy, ...

  As you can imagine, this can be a _heavy_ operation on the database,
  and might take quite a bit longer than a normal query; but it can be
  quite useful.
2017-10-24 22:55:35 +03:00