lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
/*
|
2022-02-22 21:58:31 +01:00
|
|
|
** Copyright (C) 2020-2022 Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
**
|
|
|
|
** This library is free software; you can redistribute it and/or
|
|
|
|
** modify it under the terms of the GNU Lesser General Public License
|
|
|
|
** as published by the Free Software Foundation; either version 2.1
|
|
|
|
** of the License, or (at your option) any later version.
|
|
|
|
**
|
|
|
|
** This library is distributed in the hope that it will be useful,
|
|
|
|
** but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
** Lesser General Public License for more details.
|
|
|
|
**
|
|
|
|
** You should have received a copy of the GNU Lesser General Public
|
|
|
|
** License along with this library; if not, write to the Free
|
|
|
|
** Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
|
|
|
|
** 02110-1301, USA.
|
|
|
|
*/
|
|
|
|
|
2019-12-16 21:41:17 +01:00
|
|
|
#ifndef __MU_UTILS_HH__
|
|
|
|
#define __MU_UTILS_HH__
|
|
|
|
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
#include <string>
|
2022-02-26 08:45:16 +01:00
|
|
|
#include <string_view>
|
2020-01-18 12:38:41 +01:00
|
|
|
#include <sstream>
|
2017-10-26 20:31:22 +02:00
|
|
|
#include <vector>
|
2020-06-26 18:21:04 +02:00
|
|
|
#include <chrono>
|
2019-12-16 21:41:17 +01:00
|
|
|
#include <cstdarg>
|
2020-01-05 00:15:07 +01:00
|
|
|
#include <glib.h>
|
2020-01-23 23:21:53 +01:00
|
|
|
#include <ostream>
|
2020-06-26 18:21:04 +02:00
|
|
|
#include <iostream>
|
2021-10-18 11:22:26 +02:00
|
|
|
#include <type_traits>
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
2019-12-16 21:41:17 +01:00
|
|
|
namespace Mu {
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
2020-01-05 00:15:07 +01:00
|
|
|
using StringVec = std::vector<std::string>;
|
|
|
|
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
/**
|
|
|
|
* Flatten a string -- downcase and fold diacritics etc.
|
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
*
|
|
|
|
* @return a flattened string
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string utf8_flatten(const char* str);
|
|
|
|
inline std::string
|
|
|
|
utf8_flatten(const std::string& s)
|
|
|
|
{
|
|
|
|
return utf8_flatten(s.c_str());
|
|
|
|
}
|
2019-03-23 16:00:25 +01:00
|
|
|
|
2017-10-28 13:13:09 +02:00
|
|
|
/**
|
|
|
|
* Replace all control characters with spaces, and remove leading and trailing space.
|
|
|
|
*
|
|
|
|
* @param dirty an unclean string
|
|
|
|
*
|
|
|
|
* @return a cleaned-up string.
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string utf8_clean(const std::string& dirty);
|
2017-10-28 13:13:09 +02:00
|
|
|
|
2021-03-16 16:07:39 +01:00
|
|
|
/**
|
|
|
|
* Remove ctrl characters, replacing them with ' '; subsequent
|
|
|
|
* ctrl characters are replaced by a single ' '
|
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
*
|
|
|
|
* @return the string without control characters
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string remove_ctrl(const std::string& str);
|
2021-03-16 16:07:39 +01:00
|
|
|
|
2017-10-26 20:31:22 +02:00
|
|
|
/**
|
2022-02-22 21:58:31 +01:00
|
|
|
* Split a string in parts. As a special case, splitting an empty string
|
|
|
|
* yields an empty vector (not a vector with a single empty element)
|
2017-10-26 20:31:22 +02:00
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
* @param sepa the separator
|
|
|
|
*
|
|
|
|
* @return the parts.
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::vector<std::string> split(const std::string& str, const std::string& sepa);
|
2017-10-26 20:31:22 +02:00
|
|
|
|
2022-03-19 09:58:13 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Split a string in parts. As a special case, splitting an empty string
|
|
|
|
* yields an empty vector (not a vector with a single empty element)
|
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
* @param sepa the separator
|
|
|
|
*
|
|
|
|
* @return the parts.
|
|
|
|
*/
|
|
|
|
std::vector<std::string> split(const std::string& str, char sepa);
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Join the strings in svec into a string, separated by sepa
|
|
|
|
*
|
|
|
|
* @param svec a string vector
|
|
|
|
* @param sepa separator
|
|
|
|
*
|
|
|
|
* @return string
|
|
|
|
*/
|
|
|
|
std::string join(const std::vector<std::string>& svec, const std::string& sepa);
|
|
|
|
static inline std::string join(const std::vector<std::string>& svec, char sepa) {
|
|
|
|
return join(svec, std::string(1, sepa));
|
|
|
|
}
|
|
|
|
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
/**
|
2020-06-08 22:04:05 +02:00
|
|
|
* Quote & escape a string for " and \
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
*
|
|
|
|
* @return quoted string
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string quote(const std::string& str);
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Format a string, printf style
|
|
|
|
*
|
|
|
|
* @param frm format string
|
|
|
|
* @param ... parameters
|
|
|
|
*
|
|
|
|
* @return a formatted string
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string format(const char* frm, ...) __attribute__((format(printf, 1, 2)));
|
2019-12-16 21:41:17 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Format a string, printf style
|
|
|
|
*
|
|
|
|
* @param frm format string
|
|
|
|
* @param ... parameters
|
|
|
|
*
|
|
|
|
* @return a formatted string
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string vformat(const char* frm, va_list args) __attribute__((format(printf, 1, 0)));
|
2019-12-16 21:41:17 +01:00
|
|
|
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
/**
|
2020-01-25 18:31:20 +01:00
|
|
|
* Convert an date to the corresponding time expressed as a string with a
|
|
|
|
* 10-digit time_t
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
*
|
2020-01-25 18:31:20 +01:00
|
|
|
* @param date the date expressed a YYYYMMDDHHMMSS or any n... of the first
|
|
|
|
* characters.
|
|
|
|
* @param first whether to fill out incomplete dates to the start or the end;
|
|
|
|
* ie. either 1972 -> 197201010000 or 1972 -> 197212312359
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
*
|
2020-05-12 23:56:55 +02:00
|
|
|
* @return the corresponding time_t expressed as a string
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string date_to_time_t_string(const std::string& date, bool first);
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
|
|
|
/**
|
2017-11-04 12:30:23 +01:00
|
|
|
* 64-bit incarnation of time_t expressed as a 10-digit string. Uses 64-bit for the time-value,
|
|
|
|
* regardless of the size of time_t.
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
*
|
2017-11-04 12:30:23 +01:00
|
|
|
* @param t some time value
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
*
|
|
|
|
* @return
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string date_to_time_t_string(int64_t t);
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
2021-11-10 20:32:46 +01:00
|
|
|
/**
|
|
|
|
* Get a string for a given time_t and format
|
|
|
|
* memory that must be freed after use.
|
|
|
|
*
|
|
|
|
* @param frm the format of the string (in strftime(3) format)
|
|
|
|
* @param t the time as time_t
|
|
|
|
* @param utc whether to display as UTC(if true) or local time
|
|
|
|
*
|
|
|
|
* @return a string representation of the time in UTF8-format, or empty in case
|
|
|
|
* of error.
|
|
|
|
*/
|
|
|
|
std::string time_to_string(const std::string& frm, time_t t, bool utc = false) G_GNUC_CONST;
|
|
|
|
|
2021-11-02 21:24:17 +01:00
|
|
|
/**
|
|
|
|
* Create a std::string by consuming a gchar* array; this takes ownership
|
|
|
|
* of str which should no longer be used.
|
|
|
|
*
|
|
|
|
* @param str a gchar* or NULL (latter taken as "")
|
|
|
|
*
|
|
|
|
* @return a std::string
|
|
|
|
*/
|
|
|
|
static inline std::string
|
|
|
|
from_gchars(gchar*&& str)
|
|
|
|
{
|
|
|
|
std::string s{str ? str : ""};
|
|
|
|
g_free(str);
|
|
|
|
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
2020-06-27 10:51:34 +02:00
|
|
|
using Clock = std::chrono::steady_clock;
|
|
|
|
using Duration = Clock::duration;
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
2021-10-20 11:18:15 +02:00
|
|
|
template <typename Unit>
|
|
|
|
constexpr int64_t
|
|
|
|
to_unit(Duration d)
|
|
|
|
{
|
|
|
|
using namespace std::chrono;
|
|
|
|
return duration_cast<Unit>(d).count();
|
2020-06-26 18:21:04 +02:00
|
|
|
}
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
2021-10-20 11:18:15 +02:00
|
|
|
constexpr int64_t
|
|
|
|
to_s(Duration d)
|
|
|
|
{
|
|
|
|
return to_unit<std::chrono::seconds>(d);
|
|
|
|
}
|
|
|
|
constexpr int64_t
|
|
|
|
to_ms(Duration d)
|
|
|
|
{
|
|
|
|
return to_unit<std::chrono::milliseconds>(d);
|
|
|
|
}
|
|
|
|
constexpr int64_t
|
|
|
|
to_us(Duration d)
|
|
|
|
{
|
|
|
|
return to_unit<std::chrono::microseconds>(d);
|
|
|
|
}
|
2020-06-27 10:51:34 +02:00
|
|
|
|
2020-11-26 08:23:52 +01:00
|
|
|
struct StopWatch {
|
2021-10-20 11:18:15 +02:00
|
|
|
using Clock = std::chrono::steady_clock;
|
|
|
|
StopWatch(const std::string name) : start_{Clock::now()}, name_{name} {}
|
|
|
|
~StopWatch()
|
|
|
|
{
|
2022-02-18 09:49:56 +01:00
|
|
|
const auto us{static_cast<double>(to_us(Clock::now() - start_))};
|
2021-10-20 11:18:15 +02:00
|
|
|
if (us > 2000000)
|
2022-02-18 09:49:56 +01:00
|
|
|
g_debug("%s: finished after %0.1f s", name_.c_str(), us / 1000000);
|
2021-10-20 11:18:15 +02:00
|
|
|
else if (us > 2000)
|
2022-02-18 09:49:56 +01:00
|
|
|
g_debug("%s: finished after %0.1f ms", name_.c_str(), us / 1000);
|
2021-10-20 11:18:15 +02:00
|
|
|
else
|
2022-02-18 09:49:56 +01:00
|
|
|
g_debug("%s: finished after %g us", name_.c_str(), us);
|
2021-10-20 11:18:15 +02:00
|
|
|
}
|
|
|
|
|
2021-11-02 21:24:17 +01:00
|
|
|
private:
|
2021-10-20 11:18:15 +02:00
|
|
|
Clock::time_point start_;
|
|
|
|
std::string name_;
|
2020-11-26 08:23:52 +01:00
|
|
|
};
|
|
|
|
|
2020-06-27 16:00:57 +02:00
|
|
|
/**
|
|
|
|
* See g_canonicalize_filename
|
|
|
|
*
|
|
|
|
* @param filename
|
|
|
|
* @param relative_to
|
|
|
|
*
|
|
|
|
* @return
|
|
|
|
*/
|
|
|
|
std::string canonicalize_filename(const std::string& path, const std::string& relative_to);
|
|
|
|
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
/**
|
|
|
|
* Convert a size string to a size in bytes
|
|
|
|
*
|
|
|
|
* @param sizestr the size string
|
|
|
|
* @param first
|
|
|
|
*
|
|
|
|
* @return the size expressed as a string with the decimal number of bytes
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string size_to_string(const std::string& sizestr, bool first);
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Convert a size into a size in bytes string
|
|
|
|
*
|
|
|
|
* @param size the size
|
|
|
|
* @param first
|
|
|
|
*
|
|
|
|
* @return the size expressed as a string with the decimal number of bytes
|
|
|
|
*/
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string size_to_string(int64_t size);
|
2020-01-05 00:15:07 +01:00
|
|
|
|
2020-01-18 12:38:41 +01:00
|
|
|
/**
|
|
|
|
* Convert any ostreamable<< value to a string
|
|
|
|
*
|
|
|
|
* @param t the value
|
|
|
|
*
|
|
|
|
* @return a std::string
|
|
|
|
*/
|
|
|
|
template <typename T>
|
2021-10-20 11:18:15 +02:00
|
|
|
static inline std::string
|
|
|
|
to_string(const T& val)
|
2020-01-18 12:38:41 +01:00
|
|
|
{
|
2021-10-20 11:18:15 +02:00
|
|
|
std::stringstream sstr;
|
|
|
|
sstr << val;
|
2020-01-18 12:38:41 +01:00
|
|
|
|
2021-10-20 11:18:15 +02:00
|
|
|
return sstr.str();
|
2020-01-18 12:38:41 +01:00
|
|
|
}
|
|
|
|
|
2022-02-26 08:45:16 +01:00
|
|
|
/**
|
|
|
|
* Convert string view in something printable with %*s
|
|
|
|
*/
|
|
|
|
#define STR_V(sv__) static_cast<int>((sv__).size()), (sv__).data()
|
|
|
|
|
2020-06-26 18:21:04 +02:00
|
|
|
struct MaybeAnsi {
|
2021-10-20 11:18:15 +02:00
|
|
|
explicit MaybeAnsi(bool use_color) : color_{use_color} {}
|
|
|
|
|
|
|
|
enum struct Color {
|
|
|
|
Black = 30,
|
|
|
|
Red = 31,
|
|
|
|
Green = 32,
|
|
|
|
Yellow = 33,
|
|
|
|
Blue = 34,
|
|
|
|
Magenta = 35,
|
|
|
|
Cyan = 36,
|
|
|
|
White = 37,
|
|
|
|
|
|
|
|
BrightBlack = 90,
|
|
|
|
BrightRed = 91,
|
|
|
|
BrightGreen = 92,
|
|
|
|
BrightYellow = 93,
|
|
|
|
BrightBlue = 94,
|
|
|
|
BrightMagenta = 95,
|
|
|
|
BrightCyan = 96,
|
|
|
|
BrightWhite = 97,
|
|
|
|
};
|
|
|
|
|
|
|
|
std::string fg(Color c) const { return ansi(c, true); }
|
|
|
|
std::string bg(Color c) const { return ansi(c, false); }
|
|
|
|
|
|
|
|
std::string reset() const { return color_ ? "\x1b[0m" : ""; }
|
|
|
|
|
2021-11-02 21:24:17 +01:00
|
|
|
private:
|
2021-10-20 11:18:15 +02:00
|
|
|
std::string ansi(Color c, bool fg = true) const
|
|
|
|
{
|
|
|
|
return color_ ? format("\x1b[%dm", static_cast<int>(c) + (fg ? 0 : 10)) : "";
|
|
|
|
}
|
|
|
|
|
|
|
|
const bool color_;
|
2020-06-26 18:21:04 +02:00
|
|
|
};
|
|
|
|
|
2020-01-05 00:15:07 +01:00
|
|
|
/// Allow using enum structs as bitflags
|
2021-10-20 11:18:15 +02:00
|
|
|
#define MU_TO_NUM(ET, ELM) std::underlying_type_t<ET>(ELM)
|
|
|
|
#define MU_TO_ENUM(ET, NUM) static_cast<ET>(NUM)
|
2022-02-26 08:46:06 +01:00
|
|
|
#define MU_ENABLE_BITOPS(ET) \
|
|
|
|
constexpr ET operator&(ET e1, ET e2) \
|
|
|
|
{ \
|
|
|
|
return MU_TO_ENUM(ET, MU_TO_NUM(ET, e1) & MU_TO_NUM(ET, e2)); \
|
|
|
|
} \
|
|
|
|
constexpr ET operator|(ET e1, ET e2) \
|
|
|
|
{ \
|
|
|
|
return MU_TO_ENUM(ET, MU_TO_NUM(ET, e1) | MU_TO_NUM(ET, e2)); \
|
|
|
|
} \
|
|
|
|
constexpr ET operator~(ET e) { return MU_TO_ENUM(ET, ~(MU_TO_NUM(ET, e))); } \
|
|
|
|
constexpr bool any_of(ET e) { return MU_TO_NUM(ET, e) != 0; } \
|
|
|
|
constexpr bool none_of(ET e) { return MU_TO_NUM(ET, e) == 0; } \
|
2022-03-04 23:38:59 +01:00
|
|
|
constexpr bool one_of(ET e1, ET e2) { return (e1 & e2) == e2; } \
|
2022-02-26 08:46:06 +01:00
|
|
|
constexpr ET& operator&=(ET& e1, ET e2) { return e1 = e1 & e2; } \
|
|
|
|
constexpr ET& operator|=(ET& e1, ET e2) { return e1 = e1 | e2; }
|
2020-01-05 00:15:07 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* For unit tests, assert two std::string's are equal.
|
|
|
|
*
|
|
|
|
* @param s1 string1
|
|
|
|
* @param s2 string2
|
|
|
|
*/
|
2022-02-21 22:21:04 +01:00
|
|
|
#define assert_equal(s1__,s2__) do { \
|
2022-02-22 21:58:31 +01:00
|
|
|
std::string s1s__(s1__), s2s__(s2__); \
|
2022-02-21 22:21:04 +01:00
|
|
|
g_assert_cmpstr(s1s__.c_str(), ==, s2s__.c_str()); \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
|
2020-01-05 00:15:07 +01:00
|
|
|
/**
|
|
|
|
* For unit-tests, allow warnings in the current function.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
void allow_warnings();
|
|
|
|
|
2019-12-16 21:41:17 +01:00
|
|
|
} // namespace Mu
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
2019-12-16 21:41:17 +01:00
|
|
|
#endif /* __MU_UTILS_HH__ */
|