lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
/*
|
|
|
|
** Copyright (C) 2017 Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
|
|
|
|
**
|
|
|
|
** This library is free software; you can redistribute it and/or
|
|
|
|
** modify it under the terms of the GNU Lesser General Public License
|
|
|
|
** as published by the Free Software Foundation; either version 2.1
|
|
|
|
** of the License, or (at your option) any later version.
|
|
|
|
**
|
|
|
|
** This library is distributed in the hope that it will be useful,
|
|
|
|
** but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
** Lesser General Public License for more details.
|
|
|
|
**
|
|
|
|
** You should have received a copy of the GNU Lesser General Public
|
|
|
|
** License along with this library; if not, write to the Free
|
|
|
|
** Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
|
|
|
|
** 02110-1301, USA.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <string>
|
2017-10-26 20:31:22 +02:00
|
|
|
#include <vector>
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
|
|
|
|
#ifndef __UTILS_HH__
|
|
|
|
#define __UTILS_HH__
|
|
|
|
|
|
|
|
namespace Mux {
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Flatten a string -- downcase and fold diacritics etc.
|
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
*
|
|
|
|
* @return a flattened string
|
|
|
|
*/
|
|
|
|
std::string utf8_flatten (const std::string& str);
|
|
|
|
|
2017-10-28 13:13:09 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Replace all control characters with spaces, and remove leading and trailing space.
|
|
|
|
*
|
|
|
|
* @param dirty an unclean string
|
|
|
|
*
|
|
|
|
* @return a cleaned-up string.
|
|
|
|
*/
|
|
|
|
std::string utf8_clean (const std::string& dirty);
|
|
|
|
|
|
|
|
|
2017-10-26 20:31:22 +02:00
|
|
|
/**
|
|
|
|
* Split a string in parts
|
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
* @param sepa the separator
|
|
|
|
*
|
|
|
|
* @return the parts.
|
|
|
|
*/
|
|
|
|
std::vector<std::string> split (const std::string& str,
|
|
|
|
const std::string& sepa);
|
|
|
|
|
lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
2017-10-24 21:55:35 +02:00
|
|
|
/**
|
|
|
|
* Quote & escape a string
|
|
|
|
*
|
|
|
|
* @param str a string
|
|
|
|
*
|
|
|
|
* @return quoted string
|
|
|
|
*/
|
|
|
|
std::string quote (const std::string& str);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Format a string, printf style
|
|
|
|
*
|
|
|
|
* @param frm format string
|
|
|
|
* @param ... parameters
|
|
|
|
*
|
|
|
|
* @return a formatted string
|
|
|
|
*/
|
|
|
|
std::string format (const char *frm, ...)
|
|
|
|
__attribute__((format(printf, 1, 2)));
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Convert an ISO date to the corresponding time expressed as a string
|
|
|
|
* with a 10-digit time_t
|
|
|
|
*
|
|
|
|
* @param date
|
|
|
|
* @param first
|
|
|
|
*
|
|
|
|
* @return
|
|
|
|
*/
|
|
|
|
std::string date_to_time_t_string (const std::string& date, bool first);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* time_t expressed as a string with a 10-digit time_t
|
|
|
|
*
|
|
|
|
* @param t
|
|
|
|
*
|
|
|
|
* @return
|
|
|
|
*/
|
|
|
|
std::string date_to_time_t_string (time_t t);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Convert a size string to a size in bytes
|
|
|
|
*
|
|
|
|
* @param sizestr the size string
|
|
|
|
* @param first
|
|
|
|
*
|
|
|
|
* @return the size expressed as a string with the decimal number of bytes
|
|
|
|
*/
|
|
|
|
std::string size_to_string (const std::string& sizestr, bool first);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Convert a size into a size in bytes string
|
|
|
|
*
|
|
|
|
* @param size the size
|
|
|
|
* @param first
|
|
|
|
*
|
|
|
|
* @return the size expressed as a string with the decimal number of bytes
|
|
|
|
*/
|
|
|
|
std::string size_to_string (int64_t size);
|
|
|
|
|
|
|
|
} // namespace Mux
|
|
|
|
|
|
|
|
#endif /* __UTILS_HH__ */
|