1
0
mirror of https://github.com/djcb/mu.git synced 2024-06-25 07:28:02 +02:00
mu/lib/parser/parser.hh
djcb b75f9f508b lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.

Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).

Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.

The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.

From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.

From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
      subject:/h.ll?o/
  will find subjects with hallo, hello, halo,  philosophy, ...

  As you can imagine, this can be a _heavy_ operation on the database,
  and might take quite a bit longer than a normal query; but it can be
  quite useful.
2017-10-24 22:55:35 +03:00

90 lines
2.1 KiB
C++

/*
** Copyright (C) 2017 Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
**
** This library is free software; you can redistribute it and/or
** modify it under the terms of the GNU Lesser General Public License
** as published by the Free Software Foundation; either version 2.1
** of the License, or (at your option) any later version.
**
** This library is distributed in the hope that it will be useful,
** but WITHOUT ANY WARRANTY; without even the implied warranty of
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
** Lesser General Public License for more details.
**
** You should have received a copy of the GNU Lesser General Public
** License along with this library; if not, write to the Free
** Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
** 02110-1301, USA.
*/
#ifndef __PARSER_HH__
#define __PARSER_HH__
#include <string>
#include <vector>
#include <memory>
#include <parser/data.hh>
#include <parser/tree.hh>
#include <parser/proc-iface.hh>
// A simple recursive-descent parser for queries. Follows the Xapian syntax,
// but better handles non-alphanum; also implements regexp
namespace Mux {
/**
* A parser warning
*
*/
struct Warning {
size_t pos; /**< pos in string */
const std::string msg; /**< warning message */
/**
* operator==
*
* @param rhs right-hand side
*
* @return true if rhs is equal to this; false otherwise
*/
bool operator==(const Warning& rhs) const {
return pos == rhs.pos && msg == rhs.msg;
}
};
/**
* operator<<
*
* @param os an output stream
* @param w a warning
*
* @return the updated output stream
*/
inline std::ostream&
operator<< (std::ostream& os, const Warning& w)
{
os << w.pos << ":" << w.msg;
return os;
}
/**
* Parse a query string
*
* @param query a query string
* @param warnings vec to receive warnings
* @param proc a Processor object
*
* @return a parse-tree
*/
using WarningVec=std::vector<Warning>;
using ProcPtr = const std::unique_ptr<ProcIface>&;
Tree parse (const std::string& query, WarningVec& warnings,
ProcPtr proc = std::make_unique<DummyProc>());
} // namespace Mux
#endif /* __PARSER_HH__ */