Commit Graph

1232 Commits

Author SHA1 Message Date
Dirk-Jan C. Binnema 11003000e8 mu: log warning when exiting with error 2023-09-23 09:26:37 +03:00
Dirk-Jan C. Binnema 04e3a2f9a8 mu-utils: improve tests 2023-09-21 23:41:15 +03:00
Dirk-Jan C. Binnema 6ce94ce914 mu-utils: add to_string_view 2023-09-21 23:41:15 +03:00
Dirk-Jan C. Binnema 8ba153067b mu-maildir: use the new run_command0
And fix some docstrings.
2023-09-21 23:41:15 +03:00
Dirk-Jan C. Binnema b771fd6394 query-parser: handle naked NOT, add tests
We weren't correctly expanding "naked NOT" -> AND_NOT

Fixes #2559.
2023-09-21 19:29:59 +03:00
Dirk-Jan C. Binnema 1df4452ff3 server: properly delete output-stream files
logic inversion
2023-09-20 21:35:55 +03:00
Dirk-Jan C. Binnema 24add72126 mu-file-utils: add run_command0
To ensure command ran and had exit-code=0 in one go.
2023-09-19 22:26:45 +03:00
Dirk-Jan C. Binnema ae87be6a48 flags: add flags_mail_dir_file util
And some whitespace cleanup
2023-09-19 22:26:45 +03:00
Dirk-Jan C. Binnema 472f69beb2 utils-file: default args for canonicalize_filename / determine_dtype
Make a little easier to use
2023-09-19 22:26:30 +03:00
Dirk-Jan C. Binnema b5b90a0673 query-parser: 'not' should take units
NOT should bind more tightly.
2023-09-19 22:11:18 +03:00
Dirk-Jan C. Binnema 5bda8c321b query: move phrasification to mu-query-parser
Do the "phrasification" for matching fields later during query parsing;
this allows for handling combination fields correctly.

Also match both the normal term and the "phrase term", so we catch more
cases. Update/extend unit tests.

This fixes the "kata-container" issue also for body test.

Fixes #2167.
2023-09-17 18:11:21 +03:00
Dirk-Jan C. Binnema 7cbab21099 utils: add utf8_wordbreak
Determine if a string has wordbreaks in a mostly Xapian-compatible way.
We need this to determine what strings should be considered "phrases".
2023-09-17 18:11:10 +03:00
Dirk-Jan C. Binnema 94c90bd0c5 fields: 'phrasable' instead of 'indexable'
'Phrasable' is probably a bit clearer description.
2023-09-17 18:11:10 +03:00
Dirk-Jan C. Binnema a2046dc2b1 mu-index: add blocking start()
Useful for unit tests
2023-09-16 11:12:16 +03:00
Dirk-Jan C. Binnema c78dafd723 provide end-user hints and show them
Only a few for now.
2023-09-16 11:12:16 +03:00
Dirk-Jan C. Binnema 3123f3e983 mu-error: allow for adding end-user hints 2023-09-16 11:12:16 +03:00
Dirk-Jan C. Binnema 0a12b70d7b utils-file: improve mu_play
implement in terms of run_command
2023-09-13 23:03:51 +03:00
Dirk-Jan C. Binnema 9dcbe1d96c lib: unit tests: improve / better coverage 2023-09-13 23:02:53 +03:00
Dirk-Jan C. Binnema 7c16d080d2
Merge pull request #2552 from dme/devel/misc
mu: Fix "expected command" server error report
2023-09-12 22:28:30 +03:00
Dirk-Jan C. Binnema 2f5602b938 unit tests: improve
and add a new one for the indexer
2023-09-12 21:38:57 +03:00
Dirk-Jan C. Binnema 805c5aa287 mu-query: remove unnused move ctor 2023-09-12 21:35:47 +03:00
David Edmondson a8440bb258 mu: Fix "expected command" server error report 2023-09-12 08:37:10 +01:00
Dirk-Jan C. Binnema 8287b9802e lib: replace mu-bookmarks with mu-query-macros
And add some unit tests.
2023-09-11 23:54:56 +03:00
Dirk-Jan C. Binnema e290158bcd query-xapianizer: map empty range queries to match-nothing
And only run Xapian tests if they are compatible with the version we
have.
2023-09-11 23:51:37 +03:00
Dirk-Jan C. Binnema af9eb36ca0 unit-tests: modernize
Use TempDir, join_paths etc.
2023-09-11 23:51:37 +03:00
Dirk-Jan C. Binnema 567bc001ef lib/doxyfile.in: remove
Not used any longer
2023-09-11 23:51:37 +03:00
Dirk-Jan C. Binnema 2357db5bf1 query-processor: only phrasify indexable terms 2023-09-11 23:51:37 +03:00
Dirk-Jan C. Binnema 8c5f92bacc query-xapianizer: improve testing coverage 2023-09-11 23:51:37 +03:00
Dirk-Jan C. Binnema f6dc1f7427 scanner: add more unit tests 2023-09-11 23:51:37 +03:00
Dirk-Jan C. Binnema 192c67262a lib: hide some code from coverage checking
Parts that are not easy / useful to unit-test
2023-09-11 23:51:37 +03:00
Dirk-Jan C. Binnema 85ad35bd8e utils/unbroken: avoid pre-C++20 compiler warning 2023-09-10 10:15:33 +03:00
Dirk-Jan C. Binnema c8568eecd4 utils/file: add basename/dirname helpers and use them 2023-09-10 10:15:28 +03:00
Dirk-Jan C. Binnema 53c7381929 lib: move index/ into main lib/
simplify things a bit
2023-09-10 08:55:25 +03:00
Dirk-Jan C. Binnema 3e5cec0d05 tests: update for new query parser / ngrams 2023-09-09 17:57:42 +03:00
Dirk-Jan C. Binnema 89ed21e0c5 lib: improve printability for some types
A little fmt pixie dust
2023-09-09 17:26:20 +03:00
Dirk-Jan C. Binnema 264bb092f0 support xapian ngrams
Xapian supports an "ngrams" option to help with languages/scripts
without explicit wordbreaks, such as Chinese / Japanese / Korean.

Add some plumbing for supporting this in mu as well. Experimental for
now.
2023-09-09 17:26:20 +03:00
Dirk-Jan C. Binnema a9bd6e69d3 lib: implement new query parser
Implement a new query parser; the results should be very similar to the
old one, but it adds an Sexp middle-representation, so users can see how
a query is interpreted.
2023-09-09 11:59:59 +03:00
Dirk-Jan C. Binnema 9c28c65d45 utils: handle "unbroken" scripts
Do not removing combining characters from scripts without explicit word
boundaries, such as those for CJK.

Reuse some Xapian code for that.
2023-09-09 11:40:36 +03:00
Dirk-Jan C. Binnema 77a8a67f6c move lib/thirdparty to thirdparty/ 2023-09-05 08:34:27 +03:00
Dirk-Jan C. Binnema 3f8381134e test: move test messages to testdata/ 2023-09-05 08:34:27 +03:00
Dirk-Jan C. Binnema e1308a9b40 utils: small tweaks 2023-08-27 11:07:55 +03:00
Dirk-Jan C. Binnema b2918e2bea mu-priority: implement priority_from_name 2023-08-21 18:29:21 +03:00
Dirk-Jan C. Binnema bd17c218fb mu-flags: get flag-info for shortcut, too 2023-08-21 18:29:21 +03:00
Dirk-Jan C. Binnema c1950ae0cb mu-utils: support UTC in parse_date_time
Parsing dates known to be in UTC.
2023-08-21 18:29:21 +03:00
Dirk-Jan C. Binnema f73aad2b41 better handle maildir cache
- get an updated maildir list after indexing
- add mu4e-added items to the list opportunistically

Remove mu4e-clear-caches / mu4e-cache-maildir-list to mu4e-obsolete.el

Fixes #2537.
2023-08-19 20:04:50 +03:00
Dirk-Jan C. Binnema 15f08488d3 remove Mu::format, use mu_format
Use the new fmt-based formatting.
2023-08-19 20:04:50 +03:00
Dirk-Jan C. Binnema 1a1eb1f906 server: refactor allow-for-temp-file handling
Add a helper OutputStream class so both "normal" and temp-file code can
be handled uniformly.
2023-08-17 22:42:25 +03:00
Dirk-Jan C. Binnema a16d288c70 server: implement 'data' handler / maildirs
Add a new command 'data' for getting kinds of 'data'. There's one kind
for now: "maildirs". This retrieves the list as per Store::maildirs().
2023-08-17 22:42:25 +03:00
Dirk-Jan C. Binnema e52030c049 store: expose maildirs() method
This gets the current list of maildirs by asking the scanner to do a
file-system search.
2023-08-17 22:42:25 +03:00
Dirk-Jan C. Binnema f5beea2eb2 scanner: add maildir-scan mode; improve portability
Use d_ino (struct dirent) only when available.

Implement a mode for scanning just maildirs (ie. the dirs with cur / new
in them). Use d_type (if available) to optimize that.
2023-08-17 22:42:25 +03:00
Dirk-Jan C. Binnema 8caf504381 store: update "move" and some related APIs
Update test cases as well.
2023-08-17 22:42:25 +03:00
Dirk-Jan C. Binnema 6168d776e1 server: fix contacts handler
Condition was b0rked; clean up code a bit.
2023-08-11 19:57:00 +03:00
Dirk-Jan C. Binnema 11df0bedce utils: add mu_print[ln] for ostreams 2023-08-11 19:57:00 +03:00
Dirk-Jan C. Binnema 7aa38d0b56 option/result: add "unwrap"
Sprinkle some more Rust on Option & Result
2023-08-09 23:24:47 +03:00
Dirk-Jan C. Binnema 04219b55f7 message & friends: make formattable
So we can easily debug-print them.
2023-08-09 23:24:47 +03:00
Dirk-Jan C. Binnema 843c086b2c indexer: fix build 2023-08-07 19:09:19 +03:00
Dirk-Jan C. Binnema fabeb4a89a index: fix lazy indexing
After the previous changes
2023-08-07 08:49:44 +03:00
Dirk-Jan C. Binnema 253b44043b indexer: disable lazy check for "full" scan
Lazy would actually do _more_ work a full scan.
2023-08-06 16:19:43 +03:00
Dirk-Jan C. Binnema 4ecf386cda utils-file: don't use regexp in join_paths
It's slow.
2023-08-06 16:19:43 +03:00
Dirk-Jan C. Binnema 01a516f0d3 server: tweak sexp generation 2023-08-06 16:19:43 +03:00
Dirk-Jan C. Binnema 4945e699c8 lib/mu: use fmt-based time/date formatting
For a small speedup
2023-08-06 16:19:43 +03:00
Dirk-Jan C. Binnema 27c07280b1 utils: replace time_to_string with fmt-based formatting
It's faster; make "mu find" ~5-10% faster, and removes some code we no
longer need.
2023-08-06 16:19:43 +03:00
Dirk-Jan C. Binnema 6dfb2aae7b server: don't use structured-bindings / lambda for contacts_handler
Older clang doesn't like that.
2023-08-04 22:00:51 +03:00
Dirk-Jan C. Binnema 75c37a506b server: don't use structured-bindings / lambda
Older clang doesn't like that.
2023-08-04 21:44:58 +03:00
Dirk-Jan C. Binnema f89e4c26d7 server: attempt to appease clang (pair/tuple)
https://stackoverflow.com/questions/46114214/lambda-implicit-capture-fails-with-variable-declared-from-structured-binding
2023-08-04 21:21:49 +03:00
Dirk-Jan C. Binnema 051fdb4ccf lib/config: set default batch-size to 50000
The default was 250000 but that led to problems on some systems with
limited memory, esp. since mu's indexing does quite a bit more than in
the olden days (e.g. html mail).

Fixes #2529.
2023-08-04 00:09:02 +03:00
Dirk-Jan C. Binnema 25151aad00 mu-query: small optimization tweaks 2023-08-04 00:09:02 +03:00
Dirk-Jan C. Binnema aea95b5be0 mu-server: use strings, not sexps object (optimization)
When passing messages to mu, often we got a (parsed from string)
message-sexp from the message document; then appended some more
properties ("build_message_sexp").

Instead, we can do it in terms of the strings; this is _a little_
inelegant, but also much faster; compare:

(base)
[mu4e] Found 500 matching messages; 0 hidden; search: 1298.0 ms (2.60 ms/msg); render: 642.1 ms (1.28 ms/msg)

(with temp-file optimization (earlier commit)
[mu4e] Found 500 matching messages; 0 hidden; search: 1152.7 ms (2.31 ms/msg); render: 270.1 ms (0.54 ms/msg)

(with temp file optimize _and_ the string opt (this commit)
[mu4e] Found 500 matching messages; 0 hidden; search: 266.0 ms (0.53 ms/msg); render: 199.7 ms (0.40 ms/msg)
2023-08-04 00:09:02 +03:00
Dirk-Jan C. Binnema 1018f0f0a1 mu-document: Make sexp() lazy (optimization)
This makes queries where we don't need the sexp much faster; e.g.

before:
   mu find "a" --include-related  47,51s user 2,68s system 99% cpu 50,651 total
after:
  mu find "a" --include-related  7,12s user 1,97s system 87% cpu 10,363 total
2023-08-04 00:09:02 +03:00
Dirk-Jan C. Binnema 924bb2145e mu-server: implement temp-file optimization
It can be faster to feed big mu -> mu4e data, such as contacts are
message headers through a temp-file instead directly though stdout;
implement this, and add the server parameter --allow-temp-file.

Implement this the "contacts" and "find" commands.
2023-08-04 00:09:02 +03:00
Dirk-Jan C. Binnema 111e48efa3 utils: add expand_path (wordexp wrapper)
For expanding command-line options for shells that don't do that by themselves.
2023-08-03 22:47:27 +03:00
Dirk-Jan C. Binnema 33fd79a9f0 mu-regex: add multiline test 2023-07-30 00:50:45 +03:00
Dirk-Jan C. Binnema 3a38d6366a mu-view: test locale to C for tests 2023-07-29 17:25:07 +03:00
Dirk-Jan C. Binnema 766d1849ff test-utils: add TempTz, RAII temporary timezone 2023-07-29 16:39:08 +03:00
Dirk-Jan C. Binnema 1f0342a91f mu-view: add unit-test 2023-07-28 19:43:46 +03:00
Dirk-Jan C. Binnema dc29dc8395 html-to-text: add missing include <array> 2023-07-26 23:30:54 +03:00
Dirk-Jan C. Binnema c06e765d13 html-to-text: be explicit with array type
clang in CI fails to deduce it, so let's help it a bit.
2023-07-26 23:24:29 +03:00
Dirk-Jan C. Binnema 455119f695 Merge branch 'wip/djcb/html-to-text' 2023-07-26 19:11:41 +03:00
Dirk-Jan C. Binnema da290c21a9 benchmark: improve setup
Add some useful make targets, and separate (optimized) build.
2023-07-25 23:56:19 +03:00
Dirk-Jan C. Binnema 4c0b7db3d8 store: add 'add_document' optimization, use it
*Usually* we need Xapian's replace_document() API, but when we know a
document (message) is completely new, we can use the faster
add_document(). That is the case with the initial (re)indexing, when
start with an empty database.

Also a few smaller cleanups.
2023-07-25 23:56:19 +03:00
Dirk-Jan C. Binnema 4d8ba5f579 index/scanner: implement i-node sorting
On rotational devices (HDD) processing direntries is much faster when
doing so sorted by i-node for the dir-entries. This is an old
optimization (perhaps mu <= 1.6 or so?) that was implemented yet after
indexing changed, likely because my systems use SDDs instead!

But, let's restore that optimization; the sorting is fast enough that we
don't care for SDDs; on HDD it should be quite a bit faster.
2023-07-25 22:39:12 +03:00
Dirk-Jan C. Binnema b795242d5a message: use html-to-text scraper for html parts
We were dumping the HTML-parts as-is in the Xapian indexer; however,
it's better to remove the html decoration first, and just pass the text.

We use the new built-in html->text scraper for that.
2023-07-25 21:26:36 +03:00
Dirk-Jan C. Binnema 56b8fad89e utils: implement html-to-text
Implement a crude html-to-text scraper function, to extract plain text
from html messages, so we can use it for indexing.
2023-07-25 21:26:36 +03:00
Dirk-Jan C. Binnema 11c807f955 utils/readline: use fmt-based apis 2023-07-25 21:26:01 +03:00
Dirk-Jan C. Binnema 9580d11fef utils/result: add std::move version of Err
Avoid a copy in some situations
2023-07-25 21:26:01 +03:00
Dirk-Jan C. Binnema 72f43f11df lib: improve store error messages
Use xapian_try_result
2023-07-23 21:04:26 +03:00
Dirk-Jan C. Binnema d374d94031 clang: avoid some build warnings 2023-07-23 21:04:26 +03:00
Dirk-Jan C. Binnema 7b38f094c4 migrate some more code to mu_format / join_paths
Let's modernize a bit.
2023-07-20 23:14:29 +03:00
Dirk-Jan C. Binnema 6ad5cccc53 store/index: and unit test for circular symlink
Check that we bail out early
2023-07-18 23:18:21 +03:00
Dirk-Jan C. Binnema 885903c496 index: limit length of maildir path to MaxTermLength
This limit was already in place, but now we detect it a bit earlier (in
the indexer). We _could_ increase it (by using hashes for dirstamps), but
right now it's a good catch for circular symlinks.
2023-07-18 23:18:21 +03:00
Dirk-Jan C. Binnema cf6c5a36d7 utils: rework running system commands
Use g_spawn and pass arguments, so we don't involve a shell that needs
escaping etc.

Improve error handling.
2023-07-18 20:19:27 +03:00
Dirk-Jan C. Binnema e8462e0204 lib/index: add rudimentary scanner test
Make the defunct existing one a working test.
2023-07-18 19:08:16 +03:00
Dirk-Jan C. Binnema 99a0eaaa76 lib/store: improve dirstamp / set_dirstamp code
Modernize.
2023-07-11 22:54:01 +03:00
Dirk-Jan C. Binnema 545494225a lib/contacts-cache: improve code 2023-07-11 22:54:01 +03:00
Dirk-Jan C. Binnema 6f69f5d482 utils/mu-regex: add move constructor 2023-07-11 22:54:01 +03:00
Dirk-Jan C. Binnema f3bfdf5add lib/maildir: use mv for moving to avoid warnings
using gio gives some (false, we assume) valgrind warnings, so for now
use 'mv' instead.

Also slightly update the code with some mu_format overtaking format.
2023-07-10 23:17:06 +03:00
Dirk-Jan C. Binnema 18490a818d store/server: centralize docids-for-msgid
No need for two near-identical impls

Remove some dead declarations.
2023-07-10 23:17:06 +03:00
Dirk-Jan C. Binnema 0b4f7c4cbe lib: xapian-db/store: simplify
No need for "pimpl" in xapian-db; keep it simple.
2023-07-10 23:15:40 +03:00
Dirk-Jan C. Binnema cc65b8b401 utils: add some more helpers for test code
Creating and removing (temp) dirs, running mu commands.
2023-07-10 23:15:40 +03:00
Dirk-Jan C. Binnema 904f64aa03 utils/result: add "unwrap" convenience function 2023-07-10 23:15:40 +03:00