Skip to content

Commit

Permalink
Implements various non-exact searching and matching features
Browse files Browse the repository at this point in the history
* docs: update Changelog

* docs: document the new options

* docs: add new settings to example config

* docs: remove short-hands

* docs: document the new settings

* docs: document the new command-line options in the man-page

* ci: mark missing coverage on optional library checks

This code is in fact covered by the `no-optionals` CI run but this is
not picked up by the report of the `coverage` job.

* refactor: extract ListCommand._RESERVED_FIELDS

* fix: remove order dependence for search command tests

* refactor: reorder test functions again

Due to the configuration reset, this order is relevant!

* test: --fuzziness in the list command

* test: --decode-latex and --decode-unicode in the list command

* refactor: reorder unittest methods

* refactor: new argument short-hands

- replaces `-f` with `-z` as the short-hand for `--fuzziness`
  - the idea here is, that `-f` is more likely to come in handy in the
    future (think of `formatting` or `file`-related arguments)
- removes the short-hands for `--(no-)decode-latex` and
  `--(no-)decode-unicode` in the `list` command
  - I think these will be less commonly used (compared to the `search`
    command, where they are more relevant) and this avoids conflicts
    with `-l` already taken up by `--limit`

* feat: expose non-exact filter matching via list command

* test: unittest the new Entry.matches arguments

* feat: extend non-exact matching to Entry.matches

* refactor: make extra Entry.search arguments keyword-only

* test: more timeout exception handling in the ISBNParser tests

* meta: properly test optional dependencies in CI

* [wip] fix: add optional dependency into tox

The unittests should run at least once without it installed.
There must also be a better way of linking to the optional dependencies
listed in the pyproject.toml.

* feat: basic fuzzy searching

This is achieved via an alternate `regex` package which is a new
optional dependency of coBib.

* Lint

* feat: permit LaTeX decoding during search

* feat: permit Unicode decoding during search

* refactor: inline internal Entry._search method

Turns out, we don't need to re-use the code for the file grep highlights
because we don't want to post-process them any further since grep
already returns them in chunks with the correct context.

* fix: re-enable query highlight for file matches

* refactor: support multiple Span inside Match

* refactor: loop merging

* fix: mypy

* refactor: move match module to cobib.utils

* fix: Entry.search unittests

* fix: search command unittests

* refactor: extract internal regex searching method

* refactor: extract Match.stylize

* refactor: track spans from re.Match objects during search

This refactors the handling of search results inside of `Entry.search`.
In the near future, I plan to add the `regex` library as an optional
dependency to support fuzzy regex matching. This will result in the
current word highlighting to fail.
In fact, the current approach already fails to highlight properly for
regex searches.
Instead, in this new approach, we avoid multiple repetitions of
identical regex searches and, instead, parse the matches from the first
search to extract all the relevant spanning data we may need.
  • Loading branch information
mrossinek committed May 25, 2024
1 parent 2229497 commit 6596170
Show file tree
Hide file tree
Showing 16 changed files with 1,165 additions and 136 deletions.
12 changes: 12 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,18 @@ test:
reports:
junit: tests/report-py$PYTHON_VERSION.xml

no-optionals:
stage: test
script:
- tox -e no-optionals
artifacts:
when: always
expire_in: 30 days
paths:
- tests/report-no-optionals.xml
reports:
junit: tests/report-no-optinals.xml

plugin:
stage: test
script:
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- non-exact (or fuzzy) filter matching and search functionality (#107,#130,!177)
- the `list` and `search` commands now support the following features to
perform non-exact filter matching and searching, respectively:
- LaTeX sequences can be decoded to Unicode characters:
- using `--decode-latex` from the command-line
- setting `config.commands.list_.decode_latex = True`
- setting `config.commands.search.decode_latex = True`
- Unicode characters can be converted to a close ASCII equivalent:
- using `--decode-unicode` from the command-line
- setting `config.commands.list_.decode_unicode = True`
- setting `config.commands.search.decode_unicode = True`
- a number of fuzzy errors can be set (this requires the optional dependency
[`regex`](https://pypi.org/project/regex/) to be installed):
- using `--fuzziness <int>` from the command-line
- setting `config.commands.list_.fuzzines` to some integer
- setting `config.commands.search.fuzzines` to some integer
- (DEV) the following method arguments have been converted to be accepted only
as keyword arguments:
- in `cobib.database.Entry.matches`: `ignore_case`
- in `cobib.database.Entry.search`: `context`, `ignore_case`, and `skip_files`
- (DEV) the return-type of `cobib.database.Entry.search` has been changed


## [5.0.1] - 2024-05-01

Expand Down
90 changes: 89 additions & 1 deletion cobib.1
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,39 @@ Makes the entry matching case-sensitive.
This takes precedence over the \fIconfig.commands.list_.ignore_case\fR setting.
.PP
.in +8n
.BR \-\-decode\-latex
.in +4n
Makes the entry matching decode all LaTeX sequences.
This takes precedence over the \fIconfig.commands.list_.decode_latex\fR setting.
.PP
.in +8n
.BR \-\-no\-decode\-latex
.in +4n
Makes the entry matching preserve all LaTeX sequences.
This takes precedence over the \fIconfig.commands.list_.decode_latex\fR setting.
.PP
.in +8n
.BR \-\-decode\-unicode
.in +4n
Makes the entry matching decode all Unicode characters.
This takes precedence over the \fIconfig.commands.list_.decode_unicode\fR
setting.
.PP
.in +8n
.BR \-\-no\-decode\-unicode
.in +4n
Makes the entry matching preserve all Unicode characters.
This takes precedence over the \fIconfig.commands.list_.decode_unicode\fR
setting.
.PP
.in +8n
.BR \-z ", " \-\-fuzziness " " \fI<int>\fI
.in +4n
Specifies how many fuzzy errors to allow during entry matching.
The default value is 0 but can be configured via
\fIconfig.commands.list_.fuzziness\fR.
.PP
.in +8n
.BR \-x ", " \-\-or
.in +4n
Concatenate the filters using logical \fIOR\fR rather than the default
Expand Down Expand Up @@ -438,7 +471,42 @@ This takes precedence over the \fIconfig.commands.search.ignore_case\fR setting.
.BR \-I ", " \-\-no\-ignore\-case
.in +4n
Makes the search case-insensitive.
This takes precedence over the \fIconfig.commands.list_.ignore_case\fR setting.
This takes precedence over the \fIconfig.commands.search.ignore_case\fR setting.
.PP
.in +8n
.BR \-l ", " \-\-decode\-latex
.in +4n
Makes the search decode all LaTeX sequences.
This takes precedence over the \fIconfig.commands.search.decode_latex\fR
setting.
.PP
.in +8n
.BR \-L ", " \-\-no\-decode\-latex
.in +4n
Makes the search preserve all LaTeX sequences.
This takes precedence over the \fIconfig.commands.search.decode_latex\fR
setting.
.PP
.in +8n
.BR \-u ", " \-\-decode\-unicode
.in +4n
Makes the search decode all Unicode characters.
This takes precedence over the \fIconfig.commands.search.decode_unicode\fR
setting.
.PP
.in +8n
.BR \-U ", " \-\-no\-decode\-unicode
.in +4n
Makes the search preserve all Unicode characters.
This takes precedence over the \fIconfig.commands.search.decode_unicode\fR
setting.
.PP
.in +8n
.BR \-z ", " \-\-fuzziness " " \fI<int>\fI
.in +4n
Specifies how many fuzzy errors to allow during search.
The default value is 0 but can be configured via
\fIconfig.commands.search.fuzziness\fR.
.PP
.in +8n
.BR \-\-skip\-files
Expand Down Expand Up @@ -746,6 +814,16 @@ Specifies the default columns displayed during the \fIlist\fR command.
.IR config.commands.list_.ignore_case = False
Specifies whether filter matching should be performed case-insensitive.
.TP
.IR config.commands.list_.decode_unicode = False
Specifies whether filter matching should decode all Unicode characters.
.TP
.IR config.commands.list_.decode_latex = False
Specifies whether filter matching should decode all LaTeX sequences.
.TP
.IR config.commands.list_.fuzziness = 0
Specifies the amount of fuzzy errors to allow for filter matching. Using this
feature requires the optional \fIregex\fR dependency to be installed.
.TP
.IR config.commands.modify.preserve_files = False
Specifies whether associates files should be preserved during renaming.
.TP
Expand All @@ -769,6 +847,16 @@ Allows the specification of additional arguments for the \fIgrep\fR command.
.IR config.commands.search.ignore_case = False
This boolean setting indicates whether search defaults to be case-insensitive.
.TP
.IR config.commands.search.decode_unicode = False
Specifies whether searches should decode all Unicode characters.
.TP
.IR config.commands.search.decode_latex = False
Specifies whether searches should decode all LaTeX sequences.
.TP
.IR config.commands.search.fuzziness = 0
Specifies the amount of fuzzy errors to allow for searches. Using this feature
requires the optional \fIregex\fR dependency to be installed.
.TP
.IR config.commands.show.encode_latex = True
This boolean setting indicates whether non-ASCII characters should be encoded
using LaTeX sequences during rendering via the \fIshow\fR command.
Expand Down
1 change: 1 addition & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ ruff==0.4.5
typos==1.21.0
types-beautifulsoup4==4.12.0.20240511
types-requests==2.32.0.20240523
types-regex==2024.4.28.20240430
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ yaml = "cobib.parsers.yaml:YAMLParser"
[project.scripts]
cobib = "cobib.__main__:_main"

[project.optional-dependencies]
all = ["cobib[fuzzy]"]
fuzzy = ["regex"]

[project.urls]
Homepage = "https://gitlab.com/cobib/cobib"
Documentation = "https://cobib.gitlab.io/cobib/cobib.html"
Expand Down
Loading

0 comments on commit 6596170

Please sign in to comment.