Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace filtering API; includes support for filtering numbers #1069

Merged
merged 23 commits into from
Feb 17, 2022

Conversation

dmos62
Copy link
Contributor

@dmos62 dmos62 commented Feb 14, 2022

Fixes #385

Related to #1028

This is a continuation of the #1061 PR. Contains 2 major changes:

  • replaces current filtering API with the new filtering API (introduced in Add new filtering API endpoint #1061);
    • new syntax; frontend will use the filters declared in /api/ui/v0/database/0/filters/;
    • the records query param is now filter, not filters (singular, not plural);
  • it also converts the duplicate_only operation from a filter into a query param (see Improve support for finding duplicate rows #1028 for some context); to be clear, this had to be done, there wasn't a coherent way to consider it a Mathesar filter.

Also, includes number filters necessary to fulfil requirements in #385. Actually, it's the previous PR, #1061, that included these filters, but now they will be usable.

Technical details

Filters are specified by frontend in the function API format:

{"and": [
    {"empty": [
        {"column_reference": ["some_column"]},
    ]},
    {"equal": [
        {"column_reference": ["some_column"]},
        {"literal": ["some_literal_value"]},
    ]},
]}

The expected data structure is made up of dicts containing one (and only one) key-value pair each. The key is the filter id, and the value is always a list of parameters. Every parameter is either a nested filter, or a string/number when it's the sole parameter to literal or column_reference.

There are 5 "special filters" that are not declared with the rest (in /api/ui/v0/database/0/filters/) and have to be hardcoded:

  • the operators and, or, not; their syntax is equivalent to that of regular filters;
  • the column_reference, whose parameter is a plain string denoting the column's name;
  • the literal, whose parameter is some non-collection JSON primitive whose value will be used verbatim (see parameters of the equal sub-filter above).

column_reference and literal "special filters" must always be used when you want to pass a column reference or a literal. The API should throw an error otherwise.

New filter API uses the functions API under-the-hood. Filters are actually DBFunctions presented in a simplified way. The syntax for submitting filters to backend is actually just the function API syntax. The filter API could be seen as a thin layer on top of the function. API.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the master branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@dmos62 dmos62 added affects: architecture Improvements or additions to architecture work: backend Related to Python, Django, and simple SQL labels Feb 14, 2022
@dmos62 dmos62 added this to the [07] Initial Data Types milestone Feb 14, 2022
@dmos62 dmos62 self-assigned this Feb 14, 2022
@dmos62 dmos62 requested review from kgodey and pavish February 14, 2022 15:39
@dmos62 dmos62 changed the base branch from master to introduce-new-filtering-api February 14, 2022 15:56
@codecov-commenter
Copy link

codecov-commenter commented Feb 14, 2022

Codecov Report

Merging #1069 (9d85b5a) into master (f3a32c9) will increase coverage by 0.52%.
The diff coverage is 97.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1069      +/-   ##
==========================================
+ Coverage   92.81%   93.33%   +0.52%     
==========================================
  Files         112      113       +1     
  Lines        4021     4052      +31     
==========================================
+ Hits         3732     3782      +50     
+ Misses        289      270      -19     
Flag Coverage Δ
pytest-backend 93.33% <97.00%> (+0.52%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
db/functions/known_db_functions.py 100.00% <ø> (ø)
db/functions/base.py 90.76% <77.77%> (+0.36%) ⬆️
mathesar/functions/operations/convert.py 95.00% <95.00%> (ø)
db/functions/exceptions.py 100.00% <100.00%> (ø)
db/functions/operations/apply.py 96.29% <100.00%> (+7.40%) ⬆️
db/functions/operations/deserialize.py 93.54% <100.00%> (+67.62%) ⬆️
db/records/operations/select.py 100.00% <100.00%> (ø)
mathesar/api/db/viewsets/columns.py 91.20% <100.00%> (-0.19%) ⬇️
mathesar/api/db/viewsets/records.py 100.00% <100.00%> (ø)
mathesar/api/pagination.py 79.06% <100.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3a32c9...9d85b5a. Read the comment docs.

Copy link
Contributor

@kgodey kgodey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the column_reference, whose parameter is a plain string denoting the column's name;

We should be using column ID as the primary way to refer to a column, we're in the process of switching the entire API to do this, see:

The key should be column_id instead of column_reference for clarity.

I'll review in more detail once this is fixed.

@pavish
Copy link
Member

pavish commented Feb 15, 2022

@dmos62 Please let me know when this PR is ready for frontend changes. I'll take it up.

@dmos62
Copy link
Contributor Author

dmos62 commented Feb 15, 2022

@kgodey I'm aware of the move to using column IDs. It's not yet in master though, or is it? Should I refactor this PR to use column ids?

I'll rename column_reference to column_id.

Edit: I looked through the mentioned issues and saw that we're moving to ids gradually (endpoint by endpoint). I'll refactor to use IDs.

@kgodey
Copy link
Contributor

kgodey commented Feb 15, 2022

It's not yet in master though, or is it?

@silentninja is working on these, hopefully they should be in master by the end of the week.

@dmos62
Copy link
Contributor Author

dmos62 commented Feb 16, 2022

Added support for specifying columns via their Django IDs.

Now the JSON expected by the records endpoint's filter query param looks like this:

{"and": [
    {"empty": [
        {"column_id": [1]},
    ]},
    {"equal": [
        {"column_id": [3]},
        {"literal": ["some_literal_value"]},
    ]},
]}

The difference from the sample posted in the first post is that column_references are replaced by column_ids, which take Django column IDs.

Details

Made DB layer unaware of Django IDs, by preprocessing the passed DBFunction JSON specification (for example to the records endpoint filter query param) to rewrite column id references to column name references. Before preprocessing/rewritting, the JSON looks like dict(empty=[dict(column_id=1)]), and after it looks like dict(empty=[dict(column_name="foo")]). The DB layer only sees the latter (column_name) version.

@kgodey
Copy link
Contributor

kgodey commented Feb 16, 2022

{"column_id": [1]},

Why is column_id a list? Presumably it's just one id.

@kgodey kgodey self-assigned this Feb 16, 2022
@dmos62
Copy link
Contributor Author

dmos62 commented Feb 16, 2022

@kgodey column_id is a DBFunction and all DBFunctions take a sequence of parameters. It's optimized for simple automatic compiling and parsing, not for compilation by-hand. I've considered handling these single-parameter cases differently, but opted not to in order to keep the parsing simple.

Edit:

To put it another way, it's easier to say that the parameters are always a list.

This might become less awkward if/when named parameters are introduced.

@kgodey
Copy link
Contributor

kgodey commented Feb 16, 2022

@dmos62 To make the API easier to read, I recommend dropping the list from column_id and other single parameter options, you can always check if the input is only a single parameter and turn it into a list before passing it on to the DBFunction. It doesn't seem like it would be much effort or complicate parsing much.

@dmos62
Copy link
Contributor Author

dmos62 commented Feb 16, 2022

@kgodey I replied that I'd make the change, but then thought about it some more and came to the conclusion that it's not a good idea (and deleted that reply).

Having two forms for passing parameters grows the code complexity more than you'd expect, since we're parsing the function JSON spec twice. Once to rewrite IDs to names and again to instantiate a DBFunction instance. A change in one parsing routine, has to be kept in sync with the other routine.

I don't think that the readability improvement is significant enough to justify that. Plus, I'd like to keep the syntax as simple as possible.

I have considered switching syntax format from {id: [param1,param2]} to {"id": id, "parameters": [param1,param2]}. That way it's clear that "parameters" is always a sequence. I like this for being more orthodox (predictable) JSON, though what's more readable is debatable, since once you understand the current syntax, it's very quick to read.

Edit: this is all the input I have on this; I leave it up to you to make the call.

Copy link
Contributor

@kgodey kgodey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmos62 This looks good overall, added some minor comments. I'm fine with the current API spec for now if it's going to result in repeated logic, but I did have a question about why we couldn't put the logic in a single place.

Could you open a separate GitHub issue to reevaluate the filtering API spec with a link to this discussion? It should go in the API v1 milestone (which is post-alpha).

db/functions/operations/apply.py Outdated Show resolved Hide resolved
db/functions/operations/deserialize.py Outdated Show resolved Hide resolved
)


def _rewrite(column_ids_to_names, spec_or_literal):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see the JSON spec being parsed here - couldn't you check for single parameters here and convert them to a list during this rewrite?

Copy link
Contributor Author

@dmos62 dmos62 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other place is db.functions.operations.deserialize. It's true that we could convert the single-parameter to a list here and the db.functions.operations.deserialize routines would be none the wiser. That would diverge the two (somewhat implicit) syntaxes more than I'd like though. The difference between them now is just the additional column_id DBFunction. It's not an actual DBFunction but I'm simulating it with this rewrite routine and the user shouldn't be able to tell the difference. Changing how one of these handles parameters (but not the other) is more complexity than I'd like.

There's some alternatives I can envision. Like being able to extend the set of DBFunctions used when parsing. Then the mathesar namespace/layer would tell the parsing routines to also use this custom ColumnID DBFunction that will convert ids to names when being turned into an SA expression. That way the db layer is unaware of column ids and that would centralize all the parsing into one place.

Not sure it's worth the effort right now.

@kgodey kgodey removed their assignment Feb 16, 2022
@kgodey kgodey added the pr-status: revision A PR awaiting follow-up work from its author after review label Feb 17, 2022
@dmos62
Copy link
Contributor Author

dmos62 commented Feb 17, 2022

I'll open a ticket to track this filter/functions API spec issue. I'll give it a technical debt label.

Edit:

Opened #1077 and #1078.

@dmos62
Copy link
Contributor Author

dmos62 commented Feb 17, 2022

@kgodey Renamed UnknownDBFunctionId to UnknownDBFunctionID (Id -> ID) and removed the obsolete comment. Should be ready for merge.

Copy link
Contributor

@kgodey kgodey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I assume @pavish needs to make frontend changes before we merge?

@pavish
Copy link
Member

pavish commented Feb 17, 2022

I'll get started on the frontend changes. I'll not be using the /api/ui/v0/database/${DB}/filters/ endpoint, there's another ticket open to handle that: #1072. In this PR, I'll only fix parts that are breaking.

I have a couple things to confirm before I begin:

Please let me know if these assumptions are correct.

@kgodey
Copy link
Contributor

kgodey commented Feb 17, 2022

@kgodey I assume we will have to remove the has_duplicates filter option from the UI entirely.

Yes, we're designing a different way to handle that in #1028. We can create an implementation issue for duplicates once that is done.

@pavish
Copy link
Member

pavish commented Feb 17, 2022

@kgodey @seancolsen @dmos62

The frontend changes needed for this PR has a good deal of overlap with the frontend changes needed for #1047 (Changing column names to ids). The column name -> id change needs to be implemented on the frontend before we can update the filter format.

How do we proceed on this? I believe frontend changes for #1047 will take some time, since it will involve some refactoring. Shall we merge this PR and:

  1. Handle the changes in Mathesar 896 Change records api parameters to use column id instead of name #1047 once the column id frontend changes are done?
  2. Or a better option would be to implement the changes in Frontend implementation for type specific filtering options #1072, as it will involve updating everything related to filters.

Both these options will involve us having to deal with breaking functionality in master for a while.

@kgodey
Copy link
Contributor

kgodey commented Feb 17, 2022

@pavish I'm fine with breaking things in master temporarily as long as we can make sure that filtering functionality works as expected by the time #1072 is closed. I'll leave the schedule of which things get handled in which PR to you and @seancolsen.

@pavish
Copy link
Member

pavish commented Feb 17, 2022

@kgodey Thanks, sounds good to me. I'll defer filtering changes to #1072.

@kgodey
Copy link
Contributor

kgodey commented Feb 17, 2022

@pavish Does that mean this PR can be merged as-is?

@dmos62
Copy link
Contributor Author

dmos62 commented Feb 17, 2022

I'll get started on the frontend changes. I'll not be using the /api/ui/v0/database/${DB}/filters/ endpoint, there's another ticket open to handle that: #1072. In this PR, I'll only fix parts that are breaking.

I have a couple things to confirm before I begin:

* @dmos62 I assume [this is the finalized format](https://github.com/centerofci/mathesar/pull/1069#issuecomment-1041558716).

* @kgodey I assume we will have to remove the `has_duplicates` filter option from the UI entirely.

Please let me know if these assumptions are correct.

@pavish Yes, your assumptions are correct. I've forgotten to post a summary of that for you, sorry! This is the final format and the duplicate_only filter is not a filter anymore, but a query param on the records endpoint.

This PR shouldn't block any other developers for a while, so I'd say feel free to merge in whichever order you prefer.

@kgodey @pavish @seancolsen Would it make sense to merge into a dedicated branch (that is not master) until it's not broken anymore and then merge that into master, thus not breaking master?

@kgodey
Copy link
Contributor

kgodey commented Feb 17, 2022

@kgodey @pavish @seancolsen Would it make sense to merge into a dedicated branch (that is not master) until it's not broken anymore and then merge that into master, thus not breaking master?

I don't think it's worth the fuss at this point since no one is using master for anything important. After our alpha release, we should not be breaking master anymore.

@pavish
Copy link
Member

pavish commented Feb 17, 2022

@pavish Does that mean this PR can be merged as-is?

@kgodey Yes, I think we can merge this PR as-is.

@kgodey kgodey merged commit f8ca36d into master Feb 17, 2022
@kgodey kgodey deleted the replace-filtering-api branch February 17, 2022 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects: architecture Improvements or additions to architecture pr-status: revision A PR awaiting follow-up work from its author after review work: backend Related to Python, Django, and simple SQL
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Implement filtering options for Number types.
4 participants