Skip to content

Fix to work with startswith filter #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2021
Merged

Conversation

dbramwell
Copy link

@dbramwell dbramwell commented Jan 5, 2021

I've come across an issue when wanting to use istartswith with fields that have spaces. If for example I have an object with name "Space Genre" and use search_fields = ["name__istartswith"], when searching for "Space G" I would expect the object to be returned in the results, but it isn't.

I believe this is down to splitting the search term by white space, and checking the db queries show it is so:

WHERE ("testapp_genre"."title" LIKE Space% ESCAPE '\' AND "testapp_genre"."title" LIKE G% ESCAPE '\')

In my opinion it'd be better to stop splitting the search term by white space and instead just query by the the full string with select = Q(**{search_fields[0]: term}), but there is a test that fails when using that approach, so I assume that functionality has been added for some reason.

This pull request checks for matching the full term as well as the individual terms, resulting in a query like:

WHERE (
  ("testapp_genre"."title" LIKE Space% ESCAPE '\' AND "testapp_genre"."title" LIKE G% ESCAPE '\')
  OR "testapp_genre"."title" LIKE Space G% ESCAPE '\'
)

This means that my results now show up and the previous functionality remains intact, however it does mean that genres that start with either "Space" or "G" will also show up, which I'm not sure is intuitive. Maybe it would be better to check if the filter is a startswith or endswith and avoid splitting the search term for those?

Edit:
I've now taken the above approach, only splitting the search terms by white space when we are using a 'contains' search field, as I think that's the only place it makes sense (might be wrong though)

@codecov
Copy link

codecov bot commented Jan 5, 2021

Codecov Report

Merging #26 (f0a10a1) into master (709ec19) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #26   +/-   ##
=======================================
  Coverage   99.56%   99.57%           
=======================================
  Files           7        7           
  Lines         230      233    +3     
=======================================
+ Hits          229      232    +3     
  Misses          1        1           
Impacted Files Coverage Δ
django_select2/forms.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 709ec19...f0a10a1. Read the comment docs.

Copy link
Owner

@codingjoe codingjoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dbramwell ,

I like where this is going. This is certainly not perfect and also not really readable at all.
I made a suggestion on how to further simplify this. Maybe you have a look at it.

BTW, I noticed you changed how the separate queries are joined at the end. You use an or not and. I think that's correct, but that's a big change and might need a bit more tests. I'd actually also like to see a test where we actually test the qs.query statement.

Best,
Joe

Comment on lines 402 to 409
for field in search_fields:
field_select = Q()
if "contains" in field:
for t in [t for t in term.split(" ") if not t == ""]:
field_select &= Q(**{field: t})
else:
field_select = Q(**{field: term})
select |= field_select
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need to handle the cases differently? It would still work, if you return both results, right?

Suggested change
for field in search_fields:
field_select = Q()
if "contains" in field:
for t in [t for t in term.split(" ") if not t == ""]:
field_select &= Q(**{field: t})
else:
field_select = Q(**{field: term})
select |= field_select
for field in search_fields:
field_select = Q(**{field: term})
for word in filter(None, term.split(" ")):
field_select |= Q(**{field: t})
select |= field_select

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to swapping the & for |, if searching for "Spa ce" you'd end up with:

WHERE (
  "testapp_genre"."title" LIKE %Spa ce% ESCAPE '\' AND
  "testapp_genre"."title" LIKE %Spa% ESCAPE '\' AND
  "testapp_genre"."title" LIKE %ce% ESCAPE '\'
)

This fails test_filter_queryset

After swapping the AND for OR you get:

WHERE (
  "testapp_genre"."title" LIKE %Spa ce% ESCAPE '\' OR
  "testapp_genre"."title" LIKE %Spa% ESCAPE '\' OR
  "testapp_genre"."title" LIKE %ce% ESCAPE '\'
)

But this is actually functionally different to what the old code does. AND is used for the individual parts of the search term, meaning with search_fields=["title__icontains"] you get:

WHERE (
  "testapp_genre"."title" LIKE %Spa% ESCAPE '\' AND
  "testapp_genre"."title" LIKE %ce% ESCAPE '\'
)

I'm not sure whether AND or OR is prefered in this case.

The OR in the old code is actually applied to the different search_fields, so if we do the same search but with search_fields=["title__icontains", "title__istartswith"] you get:

WHERE (
  (
    "testapp_genre"."title" LIKE %Spa% ESCAPE '\' OR
    "testapp_genre"."title" LIKE Spa% ESCAPE '\'
  ) AND (
    "testapp_genre"."title" LIKE %ce% ESCAPE '\' OR
    "testapp_genre"."title" LIKE ce% ESCAPE '\'
  )
)

I'm not sure that is intuitive either, but might have been done by design at some point. What I'd expect, and what the current changes do, would be:

WHERE (
  (
    "testapp_genre"."title" LIKE %Spa% ESCAPE '\' AND
    "testapp_genre"."title" LIKE %ce% ESCAPE '\'
  ) OR 
  "testapp_genre"."title" LIKE Spa ce% ESCAPE '\'
)

@codingjoe codingjoe self-assigned this Jan 5, 2021
@dbramwell
Copy link
Author

dbramwell commented Jan 6, 2021

This is certainly not perfect and also not really readable at all

Agreed, it's quite hard for me to figure out what it does in different scenarios without logging out the actual queries. And I think what the current code actually does is quite counter intuitive, but that's quite subjective I guess. I've left quite a big (sorry) inline comment that tries to outline the current functionality and what the proposed changes do.

I've also made a change to stop replacing tabs and new lines with spaces unless we're using contains.

I'd actually also like to see a test where we actually test the qs.query statement.

This could be pretty good. Willing to give it a go once we iron out what the queries should actually be

Split search terms only for `__contains` queries and not for
`__startswith` or `__endswith`. We no also split not only be
whitespace but tab and newline.

If multiple search fields are defined conditions combined with
and OR. If single word matches in a for contains queries are
OR combined as well.

Co-Authored-By: codingjoe <info@johanneshoppe.com>
@codingjoe codingjoe merged commit 07054b2 into codingjoe:master Jan 9, 2021
@codingjoe
Copy link
Owner

Hi @dbramwell, thanks again for your hard work. I added some tests and made some minor changes. 👍

@codingjoe
Copy link
Owner

codingjoe commented Jan 9, 2021

Released in 7.6.0 🎉

@dbramwell
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants