Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select_related and prefetch_related #57

Open
khanetor opened this issue Nov 13, 2016 · 38 comments
Open

select_related and prefetch_related #57

khanetor opened this issue Nov 13, 2016 · 38 comments

Comments

@khanetor
Copy link

When I inspected the actual SQL queries being run against DB, I found that for foreignkey and manytomany relationships, we have the n+1 problem.

For example, suppose there are 5 teams, each with 11 members, this query:

query {
  teams {
    edges {
      node {
        members {
          edges {
            node {
              name
            }
          }
        }
      }
    }
  }
}

will result in 1 query for selecting teams, and 5 more queries for selecting members of each team. 6 queries in total.

Normally, we would do a query like this Team.objects.all().prefetch_related('members') to reduce to 2 queries.

I think it would be extremely beneficial if DjangoObjectType can detect special fields:

  • ForeignField
  • OneToOneField
  • OneToManyField
  • ManyToManyField
    and apply appropriate prefetch_related and/or select_related when such fields are present in graph queries.
@jaredkwright
Copy link

Is anyone currently working on this?

@PeteAndersen
Copy link

I noticed in the docs on filtering with django_graphene you can define a resolve_xxxx function that returns a queryset. The example they provide is about filtering, but I think you could apply the select_related and prefetch_related modifications here as well.

from graphene import ObjectType
from graphene_django.filter import DjangoFilterConnectionField
from .models import Teams

class Query(ObjectType):
    teams = DjangoFilterConnectionField(CategoryNode)

    def resolve_teams(self, args, info):
        return Team.objects.all().prefetch_related('members')

http://docs.graphene-python.org/projects/django/en/latest/authorization/#queryset-filtering-on-lists

@mjtamlyn
Copy link

mjtamlyn commented Feb 1, 2017

I have some code which does some intelligent selective application of database optimisations. This version was written against graphene 0.X and is presented without any kind of guarantee. As and when I miraculously have time to work on it, I'd love to formalise this and get it working in a general case, but I've been saying that for about 9 months now so don't hold out hope.

The core idea is to add prefetch_FOO and optimize_FOO methods to your DjangoObjectType. The code will always do prefetch_related, even on FKs, and that will be applied automatically if applicable. The prefetched qs can be customised with the prefetch_FOO method, and further customised by optimize_FOO methods on the next DjangoObjectType. All optimisations are applied selectively - they won't be applied if the graphql query does not require that data.

Note that it also makes use of https://github.com/photocrowd/django-cursor-pagination

I'm in the middle of trying to do a graphene 1.0 update on a fairly large and complex API which hit a bunch of internals, it remains to be seen how well this approach holds up with graphene 1.0. However, I hope it is a somewhat useful start for anyone wanting to work on this.

https://gist.github.com/mjtamlyn/e8e03d78764552289ea4e2155c554deb

@mjtamlyn
Copy link

I've updated the patterns in my gist to make more sense with Graphene 2.0. It also now includes a few additional features including manual post-processing functions and nested fields. This code with a health warning - it definitely makes my API faster because it saves thousands of queries, however I think it may also be the source of some raw python performance issues which is preventing further optimisation of the code. I'm also not yet running it in production, and it is only tested on Python 3.

@tfoxy
Copy link

tfoxy commented Jun 10, 2018

Hi all! I've been working on a library that tries to optimize the query by analyzing the gql query:

https://github.com/tfoxy/graphene-django-optimizer

Feel free to open issues if your use case doesn't work.

@zbyte64
Copy link
Collaborator

zbyte64 commented Oct 19, 2018

I think graphene-django-optimizer is something we might want to mention in the docs.

@mvanlonden
Copy link
Member

@tfoxy thank you 🙇‍♂️! @zbyte64 mind adding a section in the docs about graphene-django-optimizer

@gotexis
Copy link

gotexis commented Apr 5, 2019

Gj, everyone.

Just a quick question: is there a performance drawback if I simply do a prefetch_related on any FK fields in my model?

Because I just tried graphene-django-optimizer, it seems to be breaking DjangoFilterConnectionField, so I have to customize resolve methods.

It seems both DjangoFilterConnectionField and graphene-django-optimizer offer their own resolvers. Hmm.....
@tfoxy

update:

It kind of did work with DjangoFilterConnectionField if I provide a **kwargs placeholder. In your documentation I guess this was lacking

class Query:
    def resolve_listings(self, info, **kwargs):
        # this will play nice with DFCF

@danpalmer
Copy link
Collaborator

@gotexis

Just a quick question: is there a performance drawback if I simply do a prefetch_related on any FK fields in my model?

You'd probably be select_related'ing your FKs on your model – prefetch_related would be for any other models that have an FK to the target model. These will both incur performance hits – select_related expands the number of joins that the database needs to perform, and the amount of data that the database needs to return to the Django app. prefetch_related increases the number of database queries that get made, one additional query for each thing being prefetched.

For small models, with few relations, and little data, this is probably fine to just do, but otherwise you may want to be smarter about when to add these queries, by using something like graphene-django-optimizer, or your own custom solution.

@mohamedrez
Copy link

I found this PR on Sailor https://github.com/mirumee/saleor/pull/2983/files,
It might be useful, however is there any plan of having this supported by default ?

I personally tried graphene-django-optimizer and it does not solve all the problems for example:
no way of prefetching when having a nested FK that has a nested reverse FK

@stale
Copy link

stale bot commented Jul 16, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 16, 2019
@sliverc
Copy link

sliverc commented Jul 19, 2019

I think it would be still great if query optimization code could be integrated into graphene-django directly.

#693 made the situation a bit better but it doesn't really work if filters are used. For such a queryset it would need to be calculated according to the schema and using Prefetch.

@stale stale bot removed the wontfix label Jul 19, 2019
@jkimbo
Copy link
Member

jkimbo commented Jul 25, 2019

@sliverc is there any reason why you can't use the https://github.com/tfoxy/graphene-django-optimizer library? I'd like to avoid including more code into the library (that needs to be maintained) if there are alternative ways of doing it.

@sliverc
Copy link

sliverc commented Jul 26, 2019

@jkimbo
First of all graphene-django-optimizer has limitations and doesn't seem to work when using DjangoFilterConnetionField or overwriting get_queryset on Node or deep nesting. And I don't see that this can easily be integrated into a separate library without ugly workarounds.

Secondly I see your point that you want to avoid adding code into the library when possible. In this case I don't see this issue as a feature though but a bug or a instability within graphene-django. Graphs are prone to query explosions and it is quite easy when using the full featureset of graphene-django that a user runs into a query explosion.
Such query explosion should be avoided when possible without the user need to tediously configure a 3rd party library to avoid this (The saleor PR is a good example for this what effort is needed to get optimizer running with all its limitations)

@jkimbo
Copy link
Member

jkimbo commented Jul 26, 2019

@sliverc I agree that “query explosions” are likely to happen however I think it is a hard problem and one that won’t necessarily be made easier by building an optimiser into the library. Having said that if you want to create a PR that adds optimisation features into graphene-django I will happily review it.

@tfoxy as author of the graphene-django-optimiser library what do you think? Should the library be integrated into graphene-django? Or are there things that we could do to make optimisations easier?

@tfoxy
Copy link

tfoxy commented Jul 26, 2019

graphene-django-optimizer was created as a hotfix to improve the performance of the project I was working while not making the codebase uglier (before this, we were making the optimizations at the root of every graph resolver).
The code of graphene-django-optimizer is more of a hack as it uses undocumented properties from graphene-django. The code is not maintainable as it is now; I just kept adding tests and never took the time to think and refactor. If graphene changes substantially, it would be really difficult to reuse the same code.
I'm no longer using Django at my current job so I lost my main drive of motivation to keep fixing it and improving it.

Maybe the best would be to find a way to make graphene more extensible so this kind of plugins can be better maintained. As I said, I'm no longer working with Django, but I can help to discuss and propose making some changes to this repo.

@jkimbo
Copy link
Member

jkimbo commented Jul 28, 2019

Thanks for the reply @tfoxy . Sounds like you are right @sliverc , we should try and integrate some optimisations into graphene-django so that it is maintainable in the long run. I'm going to pick up #250 since it looks like a simple one. Do you have any time to work on adding other optimisations @sliverc ?

@jkimbo
Copy link
Member

jkimbo commented Jul 28, 2019

Or @tfoxy if you’d like to merge some of your code into graphene-django that would be very helpful!

@gotexis
Copy link

gotexis commented Jul 28, 2019

@jkimbo
Such query explosion should be avoided when possible without the user need to tediously configure a 3rd party library to avoid this (The [saleor PR]

You are most correct, we have to remember the end goal is to have a workable, production-ready API out of the box, or people are going to move to Node.

Maybe we should have a vote on the top priority features and I think optimization will be on top1-2. Sadly it is also one of the hardest.

The situation may be made better with the planned release of a better async core of Django and native Promise of Python. Both take time.

@sliverc
Copy link

sliverc commented Jul 29, 2019

@jkimbo I have posted here as stale bot was about to close this issue and I wanted avoid this and restart the discussion whether this is a valid issue.

I would love to work on this but I am very tight on time. Someone of our team might be able to work at it later on but do not know when this might happen.

But could you add pinned label to this issue that the stale bot doesn't close it again and this issue can stay on the radar?

@jkimbo
Copy link
Member

jkimbo commented Aug 1, 2019

OK thanks @sliverc . If anyone else would like to work on adding some optimisations to graphene-django that would be great!

@mohamedrez
Copy link

mohamedrez commented Aug 1, 2019 via email

@tfoxy
Copy link

tfoxy commented Aug 1, 2019

I can contribute. I have two questions before doing anything:

  • Would these changes go into v2 or v3?
  • Would we use the same api as graphene-django-optimizer or would we have something more integrated, like for example being the default to optimize the queries?

I'm not sure when is v3 planned and if it would change much of underlaying code.
As for the api, maybe there's a better way to handle things. I'm open for ideas. Maybe it would be good to have an optimize flag in the options of a node, with the possibility to set a default in the settings.py file.

@jkimbo
Copy link
Member

jkimbo commented Aug 2, 2019

@tfoxy

Would these changes go into v2 or v3?

I am not expecting a huge amount of changes needed with v3 since the major breaking change is the dropping of Python 2 and the upgrade of graphql-core. I would suggest that any optimisation improvements could be based off v2.

Would we use the same api as graphene-django-optimizer or would we have something more integrated, like for example being the default to optimize the queries?

Since we're going to start building in query optimisations in the library I don't think there is any need to follow the api of graphene-django-optimizer. I think it would be great if the default would be that graphene-django optimized queries by default. Having said that if you see any breaking or potentially dangerous changes then having an opt out in settings would be great.


@tfoxy @mohamedrez I'm not sure how to break this task down into multiple steps really. @tfoxy any suggestions?

@jkimbo
Copy link
Member

jkimbo commented Aug 4, 2019

Just a heads up @tfoxy @mohamedrez that I've had to give up on trying to automatically apply .only to all queries: #250 (comment)

You might find that trying to automatically applying query optimisations has similar issues where they can actually make performance worse in some cases.

@tfoxy
Copy link

tfoxy commented Aug 5, 2019

I had the same problems with .only when working in graphene-django-optimizer. I decided to only apply .only when I'm sure that it will not trigger another query. That means that if there's a field on a graphql query that it's unknown, it will cancel the .only optimization for that sql query. With select_related and prefetch_related this does not happen. There was some discussion related to this: tfoxy/graphene-django-optimizer#2 . That issue highlights the downside that it's confusing when the .only optimization is not applied in some cases.

In the following days, I will try to write about the thinking behind the implementation of graphene-django-optimizer and some suggestions about how to proceed by doing small steps.

@stale
Copy link

stale bot commented Oct 4, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 4, 2019
@stale stale bot closed this as completed Oct 11, 2019
@easherma
Copy link

Any update on these efforts?

@ambientlight
Copy link

it would make sense reopening this

@dvf
Copy link

dvf commented Jul 29, 2020

Any status update here?

@Instrumedley
Copy link

feels like graphene has been totally abandoned unfortunately

@ulgens ulgens reopened this Jan 12, 2022
@ulgens
Copy link
Collaborator

ulgens commented Jan 12, 2022

I reopened issue and i agree it is not resolved, but please don't assume there will be any development 🙁

@pfcodes
Copy link

pfcodes commented Dec 21, 2023

Is Graphene abandoned?

@mohamedrez
Copy link

at the time when this ticket was created there were no dataloaders.
maybe it could be a fix for you @pfcodes
https://docs.graphene-python.org/en/latest/execution/dataloader/

@elyas-hedayat
Copy link

i think this repo can be helpful for this issue: https://github.com/MrThearMan/graphene-django-query-optimizer
also can request from @MrThearMan to merge it with graphene-Django main library

@MrThearMan
Copy link

MrThearMan commented Dec 23, 2023

Hi, author of graphene-django-query-optimizer here. My library is mostly a rewrite of graphene-django-optimizer, which I found didn't work anymore (although I cannot be 100% certain I set it up correctly 😅)

While it is my code, I think the rewrite is more maintainable. Still, there is quite a bit of complexity, rough edges, and ugliness in order to make everything work. It's also quite new and isn't nearly battle-tested enough.


I'll address some of the issues raised in this thread, and how my rewrite deals with them:

  1. Issues in optimizing queryset only fields

At the moment, I've found that the optimization works well for model fields and resolvers, even with deep queries. However, there can be issues with custom non-model field resolvers in cases where those resolvers require some model fields that aren't requested in the original query. graphene-django-optimizer has the "resolver hints" system for this, which in the rewrite is called required_fields.

In regards to canceling the only optimization like the original does in some cases, the rewrite doesn't do this, since I haven't found it necessary, given the above. I did add a setting to disable it globally in case some edge case was found.

  1. Integrating with connection fields

In the rewrite, the optimization does also work with connection fields, although it did require some ugly custom resolver logic. The main issue was that caching the query results cannot happen during the initial optimization process, since the connection field applies additional pagination limiting to the queryset, which will limit the number of rows returned from the database. Otherwise, we would cache all rows on the model, which would cripple performance.

There are still some untested waters here, like nested connection fields.

  1. Custom get_queryset

Defining a custom get_queryset method and performing additional filtering there will cancel any optimization up to that point, since that deletes the queryset's _result_cache. More on this issue here, but to enable this I added a filter_queryset hook to the ObjectType, which the optimizer will pick up and use at the appropriate time.


There are also some other ugly bits, like how the optimized data is cached between ObjectTypes, but I won't go into the details here.

I'm writing all of this to say that while it would be nice to see some query optimization stategy integrated to graphene-django, it would still require considerable effort in testing and deciding on edge cases for something that would be the "official" solution. Still, I'm ready to help if there is interest.

@ulgens
Copy link
Collaborator

ulgens commented Dec 23, 2023

I'm writing all of this to say that while it would be nice to see some query optimization strategy integrated to graphene-django, it would still require considerable effort in testing and deciding on edge cases for something that would be the "official" solution. Still, I'm ready to help if there is interest.

Thanks, @MrThearMan, for all the insides. I also agree that it's nice to have some formal way of automatic optimizations; we should be careful about the effort required and not to re-invent things that have already been perfected by other other tools.

@pfcodes
Copy link

pfcodes commented Dec 24, 2023

at the time when this ticket was created there were no dataloaders. maybe it could be a fix for you @pfcodes https://docs.graphene-python.org/en/latest/execution/dataloader/

The problem with this is that Graphene 3 broke dataloaders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.