Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader batch_load_fn + load_many not batching in graphene #42

Open
curiousest opened this issue Sep 26, 2017 · 10 comments
Open

DataLoader batch_load_fn + load_many not batching in graphene #42

curiousest opened this issue Sep 26, 2017 · 10 comments

Comments

@curiousest
Copy link

This test case doesn't appear to work when using graphene:

https://github.com/syrusakbary/promise/blob/master/tests/test_dataloader.py#L32

The batch load function is called with one key at a time when calling dataloader.load_many([6, 7]). I wasn't able to make that test case break by running it many times or with large (100,000) array sizes, which is why I suspect it's due to an interaction with graphene. It works in v2.0.2 (the pip install promise version), but not promise/master.

Here is roughly the code that I'm running:

class HeroLoader(DataLoader):
    def batch_load_fn(self, friend_ids):
        import logging; logging.error(friend_ids)  ########### Console log #2 here
        url = '{}/?id={}'.format(
            Hero.endpoint,
            ','.join([str(id) for id in friend_ids])
        )
        response = requests.get(url)
        results = response.json()['results']
        return Promise.resolve(results)

        
class Hero(graphene.ObjectType):
    endpoint = '{}/heroes'.format(HOST)
    data_loader = HeroLoader()

    id = graphene.Int()
    name = graphene.String(name='name')
    friend_ids = graphene.List(graphene.Int)
    friends = graphene.List(partial(lambda: Hero))

    def resolve_friends(self, args, context, info):
        import logging; logging.error(self.friend_ids)           ######### Console log #1 here
        heroes_json = self.data_loader.load_many(self.friend_ids)
        return heroes_json.then(do_stuff_with_promise......)

The console output of an infringing request looks like this (and stepping through with a debugger yields the same results):

ERROR:root:[6, 7]
ERROR:root:[6]
ERROR:root:[7]

The expected output is

ERROR:root:[6, 7]
ERROR:root:[6, 7]

If you want a test that works with one version of promise, but not the other:

@curiousest curiousest changed the title DataLoader batch_load_fn not batching in graphene (but works without graphene) DataLoader batch_load_fn + load_many not batching in graphene Sep 26, 2017
@nickpresta
Copy link

Reviving this issue as we're experiencing the same problem. Version 2.0.2 works as expected but version 2.1.0 broke this batching functionality.

There seems to be a related issue in the Graphene project: graphql-python/graphene#1075

@rhoog
Copy link

rhoog commented Jan 8, 2020

Try adding a @Promise.safe to the resolve function. That solved it for me one time. I have no clue as to what this decorator does exactly though...

@tazimmerman
Copy link

I've been working through this issue and the following worked for me.

from promise import set_default_scheduler
from promise.schedulers.asyncio import AsyncioScheduler
set_default_scheduler(AsyncioScheduler())

I tried using the scheduler argument of the DataLoader constructor, but found that when it constructs a Promise (in its load method), that scheduler isn't passed through. Later on, when a Promise accesses its own scheduler attribute, it's None, and falls back to the default scheduler, which is an instance of ImmediateScheduler, which is why batching doesn't occur. By setting the default scheduler with set_default_schedulers, both the DataLoader and the Promise use the same instance of AsyncioScheduler.

If my understanding of the code is correct (and my "workaround" isn't just dumb luck), I'd be happy to submit a PR that passes the scheduler through.

@amzhang
Copy link

amzhang commented Apr 22, 2020

set_default_scheduler() is dangerous:

class AsyncioScheduler(object):
    def __init__(self, loop=None):
        self.loop = loop or get_event_loop()

So if your requests are handled by threads and each thread has it's own event loop, then it would not work correctly.

@tazimmerman
Copy link

Do you know of a better workaround? Otherwise, it sounds like if you're using threads to handle requests (which I'm not), the DataLoader simply cannot be used as-is.

@amzhang
Copy link

amzhang commented Apr 25, 2020

@tazimmerman unfortunately, I don't know of any workarounds when using threads.

We moved to using aiodataloader with graphene-django, and starting a new asyncio event loop for every request. I'm not sure if that's the right way to do it, but batching is working.

@vecchp
Copy link

vecchp commented Nov 12, 2020

@tazimmerman unfortunately, I don't know of any workarounds when using threads.

We moved to using aiodataloader with graphene-django, and starting a new asyncio event loop for every request. I'm not sure if that's the right way to do it, but batching is working.

@amzhang Would you mind sharing an example? I'm in the middle of trying to get those two to work and can't seem to find the right place to create the event loop and gather all other results.

@amzhang
Copy link

amzhang commented Nov 12, 2020

Try below. It's copied from our code base with irrelevant parts stripped, so has not been tested.

It uses aiodataloader

Hope this helps.

import asyncio
from promise.promise import Promise
from django.conf import settings
from django.http.response import HttpResponseBadRequest
from django.http import HttpResponseNotAllowed
from graphene_django.views import GraphQLView, HttpError
from graphql.execution import ExecutionResult

class CustomGraphQLView(GraphQLView):
    # Copied from the source code with the following modifications:
    # - Starting event loop before resolve so that asyncio dataloaders can run.
    def execute_graphql_request(
        self, request, data, query, variables, operation_name, show_graphiql=False
    ):
        if not query:
            if show_graphiql:
                return None
            raise HttpError(HttpResponseBadRequest("Must provide query string."))

        try:
            backend = self.get_backend(request)
            document = backend.document_from_string(self.schema, query)
        except Exception as e:
            return ExecutionResult(errors=[e], invalid=True)

        if request.method.lower() == "get":
            operation_type = document.get_operation_type(operation_name)
            if operation_type and operation_type != "query":
                if show_graphiql:
                    return None

                raise HttpError(
                    HttpResponseNotAllowed(
                        ["POST"],
                        "Can only perform a {} operation from a POST request.".format(
                            operation_type
                        ),
                    )
                )

        # See: https://docs.python.org/3/whatsnew/3.8.html#asyncio
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)

        try:
            extra_options = {}
            extra_options["return_promise"] = True
            if self.executor:
                # We only include it optionally since
                # executor is not a valid argument in all backends
                extra_options["executor"] = self.executor

            ret = document.execute(
                root=self.get_root_value(request),
                variables=variables,
                operation_name=operation_name,
                context=self.get_context(request),
                middleware=self.get_middleware(request),
                **extra_options,
            )

            if Promise.is_thenable(ret):
                timeout = settings.GRAPHQL_VIEW["REQUEST_TIMEOUT_AFTER"].total_seconds()
                loop.run_until_complete(asyncio.wait_for(ret.future, timeout))
                return ret.get()
            else:
                return ret
        except Exception as e:  # pylint: disable=broad-except
            return ExecutionResult(errors=[e], invalid=True)

        finally:
            asyncio.set_event_loop(None)
            loop.close()

@vecchp
Copy link

vecchp commented Nov 12, 2020

@amzhang Thanks for the quick reply! I'll take a look to see if this helps me get where I want to go.

@vecchp
Copy link

vecchp commented Nov 12, 2020

@amzhang unfortunately it looks like this works on graphene-django v2 but not v3. Seems like the dataloader story in v3 is kind of weak at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants