Performance issues related to complete_value on large datasets #189

JCatrielLopez · 2023-01-11T19:01:33Z

Hi! We've noticed that returning a list of 5k elements, with a couple of nested objects is pretty slow:

Person {
  id
  name
  lastname
  age
  address {street number}
  job {id org_name}
  partner {id name}
  pets {name type}
  school {id name}
}

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
1	1.8e-05	1.8e-05	2.145	2.145	graphql.py:103(graphql_sync)
1	1.5e-05	1.5e-05	2.145	2.145	graphql.py:152(graphql_impl)
1/30001	0.19	6.335e-06	2.137	7.122e-05	execute.py:413(ExecutionContext.execute_fields)
1	1.3e-05	1.3e-05	2.137	2.137	execute.py:965(execute)
1	7e-06	7e-06	2.137	2.137	execute.py:328(ExecutionContext.execute_operation)
1/135001	0.3229	2.392e-06	2.135	1.581e-05	execute.py:485(ExecutionContext.execute_field)
1/145001	0.2737	1.888e-06	2.071	1.428e-05	execute.py:575(ExecutionContext.complete_value)
1	0.009884	0.009884	2.071	2.071	execute.py:660(ExecutionContext.complete_list_value)
5000/30000	0.02747	9.156e-07	2.026	6.752e-05	execute.py:893(ExecutionContext.complete_object_value)

By itself it's not really a slow function, but its executed 30k times. Is there any way to reduce the overhead by reducing the number of times this function is invoked?

Tested on Python 3.8 and graphql-core==3.2.3

The text was updated successfully, but these errors were encountered:

JCatrielLopez · 2023-01-11T19:42:16Z

Possibly related to this graphql-js issue

Cito · 2023-01-11T20:33:17Z

Thanks for reporting. Will look into this when I have more time, probably only after releasing 3.3. It would be helpful if you could post example code with dummy data to reproduce this.

JCatrielLopez · 2023-01-12T11:19:20Z

schema.graphql:

type Query {
    persons: [Person]
}

type Person {
    id: String!
    name: String
    ssn: String
    alive: Boolean
    has_job: Boolean
    job: JobDetails
    address: Address
    pets: Address
    house: House
    partner: Person
}

type JobDetails {
    id: String
    name: String
}

type Address {
    id: String
    name: String
}

type Pets {
    id: String
    name: String
    race: String
    color: String
}

type House {
    color: String
    floors: Int
    is_duplex: Boolean
    is_apt: Boolean
}

server.py:

import random
import string
import sys

import yappi

from graphql import graphql_sync, build_ast_schema
from graphql.language.parser import parse

yappi.set_clock_type("wall")

with open("./schema.graphql", "r") as f:
    schema = build_ast_schema(parse(f.read()))


class Query:
    """The root resolvers"""

    def persons(self, info):
        output = []
        for _ in range(5_000):
            output.append(
                dict(
                    id="".join(random.choices(string.ascii_lowercase + string.digits, k=9)),
                    name=f"John Doe",
                    ssn="00000000000000000",
                    alive=True,
                    has_job=False,
                    job=dict(id="xxx", name="test"),
                    address=dict(id="yyy", name="Fake Street"),
                    pets=dict(id="zzz", name="test"),
                    house=dict(
                        color="RED",
                        floors=2,
                        is_duplex=False
                    ),
                    partner=dict(id="".join(random.choices(string.ascii_lowercase + string.digits, k=9)), name="test"),
                )
            )
        return output


def main():
    query = """{ 
        persons{ 
            id 
            name 
            alive 
            has_job 
            job{id name}
            partner{id name}
            address{id name}
            pets{id name}
            house{color floors is_duplex}
        } 
    }"""

    yappi.start()
    result = graphql_sync(schema, query, Query())
    yappi.stop()

    if result.errors:
        print(result)
        sys.exit(1)

    yappi.get_func_stats().save("profile", type="pstat")


# To visualize profile:
# python -m snakeviz profile --server

if __name__ == '__main__':
    main()

JCatrielLopez changed the title ~~Performance issues with complete_value on large datasets~~ Performance issues related to complete_value on large datasets Jan 11, 2023

Cito mentioned this issue Feb 9, 2023

Significant performance hit when using async resolvers #190

Open

tdg5 mentioned this issue Jan 2, 2024

Fetching current_transaction for every GraphQL field results in performance regression newrelic/newrelic-python-agent#831

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues related to complete_value on large datasets #189

Performance issues related to complete_value on large datasets #189

JCatrielLopez commented Jan 11, 2023 •

edited

Loading

JCatrielLopez commented Jan 11, 2023

Cito commented Jan 11, 2023

JCatrielLopez commented Jan 12, 2023 •

edited

Loading

Performance issues related to complete_value on large datasets #189

Performance issues related to complete_value on large datasets #189

Comments

JCatrielLopez commented Jan 11, 2023 • edited Loading

JCatrielLopez commented Jan 11, 2023

Cito commented Jan 11, 2023

JCatrielLopez commented Jan 12, 2023 • edited Loading

JCatrielLopez commented Jan 11, 2023 •

edited

Loading

JCatrielLopez commented Jan 12, 2023 •

edited

Loading