Skip to content

Improve GraphQL semantic conventions #182

Open
@SonjaChevre

Description

@SonjaChevre

As GraphQL is gaining popularity as a query language for APIs, we (Tyk Technologies, maintainer of the Tyk open source API Gateway) would like to work on enhancing the existing semantic conventions for GraphQL instrumentation.

What is GraphQL?

GraphQL was created by Facebook in 2012 and was publicly released in 2015; it gained popularity due to its ability to solve data fetching challenges by providing a more efficient and declarative approach to API data querying and manipulation.

More about GraphQL:

GraphQL | A query language for your API
GraphQL Landscape

What are specific observability challenges with GraphQL?

Here is a non exhaustive list:

1. Error detection

In GraphQL, errors are returned as part of the data response with a 200 HTTP status code, even in the case of partially successful queries. When monitoring a GraphQL request with OpenTelemetry, this means that the distributed trace usually look ok (because of the 200 HTTP status code) even when GraphQL is returning errors.

See also: GraphQL error specification

2. Performance monitoring

Monitoring a GraphQL server is not straightforward as the performance depends highly on what queries customers are sending.

When using GraphQL for internal APIs that are only accessed by internal clients, the queries won’t change often and we could monitor the performance on a query level. But if our API is available externally, we can have hundreds of slightly different queries.

Performance issues could be related to the query lifecycle (parse, validate, execute, resolve) or to specific resolver (function that retrieves or mutates data for a specific field in a GraphQL schema during the query execution process) depending on the fields requested in the request.

3. Removing deprecated fields

GraphQL is considered "version-free" because it eliminates the need for maintaining different API versions. The shape and structure of data returned are determined by the client's query, allowing seamless evolution and addition of features without breaking existing clients. This flexibility simplifies development and reduces compatibility issues.

Removing fields from GraphQL schemas can become challenging. Removing a field is a breaking change that would disrupt the functionality of client applications that rely on the field. To address this, GraphQL allows deprecating fields without removing them. Being able to observe which fields are being requested by API clients can help understand the impact of removing a deprecated field.

What is the current support of the GraphQL in OTel?

Currently, the semantic conventions for GraphQL contains three attributes:

  • operation.name: The name of the operation being executed.
  • operation.type: The type of the operation being executed (query, mutation or subscription)
  • document: The GraphQL document being executed.

There are currently 5 instrumentation libraries for GraphQL:

We have only tried the Node.js instrumentation so far, but noticed that this library doesn’t respect the semantic conventions, but contain much more valuable information that could be standardised.

What are we missing in the semantic conventions?

Non exhaustive list:

  • GraphQL errors (error location, error message, … see the GraphQL specification GraphQL )
  • GraphQL query lifecycle (parse, validate, execute, resolve)
  • Fields (name, path, alias, …)

What is the suggested approach?

We are actively working on adding this information to our own GraphQL engine (Universal Data Graph) and would welcome other member of the observability and GraphQL community to join us on improving the semantic convention.

Looking forward to see if this proposal gets any interest!

Sonja

Note: we are also working on another proposal to introduce semantic conventions for API Gateways: #183

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions