Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix __typename inside router response #6009

Merged
merged 18 commits into from
Oct 16, 2024
Merged

Fix __typename inside router response #6009

merged 18 commits into from
Oct 16, 2024

Conversation

IvanGoncharov
Copy link
Member

@IvanGoncharov IvanGoncharov commented Sep 17, 2024

This PR fixes the following edge cases:

1. The initial response of the query containing @defer should be an empty object

For a query like this:

{
  ... @defer { me { name } }
}

The initial response should have data set to an empty object, as in this graphql-js test case:
https://github.com/graphql/graphql-js/blob/2b42a70191243d0ca0e0e4f1d580d6794718fbd5/src/execution/__tests__/defer-test.ts#L371-L384

2. data: null semantic should be maintained

According to GraphQL spec, null can be propagated to data: https://spec.graphql.org/draft/#sel-FANTNRCAACGBrwa
But the fix:

Some(Value::Null) => {
// Detect if root __typename is asked in the original query (the qp doesn't put root __typename in subselections)
// cf https://github.com/apollographql/router/issues/1677
let operation_kind_if_root_typename = original_operation.and_then(|op| {
op.selection_set
.iter()
.any(|f| f.is_typename_field())
.then(|| *op.kind())
});
response.data = match operation_kind_if_root_typename {
Some(operation_kind) => {
let mut output = Object::default();
output.insert(TYPENAME, operation_kind.default_type_name().into());
Some(output.into())
}
None => Some(Value::default()),
};

Added in #2253, resulted data: null being replaced with data: { __typename: "<some typename>" }, for queries like this:

{
  nonNullFieldThatErrors
  __typename
}

3. Handle cases where query with @defer also have __typename inside fragment

Fix added in #2253, here:

Some(Value::Null) => {
// Detect if root __typename is asked in the original query (the qp doesn't put root __typename in subselections)
// cf https://github.com/apollographql/router/issues/1677
let operation_kind_if_root_typename = original_operation.and_then(|op| {
op.selection_set
.iter()
.any(|f| f.is_typename_field())
.then(|| *op.kind())
});
response.data = match operation_kind_if_root_typename {
Some(operation_kind) => {
let mut output = Object::default();
output.insert(TYPENAME, operation_kind.default_type_name().into());
Some(output.into())
}
None => Some(Value::default()),
};

And here:

// Detect if root __typename is asked in the original query (the qp doesn't put root __typename in subselections)
// cf https://github.com/apollographql/router/issues/1677
let operation_kind_if_root_typename =
original_operation.and_then(|op| {
op.selection_set
.iter()
.any(|f| f.is_typename_field())
.then(|| *op.kind())
});
if let Some(operation_kind) = operation_kind_if_root_typename {
output.insert(TYPENAME, operation_kind.default_type_name().into());
}

Only worked for cases where __typename is specified directly in the topmost selection set, but didn't work for __typenames wrapped in fragments, e.g.:

{
  ... { __typename }
  ... defer {
    me { name }
  }
}

Because of #1, #2 and #3 fix added in #2253 was removed and but fix for #4 also fixes issues with @defer

4. Remaping subgraph's root type names to supergraph names didn't work for __typename wrapped in fragments

Subgraphs could have their root types named differently than supergraphs.
Moreover, different subgraphs could have different names for root types.

QP removes __typename from the topmost selection set, so in combination with:

} else if name.as_str() == TYPENAME {
if !output.contains_key(field_name_str) {
output.insert(field_name.clone(), Value::String(root_type_name.into()));
}

This issue is fixed for queries like this:

{
  __typename
  me { name }
}

But QP doesn't delete __typename wrapped in fragments like so (same true for named fragment):

{
  ... { __typename }
  me { name }
}

That means

} else if name.as_str() == TYPENAME {
if !output.contains_key(field_name_str) {
output.insert(field_name.clone(), Value::String(root_type_name.into()));
}

Can't be reached because subgraph will return __typename value for root, and it will trigger this "if" instead:

if let Some(input_value) = input.get_mut(field_name_str) {

That's why I changed the order of these if branches, so the subgraph's value of __typename for the root type is always ignored, and the supergraph's root type names are always used. Also, the exact same order of if branches are already used inside apply_selection_set.

However, this code is not reachable for __typenames wrapped in fragments because apply_root_selection_set incorrectly calls apply_selection_set for selection sets wrapped in fragments:

self.apply_selection_set(
selection_set,
parameters,
input,
output,
path,
// FIXME: use `ast::Name` everywhere so fallible conversion isn’t needed
#[allow(clippy::unwrap_used)]
&FieldType::new_named(type_condition.try_into().unwrap()).0,
)?;

self.apply_selection_set(
&fragment.selection_set,
parameters,
input,
output,
path,
// FIXME: use `ast::Name` everywhere so fallible conversion isn’t needed
#[allow(clippy::unwrap_used)]
&FieldType::new_named(root_type_name.try_into().unwrap()).0,
)?;

If root fields are wrapped in fragments, they are still applied on root type, e.g.:

{
   ... { 
     a: __typename # applied on query root
       ... {
         b: __typename # still query root
         
         me {
           __typename # not a root
         }
       }
     }
   }
}

So, I switch those calls to be apply_root_selection_set.
All of the above steps in combination fixed #4

5. Incorrect errors on inline fragments using root's interfaces

Root type can implement interfaces and it's legal to use those interfaces in queries, like so:

interface Foo {
   foo: String
}

type Query implements {
   foo: String
}

with query:

{
  ... on Foo {
    foo
  }
}

But before my change, this query would error here:

if type_condition.as_str() != root_type_name {
return Err(InvalidValue);
}

6. __typename with alias returned by subgraph is not validated

__typename is validated here:

if let Some(input_type) =
input_object.get(TYPENAME).and_then(|val| val.as_str())
{
// If there is a __typename, make sure the pointed type is a valid type of the schema. Otherwise, something is wrong, and in case we might
// be inadvertently leaking some data for an @inacessible type or something, nullify the whole object. However, do note that due to `@interfaceObject`,
// some subgraph can have returned a __typename that is the name of an interface in the supergraph, and this is fine (that is, we should not
// return such a __typename to the user, but as long as it's not returned, having it in the internal data is ok and sometimes expected).
let Some(ExtendedType::Object(_) | ExtendedType::Interface(_)) =
parameters.schema.types.get(input_type)
else {
parameters.nullified.push(Path::from_response_slice(path));
*output = Value::Null;
return Ok(());
};
}

And is used to assign current_type here:

let current_type = if parameters
.schema
.get_interface(field_type.inner_named_type())
.is_some()
|| parameters
.schema
.get_union(field_type.inner_named_type())
.is_some()
{
typename.as_ref().unwrap_or(field_type)
} else {
field_type
};

But aliased __typename handled here (same code for non-aliased, but they already validated by code above):

let current_type = if parameters
.schema
.get_interface(field_type.inner_named_type())
.is_some()
|| parameters
.schema
.get_union(field_type.inner_named_type())
.is_some()
{
typename.as_ref().unwrap_or(field_type)
} else {
field_type
};

This code guarantees that the aliased __typename contains the name of the object type from the supergraph schema.
But it doesn't guarantee that all __typenames (aliased and not) have the same value that is compatible with current_type.

As #4 proves, we can't trust the subgraph to return the correct names, and QP doesn't guarantee that either.
So, since we already use current_type to track the type of the current selection set, it makes sense to use it as a source of truth for all __typenames but fallback to subgraph's __typename if current_type is not an object (shouldn't happen unless QP has bugs).

Fixes #issue_number


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

This comment has been minimized.

@@ -71,6 +71,72 @@ async fn aliased() {
"###);
}

/* FIXME: should be fixed in query planner, failing with:
> value retrieval failed: empty query plan. This behavior is unexpected and we suggest opening an issue to apollographql/router with a reproduction.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be fixed in scope of this PR since they fail on query planning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the plan to solve these parts? Are there related issues on the JS and Rust planners?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open an issue right now.
Probably open it for the router and see if it should be fixed in router or in federation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I open an issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer not to have entire functions or tests commented out in the repo. Please use #[ignore] or #[should_panic] instead. Also please link to the issue in the FIXME comment

if include_skip.should_skip(parameters.variables) {
continue;
}

self.apply_selection_set(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still inside root selection, even if we go inside an inline fragment.
So it should be apply_root_selection_set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds reasonable but looks like a dangerous change. Do you have a test that would show what happens before and after that change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it tested here:

async fn test_subgraph_returning_different_typename_on_query_root() -> Result<(), BoxError> {
let mut router = IntegrationTest::builder()
.config(CONFIG)
.responder(ResponseTemplate::new(200).set_body_json(json!({
"data": {
"topProducts": null,
"__typename": "SomeQueryRoot",
"aliased": "SomeQueryRoot",
"inside_fragment": "SomeQueryRoot",
"inside_inline_fragment": "SomeQueryRoot"
}
})))
.build()
.await;
router.start().await;
router.assert_started().await;
let query = r#"
{
topProducts { name }
__typename
aliased: __typename
...TypenameFragment
... {
inside_inline_fragment: __typename
}
}
fragment TypenameFragment on Query {
inside_fragment: __typename
}
"#;
let (_trace_id, response) = router.execute_query(&json!({ "query": query })).await;
assert_eq!(response.status(), 200);
assert_eq!(
response.json::<serde_json::Value>().await?,
json!({
"data": {
"topProducts": null,
"__typename": "Query",
"aliased": "Query",
"inside_fragment": "Query",
"inside_inline_fragment": "Query"
}
})
);
Ok(())
}

.unwrap_or_else(|| {
Value::String(ByteString::from(
current_type.inner_named_type().as_str().to_owned(),
))
Copy link
Member Author

@IvanGoncharov IvanGoncharov Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, we tried to use __typename values returned by the subgraph but didn't validate them. I assume (not tested) it can even be the name of the interface if the subgraph uses @interfaceObject

According to the GraphQL spec, __typename should always use an object name (interfaces and unions are forbidden), so I changed the code to first use current_type (which is always valid) and then fallback to using the subgraph's value only if current_type is not an object type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be checked with what the output rewriting code in the QP is doing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do here.
Realistically, I can't inspect QP in the scope of this PR.
I can revert this particular change, because it separated from other fixes.
@Geal Do you think it worth to revert it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks correct to me: when a field is an object type in the supergraph we know what the correct __typename is and could even skip asking the subgraph

@IvanGoncharov IvanGoncharov marked this pull request as ready for review September 17, 2024 02:52
@IvanGoncharov IvanGoncharov requested review from a team as code owners September 17, 2024 02:52
@@ -311,6 +311,8 @@ impl PlanNode {
let _ = primary_sender.send((value.clone(), errors.clone()));
} else {
let _ = primary_sender.send((value.clone(), errors.clone()));
// primary response should be an empty object
value.deep_merge(Value::Object(Default::default()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the data field should ve an enpty object (unless it contained a non nullable field that was nullified), then I think this should be fixed here:

&initial_value.unwrap_or_default(),

to set it to an object instead of Value::Null by default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, it should be correct.
I hesitated to do it since I'm less familiar with how QP executes and also I don't have capacity to test all possible scenarios in context of this PR.
So I choose to do it just inside this specific branch that I'm 100% sure is correct.

@Geal Do you think it safe to make this change for entire QP execution?
Or I can create issue for that I do it in separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in a separate PR, considering the timing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a router issue for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that changing the linked line of execution.rs is a better fix, but it doesn’t have to be in this PR

@IvanGoncharov IvanGoncharov requested a review from a team as a code owner September 17, 2024 14:24
@IvanGoncharov IvanGoncharov requested a review from Geal September 20, 2024 12:50
@abernix abernix removed request for a team, duckki, dariuszkuc and sachindshinde October 2, 2024 18:46
Co-authored-by: Jesse Rosenberger <git@jro.cc>
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Oct 8, 2024

✅ Docs Preview Ready

No new or changed pages found.

@@ -311,6 +311,8 @@ impl PlanNode {
let _ = primary_sender.send((value.clone(), errors.clone()));
} else {
let _ = primary_sender.send((value.clone(), errors.clone()));
// primary response should be an empty object
value.deep_merge(Value::Object(Default::default()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that changing the linked line of execution.rs is a better fix, but it doesn’t have to be in this PR

.unwrap_or_else(|| {
Value::String(ByteString::from(
current_type.inner_named_type().as_str().to_owned(),
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks correct to me: when a field is an object type in the supergraph we know what the correct __typename is and could even skip asking the subgraph

@@ -71,6 +71,72 @@ async fn aliased() {
"###);
}

/* FIXME: should be fixed in query planner, failing with:
> value retrieval failed: empty query plan. This behavior is unexpected and we suggest opening an issue to apollographql/router with a reproduction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer not to have entire functions or tests commented out in the repo. Please use #[ignore] or #[should_panic] instead. Also please link to the issue in the FIXME comment

@SimonSapin
Copy link
Contributor

test_updated-amd_linux_test is failing because of the new apollo-parser version with apollographql/apollo-rs#918. #6128 update the router test expectation accordingly, but either way this job is not required for merging.

Copy link
Member

@goto-bus-stop goto-bus-stop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to add, nice work

@IvanGoncharov IvanGoncharov merged commit 4f3ace4 into dev Oct 16, 2024
15 checks passed
@IvanGoncharov IvanGoncharov deleted the i1g/fix-typename branch October 16, 2024 08:16
@abernix abernix mentioned this pull request Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants