-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix __typename inside router response #6009
Conversation
This comment has been minimized.
This comment has been minimized.
@@ -71,6 +71,72 @@ async fn aliased() { | |||
"###); | |||
} | |||
|
|||
/* FIXME: should be fixed in query planner, failing with: | |||
> value retrieval failed: empty query plan. This behavior is unexpected and we suggest opening an issue to apollographql/router with a reproduction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be fixed in scope of this PR since they fail on query planning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the plan to solve these parts? Are there related issues on the JS and Rust planners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open an issue right now.
Probably open it for the router and see if it should be fixed in router or in federation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I open an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer not to have entire functions or tests commented out in the repo. Please use #[ignore]
or #[should_panic]
instead. Also please link to the issue in the FIXME comment
if include_skip.should_skip(parameters.variables) { | ||
continue; | ||
} | ||
|
||
self.apply_selection_set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still inside root selection, even if we go inside an inline fragment.
So it should be apply_root_selection_set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds reasonable but looks like a dangerous change. Do you have a test that would show what happens before and after that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it tested here:
router/apollo-router/tests/integration/subgraph_response.rs
Lines 34 to 82 in e871c93
async fn test_subgraph_returning_different_typename_on_query_root() -> Result<(), BoxError> { | |
let mut router = IntegrationTest::builder() | |
.config(CONFIG) | |
.responder(ResponseTemplate::new(200).set_body_json(json!({ | |
"data": { | |
"topProducts": null, | |
"__typename": "SomeQueryRoot", | |
"aliased": "SomeQueryRoot", | |
"inside_fragment": "SomeQueryRoot", | |
"inside_inline_fragment": "SomeQueryRoot" | |
} | |
}))) | |
.build() | |
.await; | |
router.start().await; | |
router.assert_started().await; | |
let query = r#" | |
{ | |
topProducts { name } | |
__typename | |
aliased: __typename | |
...TypenameFragment | |
... { | |
inside_inline_fragment: __typename | |
} | |
} | |
fragment TypenameFragment on Query { | |
inside_fragment: __typename | |
} | |
"#; | |
let (_trace_id, response) = router.execute_query(&json!({ "query": query })).await; | |
assert_eq!(response.status(), 200); | |
assert_eq!( | |
response.json::<serde_json::Value>().await?, | |
json!({ | |
"data": { | |
"topProducts": null, | |
"__typename": "Query", | |
"aliased": "Query", | |
"inside_fragment": "Query", | |
"inside_inline_fragment": "Query" | |
} | |
}) | |
); | |
Ok(()) | |
} |
.unwrap_or_else(|| { | ||
Value::String(ByteString::from( | ||
current_type.inner_named_type().as_str().to_owned(), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, we tried to use __typename values returned by the subgraph but didn't validate them. I assume (not tested) it can even be the name of the interface if the subgraph uses @interfaceObject
According to the GraphQL spec, __typename
should always use an object name (interfaces and unions are forbidden), so I changed the code to first use current_type
(which is always valid) and then fallback to using the subgraph's value only if current_type
is not an object type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be checked with what the output rewriting code in the QP is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what to do here.
Realistically, I can't inspect QP in the scope of this PR.
I can revert this particular change, because it separated from other fixes.
@Geal Do you think it worth to revert it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks correct to me: when a field is an object type in the supergraph we know what the correct __typename
is and could even skip asking the subgraph
@@ -311,6 +311,8 @@ impl PlanNode { | |||
let _ = primary_sender.send((value.clone(), errors.clone())); | |||
} else { | |||
let _ = primary_sender.send((value.clone(), errors.clone())); | |||
// primary response should be an empty object | |||
value.deep_merge(Value::Object(Default::default())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a test case from graphql-js:
https://github.com/graphql/graphql-js/blob/2b42a70191243d0ca0e0e4f1d580d6794718fbd5/src/execution/__tests__/defer-test.ts#L371-L384
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the data
field should ve an enpty object (unless it contained a non nullable field that was nullified), then I think this should be fixed here:
&initial_value.unwrap_or_default(), |
to set it to an object instead of
Value::Null
by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You right, it should be correct.
I hesitated to do it since I'm less familiar with how QP executes and also I don't have capacity to test all possible scenarios in context of this PR.
So I choose to do it just inside this specific branch that I'm 100% sure is correct.
@Geal Do you think it safe to make this change for entire QP execution?
Or I can create issue for that I do it in separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe in a separate PR, considering the timing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a router issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that changing the linked line of execution.rs
is a better fix, but it doesn’t have to be in this PR
✅ Docs Preview ReadyNo new or changed pages found. |
@@ -311,6 +311,8 @@ impl PlanNode { | |||
let _ = primary_sender.send((value.clone(), errors.clone())); | |||
} else { | |||
let _ = primary_sender.send((value.clone(), errors.clone())); | |||
// primary response should be an empty object | |||
value.deep_merge(Value::Object(Default::default())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that changing the linked line of execution.rs
is a better fix, but it doesn’t have to be in this PR
.unwrap_or_else(|| { | ||
Value::String(ByteString::from( | ||
current_type.inner_named_type().as_str().to_owned(), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks correct to me: when a field is an object type in the supergraph we know what the correct __typename
is and could even skip asking the subgraph
@@ -71,6 +71,72 @@ async fn aliased() { | |||
"###); | |||
} | |||
|
|||
/* FIXME: should be fixed in query planner, failing with: | |||
> value retrieval failed: empty query plan. This behavior is unexpected and we suggest opening an issue to apollographql/router with a reproduction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer not to have entire functions or tests commented out in the repo. Please use #[ignore]
or #[should_panic]
instead. Also please link to the issue in the FIXME comment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to add, nice work
This PR fixes the following edge cases:
1. The initial response of the query containing
@defer
should be an empty objectFor a query like this:
The initial response should have
data
set to an empty object, as in this graphql-js test case:https://github.com/graphql/graphql-js/blob/2b42a70191243d0ca0e0e4f1d580d6794718fbd5/src/execution/__tests__/defer-test.ts#L371-L384
2.
data: null
semantic should be maintainedAccording to GraphQL spec, null can be propagated to
data
: https://spec.graphql.org/draft/#sel-FANTNRCAACGBrwaBut the fix:
router/apollo-router/src/spec/query.rs
Lines 241 to 257 in b443ad0
Added in #2253, resulted
data: null
being replaced withdata: { __typename: "<some typename>" }
, for queries like this:3. Handle cases where query with
@defer
also have__typename
inside fragmentFix added in #2253, here:
router/apollo-router/src/spec/query.rs
Lines 241 to 257 in b443ad0
And here:
router/apollo-router/src/spec/query.rs
Lines 149 to 160 in b443ad0
Only worked for cases where __typename is specified directly in the topmost selection set, but didn't work for __typenames wrapped in fragments, e.g.:
Because of
#1
,#2
and#3
fix added in #2253 was removed and but fix for#4
also fixes issues with @defer4. Remaping subgraph's root type names to supergraph names didn't work for __typename wrapped in fragments
Subgraphs could have their root types named differently than supergraphs.
Moreover, different subgraphs could have different names for root types.
QP removes __typename from the topmost selection set, so in combination with:
router/apollo-router/src/spec/query.rs
Lines 840 to 843 in b443ad0
This issue is fixed for queries like this:
But QP doesn't delete __typename wrapped in fragments like so (same true for named fragment):
That means
router/apollo-router/src/spec/query.rs
Lines 840 to 843 in b443ad0
Can't be reached because subgraph will return __typename value for root, and it will trigger this "if" instead:
router/apollo-router/src/spec/query.rs
Line 814 in b443ad0
That's why I changed the order of these if branches, so the subgraph's value of __typename for the root type is always ignored, and the supergraph's root type names are always used. Also, the exact same order of if branches are already used inside
apply_selection_set.
However, this code is not reachable for __typenames wrapped in fragments because
apply_root_selection_set
incorrectly callsapply_selection_set
for selection sets wrapped in fragments:router/apollo-router/src/spec/query.rs
Lines 873 to 882 in b443ad0
router/apollo-router/src/spec/query.rs
Lines 909 to 918 in b443ad0
If root fields are wrapped in fragments, they are still applied on root type, e.g.:
So, I switch those calls to be
apply_root_selection_set
.All of the above steps in combination fixed
#4
5. Incorrect errors on inline fragments using root's interfaces
Root type can implement interfaces and it's legal to use those interfaces in queries, like so:
with query:
But before my change, this query would error here:
router/apollo-router/src/spec/query.rs
Lines 865 to 867 in b443ad0
6. __typename with alias returned by subgraph is not validated
__typename is validated here:
router/apollo-router/src/spec/query.rs
Lines 547 to 561 in b443ad0
And is used to assign
current_type
here:router/apollo-router/src/spec/query.rs
Lines 574 to 586 in b443ad0
But aliased __typename handled here (same code for non-aliased, but they already validated by code above):
router/apollo-router/src/spec/query.rs
Lines 574 to 586 in b443ad0
This code guarantees that the aliased __typename contains the name of the object type from the supergraph schema.
But it doesn't guarantee that all __typenames (aliased and not) have the same value that is compatible with
current_type
.As
#4
proves, we can't trust the subgraph to return the correct names, and QP doesn't guarantee that either.So, since we already use
current_type
to track the type of the current selection set, it makes sense to use it as a source of truth for all __typenames but fallback to subgraph's __typename ifcurrent_type
is not an object (shouldn't happen unless QP has bugs).Fixes #issue_number
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩