Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot-reload ignores supergraph changes after first reload #2245

Closed
maaadepsilon opened this issue Dec 9, 2022 · 2 comments · Fixed by #2276
Closed

Hot-reload ignores supergraph changes after first reload #2245

maaadepsilon opened this issue Dec 9, 2022 · 2 comments · Fixed by #2276
Assignees

Comments

@maaadepsilon
Copy link

Describe the bug

Supergraphs managed via configmap will reload exactly once and then fail to update for subsequent changes, despite the supergraph receiving changes in the file system. Only on full restart does router pick up any new changes.

I haven't tested this with non-configmap supergraphs, so I'm not sure if this is specific to schemas managed with configmaps or more widespread.

To Reproduce

  1. Create a k8s configmap from your supergraph e.g. kubectl create configmap <configmap_name> --from-file=./supergraph.graphql --namespace <your_namespace>
  2. Start router with hot-reload enabled and configmap mounted, similarly to add a supergraph configmap option to the helm chart #2119.
  3. Publish configmap with supergraph changes e.g. kubectl replace -f configmap.yaml.
  4. Watch router logs. It should notice the file has changed and reload the schema.
  5. Publish another set of changes.
  6. Watch router logs. Nothing happens.

I used a python tool called when-changed to verify that file system events were being emitted when the configmap updated. Both times /etc/schema/..data was updated, but Router only reloaded on the first update. This was the command I used to watch for events: when-changed -r /etc/schema echo "%f".

Expected behavior
After the first update, router should emit a log line saying reloading schema and run reload_server, thereby loading any new schema changes. As should any further updates.

Output

# startup
# APOLLO_ROUTER_SUPERGRAPH_PATH=/etc/schema/supergraph.graphql
# APOLLO_ROUTER_HOT_RELOAD=true
Apollo Router v1.5.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)
Anonymous usage data was disabled via APOLLO_TELEMETRY_DISABLED=1
healthcheck endpoint exposed at [http://0.0.0.0:8080/health]
GraphQL endpoint exposed at http://0.0.0.0:8080/ 🚀
# first update
reloading schema
healthcheck endpoint exposed at [http://0.0.0.0:8080/health]
GraphQL endpoint exposed at http://0.0.0.0:8080/ 🚀
#second update
# ...nothing

in another shell,

$ when-changed -r /etc/schema echo "%f"
# first update
/etc/schema/..2022_12_09_21_32_40.552522007/supergraph.graphql
/etc/schema/..2022_12_09_21_32_40.552522007/supergraph.graphql
/etc/schema/..data
# second update
/etc/schema/..2022_12_09_21_35_29.274080650/supergraph.graphql
/etc/schema/..2022_12_09_21_35_29.274080650/supergraph.graphql
/etc/schema/..data

Desktop (please complete the following information):

  • OS: Debian GNU/Linux
  • Version: 11 (bullseye)

Additional context
Router version is v1.5.0.

@garypen
Copy link
Contributor

garypen commented Dec 12, 2022

I can confirm that there is something strange happening in this use-case. Even the first re-load took 90s to react in my test setup.

@garypen garypen self-assigned this Dec 12, 2022
@garypen garypen added bug and removed triage labels Dec 12, 2022
@garypen
Copy link
Contributor

garypen commented Dec 15, 2022

I think I got to the bottom of this. We use a crate, notify, to provide file notification services within the router and it provides a variety of file notification mechanisms on different platforms. There are some subtleties around this, not least the behaviour of different native services on different platform, with regard to handling renaming and remove/re-create.

In order to simplify things, I think the best solution is to stop using the "recommended" watcher for each platform (which leads to variations in behaviour, also known as bugs) and instead to use the PollWatcher which behaves the same way on each platform/file system.

The downside to doing this is that the watcher is a little more intrusive in terms of CPU required and it's also less re-active since we only get notifications as aggregated across a "poll interval". I'm going to set the poll interval to be 3 seconds and work on the basis that is reactive enough for things which shouldn't really change that often (configuration, schema, rhai source files).

garypen added a commit that referenced this issue Dec 15, 2022
fixes: #2245

To get consistent behaviour across multiple platforms and file systems.

The watcher may not be as reactive, but I think it's reactive enough for
things which shouldn't change frequently.
garypen added a commit that referenced this issue Dec 16, 2022
To get consistent behaviour across multiple platforms and file systems.

fixes: #2245
  
The watcher may not be as reactive, but I think it's reactive enough for
things which shouldn't change frequently.

As a bonus, we no longer get events if the content is the same, so less
spurious reloads, which is good. That means I had to modify the tests
though to use different contents.

The rhai file watching appeared to be working fine, but to be safe I
converted it to the PollWatcher as well.
@abernix abernix added this to the v1.7.0 milestone Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants