Hot-reload ignores supergraph changes after first reload #2245

maaadepsilon · 2022-12-09T22:19:02Z

Describe the bug

Supergraphs managed via configmap will reload exactly once and then fail to update for subsequent changes, despite the supergraph receiving changes in the file system. Only on full restart does router pick up any new changes.

I haven't tested this with non-configmap supergraphs, so I'm not sure if this is specific to schemas managed with configmaps or more widespread.

To Reproduce

Create a k8s configmap from your supergraph e.g. kubectl create configmap <configmap_name> --from-file=./supergraph.graphql --namespace <your_namespace>
Start router with hot-reload enabled and configmap mounted, similarly to add a supergraph configmap option to the helm chart #2119.
Publish configmap with supergraph changes e.g. kubectl replace -f configmap.yaml.
Watch router logs. It should notice the file has changed and reload the schema.
Publish another set of changes.
Watch router logs. Nothing happens.

I used a python tool called when-changed to verify that file system events were being emitted when the configmap updated. Both times /etc/schema/..data was updated, but Router only reloaded on the first update. This was the command I used to watch for events: when-changed -r /etc/schema echo "%f".

Expected behavior
After the first update, router should emit a log line saying reloading schema and run reload_server, thereby loading any new schema changes. As should any further updates.

Output

# startup
# APOLLO_ROUTER_SUPERGRAPH_PATH=/etc/schema/supergraph.graphql
# APOLLO_ROUTER_HOT_RELOAD=true
Apollo Router v1.5.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)
Anonymous usage data was disabled via APOLLO_TELEMETRY_DISABLED=1
healthcheck endpoint exposed at [http://0.0.0.0:8080/health]
GraphQL endpoint exposed at http://0.0.0.0:8080/ 🚀
# first update
reloading schema
healthcheck endpoint exposed at [http://0.0.0.0:8080/health]
GraphQL endpoint exposed at http://0.0.0.0:8080/ 🚀
#second update
# ...nothing

in another shell,

$ when-changed -r /etc/schema echo "%f"
# first update
/etc/schema/..2022_12_09_21_32_40.552522007/supergraph.graphql
/etc/schema/..2022_12_09_21_32_40.552522007/supergraph.graphql
/etc/schema/..data
# second update
/etc/schema/..2022_12_09_21_35_29.274080650/supergraph.graphql
/etc/schema/..2022_12_09_21_35_29.274080650/supergraph.graphql
/etc/schema/..data

Desktop (please complete the following information):

OS: Debian GNU/Linux
Version: 11 (bullseye)

Additional context
Router version is v1.5.0.

The text was updated successfully, but these errors were encountered:

garypen · 2022-12-12T12:30:34Z

I can confirm that there is something strange happening in this use-case. Even the first re-load took 90s to react in my test setup.

garypen · 2022-12-15T15:48:14Z

I think I got to the bottom of this. We use a crate, notify, to provide file notification services within the router and it provides a variety of file notification mechanisms on different platforms. There are some subtleties around this, not least the behaviour of different native services on different platform, with regard to handling renaming and remove/re-create.

In order to simplify things, I think the best solution is to stop using the "recommended" watcher for each platform (which leads to variations in behaviour, also known as bugs) and instead to use the PollWatcher which behaves the same way on each platform/file system.

The downside to doing this is that the watcher is a little more intrusive in terms of CPU required and it's also less re-active since we only get notifications as aggregated across a "poll interval". I'm going to set the poll interval to be 3 seconds and work on the basis that is reactive enough for things which shouldn't really change that often (configuration, schema, rhai source files).

fixes: #2245 To get consistent behaviour across multiple platforms and file systems. The watcher may not be as reactive, but I think it's reactive enough for things which shouldn't change frequently.

To get consistent behaviour across multiple platforms and file systems. fixes: #2245 The watcher may not be as reactive, but I think it's reactive enough for things which shouldn't change frequently. As a bonus, we no longer get events if the content is the same, so less spurious reloads, which is good. That means I had to modify the tests though to use different contents. The rhai file watching appeared to be working fine, but to be safe I converted it to the PollWatcher as well.

maaadepsilon added raised by user triage labels Dec 9, 2022

garypen self-assigned this Dec 12, 2022

garypen added bug and removed triage labels Dec 12, 2022

garypen mentioned this issue Dec 15, 2022

hot reload: replace notify recommended watcher with poll watcher #2276

Merged

garypen closed this as completed in #2276 Dec 16, 2022

abernix added this to the v1.7.0 milestone Dec 22, 2022

abernix mentioned this issue Dec 23, 2022

release: v1.7.0 #2312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hot-reload ignores supergraph changes after first reload #2245

Hot-reload ignores supergraph changes after first reload #2245

maaadepsilon commented Dec 9, 2022

garypen commented Dec 12, 2022

garypen commented Dec 15, 2022

Hot-reload ignores supergraph changes after first reload #2245

Hot-reload ignores supergraph changes after first reload #2245

Comments

maaadepsilon commented Dec 9, 2022

garypen commented Dec 12, 2022

garypen commented Dec 15, 2022