Skip to content

Allow for hot-reloading yaml config pipelines #2236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

nickchomey
Copy link

@nickchomey nickchomey commented Apr 4, 2025

Description

This PR adds functionality to allow for upserting (update if exists, otherwise create) and hot-reloading (graceful shutdown of running pipeline) of yaml-config pipelines - no need to shut down the entire server anymore. This will enable the use of a file watcher to solve #1875

It also accepts a yaml string, rather than just a yaml file, which will allow for me to use yaml configs in the NATS api that I am working on (and you could add a similar endpoint for the grpc and http apis if you like), without any files at all.

I don't think this has any side effects on Conduit, and as far as I can tell it wont cause any of its own issues. However, I assume there's plenty of ways in which this could be improved - be it refactoring, allowing other existing functionality to leverage this, etc...

It does not add any api endpoints, schema etc for using this - i think that's beyond my pay grade ;)

Any feedback and guidance to improve this would be quite appreciated!

Quick checks

  • I have followed the Code Guidelines.
  • There is no other pull request for the same update/change.
  • I have written unit tests.
  • I have made sure that the PR is of reasonable size and can be easily reviewed.

@nickchomey nickchomey requested a review from a team as a code owner April 4, 2025 01:38
@nickchomey nickchomey marked this pull request as draft April 4, 2025 02:16
@nickchomey nickchomey marked this pull request as ready for review April 4, 2025 03:12
@nickchomey
Copy link
Author

nickchomey commented Apr 6, 2025

It just occurred to me that this surely doesn't handle deleting a pipeline config.

I suppose whatever watcher is used (filesystem, nats kv etc...) can probably trigger a delete operation of some sort, then could explicitly call s.Delete rather than UpsertYaml?

edit: yes, we can just do that

@hariso
Copy link
Contributor

hariso commented Apr 24, 2025

@nickchomey can you explain briefly how you actually connect NATS and Conduit, so that NATS is delivering the pipeline configs to Conduit?

The team talked about this yesterday, and it appears to us that the goal is actually to be able to "refresh" a pipeline configuration or add a new one, without having to restart the Conduit service. If that's true, then we find that useful for sure and it would be a really useful feature in Conduit.

However, watching a file might not be the best way, because it may trigger unnecessary updates (e.g. an editor might be auto-saving while you're working on a file, and there might be other reasons too). Automated updates can backfire pretty easily, which is why we think that the actual reload should be triggered by a user. Currently, that trigger is a service restart. We propose to have an API endpoint that a user can use to reload a pipeline configure file(s).

In other words the flow would be:

  1. A pipeline configuration file is added/updated/removed.
  2. User calls /v1/pipelines/reload (just an example) that reloads either all or just targeted pipelines.

The implementation would reuse most of your code; we'd just add an API trigger that executes the code you wrote.

I can open a discussion where we can discuss the details. What do you think about this? Would it be helpful for you?

@nickchomey
Copy link
Author

nickchomey commented Apr 24, 2025

@hariso
Glad to hear there's interest in this!

Yes, that's a good point about editor auto save triggering an update - definitely something to be avoided. Your proposed endpoint seems appropriate to me. I'll leave that to you guys to figure out since I don't intend to use the yaml files.

The nats api works by wrapping conduit in another app (just like when we modify the builtin connectors/processors), which has a nats client that connects to either

  1. Embedded nats server via the nats micro api
  2. embedded nats server via jetstream (allows persistence)
  3. External nats + micro
  4. external nats + jetstream

I had started with micro and then realized I actually wanted jetstream, but kept it around. I'll use either external nats server or perhaps even embedded server as a leaf node that connects to an external cluster (leaf node functionality not yet built)

All of the options expose all of the same endpoints as the conduit http api - in fact, all you do is send something like

nats pub pipeline.create "{same json payload as conduit api}"

and the nats wrapper just calls the http api endpoint functions and uses the same validation.

Again, I built that before I realized that what I really wanted was to just be able to refresh yaml configs, but kept it around.

Then i added one more "endpoint" called pipeline.upsert that receives a yaml string and just calls the upsertYaml function shown in this PR. It stops and removes the pipeline if it exists, readds it, and restarts it if configured to do so. It also stores the config in nats kv, such that when the entire application is restarted (and after conduit has loaded its own api and yaml pipelines) the wrapper actually scans the kv store for pipelines and does the upsert for each one (since they haven't been persisted by conduit api, nor are loaded by the yaml file mechanism).

Bringing this back to the current PR and your question, I suppose for consistency's sake, it would be ideal if the upsertyaml nats endpoint were to have an actual upsertyaml api endpoint that I could call rather than the function in this PR directly. But it would only really be useful by a wrapper app, because of the persistence and reload stuff...

I'll share the nats api with you guys soon enough - I'm just trying to tidy things up so as to minimize the inevitable embarrassment from whatever dumb stuff I've done with it.

But, i suppose the true solution for all of this would be to do as lovro described in #2232, and unify the api and yaml files, and add import/export endpoints which presumably would accept yaml configs? I see that issue has been added to the next milestone - does that mean it might happen soon?

Sorry, this wasn't brief at all, but hopefully helpful?

I'm happy to discuss any of this more here, in a GH discussion or discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants