-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arguments to quilkin run
#574
Conversation
Can we get some docs, probably here https://googleforgames.github.io/quilkin/main/book/using.html please? Re-running CI to see if that's a flaky test error or not. |
It's not, I have the fix locally. The change is that the admin spawning is pulled out of the proxy to let the admin logic be shared amongst all commands, and these are tests that are testing the admin server. |
50bb4e3
to
ef6f986
Compare
2705090
to
e734d5f
Compare
entrypoint: bash | ||
args: | ||
- '-c' | ||
- 'timeout --signal=INT --preserve-status 5s docker run --rm ${_REPOSITORY}quilkin:$(make version)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than deleting, could make it (something like)
'timeout --signal=INT --preserve-status 5s docker run --rm ${_REPOSITORY}quilkin:$(make version)' run
To make it run in run
mode.
Although one could argue the integration tests also test this (although this would provide a nicer error message if something basic was broken). I could go either way on this one.
@markmandel Rather than duplicating documentation, I figured it was easiest and best to expose the CLI in the rust documentation and just link to that. That way we're maintaining a single source of truth as our documentation now matches what the CLI outputs in help. I'm having trouble figuring out what these Agones tests want me to do, do you have any idea? |
a55e006
to
6df9ffb
Compare
Makes sense.
I'm at an event at the moment, but will take a look as soon as I can 👍🏻 |
The other option, as arch issues can get tricky fast, is to force Docker to Linux amd64 for all things:
Might actually be the easiest option (we could also bake this into the Makefile in some capacity). |
TIL, I’ll try it with that tomorrow |
Just tried to take it for a spin, but got stuck at the compilation issue I referenced above 😢 I was going to note as well, if you have a local |
Decided to just to revert removing the
This doesn't work because the |
That makes sense 👍🏻
Huh. That's strange, because if you set I almost wish I had a ARM machine so I could test this out. If you are keen, could also try a GVC sometime. It's entirely possible I'm missing something, since i'm shooting in the dark a bit. In the meantime, I'll take on finding what has broken in the integration tests for this PR, and update you with my results (probably send a PR to your PR.) Also I'll prioritise #593 such that you can use the same image from CI to locally test an integration test. |
)] | ||
quiet: bool, | ||
/// The path to the configuration file for the Quilkin instance. | ||
#[clap(short, long, env = "QUILKIN_CONFIG", default_value = "quilkin.yaml")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking of coming back around and fleshing out the help for the rest of the commands for a nice command line reference.
Curious if you didn't want a command line reference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait - when I run the command, I see help everywhere,
Wait.... does clap take the help from the comment above it? If so, that's a super neat trick.
Digging into what's failing in the integration tests - not sure what's causing it, but looks like there is something trying to bind to 9091 more than once. I'm seeing this in a few of the tests. This is in agones management mode:
(Side note - looks like the check in Looks like all the tests are failing for this reason. |
Aha! I can replicate. Running a local binary with a config file reproduces the error:
Looks like it's trying to recreate the admin server twice. |
@@ -46,16 +46,6 @@ steps: | |||
args: | |||
- build | |||
id: build | |||
# Run the built images for 5 seconds to make sure that the entrypoint and default config works out of the box |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replicated the issue as well with running the container from this PR. I'd recommend leaving in this test, as it also will replicate the breaking issue with the Agones integration tests, but output the logs within cloud build, and make it much easier to debug if there is a problem with the base image.
We may want to consider running the container in a few different common configuration options might also be useful in a few tests like the above, just to prove it all still runs even without any traffic flowing through.
Re-running, so that the new CI failure messages can give us a build image to debug against. |
src/cli.rs
Outdated
.admin | ||
.store(Arc::new(crate::config::Admin { address })); | ||
} | ||
Some(tokio::spawn(crate::admin::server(config.clone()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this is the offending line causing the double creation of the admin server, since both Proxy::run
and ControlPlane::from_arc
both create admin instances and run them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing some debugging (code), it looks like the Config
that comes from this line:
markmandel@09de12a#diff-b2812f19576dd53d0c35b107a322f58a00fce6977f4b5976e1961853982af3ccR93
let config = Arc::new(Self::read_config(self.config)?);
Has a None
admin field, but looking at the code I'm not sure why it's None
when it should have the default admin port 🤔
You can see it in my debug logs:
{"timestamp":"2022-09-22T04:41:50.829061Z","level":"DEBUG","fields":{"message":"Main Config","no_admin":false,"config":"Config { admin: None, clusters: None, filters: None, management_servers: None, proxy: None, version: None, metrics: Metrics { active_clusters: GenericGauge { v: Value { desc: Desc { fq_name: \"cluster_active\", help: \"Number of currently active clusters.\", const_label_pairs: [], variable_labels: [], id: 6550223101314893, dim_hash: 9568000379080501117 }, val: AtomicI64 { inner: 0 }, val_type: Gauge, label_pairs: [] } }, active_endpoints: GenericGauge { v: Value { desc: Desc { fq_name: \"cluster_active_endpoints\", help: \"Number of currently active endpoints.\", const_label_pairs: [], variable_labels: [], id: 9175583387471564798, dim_hash: 12641526325508385022 }, val: AtomicI64 { inner: 0 }, val_type: Gauge, label_pairs: [] } } } }"},"target":"quilkin::cli"}
I'll keep digging into this tomorrow unless you get to it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've found the issue. I'll send a PR to your PR today.
This includes several fixes, tweaks and small improvements: * Fix bug in `Slot impl Default` that would cause the default Admin config to be `None`. * Fix double creation of Admin server * Return cloudbuild timeout tests for easier CI debugging, and add one for command line config. * Fixes and small additions to the "using" documentation.
Build Succeeded 🥳 Build Id: daaf20f4-73ad-4137-98c2-cbc32438f7f8 The following development images have been built, and will exist for the next 30 days: To build this version:
|
* Add CLI arguments to run * Fix bug in `Slot impl Default` that would cause the default Admin config to be `None`. * Return cloudbuild timeout tests for easier CI debugging, and add one for command line config. * Fixes and small additions to the "using" documentation. Co-authored-by: Mark Mandel <markmandel@google.com>
* Add CLI arguments to run * Fix bug in `Slot impl Default` that would cause the default Admin config to be `None`. * Return cloudbuild timeout tests for easier CI debugging, and add one for command line config. * Fixes and small additions to the "using" documentation. Co-authored-by: Mark Mandel <markmandel@google.com>
Closes #555, closes #572
This adds the basic options to run quilkin as a reverse proxy, allowing you to set the listen port and the destination address. Additionally I added some
admin
related options for setting admin port and disabling admin entirely.