Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: systemd unit should block on startup until http endpoint is ready #21850

Merged
merged 2 commits into from
Jul 16, 2021

Conversation

codyshepherd
Copy link
Contributor

@codyshepherd codyshepherd commented Jul 14, 2021

Closes #6068

Implements a wrapper around influxd when called by systemd to wait until the server is ready before exiting startup.

@@ -0,0 +1,15 @@
#!/bin/bash -e
/usr/bin/influxd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

Suggested change
/usr/bin/influxd
/usr/bin/influxd &

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if so, do we need to track the pid so we can somehow tell systemd about it? Or is it tracked automatically since it's a child of the wrapper script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably

/usr/bin/influxd

HOST=${INFLUXDB_HOSTNAME:-"localhost"}
PORT=${INFLUXDB_HTTP_BIND_ADDRESS:-"8086"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't pick up any changes in config.toml that modify the port.

You could use influxd print-config --key-name http-bind-address to print the value that the server will use given the current environment. It will be a pseudo-full address, i.e. :8086 as shorthand for localhost:8086

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I had wondered if influxd would populate those variables. the docs seemed a bit unclear. i'll fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually, in the context of this being run by systemd, I think any variables in the "environment file" will be present

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still recommend using influxd print-config here. It performs the same config & env resolution/merging as the top-level influxd command, so you won't need to worry about as many mismatches between what the script expects vs. what influxd actually does.

result=$(curl -s -o /dev/null http://$HOST:$PORT/ready -w %{http_code})
while [ "$result" != "200" ]; do
sleep 1
result=$(curl -s -o /dev/null http://$HOST:$PORT/ready -w %{http_code})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have some sort of escape-hatch here? Like a max number of retries

Copy link
Contributor Author

@codyshepherd codyshepherd Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that, but I feel like hitting the sweet spot for that could be a moving target. Also, I believe that Restart=on-failure in the service file means that systemd will attempt to restart it in the event that we set a hard limit and exit 1 here. Meaning that even if we set a max number of tries, it still might end up in an infinite loop. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple for now, we can change it later if people hit problems

Copy link
Contributor

@danxmoran danxmoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as far as I know!

@codyshepherd codyshepherd changed the title fix: add systemd-notify wrapper script around influxd for debs/rpms fix: systemd unit should block on startup until http endpoint is ready Jul 15, 2021
@codyshepherd codyshepherd force-pushed the systemd-wrapper branch 2 times, most recently from 6bc6fa0 to 28ee592 Compare July 15, 2021 17:10
@danxmoran danxmoran self-requested a review July 15, 2021 17:15
@codyshepherd
Copy link
Contributor Author

I've confirmed that changes to the /etc/influxdb/config.toml are visible to influxd when started by systemd

Copy link
Contributor

@danxmoran danxmoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

KillMode=control-group
Restart=on-failure
Type=forking
PIDFile=/var/lib/influxdb/influxd.pid
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PID files should go in /run/, see systemd.service(5):

PIDFile=

Takes a path referring to the PID file of the service. Usage of this option is recommended for services where Type= is set to forking. The path specified typically points to a file below /run/. If a relative path is specified it is hence prefixed with /run/. The service manager will read the PID of the main process of the service from this file after start-up of the service. The service manager will not write to the file configured here, although it will remove the file after the service has shut down if it still exists. The PID file does not need to be owned by a privileged user, but if it is owned by an unprivileged user additional safety restrictions are enforced: the file may not be a symlink to a file owned by a different user (neither directly nor indirectly), and the PID file must refer to a process already belonging to the service.

Note that PID files should be avoided in modern projects. Use Type=notify or Type=simple where possible, which does not require use of PID files to determine the main process of a service and avoids needless forking.

@@ -0,0 +1,17 @@
#!/bin/bash -e

/usr/bin/influxd &
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this background-process dance with the PID file, you could use ExecStartPost= in the unit file to call this script separately. This would also have the advantage of allowing the system administrator to override it with a dropin file if the script causes a problem like in #21967

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

systemd service should "wait" until influxdb has finished starting up
3 participants