-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refs #34394 - trigger dynflow restart when DB restarts #1028
Conversation
manifests/dynflow/worker.pp
Outdated
@@ -51,6 +51,8 @@ | |||
enable => true, | |||
subscribe => Class['foreman::database'], | |||
} | |||
|
|||
Anchor <| title == 'postgresql::server::service::end' |> ~> Service[$service] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chaining certainly makes sense but I'm not sure this would really notify since I don't see a Service[postgresqld] ~> Anchor[postgresql::server::service::end]
. Have you verified this?
And spec testing here is unreliable (we've seen that in the past in rodjek/rspec-puppet#821), but you could try to see what this test shows. If it fails, you know for certain it isn't there ;)
it { is_expected.to contain_service('postgresqld').that_notifies('Service[dynflow-sidekiq@SOMETHING]') }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my reproducer setup here, all three dynflow services got a friendly kick when the DB got restarted, so I would say "yes this works"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chain is, btw Service['postgresqld'] ~> postgresql_conn_validator['validate_service_is_running'] -> Anchor[postgresql::server::service::end]
, so no, in theory it should not trigger 🙈
Yet:
# grep -i 'postgresql::server::service::end' /var/log/foreman-installer/katello.log
2022-02-10 11:42:03 [DEBUG ] [configure] Adding relationship from Anchor[postgresql::server::service::end] to Service[dynflow-sidekiq@orchestrator] with 'notify'
2022-02-10 11:42:03 [DEBUG ] [configure] Adding relationship from Anchor[postgresql::server::service::end] to Service[dynflow-sidekiq@worker-1] with 'notify'
2022-02-10 11:42:03 [DEBUG ] [configure] Adding relationship from Anchor[postgresql::server::service::end] to Service[dynflow-sidekiq@worker-hosts-queue-1] with 'notify'
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Postgresql_conn_validator[validate_service_is_running]/before: before to Anchor[postgresql::server::service::end]
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]/notify: notify to Service[dynflow-sidekiq@orchestrator]
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]/notify: notify to Service[dynflow-sidekiq@worker-1]
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]/notify: notify to Service[dynflow-sidekiq@worker-hosts-queue-1]
2022-02-10 11:42:17 [DEBUG ] [configure] Class[Postgresql::Server::Service]: Scheduling refresh of Anchor[postgresql::server::service::end]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Starting to evaluate the resource (1174 of 2003)
2022-02-10 11:42:17 [INFO ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Triggered 'refresh' from 1 event
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Scheduling refresh of Service[dynflow-sidekiq@orchestrator]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Scheduling refresh of Service[dynflow-sidekiq@worker-1]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Scheduling refresh of Service[dynflow-sidekiq@worker-hosts-queue-1]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: The container Class[Postgresql::Server::Service] will propagate my refresh event
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Evaluated in 0.00 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something something theory and practice ...
What may happen is that something sends a notification to Class[Postgresql::Serverver::Service]
so all resources within the class are refreshed. I wonder what happens if you run:
foreman-installer
systemctl stop postgresql
foreman-installer
Does it then also restart Dynflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not (and the specified test also fails).
Should we use Service <| tag == 'postgresql::server::service' |>
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that validates with both rspec and the real world, nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still keep the anchor in? That ensures PostgreSQL is actually up and running. That doesn't need to be a refresh, just a requirement for ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can? It wasn't a problem before, though…
When PostgreSQL is restarted (or worse, unavailable for a while), Dynflow doesn't always cope well with this fact and sometimes ends in a bad state, not able to recover itself. This is clearly not fixing the underlying Dynflow issue, but at least lets restart Dynflow in the cases we *know* the DB dropped connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with not introducing the strict chaining and just sticking with this.
@evgeni @ekohl am i missing something here? |
No, the underlying dynflow issue is not fixed. Maybe @adamruzicka know more? |
No, not really. |
Is there an appropriate issue within the dynflow project? And thanks for the reply! |
Not really, https://projects.theforeman.org/issues/34485 should be the thing that covers it. Let's continue in there |
When PostgreSQL is restarted (or worse, unavailable for a while),
Dynflow doesn't always cope well with this fact and sometimes ends in a
bad state, not able to recover itself.
This is clearly not fixing the underlying Dynflow issue, but at least
lets restart Dynflow in the cases we know the DB dropped connections.