Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refs #34394 - trigger dynflow restart when DB restarts #1028

Merged
merged 1 commit into from
Feb 10, 2022
Merged

Conversation

evgeni
Copy link
Member

@evgeni evgeni commented Feb 10, 2022

When PostgreSQL is restarted (or worse, unavailable for a while),
Dynflow doesn't always cope well with this fact and sometimes ends in a
bad state, not able to recover itself.

This is clearly not fixing the underlying Dynflow issue, but at least
lets restart Dynflow in the cases we know the DB dropped connections.

@@ -51,6 +51,8 @@
enable => true,
subscribe => Class['foreman::database'],
}

Anchor <| title == 'postgresql::server::service::end' |> ~> Service[$service]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chaining certainly makes sense but I'm not sure this would really notify since I don't see a Service[postgresqld] ~> Anchor[postgresql::server::service::end]. Have you verified this?

And spec testing here is unreliable (we've seen that in the past in rodjek/rspec-puppet#821), but you could try to see what this test shows. If it fails, you know for certain it isn't there ;)

it { is_expected.to contain_service('postgresqld').that_notifies('Service[dynflow-sidekiq@SOMETHING]') }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my reproducer setup here, all three dynflow services got a friendly kick when the DB got restarted, so I would say "yes this works"

Copy link
Member Author

@evgeni evgeni Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chain is, btw Service['postgresqld'] ~> postgresql_conn_validator['validate_service_is_running'] -> Anchor[postgresql::server::service::end], so no, in theory it should not trigger 🙈

Yet:

# grep -i 'postgresql::server::service::end' /var/log/foreman-installer/katello.log 
2022-02-10 11:42:03 [DEBUG ] [configure] Adding relationship from Anchor[postgresql::server::service::end] to Service[dynflow-sidekiq@orchestrator] with 'notify'
2022-02-10 11:42:03 [DEBUG ] [configure] Adding relationship from Anchor[postgresql::server::service::end] to Service[dynflow-sidekiq@worker-1] with 'notify'
2022-02-10 11:42:03 [DEBUG ] [configure] Adding relationship from Anchor[postgresql::server::service::end] to Service[dynflow-sidekiq@worker-hosts-queue-1] with 'notify'
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Postgresql_conn_validator[validate_service_is_running]/before: before to Anchor[postgresql::server::service::end]
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]/notify: notify to Service[dynflow-sidekiq@orchestrator]
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]/notify: notify to Service[dynflow-sidekiq@worker-1]
2022-02-10 11:42:04 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]/notify: notify to Service[dynflow-sidekiq@worker-hosts-queue-1]
2022-02-10 11:42:17 [DEBUG ] [configure] Class[Postgresql::Server::Service]: Scheduling refresh of Anchor[postgresql::server::service::end]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Starting to evaluate the resource (1174 of 2003)
2022-02-10 11:42:17 [INFO  ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Triggered 'refresh' from 1 event
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Scheduling refresh of Service[dynflow-sidekiq@orchestrator]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Scheduling refresh of Service[dynflow-sidekiq@worker-1]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Scheduling refresh of Service[dynflow-sidekiq@worker-hosts-queue-1]
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: The container Class[Postgresql::Server::Service] will propagate my refresh event
2022-02-10 11:42:17 [DEBUG ] [configure] /Stage[main]/Postgresql::Server::Service/Anchor[postgresql::server::service::end]: Evaluated in 0.00 seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something something theory and practice ...

What may happen is that something sends a notification to Class[Postgresql::Serverver::Service] so all resources within the class are refreshed. I wonder what happens if you run:

foreman-installer
systemctl stop postgresql
foreman-installer

Does it then also restart Dynflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not (and the specified test also fails).

Should we use Service <| tag == 'postgresql::server::service' |> instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that validates with both rspec and the real world, nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still keep the anchor in? That ensures PostgreSQL is actually up and running. That doesn't need to be a refresh, just a requirement for ordering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can? It wasn't a problem before, though…

When PostgreSQL is restarted (or worse, unavailable for a while),
Dynflow doesn't always cope well with this fact and sometimes ends in a
bad state, not able to recover itself.

This is clearly not fixing the underlying Dynflow issue, but at least
lets restart Dynflow in the cases we *know* the DB dropped connections.
Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with not introducing the strict chaining and just sticking with this.

@evgeni evgeni merged commit f44d723 into master Feb 10, 2022
@evgeni evgeni deleted the i34394 branch February 10, 2022 14:56
@ekohl ekohl added the Bug label Feb 10, 2022
@markuslanz77
Copy link

@evgeni @ekohl
Unfortunately this fix is not applicable for installations that don't use the local PostgreSQL instance.
see: https://projects.theforeman.org/issues/34485

am i missing something here?

@evgeni
Copy link
Member Author

evgeni commented Apr 12, 2023

No, the underlying dynflow issue is not fixed.

Maybe @adamruzicka know more?

@adamruzicka
Copy link
Contributor

No, not really.

@markuslanz77
Copy link

markuslanz77 commented Apr 12, 2023

No, the underlying dynflow issue is not fixed.

Maybe @adamruzicka know more?

Is there an appropriate issue within the dynflow project? And thanks for the reply!

@adamruzicka
Copy link
Contributor

Not really, https://projects.theforeman.org/issues/34485 should be the thing that covers it. Let's continue in there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants