Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Exit on idle after txn #606

Closed
wants to merge 5 commits into from
Closed

Conversation

cgwalters
Copy link
Member

So...we have some bad memory leaks somewhere. We go from 14MB RSS after being
activated from atomic host status to 54MB after a dummy rpm-ostree install foo, 80MB for a successful install, and I've seen it as high as 140MB.

Exiting on idle is what PackageKit has done forever. One problem particularly
for both of us is that libdnf and all of the libraries underneath it are used
almost exclusively by command line clients that exit when done, so memory leaks
aren't particularly important to them.

This may break Cockpit though, since it won't be able to monitor for sysroot
changes. I considered instead running txns in a subprocess, which is possible
but harder.

@cgwalters cgwalters added the WIP label Feb 4, 2017
@cgwalters
Copy link
Member Author

This code all works, but marking as WIP.

@cgwalters
Copy link
Member Author

@petervo From looking at the cockpit code, it seems like this would result in an error message? If so, how hard would it be to adjust the code to handle it?

@petervo
Copy link
Contributor

petervo commented Feb 12, 2017

@cgwalters way back when, the original WIP version of this had this in, but we took it out in review because of races. I think cockpit will handle this correctly and shouldn't cause an error. In any case I agree that's it's good to have so when it hits centos continuous, i'll make sure everything continues to work as expected.

@cgwalters cgwalters added WIP and removed WIP labels Feb 13, 2017
@cgwalters
Copy link
Member Author

I think I need to revisit https://lists.freedesktop.org/archives/dbus/2015-May/016671.html and at least try to adapt things here for that.

This was referenced Mar 6, 2017
@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably f9944e6) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

Ok, dependencies in #660

This should be good to go now. Removing WIP.

@cgwalters cgwalters removed the WIP label Mar 7, 2017
@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 7cf3664) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

OK, rebased again. Not totally sure whether to land this right now though...maybe we make it a config option?

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wait we're missing a transition to APPSTATE_FLUSHING here, no?

diff --git a/src/app/rpmostree-builtin-start-daemon.c b/src/app/rpmostree-builtin-start-daemon.c
index 9bbcbd1..91dd817 100644
--- a/src/app/rpmostree-builtin-start-daemon.c
+++ b/src/app/rpmostree-builtin-start-daemon.c
@@ -382,6 +387,8 @@ rpmostree_builtin_start_daemon (int             argc,
        *  https://github.com/cgwalters/test-exit-on-idle
        */
       sd_notify (FALSE, "STOPPING=1");
+      state_transition (APPSTATE_FLUSHING);
       g_dbus_connection_call (bus,
                               "org.freedesktop.DBus", "/org/freedesktop/DBus",
                               "org.freedesktop.DBus", "ReleaseName",

Also, it doesn't seem like STOPPING=1 causes systemd to start up a new instance. E.g. if I do:

diff --git a/src/app/rpmostree-builtin-start-daemon.c b/src/app/rpmostree-builtin-start-daemon.c
index 9bbcbd1..81eccc7 100644
--- a/src/app/rpmostree-builtin-start-daemon.c
+++ b/src/app/rpmostree-builtin-start-daemon.c
@@ -372,6 +372,11 @@ rpmostree_builtin_start_daemon (int             argc,
   g_debug ("Entering main event loop");
   rpmostreed_daemon_run_until_idle_exit (rpm_ostree_daemon);

+ /* note that STOPPING=1 was already done in the idle exit */
+  sd_journal_print (LOG_INFO, "sleeping for race");
+  sleep (10);
+
   if (bus)
     {
       /* We first tell systemd we're stopping, so it knows to activate a new instance

And run rpm-ostree status during the race window, I get blocked until the daemon exits (which is expected), and then get:

error: Rejected send message, 1 matched rules; type="method_call", sender=":1.117" (uid=0 pid=22345 comm="rpm-ostree status ") interface="org.projectatomic.rpmostree1.Sysroot" member="RegisterClient" error name="(unset)" requested_reply="0" destination=":1.116" (uid=0 pid=22316 comm="/usr/bin/rpm-ostree start-daemon ")

OTOH, if I instead put the race window after ReleaseName, I still get blocked, but I then get served by the new instance, which I think is what we want to happen before as well, right?


sd_notifyf (0, "STOPPING=1\nSTATUS=Exiting due to idle");
self->running = FALSE;
g_main_context_wakeup (NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this though? This is already running as part of the main context, so isn't it already woken up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we don't need it here. However, if one is doing multithreading, the main loop does need to be woken up. So I always add this as part of the "manage mainloop termination condition" pattern. It's just a write() to an eventfd() underneath so it's cheap.

@cgwalters
Copy link
Member Author

Hmm, wait we're missing a transition to APPSTATE_FLUSHING here, no?

Urgh, yes. Good catch. I'm going to need to squash together the idle exit/mainloop logic into one file really. I guess rpmostreed-daemon.c?

@cgwalters
Copy link
Member Author

RE the dbus error...I think the problem is bus policy won't let messages through to us anymore since we don't own the name. To fix this we need to do #745

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 675066a) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

Note to self: from looking at massif briefly, the vast majority of the memory usage goes into libsolv. I'm not sure I really want to debug that code. Better to make txns a subprocess at least (#897) or do this.

This closes a race condition for having the daemon idle exit.  After
the daemon has released its bus name, the dbus-daemon will no longer
allow messages through that targeted its unique name.

Since the intention of the `RegisterClient` method is to be the "knock on the
door", fix this by directly sending a message to the well-known name.

Second, we need to handle the case where the daemon exits without
replying; @jlebon added a `sleep(10)` invocation after the daemon
mainloop quit but before we `ReleaseName`, and I verified these two
things combine to fix that case.
Now that we have the ability to both track clients and our active transaction,
and the `RegisterClient` call acts "atomically", let's start doing exit-on-idle
and return the RAM to the people.
@cgwalters
Copy link
Member Author

Rebased 🏄 and fixed up! I figured out a way to close the race you found; see the first commit.

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits, otherwise looks good! I tried add a sleep in the same places and got appropriate behaviour in all of them.

break; /* Success! */
else
{
if (g_dbus_error_is_remote_error (local_error))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can de-indent all of this by just dropping the else.

* FLUSHING right now, let's just forcibly do so if it hasn't been done
* already.
*/
if (appstate <= APPSTATE_FLUSHING)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should technically be strictly <, right?


g_clear_pointer (&self->idle_exit_source, (GDestroyNotify)g_source_unref);

sd_notifyf (0, "STOPPING=1\nSTATUS=Exiting due to idle");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just drop the STOPPING=1 here? We already do this once we exit the loop and it covers all other possible non-idle-exit scenarios as well.

@cgwalters
Copy link
Member Author

Nice review, thanks! Fixups ⬆️

@jlebon
Copy link
Member

jlebon commented Oct 3, 2017

Nice! @rh-atomic-bot r+ 2cda8d7

@rh-atomic-bot
Copy link

⚡ Test exempted: merge already tested.

rh-atomic-bot pushed a commit that referenced this pull request Oct 3, 2017
Now that we have the ability to both track clients and our active transaction,
and the `RegisterClient` call acts "atomically", let's start doing exit-on-idle
and return the RAM to the people.

Closes: #606
Approved by: jlebon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants