Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
systemd: fix snapd systemd-unit stop progress notifications (canonica…
…l#11340) * systemd: fix snapd systemd-unit stop progress notifications The current behaviour of the systemd stop progress code is to print the list of systemd units not stopped after 250ms, and then after every 20 seconds (notice ticker), erroneously print out the same list even though some of the units may have stopped in the meantime. However, the bigger issue is that the non-mocked (real) code starts the stop sequence by making a blocking systemctl stop request and then afterwords (once all units are stopped) proceed to the progress code, which has no effect and exits immediately. The unit tests does not pick this up because the mock systemctl does not no anything, and therefore the simulation of progress works as expected because the output of the 'systemctl show' progress calls are mocked to simulate certain outcomes. Finally, the 'Stop()' function provided a 'timeout' argument which suggested the user could control when a stop attempt should timeout and fail. However, this feature did not work as the original call was blocking. Furthermore, this features does not make sense because it creates a second place where timeouts are introduced. The first place is in the systemd back-end where 'TimeoutStopSec' is optionally configured by snapd. In contrast with systemd's actual timeout mechanism described, the 'Stop()' call offered another timeout, but this mechanism is inconsistently used all over the code-base with different hard-coded timeout values, not matching the underlying service defaults to configured values. In the worse case scenario the snapd 'Stop()' would return an error due to timeout while systemd is still busy stopping a unit, because it was a longer unit timeout configure. The timeout value offered by snapd has no real purpose if carefully analysed. The following changes were made: 1. Remove the systemd Stop() timeout argument. The timeout is now automatically provided by systemdctl when it returns (blocking). This allows the timeout to always be aligned with the actual timeout 'TimeoutStopSec' provided by the unit, and takes into account systemd timeout behaviour related to 'ExecStop' commands. 2. Change the blocking systemctl stop command to async run in a go routine allowing progress tracking to continue in parallel. Note that the call to Stop() is still blocking because the progress code waits until either units are all stopped or the stop command returns. 3. Change the notice ticker to rather implement an initial notice silence period (1s). After this expires, the first notification is generated with the list of units still not stopped. This delay was chosen based on empirical tests performed on a Raspberry Pi 3 (representing slower targets) under heavy IO and CPU load (where it took up to 750ms for 4 services to stop). 4. Change the progress logic to always send a notification after the notice silence period on any change to the list of unit we are still waiting for, and always only poll the state of the remaining units. 5. Add additional unit tests to better cover the new progress and notification logic, and make existing tests more thorough. 6. Add critical sections inside the mocked systemctl functions to prevent corruption during concurrent modifications to some of the test suite logs and counters. This was needed because the Stop() is now issued from a go routine. The following example shows the expected output for 6 systemd units terminated with the 'Stop' function. The last four (test3 - test6) has a random post SIGTERM delay before the process exits. The following notifications are generated for this example after the 1s notification silence period expires: Waiting for "test3", "test4", "test5", "test6" to stop. Waiting for "test3", "test4", "test5" to stop. Waiting for "test3", "test5" to stop. Waiting for "test3" to stop. This patch is required by canonical#11113 Signed-off-by: Fred Lotter <fred.lotter@canonical.com> * typos * Improved the system stop progress logic. * Number of small review improvements * Remove redundant channel write * reviews: add thread safety into MockSystemctl * many: have a separate systemd.MockSystemctlWithDelay keep MockSystemctl signature as before as we need to introduce controlled delays only in the systemd package own tests so far Co-authored-by: Samuele Pedroni <pedronis@lucediurna.net>
- Loading branch information