Skip to content

[sled-agent] Add tests for service management, fix prefix-matching bug #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 6, 2022

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Apr 6, 2022

Background

@leftwo helped identify an issue where the ServiceManager attempts to create a service which already exists because it failed to identify that one was already running, based on the name. This resulted in a "DLPI link error", as the zone management logic tried to re-allocate an in-use IP address.

What this patch does

  • Refactors ServiceManager to be slightly easier to test
  • Fixes the prefix matching issue (it now checks for the existence of "oxz_my_svc" for a service named "my_svc", which is what the zone is actually named).
  • Adds a regression test for this issue, as well as tests for a handful of other basic behaviors.

@smklein smklein requested review from bnaecker and leftwo April 6, 2022 15:32
///
/// This results in a zone name which is distinct across different zpools,
/// but stable and predictable across reboots.
pub fn get_zone_name(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply a refactor to make accessing this information easier from elsewhere

pub async fn new(
log: Logger,
physical_link: Option<PhysicalLink>,
config_path: Option<PathBuf>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing the config_path as an optional argument makes this much safer to test.

Comment on lines +132 to +134
let expected_zone_name =
InstalledZone::get_zone_name(&service.name, None);
if existing_zones.iter().any(|z| z.name() == expected_zone_name) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual bug fix

@smklein smklein enabled auto-merge (squash) April 6, 2022 16:19
@smklein smklein merged commit 65e0804 into main Apr 6, 2022
@smklein smklein deleted the check-prefix-better branch April 6, 2022 16:36
leftwo pushed a commit that referenced this pull request Mar 26, 2025
Crucible changes are:
Make `ImpactedBlocks` an absolute range (#1685)
update omicron (#1687)
Update Rust crate chrono to v0.4.40 (#1682)
Update Rust crate tempfile to v3.19.0 (#1676)
Log loops during crutest replay (#1674)
Refactor live-repair start and send (#1673)
Update Rust crate libc to v0.2.171 (#1684)
Update Rust crate clap to v4.5.32 (#1683)
Update Rust crate async-trait to 0.1.88 (#1680)
Update Rust crate bytes to v1.10.1 (#1681)
Update Rust crate anyhow to v1.0.97 (#1679)
Update Rust crate http to v1 (#1678)
Update Rust crate tokio-util to v0.7.14 (#1675)
Update Rust crate thiserror to v2 (#1657)
Update Rust crate omicron-zone-package to 0.12.0 (#1647)
Update Rust crate opentelemetry to 0.28.0 (#1543)
Update Rust crate itertools to 0.14.0 (#1646)
Update Rust crate clearscreen to v4 (#1656)
nightly test polish (#1665)
Update Rust crate strum_macros to 0.27 (#1654)
Update Rust crate strum to 0.27 (#1653)
Update Rust crate rusqlite to 0.34 (#1652)
Better return semantics from should_send
Trying to simplify enqueue
Remove unnecessary argument
Remove unused code from `ImpactedBlocks` (#1668)
Log when pantry/volume layers activate things (#1670)
Fix unused import (#1669)
Use `RangeSet` instead of tracking every complete job (#1663)
Update Rust crate dropshot to 0.16.0 (#1645)
Simplify pending work tracking (#1662)
Remove rusqlite dependency from crucible-common (#1659)
Update Rust crate reedline to 0.38.0 (#1651)
Update Rust crate proptest to 1.6.0 (#1648)
Update Rust crate bytes to v1.10.0 (#1644)
Update Rust crate tracing-subscriber to 0.3.19 (#1643)
Update Rust crate tracing to v0.1.41 (#1642)
Update Rust crate toml to v0.8.20 (#1641)
Update Rust crate thiserror to v1.0.69 (#1640)
Update Rust crate serde_json to v1.0.139 (#1639)
Update Rust crate serde to v1.0.218 (#1638)
Update Rust crate reqwest to v0.12.12 (#1637)
Update Rust crate libc to v0.2.170 (#1636)
Update Rust crate indicatif to 0.17.11 (#1635)
Update Rust crate httptest to 0.16.3 (#1634)
Update Rust crate clap to v4.5.31 (#1632)
Update Rust crate chrono to v0.4.39 (#1631)
Update Rust crate byte-unit to 5.1.6 (#1630)
Update Rust crate async-trait to 0.1.86 (#1629)
Update Rust crate csv to 1.3.1 (#1633)
Update Rust crate anyhow to v1.0.96 (#1628)
Fix a test flake (#1626)
Bitpack per-block slot data in memory (#1625)
Use `div_ceil` instead of `(x + 7) / 8` (#1624)
Add repair server dynamometer (#1618)

Propolis changes are:
Paper over minor() differences
Update crucible (#886)
Test oximeter metrics end to end (#855)
server: improve behavior of starting VMs that are waiting for Crucible activation (#873)
server: extend API request queue's memory of prior requests (#880)
Use production propolis-server flags in PHD CI builds (#878)

Added a new field to the Board struct that propolis requires.
leftwo added a commit that referenced this pull request Mar 27, 2025
Crucible changes are:
Make `ImpactedBlocks` an absolute range (#1685)
update omicron (#1687)
Update Rust crate chrono to v0.4.40 (#1682)
Update Rust crate tempfile to v3.19.0 (#1676)
Log loops during crutest replay (#1674)
Refactor live-repair start and send (#1673)
Update Rust crate libc to v0.2.171 (#1684)
Update Rust crate clap to v4.5.32 (#1683)
Update Rust crate async-trait to 0.1.88 (#1680)
Update Rust crate bytes to v1.10.1 (#1681)
Update Rust crate anyhow to v1.0.97 (#1679)
Update Rust crate http to v1 (#1678)
Update Rust crate tokio-util to v0.7.14 (#1675)
Update Rust crate thiserror to v2 (#1657)
Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust
crate opentelemetry to 0.28.0 (#1543)
Update Rust crate itertools to 0.14.0 (#1646)
Update Rust crate clearscreen to v4 (#1656)
nightly test polish (#1665)
Update Rust crate strum_macros to 0.27 (#1654)
Update Rust crate strum to 0.27 (#1653)
Update Rust crate rusqlite to 0.34 (#1652)
Better return semantics from should_send
Trying to simplify enqueue
Remove unnecessary argument
Remove unused code from `ImpactedBlocks` (#1668)
Log when pantry/volume layers activate things (#1670) Fix unused import
(#1669)
Use `RangeSet` instead of tracking every complete job (#1663) Update
Rust crate dropshot to 0.16.0 (#1645)
Simplify pending work tracking (#1662)
Remove rusqlite dependency from crucible-common (#1659) Update Rust
crate reedline to 0.38.0 (#1651)
Update Rust crate proptest to 1.6.0 (#1648)
Update Rust crate bytes to v1.10.0 (#1644)
Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate
tracing to v0.1.41 (#1642)
Update Rust crate toml to v0.8.20 (#1641)
Update Rust crate thiserror to v1.0.69 (#1640)
Update Rust crate serde_json to v1.0.139 (#1639)
Update Rust crate serde to v1.0.218 (#1638)
Update Rust crate reqwest to v0.12.12 (#1637)
Update Rust crate libc to v0.2.170 (#1636)
Update Rust crate indicatif to 0.17.11 (#1635)
Update Rust crate httptest to 0.16.3 (#1634)
Update Rust crate clap to v4.5.31 (#1632)
Update Rust crate chrono to v0.4.39 (#1631)
Update Rust crate byte-unit to 5.1.6 (#1630)
Update Rust crate async-trait to 0.1.86 (#1629)
Update Rust crate csv to 1.3.1 (#1633)
Update Rust crate anyhow to v1.0.96 (#1628)
Fix a test flake (#1626)
Bitpack per-block slot data in memory (#1625)
Use `div_ceil` instead of `(x + 7) / 8` (#1624)
Add repair server dynamometer (#1618)

Propolis changes are:
Paper over minor() differences
Update crucible (#886)
Test oximeter metrics end to end (#855)
server: improve behavior of starting VMs that are waiting for Crucible
activation (#873) server: extend API request queue's memory of prior
requests (#880) Use production propolis-server flags in PHD CI builds
(#878)

---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants