Full measurements in reconfigurator by labbott · Pull Request #9877 · oxidecomputer/omicron

labbott · 2026-02-18T13:35:50Z

This is the full stack of changes to support reference measurements. This is described more fully in RFD 512 but briefly: the goal here is to be able to distribute a set of reference measurements (hashes of what software we expect to be running on the rack) to sprockets so each end of a sprockets connection can appraise the measurements of the remote peer (compare what's actually running on the rack to the expected reference measurements).

Blueprints now have knowledge of measurements (see omicron#9718) and this PR stack is responsible for having reconfigurator update the measurements. Measurements are an artifact in the TUF repo. The high level goals here are

Handle going from a blueprint with Unknown measurements (the default when pulling out old blueprints) to a known set of measurements
Support reading measurements from the install dataset during a MUPdate override situation
Making sure the measurements included on each sled include all the measurements from the old TUF repo and the new TUF repo. This is the only set of software we ever expect to be running.
Updating the measurements before performing any other changes to the sled

Big points to check

MUPdate override must always work as this will be the recovery path. This will be even more important when measurements are enforced (right now it's logged but the sprockets connection continues)
Is this the right place for measurements to come first in reconfigurator? Right now we're only measuring RoT and SP but Host OS is coming soon. Will we need changes when that happens?
Are there enough test cases with the reconfigurator-cli?
The edit counts for measurements with the sled-editor never felt quite right. Is there a better way to check?

Automated test output to check

cmds-missing-measurement-manifest.txt (new test)
cmds-unknown-measurements.txt (new test)
cmds-mupdate-update-flow.txt (had to tweak output)

Tests added

nexus/reconfigurator/planning/tests/integration_tests/planner.rs
nexus/db-queries/src/db/datastore/deployment.rs

Most exciting files to review

nexus/reconfigurator/planning/src/planner.rs (this changes some of the logic of reconfigurator to have measurements come first)
nexus/reconfigurator/planning/src/planner/image_source.rs (MUPdate override logic, also needs to work with unknown measurements)
nexus/reconfigurator/planning/src/measurements.rs (actually very concise logic for planning measurements)
nexus/reconfigurator/planning/src/blueprint_editor/sled_editor/measurements.rs (logic for editing measurements)

dev-tools/reconfigurator-cli/src/lib.rs

jgallagher · 2026-03-02T18:47:29Z

dev-tools/reconfigurator-cli/src/lib.rs

                )?;
                sim_source.simulate_zone_errors(&source.with_zone_error)?;
-                Ok(SimTufRepoDescription::new(sim_source))
+                if source.with_measurement_manifest_error {


I haven't gotten to the changes to SimTufRepoDescription yet, but looking at the surrounding code here I'm wondering if we could attach this to SimTufRepoSource instead? So both these branches would become something like

sim_source.simulate_zone_errors(&source.with_zone_error)?; + sim_source.simulate_measurement_error("...error message...")?;

instead of needing to call different SimTufRepoDescription constructors.

That's what I tried at first. simulate_zone_errors works to simulate an error on individual zones in the manifest. What with_measurement_manifest_error is simulating is no manifest present at all (which is very important for us to simulate as that's what we're going to see in R18 -> R19)

dev-tools/reconfigurator-cli/tests/input/cmds-missing-measurement-manifest.txt

dev-tools/reconfigurator-cli/tests/output/cmds-add-sled-no-disks-stdout

nexus/reconfigurator/planning/src/planner.rs

nexus/reconfigurator/planning/tests/integration_tests/planner.rs

jgallagher · 2026-03-02T21:29:21Z

nexus/reconfigurator/planning/tests/integration_tests/planner.rs

+    panic!("did not converge after {MAX_PLANNING_ITERATIONS} iterations");
+}
+
+// This test case was based on an error I hit while developing!


... what was the error? 😅

Having one set of measurements be a subset of another wasn't detected correctly and we'd generate infinite blueprints. I 'll clarify that

jgallagher · 2026-03-02T21:31:26Z

nexus/reconfigurator/planning/tests/integration_tests/planner.rs

+}
+
+#[test]
+fn test_multiple_measurements() {


Does this test cover everything test_subset_measurements() covers (but with more measurements)? At a glance that look very similar.

test_subset_measurements was based on a specific error case I ran into. test_multiple_measurements was designed to check the expected case but I do think they have a lot of overlap. I think I'd like to keep both with some clarification.

jgallagher · 2026-03-03T22:38:37Z

Will give this another look soon - also would be good to get eyes from @sunshowers since this touches a lot of the mupdate override / recovery bits.

I'm pretty sure the edit checks were wrong

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

nexus/reconfigurator/planning/src/measurements.rs

nexus/reconfigurator/planning/src/planner/image_source.rs

nexus/db-queries/src/db/datastore/deployment.rs

nexus/reconfigurator/planning/src/measurements.rs

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor/scalar.rs

sunshowers · 2026-03-04T08:15:53Z

nexus/types/src/deployment.rs

 pub use planning_report::PlanningCockroachdbSettingsStepReport;
 pub use planning_report::PlanningDecommissionStepReport;
 pub use planning_report::PlanningExpungeStepReport;
+pub use planning_report::PlanningMeasurementUpdatesStepReport;


This file has a few instances of measurment -- could you fix the typos in a followup?

sunshowers

a few more comments, going to sleep now :)

nexus/reconfigurator/simulation/src/zone_images.rs

sunshowers · 2026-03-04T08:28:21Z

nexus/reconfigurator/simulation/src/zone_images.rs

                        Err("reconfigurator-sim: simulated error \
                             validating zone image"
                            .to_owned())


Should this be "simulated error validating measurement" or similar?

nexus/reconfigurator/planning/src/example.rs

sunshowers · 2026-03-04T08:32:16Z

nexus/reconfigurator/planning/tests/integration_tests/planner.rs

+    let mut sim = ReconfiguratorCliTestState::new(TEST_NAME, &logctx.log);
+    sim.load_example_customized(|builder| builder.with_target_release_0_0_1())
+        .expect("loaded example system");
+    let blueprint1 = sim.assert_latest_blueprint_is_blippy_clean();


Do we need to add blippy checks for measurements?

I've never looked at blippy before! I'll take a pass and see what makes sense.

sunshowers · 2026-03-04T08:34:28Z

dev-tools/reconfigurator-cli/tests/input/cmds-missing-measurement-manifest.txt

+# We expect to see all sleds successfully show measurements in the artifact
+# state including the sled with the measurement manifest error
+blueprint-diff latest


Suggested change

# We expect to see all sleds successfully show measurements in the artifact

# state including the sled with the measurement manifest error

blueprint-diff latest

# We expect to see all sleds successfully show measurements in the artifact

# state, including the sled with the measurement manifest error.

blueprint-diff latest

Also, what happens if:

the mupdate override is cleared, setting measurements to InstallDataset

then noop conversion finds that the manifest is missing

which causes do_plan_measurements to be blocked?

I don't fully understand the state machine here I guess -- I don't think zone images have a missing state the way this does.

Measurements do behave differently than zones because we have to account for the upgrade case. Online update is now available to customers so we cannot guarantee a MUPdate to produce a manifest or measurements in the install dataset. This is okay and the code handles this update path appropriately.

It's impossible to tell the difference between a missing manifest because it never got one (expected) and a missing manifest because of another reason (unexpected). In the example given, the expectation is that if we have a mupdate override we must have done a mupdate which should install the proper measurements. If we are in a case where we're trying to boot with measurements set to InstallDataset but can't get the manifest this is going to be a hard error that requires a new MUPdate to fix. The nightmare scenario is something where a MUPdate cannot fix our measurement problems.

Co-authored-by: Rain <rain@oxide.computer>

labbott · 2026-03-04T16:18:13Z

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor/scalar.rs

+    /// This is the same as `set_value` but if the internal value is
+    /// still `Original` and `value` matches we will leave the
+    /// value as `Original`
+    pub(crate) fn set_value_if_unchanged(&mut self, value: T) -> Cow<'_, T>


I looked at this again and I think this function is actually unnecessary and I had accounting bugs elsewhere? Going to see about re-testing.

labbott · 2026-03-04T21:05:15Z

dev-tools/reconfigurator-cli/tests/output/cmds-mupdate-update-flow-stdout

+Measurement updates:
+Waiting on zone add/update blockers


@jgallagher is this clearer?

Deployment fix sigh

labbott marked this pull request as draft February 18, 2026 13:35

labbott force-pushed the labbott/full_measurement_reconfigurator branch 2 times, most recently from 59f8771 to a1faf54 Compare February 24, 2026 17:38

labbott changed the base branch from main to labbott/measurement_blueprints February 24, 2026 20:46

labbott changed the base branch from labbott/measurement_blueprints to main February 24, 2026 20:47

labbott force-pushed the labbott/full_measurement_reconfigurator branch from d32f8be to b6ece6d Compare February 25, 2026 20:10

labbott changed the base branch from main to labbott/measurement_blueprints February 25, 2026 20:10

labbott changed the base branch from labbott/measurement_blueprints to main February 25, 2026 20:15

labbott force-pushed the labbott/full_measurement_reconfigurator branch 2 times, most recently from 664dfa8 to 072d8db Compare February 27, 2026 13:53

labbott changed the title ~~WIP: full measurements in reconfigurator~~ Full measurements in reconfigurator Feb 27, 2026

labbott marked this pull request as ready for review February 27, 2026 13:53

labbott added this to the 19 milestone Feb 27, 2026

labbott requested review from davepacheco and jgallagher February 27, 2026 19:58

Add measurement logic to reconfigurator

072d8db

jgallagher reviewed Mar 2, 2026

View reviewed changes

labbott added 5 commits March 3, 2026 21:22

Name change

384c3e7

Test fixups

f29781a

Old comment

a55d795

Wrapping manually

4ce2447

name change

4424a2c

jgallagher requested review from jgallagher and sunshowers March 3, 2026 22:38

labbott added 5 commits March 4, 2026 01:06

better approach for editing measurements?

545226f

Update that test

116f066

Some rename and remove another function

367f998

Fixup tests more

3ab9a00

I'm pretty sure the edit checks were wrong

Drop that

5f48975

sunshowers reviewed Mar 4, 2026

View reviewed changes

Update dev-tools/reconfigurator-cli/src/lib.rs

1c0aae2

Co-authored-by: Rain <rain@oxide.computer>

labbott commented Mar 4, 2026

View reviewed changes

labbott added 6 commits March 4, 2026 20:48

stray comment

4e68815

typo

38726fa

style

4ee44d1

style

38ea77e

typo

d2af4ce

typo

14372b9

labbott commented Mar 4, 2026

View reviewed changes

labbott added 19 commits March 4, 2026 22:01

Merge remote-tracking branch 'origin/main' into mar3_full_measurements

0f62f16

Deployment fix sigh

style fix

56bee39

Document assumption

1f67715

comment update

3e911b2

I don't think that's needed?

b760556

This goes there too

4bd1ccf

fix that output

8782ee3

measurement tweaks

85253bc

What was I going to account for? The world may never know.

1980a6b

tweak the checks for gen1

1c70350

Cleanup NoopMeasurements including letting rustfmt work

5409a13

change name

e2dcf06

simplify measurement artifact generaiton

d32306e

Turns out we don't need this branch anymore

742b5cf

Unique hash

e82b7c1

cleanup noop handling

880627e

one more place to print

2ac0df5

clippy

3b955c6

not neede

64bad13

Conversation

labbott commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher commented Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunshowers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

labbott commented Feb 18, 2026 •

edited

Loading