Conversation
59f8771 to
a1faf54
Compare
d32f8be to
b6ece6d
Compare
664dfa8 to
072d8db
Compare
| )?; | ||
| sim_source.simulate_zone_errors(&source.with_zone_error)?; | ||
| Ok(SimTufRepoDescription::new(sim_source)) | ||
| if source.with_measurement_manifest_error { |
There was a problem hiding this comment.
I haven't gotten to the changes to SimTufRepoDescription yet, but looking at the surrounding code here I'm wondering if we could attach this to SimTufRepoSource instead? So both these branches would become something like
sim_source.simulate_zone_errors(&source.with_zone_error)?;
+ sim_source.simulate_measurement_error("...error message...")?;instead of needing to call different SimTufRepoDescription constructors.
There was a problem hiding this comment.
That's what I tried at first. simulate_zone_errors works to simulate an error on individual zones in the manifest. What with_measurement_manifest_error is simulating is no manifest present at all (which is very important for us to simulate as that's what we're going to see in R18 -> R19)
dev-tools/reconfigurator-cli/tests/input/cmds-missing-measurement-manifest.txt
Outdated
Show resolved
Hide resolved
dev-tools/reconfigurator-cli/tests/output/cmds-add-sled-no-disks-stdout
Outdated
Show resolved
Hide resolved
nexus/reconfigurator/planning/tests/integration_tests/planner.rs
Outdated
Show resolved
Hide resolved
| panic!("did not converge after {MAX_PLANNING_ITERATIONS} iterations"); | ||
| } | ||
|
|
||
| // This test case was based on an error I hit while developing! |
There was a problem hiding this comment.
... what was the error? 😅
There was a problem hiding this comment.
Having one set of measurements be a subset of another wasn't detected correctly and we'd generate infinite blueprints. I 'll clarify that
| } | ||
|
|
||
| #[test] | ||
| fn test_multiple_measurements() { |
There was a problem hiding this comment.
Does this test cover everything test_subset_measurements() covers (but with more measurements)? At a glance that look very similar.
There was a problem hiding this comment.
test_subset_measurements was based on a specific error case I ran into. test_multiple_measurements was designed to check the expected case but I do think they have a lot of overlap. I think I'd like to keep both with some clarification.
|
Will give this another look soon - also would be good to get eyes from @sunshowers since this touches a lot of the mupdate override / recovery bits. |
I'm pretty sure the edit checks were wrong
nexus/reconfigurator/planning/src/blueprint_editor/sled_editor/scalar.rs
Outdated
Show resolved
Hide resolved
nexus/reconfigurator/planning/src/blueprint_editor/sled_editor/scalar.rs
Outdated
Show resolved
Hide resolved
nexus/reconfigurator/planning/src/blueprint_editor/sled_editor/scalar.rs
Outdated
Show resolved
Hide resolved
| pub use planning_report::PlanningCockroachdbSettingsStepReport; | ||
| pub use planning_report::PlanningDecommissionStepReport; | ||
| pub use planning_report::PlanningExpungeStepReport; | ||
| pub use planning_report::PlanningMeasurementUpdatesStepReport; |
There was a problem hiding this comment.
This file has a few instances of measurment -- could you fix the typos in a followup?
sunshowers
left a comment
There was a problem hiding this comment.
a few more comments, going to sleep now :)
| Err("reconfigurator-sim: simulated error \ | ||
| validating zone image" | ||
| .to_owned()) |
There was a problem hiding this comment.
Should this be "simulated error validating measurement" or similar?
| let mut sim = ReconfiguratorCliTestState::new(TEST_NAME, &logctx.log); | ||
| sim.load_example_customized(|builder| builder.with_target_release_0_0_1()) | ||
| .expect("loaded example system"); | ||
| let blueprint1 = sim.assert_latest_blueprint_is_blippy_clean(); |
There was a problem hiding this comment.
Do we need to add blippy checks for measurements?
There was a problem hiding this comment.
I've never looked at blippy before! I'll take a pass and see what makes sense.
| # We expect to see all sleds successfully show measurements in the artifact | ||
| # state including the sled with the measurement manifest error | ||
| blueprint-diff latest |
There was a problem hiding this comment.
| # We expect to see all sleds successfully show measurements in the artifact | |
| # state including the sled with the measurement manifest error | |
| blueprint-diff latest | |
| # We expect to see all sleds successfully show measurements in the artifact | |
| # state, including the sled with the measurement manifest error. | |
| blueprint-diff latest |
Also, what happens if:
- the mupdate override is cleared, setting measurements to
InstallDataset - then noop conversion finds that the manifest is missing
- which causes
do_plan_measurementsto be blocked?
I don't fully understand the state machine here I guess -- I don't think zone images have a missing state the way this does.
There was a problem hiding this comment.
Measurements do behave differently than zones because we have to account for the upgrade case. Online update is now available to customers so we cannot guarantee a MUPdate to produce a manifest or measurements in the install dataset. This is okay and the code handles this update path appropriately.
It's impossible to tell the difference between a missing manifest because it never got one (expected) and a missing manifest because of another reason (unexpected). In the example given, the expectation is that if we have a mupdate override we must have done a mupdate which should install the proper measurements. If we are in a case where we're trying to boot with measurements set to InstallDataset but can't get the manifest this is going to be a hard error that requires a new MUPdate to fix. The nightmare scenario is something where a MUPdate cannot fix our measurement problems.
Co-authored-by: Rain <rain@oxide.computer>
| /// This is the same as `set_value` but if the internal value is | ||
| /// still `Original` and `value` matches we will leave the | ||
| /// value as `Original` | ||
| pub(crate) fn set_value_if_unchanged(&mut self, value: T) -> Cow<'_, T> |
There was a problem hiding this comment.
I looked at this again and I think this function is actually unnecessary and I had accounting bugs elsewhere? Going to see about re-testing.
| Measurement updates: | ||
| Waiting on zone add/update blockers |
Deployment fix sigh
This is the full stack of changes to support reference measurements. This is described more fully in RFD 512 but briefly: the goal here is to be able to distribute a set of reference measurements (hashes of what software we expect to be running on the rack) to sprockets so each end of a sprockets connection can appraise the measurements of the remote peer (compare what's actually running on the rack to the expected reference measurements).
Blueprints now have knowledge of measurements (see omicron#9718) and this PR stack is responsible for having reconfigurator update the measurements. Measurements are an artifact in the TUF repo. The high level goals here are
Unknownmeasurements (the default when pulling out old blueprints) to a known set of measurementsBig points to check
Automated test output to check
Tests added
Most exciting files to review