Skip to content

Closes: #391 (Phase 2) - post_migrate auto-heal for mixin column drift#481

Merged
arthanson merged 25 commits into
featurefrom
391-post-migrate-heal
Apr 30, 2026
Merged

Closes: #391 (Phase 2) - post_migrate auto-heal for mixin column drift#481
arthanson merged 25 commits into
featurefrom
391-post-migrate-heal

Conversation

@bctiemann
Copy link
Copy Markdown
Contributor

Summary

Implements automatic detection and repair of mixin column drift for Custom Object Type tables (Phase 2 of #391). When NetBox is upgraded and a mixin (e.g. ChangeLoggingMixin) gains a new concrete column, existing COT tables would previously be missing that column silently. This PR closes that gap.

Depends on PR #480 (Phase 1 -- base column snapshot).

New module: mixin_migration.py

  • _expected_base_fields(cot) -- resolves the full set of base columns from the live CustomObject model, excluding user-defined fields
  • _can_auto_add(field) -- determines whether a missing column can be safely added (ADD COLUMN) without breaking existing rows (nullable or has a default)
  • heal_cot(cot, verbosity, dry_run) -- detects drift for a single COT, adds safe columns via schema_editor.add_field(), warns on unsafe ones (NOT NULL/no default, type changes, removals), never auto-drops, refreshes schema_document["base_columns"] after successful additions
  • heal_all_cots(verbosity, dry_run) -- iterates all COTs, returns {total, healed, warnings} summary

Signal wiring (__init__.py)

_heal_mixin_columns() is connected to post_migrate in ready(). It fires automatically after every manage.py migrate run. A module-level _heal_ran flag ensures the work happens only once per process even though post_migrate fires once per installed app. Excluded during makemigrations and collectstatic.

Management command: upgrade_custom_objects

Explicit escape hatch for inspection and manual repair:

manage.py upgrade_custom_objects               # check and fix all COTs
manage.py upgrade_custom_objects --dry-run     # preview without DB changes
manage.py upgrade_custom_objects --cot mytype  # single COT by name or ID

Safety rules

Scenario Action
New nullable column Auto-ADD
New column with default Auto-ADD
New NOT NULL column, no default Warn only
Column type changed Warn only
Column removed from mixin Warn only, never DROP

Test plan

  • CanAutoAddTestCase -- nullable, defaulted, both, neither
  • ExpectedBaseFieldsTestCase -- returns id/created/last_updated, excludes user fields, returns real Django Field instances
  • HealCotTestCase -- no-drift returns empty; nullable column is added to DB; non-nullable warns; snapshot updated after addition; removed column warns but not dropped; dry_run reports without touching DB; dry_run does not update snapshot
  • HealAllCotsTestCase -- total matches COT count, healed=0 when no drift, correct summary keys
  • UpgradeCustomObjectsCommandTestCase -- runs without error, --dry-run flag, --cot by name, --cot by ID, unknown COT raises CommandError

Generated with Claude Code

…t COT creation

Add CustomObjectType._collect_base_columns() and _store_base_column_snapshot()
to record the concrete columns contributed by the CustomObject mixin hierarchy
(id, created, last_updated, and any future additions) at table-creation time.
The snapshot is stored in schema_document["base_columns"] alongside any existing
keys, preserving schema-exporter output from #386.

Add migration 0009_backfill_base_columns to populate the snapshot for all
existing COT tables using DB introspection, subtracting user-defined field names
to isolate base columns.

This is the foundation for Phase 2 (post_migrate auto-heal handler that detects
and applies mixin schema drift when NetBox is upgraded).

Note: migration 0009_alter_customobjecttype_version on branch
392-validate-semver-fields will need renumbering to 0010 before that PR merges.
Add netbox_custom_objects/mixin_migration.py with:
  - _expected_base_fields(): resolves base columns from live CustomObject model
  - _can_auto_add(): determines if a missing column can be safely ADD COLUMNed
  - heal_cot(): detects drift for one COT, adds safe columns, warns on unsafe
    ones, never auto-drops; refreshes schema_document snapshot on success
  - heal_all_cots(): iterates all COTs, returns {total, healed, warnings}

Wire _heal_mixin_columns() post_migrate signal handler in __init__.py:ready()
so the heal runs automatically after every 'manage.py migrate'. A module-level
_heal_ran flag deduplicates the call (post_migrate fires once per installed
app). makemigrations and collectstatic are excluded.

Add management command 'upgrade_custom_objects' with --dry-run and --cot flags
for explicit inspection and repair outside the migrate flow.

Tests cover: _can_auto_add() unit tests, _expected_base_fields() with/without
user fields, heal_cot() adding nullable columns, warning on non-nullable, not
dropping removed columns, snapshot refresh, dry_run correctness,
heal_all_cots() summary counts, and all management command flags.
@bctiemann bctiemann force-pushed the 391-post-migrate-heal branch from 61b9fe2 to f446e28 Compare April 23, 2026 10:11
- Snapshot refresh: derive updated base_columns by merging new fields into the
  existing snapshot dict rather than re-running _store_base_column_snapshot(model),
  which re-derives from model._meta and misses columns added directly to the DB
  by this heal pass (schema_editor.add_field does not update _meta).
- removed-column test: actually add ghost_col to the DB table so the
  'if col_name in actual_names' guard in heal_cot() fires correctly.
@bctiemann bctiemann requested a review from arthanson April 23, 2026 22:35
@bctiemann
Copy link
Copy Markdown
Contributor Author

@claude /review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @bctiemann's task in 2m 11s —— View job


Code Review — PR #481: Phase 2 post_migrate auto-heal for mixin column drift

  • Gather context and read changed files
  • Analyze code quality and correctness
  • Review test coverage
  • Post review feedback

Overview

This is a well-structured implementation of automatic mixin column drift detection and repair. The design is sound: post-migrate signal wiring, a module-level deduplication flag, a management command escape hatch, and clear safety rules about what can/cannot be auto-fixed. Overall the code is clean and the test coverage is thorough. A few issues are worth addressing before merge.


Issues Found

Bug: Field name vs. column name mismatch

mixin_migration.py:43-45 and mixin_migration.py:112

_expected_base_fields keys the result by f.name, but _actual_column_names returns DB introspection column names. These diverge whenever a model field uses db_column= to override the column name. The comparison at line 112 (if field_name in actual_names) would then fail to detect an existing column, causing heal_cot to try adding a column that already exists (which would fail or succeed depending on DB behavior).

# Current: uses f.name (Python attribute name)
return {f.name: f for f in model._meta.concrete_fields if ...}

# Should use f.attname or f.column for FK fields and db_column overrides:
return {f.column: f for f in model._meta.concrete_fields if ...}

Also, the user-field exclusion at line 41 checks cot.fields.values_list("name", flat=True) (Python field names), which also may not match f.column. These need to be consistent.

Fix this →


Missing implementation: type-change detection

mixin_migration.py docstring line 19 says "Warn only: column type appears to have changed" but there is no code that actually performs this check. The function detects missing columns and removed columns, but never compares the type of columns that exist in both expected and actual. This gap means silently incorrect column types go undetected.

Fix this →


Design concern: logging vs. stdout in the management command

upgrade_custom_objects.py uses self.stdout.write for high-level status and logger.info (inside heal_cot) for per-column details. In practice, users running manage.py upgrade_custom_objects --dry-run -v 2 won't see the "would ADD COLUMN" messages in their terminal unless Django logging is explicitly configured to print to console. This makes dry-run mode less useful interactively.

Consider having heal_cot accept an optional output callable (like self.stdout.write) so the command can route per-column messages to stdout, or have the command post-process the returned result dict to emit those details.


Minor: _heal_ran never resets — could confuse tests indirectly

__init__.py:37,59 — The module-level _heal_ran flag is set on first call and never cleared. This is by design for production (one heal per process), but if a test indirectly triggers _heal_mixin_columns (e.g., via a real post_migrate firing during TransactionTestCase setup), subsequent calls in the same test process are silently skipped. The tests currently call heal_cot and heal_all_cots directly so this doesn't bite them, but it's worth noting for future test authors.


Minor: double cot.get_model() call in hot path

mixin_migration.py:108 and 140_expected_base_fields(cot) calls cot.get_model() internally, and then cot.get_model() is called again in the add_field block. Consider storing the result in a local variable and passing it to _expected_base_fields, or at minimum retrieving the model once in heal_cot:

model = cot.get_model()
expected = _expected_base_fields(model, user_field_names)  # refactored signature

Test Coverage Assessment

The tests are comprehensive and well-structured. Highlights:

  • CanAutoAddTestCase — clean unit tests, all four combinations covered
  • HealCotTestCase — good integration tests against a real DB; dry-run, snapshot update, and removed-column cases all covered
  • HealAllCotsTestCase and UpgradeCustomObjectsCommandTestCase — solid smoke tests

One gap: test_dry_run_flag_accepted only asserts "DRY RUN" appears in stdout (the top-level handle() message). The per-field "would ADD COLUMN" messages from heal_cot go to logger.info, not self.stdout, so there's no test that verifies the dry-run per-column output path is actually reachable from the management command. This is a minor gap but aligns with the stdout/logger mixing issue above.


Summary

Issue Severity
f.name vs f.column in drift comparison Medium — would produce incorrect results for db_column-overriding fields
Missing type-change detection Low — documented in docstring but unimplemented
Logger vs. stdout mixing in command Low — UX/DX concern
Double get_model() call Negligible

The core architecture is solid. The f.name vs f.column issue is the most important to address before merge.

- Key by f.column (not f.name) in _expected_base_fields for correct FK field comparison
- Accept optional model arg in _expected_base_fields to avoid duplicate get_model()
- Add type-change detection: warn when a present base column field class differs from snapshot
- Route per-column output to self.stdout by having command iterate COTs directly
- Add test for type-change detection warning

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bctiemann
Copy link
Copy Markdown
Contributor Author

All review items addressed.

bctiemann and others added 2 commits April 23, 2026 19:08
- Sort _collect_base_columns return value by name to match the
  alphabetical ordering used in the backfill migration, so both
  paths produce consistent snapshots
- field_class is now always present in backfill entries (was
  conditionally absent for columns not in base_field_info); falls
  back to 'UnknownField' for any unrecognised column
- Derive id field class from CustomObject._meta.pk.__class__.__name__
  instead of hardcoding 'AutoField', handling BigAutoField installs
- Use USER_TABLE_DATABASE_NAME_PREFIX constant in migration instead of
  duplicating the 'custom_objects_' literal
- Update test to assert field_class is always present in backfilled entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@arthanson arthanson changed the base branch from 391-base-column-snapshot to feature April 30, 2026 00:35
Copy link
Copy Markdown
Contributor

@arthanson arthanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to merge feature and fix merge conflicts

bctiemann and others added 13 commits April 30, 2026 07:22
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_base_columns.py: update hardcoded migration reference from
0009_backfill_base_columns to 0010_backfill_base_columns. Migration 0009
is now alter_customobjecttype_version; the backfill was renumbered to 0010.

executor.py: Phase 1 always sets schema_document={"base_columns":[...]}
on COT creation, making it truthy. The finalisation condition
`not cot.schema_document` was meant to catch "never set", but now
incorrectly skips _update_schema_document for no-change applies on newly
created COTs. Switch to checking for the presence of schema_version so
that a COT whose schema_document was populated only by the base-column
snapshot still gets a proper executor-managed document on first apply.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CustomObject is abstract, so CustomObject._meta.pk is always None.
The fallback "AutoField" fired unconditionally, causing backfilled COTs
to record a different field_class for id than COTs created after this PR
(which correctly record "BigAutoField" via _collect_base_columns on the
concrete model). This would produce false drift alerts in Phase 2.

Derive pk_class from DEFAULT_AUTO_FIELD instead, which matches what
_collect_base_columns produces on concrete models.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bctiemann bctiemann requested a review from arthanson April 30, 2026 18:14
bctiemann and others added 2 commits April 30, 2026 15:03
_collect_base_columns stored "name": f.name, but the Phase 2 heal
function keys its expected-column dict by f.column (the actual DB
column name).  For all current base fields f.name == f.column, so
this worked silently — but any FK field added to the mixin hierarchy
(f.name='site', f.column='site_id') would cause the heal-time lookup
to miss the snapshot entry entirely.

Fix _collect_base_columns to store f.column and update the backfill
migration to:
- key base_field_info by f.column (was f.name) so the field_class
  lookup against DB-introspected column names is consistent
- build user_column_names as both name and name+'_id' so Object-type
  user fields (whose DB column is field_name+'_id') are correctly
  excluded from the base-column snapshot

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bctiemann and others added 3 commits April 30, 2026 15:06
The workaround note in heal_cot that said base_columns stores f.name
is no longer accurate after the fix to _collect_base_columns.  Update
to reflect that snapshot entries are now keyed by f.column.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves conflicts in models.py and 0010_backfill_base_columns.py by
keeping the f.column fix from 391-base-column-snapshot (the correct
version) over the pre-fix f.name version that landed in feature via
the original PR #480 merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- __init__.py: document that _heal_ran is set before the try block so
  a first-firing failure permanently disables auto-retry in the same
  process; operators can use upgrade_custom_objects manually instead
- upgrade_custom_objects: deduplicate CustomObjectType import into
  handle(); gate "no drift detected" stdout on verbosity >= 2 to avoid
  flooding output for deployments with many COTs
- test_mixin_migration: collapse dead if/else in
  test_type_change_detected_as_warning; add
  test_add_field_failure_recorded_as_warning covering the add_failed
  error path; update command tests to use verbosity=2 for no-drift
  assertions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@arthanson arthanson merged commit ef4953e into feature Apr 30, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants