Conversation
Management command to move deployments and all related data between projects. Handles all direct and indirect relationships including Events, SourceImages, Occurrences, Jobs, Detections, Classifications, Identifications, SourceImageCollections (with mixed-collection splitting), pipeline configs, processing services, and taxa M2M links. Features: - Dry-run mode by default, --execute to commit - Per-deployment before/after snapshots with row counts - Conservation checks (source + target = original) - FK integrity and indirect access validation - Shared resource handling (clone vs reassign devices, sites, S3 sources) - Raw SQL for ProcessingService M2M to avoid ORM column mismatch Co-Authored-By: Claude <noreply@anthropic.com>
Documents the full relationship map, edge cases, and validation checklist for moving deployments between projects. Co-Authored-By: Claude <noreply@anthropic.com>
The reassign_deployments command now also recalculates: - Event cached counts (captures_count, detections_count, occurrences_count) - Both source and target project related calculated fields Co-Authored-By: Claude <noreply@anthropic.com>
- Renamed from reassign_deployments to move_project_data to better communicate the scope of the operation (all occurrences, identifications, classifications, etc. — not just deployment records) - Added conditional scope warning that shows full data weight when processed data exists, or a simple note for unprocessed transfers - Added identifier membership: users who made identifications on moved data are auto-added to the target project with Identifier role - Added default filter config copy (score threshold, include/exclude taxa, default processing pipeline) Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
- Identifier users now get their existing source project role preserved (e.g. ProjectManager stays ProjectManager), falling back to Identifier role only for users who aren't source project members - TaxaLists linked to source project are now also linked to target project (M2M add, not move — TaxaLists can be shared) - Dry-run output shows TaxaLists and role assignments that will be made Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-ssec canceled.
|
✅ Deploy Preview for antenna-preview canceled.
|
📝 WalkthroughWalkthroughAdds a new Django management command Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant CMD as move_project_data\n(Command)
participant DB as Database
participant S3 as S3 (physical data)
participant PROC as ProcessingServices
CLI->>CMD: invoke (args: source, target/create, deployment_ids, flags)
CMD->>DB: validate source, deployments, collect snapshots
CMD->>DB: begin transaction
CMD->>DB: create/select target project
CMD->>DB: clone/reassign shared resources (devices, sites, storage)
CMD->>DB: update deployments.project_id and related FKs (events, images, occurrences, jobs)
CMD->>DB: handle collections (split/clone/remove images)
CMD->>DB: clone project pipeline configs & link processing services
CMD->>DB: add taxa & identifier memberships to target
CMD->>DB: commit transaction
DB-->>CMD: transaction committed
CMD->>DB: recompute cached fields, post-move validation
CMD->>CLI: report results (pass/fail, diffs, errors)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds an operator-facing management command to move one or more deployments (and their dependent data) from a source project to a target/new project, plus a written guide documenting the relationship map and validation steps for doing so safely.
Changes:
- Added
move_project_dataDjango management command implementing deployment reassignment with dry-run analysis, execution, and post-move validation. - Added planning/ops documentation describing the full relationship map, edge cases (shared resources, mixed collections), and a validation checklist.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| docs/claude/planning/deployment-reassignment-guide.md | New guide documenting deployment reassignment relationships, strategies, and validation checklist. |
| ami/main/management/commands/move_project_data.py | New management command to analyze and execute moving deployments and related data between projects. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | **S3StorageSource** | FK on Deployment (`data_source`) | Clone if `project_id` points to source project; update deployment FK | | ||
| | **Device** | FK on Deployment | Clone if `project_id` = source project; or set NULL (shared) | | ||
| | **Site** | FK on Deployment (`research_site`) | Clone if `project_id` = source project; or set NULL (shared) | |
There was a problem hiding this comment.
The guide’s reassignment strategy for S3StorageSource/Device/Site says “Clone if project_id points to source project”, but the implementation clones only when the resource is shared with other deployments and otherwise reassigns the same row to the target project. Update the guide to reflect the shared-vs-exclusive behavior to avoid operators expecting a clone in all cases.
| | **S3StorageSource** | FK on Deployment (`data_source`) | Clone if `project_id` points to source project; update deployment FK | | |
| | **Device** | FK on Deployment | Clone if `project_id` = source project; or set NULL (shared) | | |
| | **Site** | FK on Deployment (`research_site`) | Clone if `project_id` = source project; or set NULL (shared) | | |
| | **S3StorageSource** | FK on Deployment (`data_source`) | If shared with other deployments in the source project, clone and point the moved deployment to the clone; if exclusive to this deployment, reassign the existing row’s `project_id` to the target project and update the deployment FK | | |
| | **Device** | FK on Deployment | If shared with other deployments in the source project, clone for the moved deployment; if exclusive, reassign the existing row’s `project_id` to the target project. For globally shared devices or devices with no project, you may instead set the deployment’s device FK to NULL | | |
| | **Site** | FK on Deployment (`research_site`) | If shared with other deployments in the source project, clone for the moved deployment; if exclusive, reassign the existing row’s `project_id` to the target project. For shared/global sites, you may instead set the deployment’s site FK to NULL | |
| # Update both projects' related calculated fields (events + deployments) | ||
| self.log(f" Updating source project cached fields...") | ||
| source_project.update_related_calculated_fields() | ||
| self.log(f" Source project '{source_project.name}': related fields updated") | ||
|
|
||
| self.log(f" Updating target project cached fields...") | ||
| target_project.update_related_calculated_fields() | ||
| self.log(f" Target project (id={target_id}): related fields updated") | ||
|
|
There was a problem hiding this comment.
Project.update_related_calculated_fields() iterates over all events and deployments in the project and calls .update_calculated_fields(save=True) per row (N+1 writes). On large projects this can dominate runtime even when only a few deployments were moved. Consider updating only affected deployments/events (you already have deployment_ids and moved_event_pks) or adding a bulk recalculation helper for project stats.
| # Update both projects' related calculated fields (events + deployments) | |
| self.log(f" Updating source project cached fields...") | |
| source_project.update_related_calculated_fields() | |
| self.log(f" Source project '{source_project.name}': related fields updated") | |
| self.log(f" Updating target project cached fields...") | |
| target_project.update_related_calculated_fields() | |
| self.log(f" Target project (id={target_id}): related fields updated") |
| # 15. Copy default filter config to target project | ||
| if include_taxa or exclude_taxa: | ||
| for t in source_project.default_filters_include_taxa.all(): | ||
| target_project.default_filters_include_taxa.add(t) | ||
| for t in source_project.default_filters_exclude_taxa.all(): | ||
| target_project.default_filters_exclude_taxa.add(t) | ||
| self.log(" [15/16] Copied default filter taxa config") | ||
| target_project.default_filters_score_threshold = source_project.default_filters_score_threshold | ||
| if source_project.default_processing_pipeline: | ||
| target_project.default_processing_pipeline = source_project.default_processing_pipeline | ||
| target_project.save() |
There was a problem hiding this comment.
Default filter taxa copying uses .add() on the target project without clearing any existing include/exclude taxa. If the target project already has defaults configured, this results in a merged/superset config rather than a true copy from the source. If the intent is to copy/replace, clear the target M2Ms first (or explicitly document the merge behavior).
| for coll, target_count, other_count in mixed_collections: | ||
| moved_images = coll.images.filter(deployment_id__in=deployment_ids) | ||
| moved_image_ids = list(moved_images.values_list("pk", flat=True)) | ||
|
|
||
| if clone_collections: | ||
| new_coll = SourceImageCollection.objects.create( | ||
| name=coll.name, | ||
| project_id=target_id, | ||
| description=coll.description or "", | ||
| ) | ||
| new_coll.images.set(moved_image_ids) | ||
| self.log( | ||
| f" [10/12] Split collection '{coll.name}': " | ||
| f"cloned {len(moved_image_ids):,} images → target collection id={new_coll.pk}" | ||
| ) | ||
|
|
||
| coll.images.remove(*moved_image_ids) | ||
| self.log(f" [10/12] Removed {len(moved_image_ids):,} images from source collection '{coll.name}'") |
There was a problem hiding this comment.
Collection splitting materializes all moved image PKs into Python and then calls coll.images.remove(*moved_image_ids). For large moves (tens of thousands of images) this can be very slow and may exceed DB/query parameter limits. Prefer operating on the M2M through table with queryset deletes/insert-select (e.g., delete rows where sourceimage__deployment_id__in=...) and, if cloning, bulk-create through rows in batches instead of passing a huge *args list.
| cloned_count = 0 | ||
| for config in ProjectPipelineConfig.objects.filter(project_id=source_project_id): | ||
| if config.pipeline_id not in existing_pipelines: | ||
| ProjectPipelineConfig.objects.create(project_id=target_id, pipeline_id=config.pipeline_id) |
There was a problem hiding this comment.
Pipeline config cloning only copies pipeline_id and ignores enabled and config fields on ProjectPipelineConfig. This will silently change pipeline behavior in the target project (e.g., disabled pipelines become enabled; custom config lost). Clone the full config (at least enabled and config) when creating target configs.
| ProjectPipelineConfig.objects.create(project_id=target_id, pipeline_id=config.pipeline_id) | |
| ProjectPipelineConfig.objects.create( | |
| project_id=target_id, | |
| pipeline_id=config.pipeline_id, | |
| enabled=getattr(config, "enabled", True), | |
| config=getattr(config, "config", None), | |
| ) |
| dets_via_project = Detection.objects.filter(source_image__project_id=target_id).count() | ||
| dets_via_dep = Detection.objects.filter(source_image__deployment_id__in=deployment_ids).count() | ||
| if dets_via_project != dets_via_dep: | ||
| errors.append(f"Detection count mismatch: via project={dets_via_project}, via deployment={dets_via_dep}") | ||
| self.log( | ||
| f" FAIL: Detection count mismatch: via project={dets_via_project}, via deployment={dets_via_dep}" | ||
| ) | ||
| else: | ||
| self.log( | ||
| f" OK: Detections consistent ({dets_via_project:,} via project, {dets_via_dep:,} via deployment)" | ||
| ) |
There was a problem hiding this comment.
dets_via_project counts all detections in the target project, then compares to detections for only the moved deployments. If the target project already has data, this check will fail (or worse, pass/fail for the wrong reasons). Scope the “via project” counts to the moved deployments as well (and similarly for classifications/identifications) so the validation actually verifies the moved slice.
| for taxon in Taxon.objects.filter(pk__in=taxa_ids): | ||
| taxon.projects.add(target_project) |
There was a problem hiding this comment.
Linking taxa iterates and performs taxon.projects.add(target_project) per taxon. For large moves this becomes an N+1 write pattern. Prefer a bulk M2M add via the reverse relation (e.g., target_project.taxa.add(*taxa_ids)) or bulk-creating through rows in batches to reduce DB round-trips.
| for taxon in Taxon.objects.filter(pk__in=taxa_ids): | |
| taxon.projects.add(target_project) | |
| # Bulk-link taxa to avoid N+1 writes on the Taxon.projects M2M | |
| target_project.taxa.add(*taxa_ids) |
| target_project = Project(name=create_project_name, owner=source_project.owner) | ||
| target_project.save() |
There was a problem hiding this comment.
When creating a new target project, this uses Project(...) + save(), which bypasses ProjectManager.create() and its create_related_defaults() behavior (default device/site/deployment/collection/processing service). This can leave the new project missing expected defaults and behaving differently from projects created elsewhere. Prefer Project.objects.create(..., create_defaults=...) (explicitly choose True/False) to match established project-creation semantics.
| target_project = Project(name=create_project_name, owner=source_project.owner) | |
| target_project.save() | |
| target_project = Project.objects.create( | |
| name=create_project_name, | |
| owner=source_project.owner, | |
| create_defaults=True, | |
| ) |
| if cls_via_project != cls_via_dep: | ||
| errors.append( | ||
| f"Classification count mismatch: via project={cls_via_project}, via deployment={cls_via_dep}" | ||
| ) |
There was a problem hiding this comment.
On classification mismatch, an error is appended but no FAIL log line is emitted (unlike detections/identifications). This makes validation output inconsistent and can hide the specific failure in logs. Add a self.log("FAIL: ...") branch here similar to the detections check.
| ) | |
| ) | |
| self.log( | |
| f" FAIL: Classification count mismatch: via project={cls_via_project}, via deployment={cls_via_dep}" | |
| ) |
| else: | ||
| self.log(" ALL VALIDATION CHECKS PASSED", style=self.style.SUCCESS) | ||
| self.log(f"{'=' * 60}") |
There was a problem hiding this comment.
If validation fails, the command only logs errors but does not raise an exception / return a non-zero exit code. That makes failures easy to miss in automation and contradicts the “rolls back cleanly on any failure” expectation—especially since validation happens after the atomic block has committed. Consider moving validation inside the transaction (so you can raise and rollback) and/or raising CommandError at the end when errors is non-empty.
| else: | |
| self.log(" ALL VALIDATION CHECKS PASSED", style=self.style.SUCCESS) | |
| self.log(f"{'=' * 60}") | |
| self.log(f"{'=' * 60}") | |
| raise CommandError("Post-move validation failed; see log output above for details.") | |
| else: | |
| self.log(" ALL VALIDATION CHECKS PASSED", style=self.style.SUCCESS) | |
| self.log(f"{'=' * 60}") |
Design spec for making projects portable between Antenna instances. Covers UUID fields, Organization model, natural keys, export/import commands, and Darwin Core integration. Includes 21 research areas spanning internal data validation, biodiversity standards (GBIF, iNaturalist, BOLD, Camtrap DP), and patterns from non-biodiversity applications (GitLab, WordPress, Notion, Jira). Status: draft — pending research and validation. Co-Authored-By: Claude <noreply@anthropic.com>
Documents concrete use cases for each model's UUID beyond export/import: Darwin Core field mappings (occurrenceID, eventID, etc.), ML pipeline coordination (Pipeline/Algorithm slug collision risk), device tracking across institutions, and scientific reproducibility requirements. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/main/management/commands/move_project_data.py`:
- Line 344: The code is logging user.email via self.log(...) and sending it to
logger.info, which exposes PII; update the occurrences that reference user.email
(the self.log(...) call and the logger.info usage) to avoid storing emails by
either logging a non-PII identifier (e.g., user.id or user.pk), a masked email
(e.g., mask local part), or a stable hash of the email, while preserving
role_to_assign.display_name and role_source in the message; ensure both the
self.log(...) invocation and the corresponding logger.info(...) call are changed
consistently to use the non-PII value and not user.email.
- Line 333: These lines use f-strings with no interpolation (triggering F541)
and likely loop-captured variables in inner scopes (B007); replace f-string
literals like self.log(f"\n Identifiers (users with identifications on moved
data):") with plain string literals (self.log("\n Identifiers (users with
identifications on moved data):")) and similarly update the other occurrences at
the reported locations (lines showing self.log(f"...")); search for all
self.log(f"...") instances in move_project_data.py and convert those with no
placeholders to normal strings, and where B007 arises, ensure any loop variables
used inside nested functions/comprehensions are passed as defaults or copied
into a new local variable before inner usage.
- Around line 511-515: The cloned SourceImageCollection created in
SourceImageCollection.objects.create currently only copies name, project_id and
description which drops sampling metadata; update the creation to also copy the
sampling fields from the original collection by including method and kwargs
(e.g., set method=coll.method and kwargs=coll.kwargs) so the new_coll preserves
the original collection semantics when moved.
- Around line 175-187: The command currently prefers create_project when both
--create-project and --target-project are passed; update the options handling in
move_project_data (around create_project_name, target_project logic) to detect
when both options are provided and raise a CommandError (e.g., "Specify only one
of --target-project or --create-project") instead of silently preferring
create_project; keep the existing branches for the single-option cases
(resolving target_project via Project.objects.get and logging, or setting
target_project=None and create_project_name) unchanged.
- Around line 695-723: The validation wrongly compares all records in the target
project to only the moved deployments; update the *_via_project queries
(dets_via_project, cls_via_project, idents_via_project) to also filter by
deployment_ids so they count only records for the moved deployments in the
target project (e.g. add source_image__deployment_id__in=deployment_ids to
Detection/Classifications filters and
occurrence__deployment_id__in=deployment_ids to Identification filters while
keeping source_image__project_id=target_id / occurrence__project_id=target_id).
In `@docs/claude/planning/deployment-reassignment-guide.md`:
- Around line 36-37: Update the wording so it matches the actual command
behavior: change the Device and Site guidance that currently suggests setting
their FK to NULL to instead state that the command clones or reassigns Device
and Site records (do not describe NULL as an option), and update the
project-creation description to indicate the command only sets the target
project's name and owner (omit mention of description). Ensure the edits touch
the same Device/Site lines and the project creation sentences referenced (the
blocks around the Device/Site rows and the project creation paragraph) so the
doc reflects cloned/reassigned ownership and limited project fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fbde3add-8c7c-4079-bf02-a2aa73a1845a
📒 Files selected for processing (3)
ami/main/management/commands/move_project_data.pydocs/claude/planning/deployment-reassignment-guide.mddocs/claude/planning/project-portability-spec.md
| role_to_assign = Identifier | ||
| role_source = "default (not a source member)" | ||
| identifier_role_map[uid] = role_to_assign | ||
| self.log(f" {user.email}: " f"{role_to_assign.display_name} ({role_source})") |
There was a problem hiding this comment.
Avoid logging user emails to application logs.
Line 344 and Line 567 log email addresses and also send them to logger.info. This creates avoidable PII retention in centralized logs.
Proposed fix
- self.log(f" {user.email}: " f"{role_to_assign.display_name} ({role_source})")
+ self.log(f" user_id={user.pk}: {role_to_assign.display_name} ({role_source})")
...
- self.log(f" [13/16] Added {user.email}" f" as {role_cls.display_name}")
+ self.log(f" [13/16] Added user_id={user.pk} as {role_cls.display_name}")Also applies to: 567-567
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ami/main/management/commands/move_project_data.py` at line 344, The code is
logging user.email via self.log(...) and sending it to logger.info, which
exposes PII; update the occurrences that reference user.email (the self.log(...)
call and the logger.info usage) to avoid storing emails by either logging a
non-PII identifier (e.g., user.id or user.pk), a masked email (e.g., mask local
part), or a stable hash of the email, while preserving
role_to_assign.display_name and role_source in the message; ensure both the
self.log(...) invocation and the corresponding logger.info(...) call are changed
consistently to use the non-PII value and not user.email.
| | **Device** | FK on Deployment | Clone if `project_id` = source project; or set NULL (shared) | | ||
| | **Site** | FK on Deployment (`research_site`) | Clone if `project_id` = source project; or set NULL (shared) | |
There was a problem hiding this comment.
Guide should align with implemented command behavior for ownership handling.
This section currently states Device/Site can be set to NULL and that project creation includes description, but the command clones/reassigns Device/Site and creates target project with name/owner only. Please align wording to avoid operational confusion.
Also applies to: 49-50, 69-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/claude/planning/deployment-reassignment-guide.md` around lines 36 - 37,
Update the wording so it matches the actual command behavior: change the Device
and Site guidance that currently suggests setting their FK to NULL to instead
state that the command clones or reassigns Device and Site records (do not
describe NULL as an option), and update the project-creation description to
indicate the command only sets the target project's name and owner (omit mention
of description). Ensure the edits touch the same Device/Site lines and the
project creation sentences referenced (the blocks around the Device/Site rows
and the project creation paragraph) so the doc reflects cloned/reassigned
ownership and limited project fields.
Fixes from Copilot and CodeRabbit review: - Mutual exclusion check for --target-project / --create-project - Pipeline config clone preserves enabled and config fields - Collection clone preserves method and kwargs fields - Validation queries scoped to moved deployments (not entire target) - Validation failure raises CommandError (triggers transaction rollback) - Project creation uses ProjectManager (create_defaults=True) - Bulk taxa linking via target_project.taxa.add(*taxa_ids) - Remove unused Taxon import and f-prefix on plain strings Add 37 automated tests covering: - Basic move, dry run, --create-project - All 6 error handling paths - Shared resource clone-vs-reassign (Device, Site, S3StorageSource) - Collection split/reassign logic - Pipeline config cloning - ProcessingService linking - Identifier role preservation - TaxaList linking, default filter copying - Edge cases (empty deployment, move all, multiple deployments) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
ami/main/management/commands/move_project_data.py (1)
346-346:⚠️ Potential issue | 🟠 MajorAvoid logging identifier email addresses.
These two lines still emit
user.emailto both command output andlogger.info, which creates unnecessary PII retention for an admin migration. Loguser.pkor a masked address instead.Also applies to: 576-576
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/management/commands/move_project_data.py` at line 346, The log currently prints full PII via user.email in the self.log call; change it to use a non-PII identifier (e.g., user.pk) or a masked email instead of user.email. Update the occurrences where user.email is used in move_project_data.py (the self.log call that formats f"{user.email}: {role_to_assign.display_name} ({role_source})" and the similar occurrence around line 576) to log user.pk or a masked value while preserving role_to_assign.display_name and role_source.
🧹 Nitpick comments (1)
ami/main/tests/test_move_project_data.py (1)
325-362: Replacemsg=parameter withassertRaisesRegexto actually verify exception messages.The
msg=parameter inassertRaisesis only used when the assertion fails (i.e., no exception is raised). It does not verify the exception message contents. UseassertRaisesRegexinstead to match both the exception type and message text.Example fix
- with self.assertRaises(CommandError, msg="does not exist"): + with self.assertRaisesRegex(CommandError, "does not exist"): _run_command(*self._base_args(), "--target-project", "99999")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/tests/test_move_project_data.py` around lines 325 - 362, Replace uses of self.assertRaises(..., msg="...") with self.assertRaisesRegex(CommandError, r"pattern") in the tests listed (test_source_project_not_found, test_target_project_not_found, test_deployment_not_found, test_deployment_wrong_project, test_both_target_and_create, test_neither_target_nor_create) so the exception message is actually verified; for example call self.assertRaisesRegex(CommandError, r"Source project 99999 does not exist") around the _run_command in test_source_project_not_found and use appropriate regexes like r"does not exist", r"not found", r"not in source project", r"not both", and r"Must specify" in the corresponding tests when invoking _run_command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/main/management/commands/move_project_data.py`:
- Around line 593-603: The current copy step merges taxa and only conditionally
copies the pipeline, allowing stale values to persist; instead clear and replace
the target's defaults: call clear() on
target_project.default_filters_include_taxa and
target_project.default_filters_exclude_taxa, then add all taxa from
source_project.default_filters_include_taxa and
source_project.default_filters_exclude_taxa; always assign
target_project.default_filters_score_threshold =
source_project.default_filters_score_threshold and unconditionally set
target_project.default_processing_pipeline =
source_project.default_processing_pipeline (even if None) before saving the
project in move_project_data.
- Around line 143-166: The deployment ID parsing is brittle: parse
options["deployment_ids"] into a cleaned, deduplicated list (e.g., split on ',',
strip tokens, ignore empty tokens, attempt int() inside a try/except to raise
CommandError on non-integer values) and assign to a new variable like
deployment_ids_normalized (or replace deployment_ids); then use
Deployment.objects.filter(pk__in=deployment_ids_normalized) and compare
deployments.count() against len(deployment_ids_normalized) (not the raw parsed
list) so duplicates/trailing commas don't cause false failures; ensure any
ValueError from int() is converted into a CommandError with a clear message.
- Around line 173-189: After resolving target_project (after the block that sets
target_project from options or when create_project_name is None), add a
validation that rejects using the same project as both source and target: check
options.get("source_project") (or the earlier source_project variable if
present) against the resolved target_project.pk and raise CommandError("Target
project must be different from source project") if they match; ensure this check
runs only when target_project is not None (i.e., when --target-project was
provided).
- Line 411: The validation that raises CommandError must run inside the same
transaction.atomic() so failures roll back; move the validation logic currently
after the transaction.atomic() block (the block starting at the
transaction.atomic() call and ending at that block's closing) into the atomic
block before it exits—specifically relocate the validation checks (the section
that raises CommandError) into the transaction.atomic() scope in
move_project_data.py so any CommandError thrown triggers a rollback; ensure all
subsequent writes and the commit point occur only after validation completes
successfully.
---
Duplicate comments:
In `@ami/main/management/commands/move_project_data.py`:
- Line 346: The log currently prints full PII via user.email in the self.log
call; change it to use a non-PII identifier (e.g., user.pk) or a masked email
instead of user.email. Update the occurrences where user.email is used in
move_project_data.py (the self.log call that formats f"{user.email}:
{role_to_assign.display_name} ({role_source})" and the similar occurrence around
line 576) to log user.pk or a masked value while preserving
role_to_assign.display_name and role_source.
---
Nitpick comments:
In `@ami/main/tests/test_move_project_data.py`:
- Around line 325-362: Replace uses of self.assertRaises(..., msg="...") with
self.assertRaisesRegex(CommandError, r"pattern") in the tests listed
(test_source_project_not_found, test_target_project_not_found,
test_deployment_not_found, test_deployment_wrong_project,
test_both_target_and_create, test_neither_target_nor_create) so the exception
message is actually verified; for example call
self.assertRaisesRegex(CommandError, r"Source project 99999 does not exist")
around the _run_command in test_source_project_not_found and use appropriate
regexes like r"does not exist", r"not found", r"not in source project", r"not
both", and r"Must specify" in the corresponding tests when invoking
_run_command.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21aec0e3-2104-4d9f-abb2-c75687561306
📒 Files selected for processing (3)
ami/main/management/commands/move_project_data.pyami/main/tests/__init__.pyami/main/tests/test_move_project_data.py
| source_project_id = options["source_project"] | ||
| deployment_ids = [int(x.strip()) for x in options["deployment_ids"].split(",")] | ||
| execute = options["execute"] | ||
| clone_pipelines = not options["no_clone_pipelines"] | ||
| clone_collections = not options["no_clone_collections"] | ||
|
|
||
| mode = "EXECUTE" if execute else "DRY RUN" | ||
| self.log(f"\n{'=' * 60}") | ||
| self.log(f" MOVE PROJECT DATA — {mode}") | ||
| self.log(f"{'=' * 60}") | ||
|
|
||
| # --- Validate inputs --- | ||
| try: | ||
| source_project = Project.objects.get(pk=source_project_id) | ||
| except Project.DoesNotExist: | ||
| raise CommandError(f"Source project {source_project_id} does not exist") | ||
|
|
||
| self.log(f"\nSource project: {source_project.name} (id={source_project.pk})") | ||
|
|
||
| deployments = Deployment.objects.filter(pk__in=deployment_ids) | ||
| if deployments.count() != len(deployment_ids): | ||
| found = set(deployments.values_list("pk", flat=True)) | ||
| missing = set(deployment_ids) - found | ||
| raise CommandError(f"Deployments not found: {missing}") |
There was a problem hiding this comment.
Normalize --deployment-ids before the existence check.
Line 163 compares a deduplicated queryset count against the raw parsed list length, so --deployment-ids 60,60 currently fails with Deployments not found: set(). A trailing comma or non-integer token will also leak a raw ValueError instead of a CommandError.
Minimal fix
- deployment_ids = [int(x.strip()) for x in options["deployment_ids"].split(",")]
+ raw_ids = [x.strip() for x in options["deployment_ids"].split(",") if x.strip()]
+ try:
+ deployment_ids = list(dict.fromkeys(int(x) for x in raw_ids))
+ except ValueError as err:
+ raise CommandError("--deployment-ids must be a comma-separated list of integers") from err
+ if not deployment_ids:
+ raise CommandError("At least one deployment ID must be provided")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| source_project_id = options["source_project"] | |
| deployment_ids = [int(x.strip()) for x in options["deployment_ids"].split(",")] | |
| execute = options["execute"] | |
| clone_pipelines = not options["no_clone_pipelines"] | |
| clone_collections = not options["no_clone_collections"] | |
| mode = "EXECUTE" if execute else "DRY RUN" | |
| self.log(f"\n{'=' * 60}") | |
| self.log(f" MOVE PROJECT DATA — {mode}") | |
| self.log(f"{'=' * 60}") | |
| # --- Validate inputs --- | |
| try: | |
| source_project = Project.objects.get(pk=source_project_id) | |
| except Project.DoesNotExist: | |
| raise CommandError(f"Source project {source_project_id} does not exist") | |
| self.log(f"\nSource project: {source_project.name} (id={source_project.pk})") | |
| deployments = Deployment.objects.filter(pk__in=deployment_ids) | |
| if deployments.count() != len(deployment_ids): | |
| found = set(deployments.values_list("pk", flat=True)) | |
| missing = set(deployment_ids) - found | |
| raise CommandError(f"Deployments not found: {missing}") | |
| source_project_id = options["source_project"] | |
| raw_ids = [x.strip() for x in options["deployment_ids"].split(",") if x.strip()] | |
| try: | |
| deployment_ids = list(dict.fromkeys(int(x) for x in raw_ids)) | |
| except ValueError as err: | |
| raise CommandError("--deployment-ids must be a comma-separated list of integers") from err | |
| if not deployment_ids: | |
| raise CommandError("At least one deployment ID must be provided") | |
| execute = options["execute"] | |
| clone_pipelines = not options["no_clone_pipelines"] | |
| clone_collections = not options["no_clone_collections"] | |
| mode = "EXECUTE" if execute else "DRY RUN" | |
| self.log(f"\n{'=' * 60}") | |
| self.log(f" MOVE PROJECT DATA — {mode}") | |
| self.log(f"{'=' * 60}") | |
| # --- Validate inputs --- | |
| try: | |
| source_project = Project.objects.get(pk=source_project_id) | |
| except Project.DoesNotExist: | |
| raise CommandError(f"Source project {source_project_id} does not exist") | |
| self.log(f"\nSource project: {source_project.name} (id={source_project.pk})") | |
| deployments = Deployment.objects.filter(pk__in=deployment_ids) | |
| if deployments.count() != len(deployment_ids): | |
| found = set(deployments.values_list("pk", flat=True)) | |
| missing = set(deployment_ids) - found | |
| raise CommandError(f"Deployments not found: {missing}") |
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 158-158: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ami/main/management/commands/move_project_data.py` around lines 143 - 166,
The deployment ID parsing is brittle: parse options["deployment_ids"] into a
cleaned, deduplicated list (e.g., split on ',', strip tokens, ignore empty
tokens, attempt int() inside a try/except to raise CommandError on non-integer
values) and assign to a new variable like deployment_ids_normalized (or replace
deployment_ids); then use
Deployment.objects.filter(pk__in=deployment_ids_normalized) and compare
deployments.count() against len(deployment_ids_normalized) (not the raw parsed
list) so duplicates/trailing commas don't cause false failures; ensure any
ValueError from int() is converted into a CommandError with a clear message.
| # Target project resolution | ||
| create_project_name = options.get("create_project") | ||
| if create_project_name and options.get("target_project"): | ||
| raise CommandError("Use either --target-project or --create-project, not both") | ||
|
|
||
| if create_project_name: | ||
| self.log(f"Target project: NEW — '{create_project_name}'") | ||
| target_project = None | ||
| elif options.get("target_project"): | ||
| try: | ||
| target_project = Project.objects.get(pk=options["target_project"]) | ||
| except Project.DoesNotExist: | ||
| raise CommandError(f"Target project {options['target_project']} does not exist") | ||
| create_project_name = None | ||
| self.log(f"Target project: {target_project.name} (id={target_project.pk})") | ||
| else: | ||
| raise CommandError("Must specify either --target-project or --create-project") |
There was a problem hiding this comment.
Reject --target-project equal to --source-project.
When both IDs match, this command is not a no-op: the later clone/split logic can still duplicate shared devices/sites/S3 sources and split mixed collections inside the same project. Fail fast once the target is resolved.
Minimal fix
elif options.get("target_project"):
try:
target_project = Project.objects.get(pk=options["target_project"])
except Project.DoesNotExist:
raise CommandError(f"Target project {options['target_project']} does not exist")
+ if target_project.pk == source_project_id:
+ raise CommandError("--target-project must differ from --source-project")
create_project_name = None
self.log(f"Target project: {target_project.name} (id={target_project.pk})")🧰 Tools
🪛 Ruff (0.15.7)
[warning] 185-185: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ami/main/management/commands/move_project_data.py` around lines 173 - 189,
After resolving target_project (after the block that sets target_project from
options or when create_project_name is None), add a validation that rejects
using the same project as both source and target: check
options.get("source_project") (or the earlier source_project variable if
present) against the resolved target_project.pk and raise CommandError("Target
project must be different from source project") if they match; ensure this check
runs only when target_project is not None (i.e., when --target-project was
provided).
| self.log(" EXECUTING MOVE") | ||
| self.log(f"{'─' * 60}") | ||
|
|
||
| with transaction.atomic(): |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Django, does raising an exception after leaving a transaction.atomic() block roll back the writes made inside that block, or only exceptions raised before the block exits?
💡 Result:
No, raising an exception after leaving a transaction.atomic block does not roll back the writes made inside that block. The transaction is committed upon successful (normal) exit from the block, before any subsequent code executes. Only exceptions raised before or during the block's exit trigger a rollback.
Citations:
- 1: https://docs.djangoproject.com/en/4.2/topics/db/transactions
- 2: https://docs.djangoproject.com/en/6.0/topics/db/transactions/
- 3: https://docs.djangoproject.com/en/4.1/_modules/django/db/transaction
- 4: https://linkedin.com/company/block-%26-block-cpa
- 5: https://django.fun/docs/django/4.0/_modules/django/db/transaction
🏁 Script executed:
# First, let's find the file and check its size
find . -name "move_project_data.py" -type fRepository: RolnickLab/antenna
Length of output: 113
🏁 Script executed:
# Read the relevant sections to understand the transaction.atomic() block structure
# Lines around 411 to see the atomic block opening
sed -n '405,420p' ami/main/management/commands/move_project_data.pyRepository: RolnickLab/antenna
Length of output: 805
🏁 Script executed:
# Read lines 683-787 to see the validation code
sed -n '680,800p' ami/main/management/commands/move_project_data.pyRepository: RolnickLab/antenna
Length of output: 5703
🏁 Script executed:
# Find the exact closing of the atomic block by checking indentation levels
# Look for the first line after 411 that has less indentation than the block content
sed -n '411,800p' ami/main/management/commands/move_project_data.py | head -200 | tail -100Repository: RolnickLab/antenna
Length of output: 5279
🏁 Script executed:
# Let's get a better view - look at the structure around line 411 and find where atomic block closes
python3 << 'EOF'
with open('ami/main/management/commands/move_project_data.py', 'r') as f:
lines = f.readlines()
# Find line 411 and trace indentation levels
in_atomic = False
atomic_indent = None
for i in range(410, min(800, len(lines))):
line = lines[i]
# Count leading spaces
spaces = len(line) - len(line.lstrip())
# Print with line number for reference
if i >= 410 and i <= 420:
print(f"{i+1:4d} ({spaces:2d} sp): {line.rstrip()}")
if 'transaction.atomic' in line:
in_atomic = True
atomic_indent = spaces
print(f"\n>>> ATOMIC BLOCK OPENS at line {i+1} with {spaces} spaces indentation")
# Check if we've left the atomic block
if in_atomic and atomic_indent is not None and spaces <= atomic_indent and 'transaction.atomic' not in line and line.strip():
print(f"\n>>> ATOMIC BLOCK LIKELY CLOSES BEFORE line {i+1} (back to {spaces} space indent)")
break
EOFRepository: RolnickLab/antenna
Length of output: 981
🏁 Script executed:
# Let's find the exact end of the atomic block by looking for the unindent
sed -n '411,900p' ami/main/management/commands/move_project_data.py | grep -n "^[^ ]" | head -5Repository: RolnickLab/antenna
Length of output: 44
🏁 Script executed:
# Let's see exactly where the atomic block closes by looking at lines around 600-610
sed -n '598,612p' ami/main/management/commands/move_project_data.pyRepository: RolnickLab/antenna
Length of output: 910
🏁 Script executed:
# Now let's see where the validation code (lines 683-787) sits
# Check the indentation of the validation code
sed -n '680,690p' ami/main/management/commands/move_project_data.pyRepository: RolnickLab/antenna
Length of output: 441
🏁 Script executed:
# Get a clearer picture of the structure
python3 << 'EOF'
with open('ami/main/management/commands/move_project_data.py', 'r') as f:
lines = f.readlines()
# Track the atomic block
atomic_start = None
atomic_end = None
validation_start = None
for i in range(410, min(800, len(lines))):
line = lines[i]
spaces = len(line) - len(line.lstrip())
if 'with transaction.atomic' in line:
atomic_start = i + 1
print(f"Line {i+1}: Atomic block OPENS (indent: {spaces})")
if atomic_start and spaces <= 8 and 'with transaction.atomic' not in line and line.strip() and i > 411:
if atomic_end is None:
atomic_end = i + 1
print(f"Line {i+1}: Atomic block CLOSES (back to indent: {spaces})")
print(f" Content: {line.rstrip()}")
if 'VALIDATION' in line and 'Post-move' not in line:
if validation_start is None:
validation_start = i + 1
print(f"Line {i+1}: Validation section starts (indent: {spaces})")
print(f" Content: {line.rstrip()}")
print(f"\nSummary:")
print(f" Atomic block: lines {atomic_start} to {atomic_end if atomic_end else '?'}")
print(f" Validation: line {validation_start}")
print(f" Validation INSIDE atomic: {atomic_start <= validation_start < atomic_end if atomic_end else 'UNKNOWN'}")
EOFRepository: RolnickLab/antenna
Length of output: 445
🏁 Script executed:
# Let's find the exact CommandError raise in the validation section
sed -n '685,800p' ami/main/management/commands/move_project_data.py | grep -n "raise CommandError"Repository: RolnickLab/antenna
Length of output: 162
🏁 Script executed:
# Get the context around the CommandError raise
sed -n '750,790p' ami/main/management/commands/move_project_data.pyRepository: RolnickLab/antenna
Length of output: 2076
Validation must occur before the transaction commits.
The transaction.atomic() block (lines 411–606) closes before the validation section (lines 685–787). If validation fails and raises CommandError, the exception occurs outside the atomic block where writes are already committed. According to Django's transaction semantics, only exceptions raised during or before the block's normal exit trigger a rollback. This breaks the all-or-nothing safety guarantee—the move persists even if validation detects integrity failures.
Move the validation logic inside the atomic block (before line 606) so that validation failures can trigger a rollback.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ami/main/management/commands/move_project_data.py` at line 411, The
validation that raises CommandError must run inside the same
transaction.atomic() so failures roll back; move the validation logic currently
after the transaction.atomic() block (the block starting at the
transaction.atomic() call and ending at that block's closing) into the atomic
block before it exits—specifically relocate the validation checks (the section
that raises CommandError) into the transaction.atomic() scope in
move_project_data.py so any CommandError thrown triggers a rollback; ensure all
subsequent writes and the commit point occur only after validation completes
successfully.
| # 15. Copy default filter config to target project | ||
| if include_taxa or exclude_taxa: | ||
| for t in source_project.default_filters_include_taxa.all(): | ||
| target_project.default_filters_include_taxa.add(t) | ||
| for t in source_project.default_filters_exclude_taxa.all(): | ||
| target_project.default_filters_exclude_taxa.add(t) | ||
| self.log(" [15/16] Copied default filter taxa config") | ||
| target_project.default_filters_score_threshold = source_project.default_filters_score_threshold | ||
| if source_project.default_processing_pipeline: | ||
| target_project.default_processing_pipeline = source_project.default_processing_pipeline | ||
| target_project.save() |
There was a problem hiding this comment.
Replace the target filter defaults instead of merging them.
This block only appends source include/exclude taxa to whatever the target already has, and Line 601 only overwrites default_processing_pipeline when the source has one. For an existing target project, stale filters or a stale default pipeline can survive even though this step is supposed to copy the source defaults.
Minimal fix
- if include_taxa or exclude_taxa:
- for t in source_project.default_filters_include_taxa.all():
- target_project.default_filters_include_taxa.add(t)
- for t in source_project.default_filters_exclude_taxa.all():
- target_project.default_filters_exclude_taxa.add(t)
- self.log(" [15/16] Copied default filter taxa config")
+ target_project.default_filters_include_taxa.set(
+ source_project.default_filters_include_taxa.all()
+ )
+ target_project.default_filters_exclude_taxa.set(
+ source_project.default_filters_exclude_taxa.all()
+ )
+ self.log(" [15/16] Copied default filter taxa config")
target_project.default_filters_score_threshold = source_project.default_filters_score_threshold
- if source_project.default_processing_pipeline:
- target_project.default_processing_pipeline = source_project.default_processing_pipeline
+ target_project.default_processing_pipeline = source_project.default_processing_pipeline
target_project.save()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ami/main/management/commands/move_project_data.py` around lines 593 - 603,
The current copy step merges taxa and only conditionally copies the pipeline,
allowing stale values to persist; instead clear and replace the target's
defaults: call clear() on target_project.default_filters_include_taxa and
target_project.default_filters_exclude_taxa, then add all taxa from
source_project.default_filters_include_taxa and
source_project.default_filters_exclude_taxa; always assign
target_project.default_filters_score_threshold =
source_project.default_filters_score_threshold and unconditionally set
target_project.default_processing_pipeline =
source_project.default_processing_pipeline (even if None) before saving the
project in move_project_data.
Summary
Management command to move deployments and all associated data from one project to another. Motivated by a request to move northern Quebec stations (Umiujaq, Kuujjuaq, Kangiqsujuaq, Inukjuak) from "Insectarium de Montreal" into a new "Nunavik" project.
This is a comprehensive data transfer that handles 16 categories of related data:
Usage
```bash
Dry run (default) — shows full before/after analysis
python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284
Execute
python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284 --execute
```
Features
Fixes from review
--target-project/--create-projectenabledandconfigfieldsmethodandkwargsfieldsCommandError(rolls back transaction)Project.objects.create(create_defaults=True)via ProjectManagertarget_project.taxa.add(*taxa_ids)Taxonimport and f-prefix on strings without placeholdersTesting & validation
Automated tests (37 tests)
Test file:
ami/main/tests/test_move_project_data.py```bash
docker compose -f docker-compose.ci.yml run --rm django python manage.py test ami.main.tests.test_move_project_data --keepdb -v2
```
TestMoveToExistingProjectTestDryRun--executeTestCreateProject--create-projectflag, member copyingTestErrorHandlingTestSharedResourceCloningTestCollectionHandling--no-clone-collectionsTestPipelineConfigCloning--no-clone-pipelinesTestProcessingServiceLinkingTestIdentifierRolePreservationTestTaxaListLinkingTestDefaultFilterCopyingTestEdgeCasesManual validation (pre-production)
Before running on production data, validate on a staging DB or local backup:
Dry run first — review the full output, check per-deployment counts match expectations
```bash
python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284
```
Execute — review the 17-check validation output at the end
```bash
python manage.py move_project_data --source-project 23 --create-project "Nunavik" --deployment-ids 84,163,284 --execute
```
Post-move spot checks in Django admin or shell:
If validation fails, the transaction is rolled back automatically and a
CommandErroris raised with details. No manual cleanup needed.Remaining nits
[12/12]to[13/16]— cosmetic only, reflects that steps 13-16 (identifiers, TaxaLists, default filters) were added after the original 12-step planTest plan
Summary by CodeRabbit
New Features
Documentation
Tests