Skip to content

fix(storage): prevent history wipe on transient dataStore errors#1052

Open
ifoche wants to merge 1 commit into
developmentfrom
fix/history-wipe-on-storage-error
Open

fix(storage): prevent history wipe on transient dataStore errors#1052
ifoche wants to merge 1 commit into
developmentfrom
fix/history-wipe-on-storage-error

Conversation

@ifoche

@ifoche ifoche commented Mar 4, 2026

Copy link
Copy Markdown
Member

Summary

  • Fix StorageDataStoreClient.getObject() silently returning undefined on all errors (network, 5xx, timeout), which was indistinguishable from "key not found" (404)
  • This caused sync history to be wiped whenever a transient error occurred during a saveObjectInCollection call — the existing history was treated as empty and overwritten with just the new entry
  • Now getObject returns undefined only for 404 and re-throws on transient errors. Same fix applied to getMetadataByKey()
  • StorageConstantClient was verified to not have this bug (it uses the models API which returns empty arrays, not 404s)
  • Adds 6 unit tests for the error handling behavior

Related Tasks

Test plan

  • Run yarn test — all 394 tests pass (48 files)
  • Run yarn build core-app — builds cleanly
  • Manual: trigger a sync while the DHIS2 server is under load; verify history is preserved even if some requests are slow
  • Manual: verify that a genuinely missing dataStore key still returns undefined (e.g., first-time app setup works correctly)
  • Manual: verify sync reports are saved correctly after a successful sync

🤖 Generated with Claude Code

StorageDataStoreClient.getObject() was catching ALL errors and returning
undefined, making network failures indistinguishable from missing keys.
Collection mutation methods (saveObjectInCollection, etc.) treated
undefined as "empty collection" via ?? [], then overwrote the dataStore
key — wiping sync history on any transient error.

Now getObject returns undefined only for 404 (key not found) and
re-throws on network errors, 5xx, and timeouts. Also applies the same
fix to getMetadataByKey. Adds 6 unit tests.

StorageConstantClient was verified to not have this bug — it queries
constants via the models API which returns empty arrays (not 404s) for
missing entries.

Closes: https://app.clickup.com/t/869c32tg2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bundlemon

bundlemon Bot commented Mar 4, 2026

Copy link
Copy Markdown

BundleMon

No change in files bundle size

Groups updated (1)
Status Path Size Limits
Build Folder
./**/*
2.78MB (-52B 0%) +20%

Final result: ✅

View report in BundleMon website ➡️


Current branch size history | Target branch size history

@adrianq adrianq assigned xurxodev and unassigned xurxodev Jun 22, 2026
@adrianq adrianq requested a review from xurxodev June 23, 2026 08:35

@xurxodev xurxodev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ifoche

This fixes a real and well-diagnosed bug: StorageDataStoreClient.getObject() was treating any error (network, 5xx, timeout) the same as a 404, causing collection mutation methods (saveObjectInCollection, removeObjectsInCollection, etc.) to overwrite the dataStore with an empty collection on a transient failure.

The fix itself in StorageDataStoreClient.ts is correct and well tested (404 vs 500 vs network error vs timeout are all covered, including the case where error.response is undefined). However, the fix doesn't fully close the loop for the use case that motivated it, the PR's own checklist (tasks.md) is left incomplete, and several callers downstream now need to handle a new failure mode that didn't exist before.

1. Should fix

  • src/data/reports/ReportsD2ApiRepository.ts:44-58, 69-89save(), delete(), and deleteByIds() wrap the calls to saveObjectInCollection/removeObjectInCollection/removeObjectsInCollection in a try/catch that does console.error(error) and swallows it without rethrowing. These are exactly the methods behind reports.save() on every sync and useDeleteHistory on app load — the flows named in proposal.md as the cause of the history wipe. With this catch in place, the error now correctly thrown by getObject never reaches the caller: the storage layer is fixed, but the history use case still silently "succeeds" even when the report wasn't actually saved. This needs at least an explicit decision (rethrow and let the caller surface it, or report the failure some other way) — right now the fix is not observable for the case it was written for.

  • openspec/changes/fix-history-wipe-on-storage-error/tasks.md:7 — task 2.1 ("apply the same fix to StorageConstantClient") is unchecked and not implemented; StorageConstantClient.ts has zero diff. But spec.md declares as a MUST requirement that "Error handling SHALL be consistent across storage backends." Concretely, StorageConstantClient.clearStorage() still has the same silent catch-all (catch (error) { console.error(error); }) that this PR removes from the other backend. As written, the PR doesn't satisfy its own spec, and the original bug can still occur on instances configured with theconstant storage backend.

  • StartApplicationUseCaselistObjectsInCollection(Namespace.INSTANCES)getObject) with no try/catch anywhere in that chain. Before this change, a
    transient network error here resolved silently to an empty/default state; now it throws. Since nothing catches it, appContext is never set and the app is stuck on an infinite loading state with no error message on app startup. This is a direct, app-wide side effect of widening getObject's contract from "never throws" to "can throw," and it isn't mentioned in the design doc's risk section.

  • Test doubles don't follow the project's established pattern. The rest of the codebase builds test doubles with ts-mockito (mock/when/instance) plus givenA... helper functions that keep construction out of the test body (see src/domain/metadata/builders/__tests__/MetadataPayloadBuilder.spec.tsx). The new src/data/storage/__tests__/StorageDataStoreClient.spec.ts instead hand-rolls mocks with vi.fn() and reimplements CancelableResponse's internal shape via local buildCancelableResponse/buildCancelableRejectionhelpers (lines 29-40), coupling the test to an SDK abstraction it doesn't even need to exercise (only .getData() is used). Worth aligning with the existing convention for consistency and to avoid breaking this test on unrelated CancelableResponse signature changes.

2. Recommendations non blocking

  • StorageDataStoreClient.clone() iterates all keys calling getObject with no local try/catch. Previously a transient error on one key degraded to undefined and the backup/export continued; now a single transient failure aborts the entire clone(). This may be the right behavior (avoid partial/corrupt backups) but it's a real behavior change that isn't covered by a test or mentioned in design.md — worth a deliberate test or at least a note.

  • Several use cases that call StorageClient collection methods directly have no error handling (CreatePullRequestUseCase, SetResponsiblesUseCase, rule save/list flows in SummaryStep.tsx and SyncRulesListPage.tsx). None of these crash visually (no Error Boundary triggers, no render-time null deref found), but several leave UI state stuck — e.g. SummaryStep.save() leaves isSaving true forever if rules.save() throws, with no error feedback to the user. Not blocking for this PR's scope, but worth a follow-up pass now that getObject can throw where it previously couldn't.

  • Test coverage gives a false sense of safety for the motivating bug. The saveObjectInCollection propagation test verifies behavior at the StorageClient level, but since ReportsD2ApiRepository.save() wraps that same call in its own swallowing try/catch (see Should-fix #1), there's no test exercising the actual history use case end-to-end that would catch a regression of the original wipe bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants