Skip to content

Spike: intermittent flake in proxy-server POST /gremlin "sends query as JSON with gremlin key" test #1853

@kmcginnes

Description

@kmcginnes

Summary

packages/graph-explorer-proxy-server/src/app.test.ts > createApp > POST /gremlin > sends query as JSON with gremlin key failed once during a full pnpm test run (1 failed | 1807 passed), then could not be reproduced across 35+ subsequent runs (12× the file in isolation, 3× full suite, 20× under deliberate CPU contention from a parallel graph-explorer run). This spike is to reproduce it reliably, confirm the root cause, and decide on a fix — not to patch blind.

Why it is a flake, not a regression

The package depends only on @graph-explorer/shared and runs as its own threaded vitest project; nothing in its src/ imports the graph-explorer app package. It was first observed while reviewing an unrelated graph-explorer persistence change (#1831), which has no causal path to this test. The failure pre-dates and is independent of that work.

Leading hypothesis (unconfirmed)

The /gremlin handler (app.ts:374-433) registers req.on("close") / res.on("close") listeners that call cancelQuery() — which issues an additional mockFetch call — when the connection closes before req.complete / res.writableFinished. The response body is streamed via pipeline() (app.ts:264), which settles asynchronously after the handler returns and possibly after supertest has resolved. Combined with a module-shared mockFetch, mockReset() in beforeEach, and restoreMocks: true, there is a window where a late close-event mockFetch from one test can bleed into another test's expectations.

Caveat: this best explains a stray calls[1], whereas the observed failure asserted on calls[0][1].body. The mechanism is plausible but not proven — hence a spike.

Suggested approach

  1. Reproduce reliably — try vitest --no-isolate, --sequence.seed, tighter stress loops, and CI hardware; instrument the close/cancel path and the mock call log.
  2. Confirm the exact mechanism (close-event race vs. shared-mock teardown vs. something else).
  3. Fix with certainty. Likely candidates regardless of exact trigger:
    • Per-test mock instance instead of a module-shared mockFetch.
    • Assert on the specific fetch call (URL/method) rather than positional calls[0].
    • Ensure the handler's streaming/close path has settled before assertions run.

Acceptance criteria

  • The flake is reproduced deterministically (or a documented, time-boxed conclusion that it cannot be, with the stress methodology recorded)
  • Root cause confirmed
  • Fix applied with a regression guard, or a tracked decision to accept with rationale

Important

Internal only — this issue is maintained by the core team and is not accepting external contributions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    internalSignals that the team will work on this issue internally.needs-triageMaintainer needs to evaluatetech debtIssues, typically tasks, that are mainly about cleaning up code that is problematic in some way

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions