Skip to content

CI: audit minimal test lane and split out GFQL-heavy coverage #944

@lmeyerov

Description

@lmeyerov

Summary

test-minimal-python has drifted away from being a fast-fail smoke lane. On the current green #943 rerun, test-minimal-python (3.14) finished right at the old wall-clock limit:

  • job start: 2026-03-13T08:55:12Z
  • job end: 2026-03-13T09:01:12Z
  • pytest phase alone: 301.06s (2319 passed, 370 skipped, 5 xfailed)
  • non-pytest overhead: roughly 59s (git lfs pull, venv/pip install, ./docker/test-pip-install.sh, cleanup)

We increased the job timeout from 6 to 8 minutes in #943 to remove the immediate CI flake, but this should be treated as a budgeting fix, not the final answer.

Current suite composition

Collected via the current ./bin/test-minimal.sh pytest arguments:

  • total collected tests: 2683
  • GFQL-related tests (broad count): 1226 (45.7%)

Broad GFQL count includes:

  • graphistry/tests/compute/gfql/**
  • graphistry/tests/compute/test_gfql*.py
  • graphistry/tests/test_gfql_*.py
  • tests/gfql/**

Largest contributors in the current minimal shard:

  • graphistry/tests/compute/gfql/cypher/test_lowering.py: 355
  • graphistry/tests/compute/gfql/test_row_pipeline_ops.py: 150
  • tests/gfql/ref/test_df_executor_patterns.py: 100
  • tests/gfql/ref/test_chain_optimizations.py: 82
  • tests/gfql/ref/test_df_executor_core.py: 80
  • tests/gfql/ref/test_df_executor_amplify.py: 57
  • tests/gfql/ref/test_df_executor_dimension.py: 52

Why this matters

test-minimal-python is supposed to be the early, cheap confidence lane. Right now it mixes:

  • install / import smoke
  • broad compute coverage
  • a large amount of GFQL lowering / row-pipeline / reference-executor coverage

That makes the shard slower to fail and harder to keep within a small CI budget, especially on slower hosted-runner / interpreter combinations like Python 3.14.

Proposed audit

  1. Inventory the current minimal suite by domain and runtime cost.
  2. Define what must stay in test-minimal-python as true fast-fail smoke coverage.
  3. Evaluate splitting out a GFQL-heavy shard, for example:
    • keep core package/install/basic compute smoke in test-minimal-python
    • move the heavier GFQL reference / lowering / row-pipeline coverage into a dedicated job
  4. Preserve signal quality:
    • minimal should stay the first red lane for packaging/basic runtime regressions
    • GFQL should still have strong coverage, just not necessarily inside the same fast-fail shard
  5. Set a target wall-clock budget for test-minimal-python (for example, comfortably under 5 minutes on the slowest supported interpreter)

Success criteria

  • test-minimal-python is meaningfully faster and more stable as an early CI gate
  • GFQL coverage remains intact, either in the slimmed minimal lane or a dedicated sibling lane
  • Python 3.14 no longer sits on the timeout edge for the minimal shard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions