fix: resolve all 8 Windows test failures — UTF-8, CRLF, FastMCP API, stop_at (#239, #241)#274
Open
azizur100389 wants to merge 1 commit intotirth8205:mainfrom
Open
Conversation
This was referenced Apr 14, 2026
eb1818a to
dea0e93
Compare
azizur100389
added a commit
to azizur100389/code-review-graph
that referenced
this pull request
Apr 14, 2026
…irth8205#258, tirth8205#259, tirth8205#260, tirth8205#261) Four reliability hardening fixes that improve crash safety, resource cleanup, and API transparency across the graph pipeline. Fix 1: wrap incremental_trace_flows DELETEs in a transaction (tirth8205#258) ------------------------------------------------------------------- File: code_review_graph/flows.py:499-502 The DELETE loop in incremental_trace_flows deleted flow_memberships and flows rows one at a time without an explicit transaction. A crash mid-loop could leave orphaned flow_memberships rows pointing at deleted flows. Fix: wrap the DELETE loop in BEGIN IMMEDIATE / COMMIT / ROLLBACK, matching the pattern already used by store_flows at line 406. Includes an in_transaction guard to flush any implicit transaction first. Test: test_incremental_trace_flows_delete_is_atomic — verifies no orphaned memberships exist after an incremental re-trace. Fix 2: atomic FTS index rebuild (tirth8205#259) -------------------------------------- File: code_review_graph/search.py:39-54 rebuild_fts_index performed DROP, CREATE, and INSERT across three separate statements with commits between them. A crash after DROP but before CREATE would leave the DB without an FTS table entirely. Fix: wrap the entire DROP + CREATE + INSERT sequence in a single BEGIN IMMEDIATE transaction so the operation is all-or-nothing. Test: test_fts_rebuild_is_atomic — verifies double-rebuild leaves the table intact and queryable. Fix 3: EmbeddingStore context manager support (tirth8205#260) ---------------------------------------------------- File: code_review_graph/embeddings.py:391+ EmbeddingStore opens a SQLite connection in __init__ but did not implement __enter__/__exit__. If an exception occurred during usage before close() was called, the connection leaked. Fix: add __enter__ and __exit__ methods following the same pattern as GraphStore (graph.py:154-157). Test: test_supports_context_manager, test_context_manager_closes_on_exception. Fix 4: truncation signal for find_dependents (tirth8205#261) ---------------------------------------------------- File: code_review_graph/incremental.py:434-465 When find_dependents hit _MAX_DEPENDENT_FILES (500), it truncated the result and logged a warning, but the caller received a plain list with no indication that it was incomplete. Callers had to guess whether they got the full dependent set or a truncated one. Fix: introduce DependentList (a transparent list subclass with a .truncated boolean attribute). Existing callers that iterate, len(), or slice continue to work unchanged; only callers that specifically check .truncated benefit from the signal. Tests: test_truncated_flag_set_when_capped (600 deps, expects True), test_truncated_flag_false_when_not_capped (small chain, expects False). Files changed ------------- - code_review_graph/flows.py — transaction wrap around DELETE loop - code_review_graph/search.py — atomic FTS rebuild - code_review_graph/embeddings.py — __enter__/__exit__ on EmbeddingStore - code_review_graph/incremental.py — DependentList with .truncated flag - tests/test_flows.py — orphan membership check - tests/test_search.py — double-rebuild atomicity test - tests/test_embeddings.py — context manager tests - tests/test_incremental.py — truncation flag tests Test results ------------ Stage 1 (new targeted tests): 6/6 passed. Stage 2 (all affected test files): 137 passed, 3 pre-existing failures (find_repo_root flakiness + UTF-8 gitignore — covered by PRs tirth8205#274/tirth8205#242). Stage 3 (full suite): 784 passed, 8 pre-existing Windows failures (all covered by parallel PRs). Stage 4 (ruff check on all 8 changed files): clean. Stage 5 (mypy on all 4 changed source files): clean. Zero regressions.
…stop_at (tirth8205#239, tirth8205#241) Five root-cause fixes that together resolve every pre-existing Windows test failure on main. Each fix has targeted regression tests; the net effect is full green CI on Windows (8 failures -> 0). Bug 1: get_data_dir() writes non-UTF-8 .gitignore on Windows (tirth8205#239) -------------------------------------------------------------------- write_text() called without encoding="utf-8". The em-dash in the header is U+2014 which Python encodes as cp1252 byte 0x97 on Windows. Any later UTF-8 read fails with UnicodeDecodeError. Fix: add encoding="utf-8" (matches sibling _ensure_repo_gitignore). Test: test_auto_gitignore_is_valid_utf8 — asserts UTF-8 byte sequence, rejects cp1252 byte. Bug 2: Databricks notebook detection fails on CRLF line endings (tirth8205#239) ---------------------------------------------------------------------- source.startswith(b"# Databricks notebook source\n") hard-codes LF. Windows git checkout (core.autocrlf=true) produces CRLF. All Databricks handling silently bypassed — 4 tests fail. Fix: parse first line robustly, strip trailing \r before exact match. Tests: test_databricks_header_crlf_line_endings, test_databricks_header_lf_line_endings_still_work, test_databricks_header_prefix_false_positive_rejected. Bug 3: Stale FastMCP API in async regression guard (tirth8205#239) --------------------------------------------------------- test_heavy_tools_are_coroutines called mcp.get_tools() which does not exist in fastmcp>=2.14.0 (pinned in pyproject.toml). The guard has been silently broken since it was written — the protection promised by PR tirth8205#231 for tirth8205#46/tirth8205#136 was never actually enforced. Fix: resolve tools via getattr(crg_main, name) like the sibling test. Drop @pytest.mark.asyncio since no event loop is needed. Test: test_regression_guard_does_not_depend_on_fastmcp_internals — AST-walks the guard source to ensure no mcp internal API references. Bug 4-5: find_repo_root walks above test sandbox (tirth8205#241) ------------------------------------------------------- test_returns_none_without_git and test_falls_back_to_start fail on any machine where tmp_path has a git-initialized ancestor (dotfiles repo at ~/.git — very common on developer machines). Fix: add optional stop_at parameter to find_repo_root() and find_project_root(). When set, the walk examines stop_at for .git and then stops. Default is None (existing walk-to-root behavior). Fully backward-compatible — all 7 production callers unchanged. Tests: test_stop_at_prevents_escape_to_outer_git, test_stop_at_finds_git_at_boundary, test_stop_at_forwarded_to_find_repo_root. Files changed ------------- - code_review_graph/incremental.py — encoding fix + stop_at API - code_review_graph/parser.py — CRLF-tolerant Databricks detection - tests/test_incremental.py — gitignore UTF-8 guard + stop_at tests - tests/test_main.py — fixed async guard + meta-guard - tests/test_notebook.py — CRLF + LF + false-positive guards
dea0e93 to
cd0ecda
Compare
azizur100389
added a commit
to azizur100389/code-review-graph
that referenced
this pull request
Apr 14, 2026
…irth8205#258, tirth8205#259, tirth8205#260, tirth8205#261) Four reliability hardening fixes that improve crash safety, resource cleanup, and API transparency across the graph pipeline. Fix 1: wrap incremental_trace_flows DELETEs in a transaction (tirth8205#258) ------------------------------------------------------------------- File: code_review_graph/flows.py:499-502 The DELETE loop in incremental_trace_flows deleted flow_memberships and flows rows one at a time without an explicit transaction. A crash mid-loop could leave orphaned flow_memberships rows pointing at deleted flows. Fix: wrap the DELETE loop in BEGIN IMMEDIATE / COMMIT / ROLLBACK, matching the pattern already used by store_flows at line 406. Includes an in_transaction guard to flush any implicit transaction first. Test: test_incremental_trace_flows_delete_is_atomic — verifies no orphaned memberships exist after an incremental re-trace. Fix 2: atomic FTS index rebuild (tirth8205#259) -------------------------------------- File: code_review_graph/search.py:39-54 rebuild_fts_index performed DROP, CREATE, and INSERT across three separate statements with commits between them. A crash after DROP but before CREATE would leave the DB without an FTS table entirely. Fix: wrap the entire DROP + CREATE + INSERT sequence in a single BEGIN IMMEDIATE transaction so the operation is all-or-nothing. Test: test_fts_rebuild_is_atomic — verifies double-rebuild leaves the table intact and queryable. Fix 3: EmbeddingStore context manager support (tirth8205#260) ---------------------------------------------------- File: code_review_graph/embeddings.py:391+ EmbeddingStore opens a SQLite connection in __init__ but did not implement __enter__/__exit__. If an exception occurred during usage before close() was called, the connection leaked. Fix: add __enter__ and __exit__ methods following the same pattern as GraphStore (graph.py:154-157). Test: test_supports_context_manager, test_context_manager_closes_on_exception. Fix 4: truncation signal for find_dependents (tirth8205#261) ---------------------------------------------------- File: code_review_graph/incremental.py:434-465 When find_dependents hit _MAX_DEPENDENT_FILES (500), it truncated the result and logged a warning, but the caller received a plain list with no indication that it was incomplete. Callers had to guess whether they got the full dependent set or a truncated one. Fix: introduce DependentList (a transparent list subclass with a .truncated boolean attribute). Existing callers that iterate, len(), or slice continue to work unchanged; only callers that specifically check .truncated benefit from the signal. Tests: test_truncated_flag_set_when_capped (600 deps, expects True), test_truncated_flag_false_when_not_capped (small chain, expects False). Files changed ------------- - code_review_graph/flows.py — transaction wrap around DELETE loop - code_review_graph/search.py — atomic FTS rebuild - code_review_graph/embeddings.py — __enter__/__exit__ on EmbeddingStore - code_review_graph/incremental.py — DependentList with .truncated flag - tests/test_flows.py — orphan membership check - tests/test_search.py — double-rebuild atomicity test - tests/test_embeddings.py — context manager tests - tests/test_incremental.py — truncation flag tests Test results ------------ Stage 1 (new targeted tests): 6/6 passed. Stage 2 (all affected test files): 137 passed, 3 pre-existing failures (find_repo_root flakiness + UTF-8 gitignore — covered by PRs tirth8205#274/tirth8205#242). Stage 3 (full suite): 784 passed, 8 pre-existing Windows failures (all covered by parallel PRs). Stage 4 (ruff check on all 8 changed files): clean. Stage 5 (mypy on all 4 changed source files): clean. Zero regressions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five root-cause fixes that together resolve every pre-existing Windows test failure on
main. Net effect: 8 failures → 0, full green Windows CI.Closes #239, closes #241.
Supersedes individual PRs #240 and #242 (which are being closed in favor of this bundled fix).
Bug 1 —
get_data_dir()writes non-UTF-8.gitignoreon Windowswrite_text()called withoutencoding="utf-8". The em-dash (U+2014) in the header encodes as cp1252 byte0x97on Windows. Any later UTF-8 read raisesUnicodeDecodeError.Fix: Add
encoding="utf-8"(matches sibling_ensure_repo_gitignore).Test:
test_auto_gitignore_is_valid_utf8— asserts UTF-8 byte sequence0xE2 0x80 0x94, rejects cp1252 byte0x97.Bug 2 — Databricks notebook detection fails on CRLF
source.startswith(b"# Databricks notebook source\n")hard-codes LF. Windowscore.autocrlf=trueproduces CRLF → 4 tests fail silently.Fix: Parse first line robustly, strip trailing
\rbefore exact match. Rejects prefix false positives.Tests:
test_databricks_header_crlf_line_endings,test_databricks_header_lf_line_endings_still_work,test_databricks_header_prefix_false_positive_rejected.Bug 3 — Stale FastMCP API in async regression guard
mcp.get_tools()doesn't exist infastmcp>=2.14.0. The guard promised by PR #231 for #46/#136 has been silently broken since it was written.Fix: Resolve tools via
getattr(crg_main, name)(mirrors sibling test). Drop@pytest.mark.asyncio.Test:
test_regression_guard_does_not_depend_on_fastmcp_internals— AST-walks guard source to reject stale mcp internal APIs.Bugs 4-5 —
find_repo_rootwalks above test sandboxtest_returns_none_without_gitandtest_falls_back_to_startfail on any machine wheretmp_pathhas a git-initialized ancestor (dotfiles at~/.git).Fix: Add optional
stop_at: Path | None = Nonetofind_repo_root()andfind_project_root(). Walk examinesstop_atfor.gitthen stops. DefaultNone= existing behavior. All 7 production callers unchanged.Tests:
test_stop_at_prevents_escape_to_outer_git,test_stop_at_finds_git_at_boundary,test_stop_at_forwarded_to_find_repo_root.Files changed (5 files, +302 / -35)
code_review_graph/incremental.py— UTF-8 encoding fix +stop_atAPI additioncode_review_graph/parser.py— CRLF-tolerant Databricks detectiontests/test_incremental.py— gitignore UTF-8 guard +stop_atregression teststests/test_main.py— fixed async guard + meta-guard against FastMCP API drifttests/test_notebook.py— CRLF + LF + false-positive regression guardsTest results
ruff checkon all 5 changed files: cleanWhy this fix is safe
==notstartswith)stop_atis backward-compatible (defaultNone= existing behavior)