Skip to content

[fix][evaluation]: retry-running-lock #539

Open
xueyizheng wants to merge 4 commits into
mainfrom
fix/expt-retry-running-lock-value
Open

[fix][evaluation]: retry-running-lock #539
xueyizheng wants to merge 4 commits into
mainfrom
fix/expt-retry-running-lock-value

Conversation

@xueyizheng

Copy link
Copy Markdown
Collaborator

What type of PR is this?

Check the PR title

  • This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Add documentation if the current PR requires user awareness at the usage level.
  • This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

… items

Without this, retrying a single item on a Running experiment quietly fails:
the item id is appended to expt_run_log.item_ids but no ExptScheduleEvent is
published, so the scheduler never seeds expt_item_run_logs / resets turn
results. The retry "succeeds" at the API layer while the item stays in its
old (failed) state. Calling RetryItems with the same runID is safe and
idempotent — the event carries only the user's requested itemIDs, and
resetEvalItems uses a fresh schedule scope per (exptID, exptRunID).
…e-reset

ExptRetryItemsExec.ExptStart's idem key was (exptID, exptRunID) only. Once
the first retry on a run wrote the key, every subsequent retry on the same
run short-circuited at the idem.Exist check, skipping resetEvalItems
entirely. The user-facing effect was that an item appended to a Running
experiment got its row into expt_run_log.item_ids but its expt_run_id /
expt_item_result_run_log / turn_result.status were never updated — UI stayed
on the stale state.

Switch the retry-items idem key to (exptID, exptRunID, sha1(sorted itemIDs)).
MQ redelivery protection is preserved because identical event payloads
produce identical hashes; the only relaxation is letting genuinely-new item
sets through. Other schedule modes (Submit / FailRetry / Append / RetryAll)
keep using makeStartIdemKey, so their behavior is unchanged.
@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #539   +/-   ##
=======================================
  Coverage   77.41%   77.41%           
=======================================
  Files         662      662           
  Lines       74864    74871    +7     
=======================================
+ Hits        57956    57965    +9     
+ Misses      13487    13486    -1     
+ Partials     3421     3420    -1     
Flag Coverage Δ
unittests 77.41% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...d/modules/evaluation/application/experiment_app.go 84.61% <100.00%> (-0.02%) ⬇️
...ation/domain/service/expt_manage_execution_impl.go 79.97% <100.00%> (ø)
...ion/domain/service/expt_run_scheduler_mode_impl.go 87.22% <100.00%> (+0.09%) ⬆️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8b2a6f...b8b09e3. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xueyizheng xueyizheng changed the title fix(evaluation): retry-running-lock [fix][evaluation]: retry-running-lock Jun 2, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant