fix: resolve CritPt benchmark config interpolation and add critpt_agent README#1642
Open
linj-glitch wants to merge 1 commit into
Open
fix: resolve CritPt benchmark config interpolation and add critpt_agent README#1642linj-glitch wants to merge 1 commit into
linj-glitch wants to merge 1 commit into
Conversation
… README Two pre-existing failures on main (from CritPt #1588) surfaced in CI once this branch was merged with main: - resources_servers/critpt/configs/critpt.yaml referenced ${artificial_analysis_api_key} with no definition or default, so list_benchmarks() failed to resolve it (test_benchmarks.py::test_lists_found_benchmarks), which also dropped unit-test coverage below the 96% gate. Use ${oc.select:artificial_analysis_api_key,null} so it resolves to null when unset, matching the pattern used by sibling configs. - responses_api_agents/critpt_agent had no README.md, so ng_test_all counted it as a candidate module but not a testable one, failing the total-vs-tested assertion on every server-suite shard. Add a README describing the two-turn agent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Lin Jia <linj@nvidia.com>
fsiino-nvidia
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two CI breakages on
mainintroduced by the CritPt benchmark (#1588). Both fail unit/server CI for any PR once it is merged with currentmain:resources_servers/critpt/configs/critpt.yamlreferenced${artificial_analysis_api_key}with no definition or default.list_benchmarks()(chained viabenchmarks/critpt/config.yaml) cannot resolve it, failingtests/unit_tests/test_benchmarks.py::TestListBenchmarks::test_lists_found_benchmarksand dropping unit-test coverage below the 96% gate (the errored test stops coveringbenchmarks.py). Changed to${oc.select:artificial_analysis_api_key,null}so it resolves tonullwhen unset — matching the default-bearing pattern used by sibling configs (e.g.labbench2_vlm's${oc.select:judge_api_key,unset},frontierscience_judge's${oc.env:NVIDIA_API_KEY,null}). The override key name is unchanged, so any existingartificial_analysis_api_keyconfig/env override still applies.responses_api_agents/critpt_agentshipped without aREADME.md.ng_test_allcounts it as a candidate module but not a testable one, failing the total-vs-tested assertion (Mismatch on the number of total modules found (120) ... (119)) on every server-suite shard. Added aREADME.mddescribing the two-turn agent.Testing
pytest tests/unit_tests/→ 290 passed; coverage 96.20% (was 95.87%)critptserver test is unaffected (it constructs its config directly with a dummy key and never resolves the YAML interpolation)🤖 Generated with Claude Code