fix: resolve CritPt benchmark config interpolation and add critpt_agent README by linj-glitch · Pull Request #1642 · NVIDIA-NeMo/Gym

linj-glitch · 2026-06-18T22:35:46Z

Summary

Fixes two CI breakages on main introduced by the CritPt benchmark (#1588). Both fail unit/server CI for any PR once it is merged with current main:

resources_servers/critpt/configs/critpt.yaml referenced ${artificial_analysis_api_key} with no definition or default. list_benchmarks() (chained via benchmarks/critpt/config.yaml) cannot resolve it, failing tests/unit_tests/test_benchmarks.py::TestListBenchmarks::test_lists_found_benchmarks and dropping unit-test coverage below the 96% gate (the errored test stops covering benchmarks.py). Changed to ${oc.select:artificial_analysis_api_key,null} so it resolves to null when unset — matching the default-bearing pattern used by sibling configs (e.g. labbench2_vlm's ${oc.select:judge_api_key,unset}, frontierscience_judge's ${oc.env:NVIDIA_API_KEY,null}). The override key name is unchanged, so any existing artificial_analysis_api_key config/env override still applies.
responses_api_agents/critpt_agent shipped without a README.md. ng_test_all counts it as a candidate module but not a testable one, failing the total-vs-tested assertion (Mismatch on the number of total modules found (120) ... (119)) on every server-suite shard. Added a README.md describing the two-turn agent.

Testing

pytest tests/unit_tests/ → 290 passed; coverage 96.20% (was 95.87%)
Module discovery: 120 candidates == 120 testable
critpt server test is unaffected (it constructs its config directly with a dummy key and never resolves the YAML interpolation)

🤖 Generated with Claude Code

… README Two pre-existing failures on main (from CritPt #1588) surfaced in CI once this branch was merged with main: - resources_servers/critpt/configs/critpt.yaml referenced ${artificial_analysis_api_key} with no definition or default, so list_benchmarks() failed to resolve it (test_benchmarks.py::test_lists_found_benchmarks), which also dropped unit-test coverage below the 96% gate. Use ${oc.select:artificial_analysis_api_key,null} so it resolves to null when unset, matching the pattern used by sibling configs. - responses_api_agents/critpt_agent had no README.md, so ng_test_all counted it as a candidate module but not a testable one, failing the total-vs-tested assertion on every server-suite shard. Add a README describing the two-turn agent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Lin Jia <linj@nvidia.com>

copy-pr-bot · 2026-06-18T22:35:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

linj-glitch requested review from ffrujeri, fsiino-nvidia and gwarmstrong June 18, 2026 22:40

fsiino-nvidia approved these changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve CritPt benchmark config interpolation and add critpt_agent README#1642

fix: resolve CritPt benchmark config interpolation and add critpt_agent README#1642
linj-glitch wants to merge 1 commit into
mainfrom
linj/fix-critpt-ci

linj-glitch commented Jun 18, 2026

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

linj-glitch commented Jun 18, 2026

Summary

Testing

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants