Skip to content

fix: resolve CritPt benchmark config interpolation and add critpt_agent README#1642

Open
linj-glitch wants to merge 1 commit into
mainfrom
linj/fix-critpt-ci
Open

fix: resolve CritPt benchmark config interpolation and add critpt_agent README#1642
linj-glitch wants to merge 1 commit into
mainfrom
linj/fix-critpt-ci

Conversation

@linj-glitch

Copy link
Copy Markdown
Collaborator

Summary

Fixes two CI breakages on main introduced by the CritPt benchmark (#1588). Both fail unit/server CI for any PR once it is merged with current main:

  1. resources_servers/critpt/configs/critpt.yaml referenced ${artificial_analysis_api_key} with no definition or default. list_benchmarks() (chained via benchmarks/critpt/config.yaml) cannot resolve it, failing tests/unit_tests/test_benchmarks.py::TestListBenchmarks::test_lists_found_benchmarks and dropping unit-test coverage below the 96% gate (the errored test stops covering benchmarks.py). Changed to ${oc.select:artificial_analysis_api_key,null} so it resolves to null when unset — matching the default-bearing pattern used by sibling configs (e.g. labbench2_vlm's ${oc.select:judge_api_key,unset}, frontierscience_judge's ${oc.env:NVIDIA_API_KEY,null}). The override key name is unchanged, so any existing artificial_analysis_api_key config/env override still applies.

  2. responses_api_agents/critpt_agent shipped without a README.md. ng_test_all counts it as a candidate module but not a testable one, failing the total-vs-tested assertion (Mismatch on the number of total modules found (120) ... (119)) on every server-suite shard. Added a README.md describing the two-turn agent.

Testing

  • pytest tests/unit_tests/ → 290 passed; coverage 96.20% (was 95.87%)
  • Module discovery: 120 candidates == 120 testable
  • critpt server test is unaffected (it constructs its config directly with a dummy key and never resolves the YAML interpolation)

🤖 Generated with Claude Code

… README

Two pre-existing failures on main (from CritPt #1588) surfaced in CI once this
branch was merged with main:

- resources_servers/critpt/configs/critpt.yaml referenced ${artificial_analysis_api_key}
  with no definition or default, so list_benchmarks() failed to resolve it
  (test_benchmarks.py::test_lists_found_benchmarks), which also dropped unit-test
  coverage below the 96% gate. Use ${oc.select:artificial_analysis_api_key,null}
  so it resolves to null when unset, matching the pattern used by sibling configs.
- responses_api_agents/critpt_agent had no README.md, so ng_test_all counted it as a
  candidate module but not a testable one, failing the total-vs-tested assertion on
  every server-suite shard. Add a README describing the two-turn agent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Lin Jia <linj@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants