Skip to content

BOT: Fix #1022: allow pairwise comparisons with two models#1075

Closed
nikosbosse wants to merge 1 commit intomainfrom
fix/1022-pairwise-two-models
Closed

BOT: Fix #1022: allow pairwise comparisons with two models#1075
nikosbosse wants to merge 1 commit intomainfrom
fix/1022-pairwise-two-models

Conversation

@nikosbosse
Copy link
Copy Markdown
Collaborator

Summary

  • Fixes Let get_pairwise_comparison() run with two models? #1022: get_pairwise_comparisons() now warns instead of erroring when there are exactly 2 models with one as the baseline
  • Changed cli_abort() to cli_warn() for the two-model-with-baseline case, allowing the function to proceed and compute the single score ratio
  • Fixed the "compairisons" typo in the warning message
  • The single-model case (truly insufficient comparators) still errors via pairwise_comparison_one_group()

Root cause

The check length(setdiff(comparators, baseline)) < 2 at line 161 of R/pairwise-comparisons.R called cli_abort(), blocking the legitimate use case of comparing 2 models where one is the baseline. While the pairwise comparison is just a single ratio in this case, it's still useful output.

What the fix does

  • Adds !is.null(baseline) guard so the check only applies when a baseline is specified
  • Changes cli_abort() to cli_warn() so the function proceeds with a warning
  • Improves the warning message to be more informative

Test coverage added

  • get_pairwise_comparisons() warns but works with two models and a baseline
  • add_relative_skill() warns but works with two models and a baseline
  • Two-model ratio is mathematically correct (deterministic test with known values)
  • Single-model case still errors (regression test)
  • Two models without baseline still works without warning (regression test)
  • Updated existing expect_error to expect_warning for the two-model-with-baseline case

Test plan

  • New tests pass
  • Full test suite passes (695 tests, 0 failures)
  • R CMD check: 0 errors, 0 warnings, 2 pre-existing notes

🤖 Generated with Claude Code

…aseline

The check at line 161 of pairwise-comparisons.R called cli_abort() when
fewer than 2 non-baseline models existed, blocking the legitimate case of
comparing exactly 2 models where one is the baseline. Changed to cli_warn()
so the function proceeds with a warning instead of erroring. Also fixed the
"compairisons" typo and improved the warning message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.83%. Comparing base (ac0c01a) to head (92468f7).
⚠️ Report is 93 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1075   +/-   ##
=======================================
  Coverage   97.83%   97.83%           
=======================================
  Files          35       35           
  Lines        1845     1845           
=======================================
  Hits         1805     1805           
  Misses         40       40           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nikosbosse added a commit that referenced this pull request Feb 13, 2026
Copy link
Copy Markdown
Collaborator Author

@nikosbosse nikosbosse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLAUDE: Approving (posted as comment since self-approval is blocked). Clean, correct fix. The !is.null(baseline) guard is well-considered — it ensures the single-model-without-baseline case still errors downstream at pairwise_comparison_one_group(). All edge cases (1 model, 2 models with/without baseline) are handled correctly. The deterministic mathematical correctness test (Test 3) with known score ratios is excellent. Typo fix and improved warning message are appropriate. No issues found.

@nikosbosse nikosbosse marked this pull request as draft February 13, 2026 08:26
@nikosbosse nikosbosse changed the title Fix #1022: allow pairwise comparisons with two models BOT: Fix #1022: allow pairwise comparisons with two models Feb 13, 2026
@nikosbosse nikosbosse closed this Apr 4, 2026
@nikosbosse
Copy link
Copy Markdown
Collaborator Author

Duplicated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Let get_pairwise_comparison() run with two models?

1 participant