Skip to content

[codex] tighten Expo eval follow-ups#382

Draft
grabbou wants to merge 8 commits into
mainfrom
codex/expo-eval-followups
Draft

[codex] tighten Expo eval follow-ups#382
grabbou wants to merge 8 commits into
mainfrom
codex/expo-eval-followups

Conversation

@grabbou

@grabbou grabbou commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR tightens the Expo eval expansion after review against current SDK 56 docs and local runner behavior.

  • Aligns the testbench dependencies with Expo's current SDK 56-compatible versions via expo install --fix.
  • Fixes the Expo Router data-loader eval to use unstable_useServerDataLoaders instead of unrelated asyncRoutes config.
  • Makes the inline-modules prebuild/build note judgeable by adding an explicit input/reference note file.
  • Corrects the Expo Modules web-platform eval to use platforms: ["apple", "android", "web"] plus .web.ts platform resolution, avoiding unsupported web.modules config.
  • Tightens the MediaLibrary reference to create a real .png file, verify file.exists, and avoid double-adding the asset to an album.
  • Seeds static Expo Modules evals with the config/native/script files their requirements ask solvers to edit.
  • Adds missing official Expo SDK source links and clarifies whitepaper wording around requirement weights and inputs.files metadata.

Validation

  • bun lint
  • bun test runner/evaluators/llm/tests/discovery.test.ts runner/evaluators/llm/tests/requirements.test.ts runner/evaluators/llm/tests/files.test.ts
  • bunx expo install --check from testbench/
  • bun runner/run.ts --pattern "evals/expo-sdk/17-rn-expo-media-library-file-asset-create" --model noop --output /tmp/evals-expo-sdk-17
  • bun runner/run.ts --pattern "evals/expo-router/09-rn-expo-router-data-loaders-config" --model noop --output /tmp/evals-expo-router-09
  • bun runner/run.ts --pattern "evals/expo-modules/**" --model noop --output /tmp/evals-expo-modules

Notes

I intentionally did not change the broader whitepaper claim about the historical 66-eval result snapshot versus the current 151-eval inventory, per review direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant