Skip to content

docs: add Supported Patterns section to README#1155

Open
vominh1919 wants to merge 2 commits intoPrimeIntellect-ai:mainfrom
vominh1919:docs/1078-supported-patterns
Open

docs: add Supported Patterns section to README#1155
vominh1919 wants to merge 2 commits intoPrimeIntellect-ai:mainfrom
vominh1919:docs/1078-supported-patterns

Conversation

@vominh1919
Copy link
Copy Markdown

@vominh1919 vominh1919 commented Apr 16, 2026

Closes #1078

Adds a "Supported Patterns" reference table to the README listing all RL environment design patterns that verifiers supports, with links to relevant documentation.

Covers:

  • Single-turn / Multi-turn interactions
  • Native tool parsing / MCP tools
  • Harness-in-sandbox / outside sandbox / no sandbox
  • Groupwise rewards / weighted rewards / intermediate metrics
  • Multiple environments (RubricGroup)
  • Custom metrics / error handling
  • Offline evals
  • Resource management
  • Stateful interactions
  • Prompt optimization (GEPA)

Based on the pattern list from @willccbb tweet (https://x.com/i/status/2037734454459089027).


Note

Low Risk
Mostly documentation, plus a small defensive change to dataset generation in TextArenaEnv that should only broaden compatibility.

Overview
Adds a new Supported Patterns section to README.md, providing a quick-reference table of common RL environment design patterns and linking each to the relevant docs/classes.

Fixes TextArenaEnv.ta_to_hf() to handle TextArena games whose word_list is a dict of categorized word lists by flattening it before sampling answers.

Reviewed by Cursor Bugbot for commit f5b4027. Bugbot is set up for automated code reviews on this repo. Configure here.

…nv.ta_to_hf()

Games like TwentyQuestions-v0 use categorized word lists (dict with
category keys mapping to word lists). Ta_to_hf() called random.choice()
directly on the word_list, causing KeyError when it's a dict.

Now flattens dict word lists into a single list before sampling.

Fixes PrimeIntellect-ai#1074
Add a reference table listing all RL environment design patterns that
verifiers supports, with links to relevant documentation for each.

Covers: single-turn, multi-turn, native tool parsing, MCP tools,
sandbox/no-sandbox modes, groupwise rewards, weighted rewards,
intermediate metrics, multiple environments, custom metrics,
offline evals, resource management, stateful interactions, and GEPA.

Closes PrimeIntellect-ai#1078
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add "Supported Patterns" section to README or docs

1 participant