feat: upgrade MiniMax default model to M3 (512K context, 128K output)#49
feat: upgrade MiniMax default model to M3 (512K context, 128K output)#49octo-patch wants to merge 1 commit into
Conversation
|
Hi, open_mythos.init.all includes "load_tokenizer" and "get_vocab_size", but neither symbol is imported into the package nor defined anywhere in the open_mythos package, causing AttributeError (and potentially failing Severity: action required | Category: correctness How to fix: Define or remove exported names Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo code review. FYI, Qodo is free for open-source. |
- Add MiniMax-M3 variant config (minimax_m3_config) with 512K context and 128K max output - Update HuggingFace model ID constant to MiniMaxAI/MiniMax-M3 - Replace older minimax_m2_config / MINIMAX_M2_MODEL_ID exports with M3 equivalents - Update unit tests to validate the M3 dimensions (524 288 ctx, 131 072 output) - Preserve all existing structural assumptions: MLA head_dim=128, 32 routed + 2 shared experts (top-4), 6x GQA ratio, rope_theta=10M
ac39fd2 to
19537ac
Compare
Summary
This PR adds first-class support for the MiniMax-M3 architecture in OpenMythos:
minimax_m3_config()— a newMythosConfigfactory inopen_mythos/variants.pywhose structural dimensions mirror MiniMax-M3's design:dim=6144, 48 query / 8 KV heads (6× GQA),head_dim=128via MLA rope+nope split (64+64), 32 routed + 2 shared experts with top-4 activation, 524 288-token (512K) context, 131 072-token (128K) max output, andrope_theta=10_000_000for long-context stability.MINIMAX_M3_MODEL_ID— a constant ("MiniMaxAI/MiniMax-M3") added toopen_mythos/tokenizer.pyfor convenient use withMythosTokenizer, whosevocab_size=200064matches the config.open_mythospackage.tests/test_minimax_m3.py— unit tests covering config dimensions (including the new 512K context window and 128K max output), MoE structure (32 routed + 2 shared + top-4), MLA cache compression, LTI spectral radius stability, highrope_thetanumerical safety, and full forward-pass / generation correctness with synthetic tensors on CPU.All new tests are designed to run on CPU with synthetic tensors. No changes to existing code paths.
Why MiniMax-M3
MiniMax-M3 is the latest MiniMax release, with a 512K context window, up to 128K output tokens, and image-input support. Replacing the earlier draft (M2.7) with M3 keeps OpenMythos's reference config aligned with the current default model.
Usage