Skip to content

MTP greedy parity vs llama.cpp --spec-type draft-mtp #31

@pekkah

Description

@pekkah

Background

CPU MTP foundation (#25, commit a28bac4) ships a smoke test that asserts finite + non-degenerate MTP head logits, but does NOT verify the actual emitted tokens match llama.cpp's --spec-type draft-mtp --spec-draft-n-max 2 output. Without parity, we don't know if the MTP head is correctly wired (concat order, eh_proj orientation, partial-RoPE on the MTP attn block, etc.).

Scope

  1. Capture reference output from llama-cli (or llama-server greedy) on the Qwen3.6-27B-MTP file with --spec-type draft-mtp --spec-draft-n-max 2, greedy, fixed seed, fixed prompt, >=60 tokens.

  2. Capture our output from sharpi-cli (when CLI MTP routing lands — depends on #(cli-routing-issue), or via a test that drives InferenceEngine.GenerateAsync directly) on the same model + prompt + greedy. Confirm SHARPI_TRACE_MTP shows MTP was actually used.

  3. Compare token-for-token. Expect:

    • Bit-identical when MTP is disabled (SHARPI_DISABLE_MTP=1) vs llama.cpp with --spec-draft-n-max 0.
    • Bit-identical or within Q4_K_M roundoff when MTP is enabled vs llama.cpp with --spec-draft-n-max 2 — because both use greedy correction, divergences in the MTP draft path don't affect emitted tokens.
  4. If mismatched, bisect using SHARPI_TRACE_LAYERS=1 and llama.cpp's eval-callback dump. Likely culprits documented in docs/qwen35moe-plan.md Phase 5 (concat order, eh_proj orientation, partial RoPE on MTP attn).

Acceptance criteria

  • Reference dump from llama.cpp captured + checked in under tests/fixtures/mtp_parity_27b.txt (or similar).
  • New test in Tests.ForwardPass or a new Tests.Mtp project: MtpDecoder_GreedyParity_LlamaCpp reads the fixture and asserts >= 60 byte-identical tokens.
  • Test silently skips when the 27B-MTP file isn't on disk (mirror existing HybridGdnForwardPassTests pattern).

Out of scope

  • Speedup measurement — separate issue.
  • Multimodal / vision parity — model has text-only fixture.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions