Add synthetic audio sanity check skill by fqian1107 · Pull Request #1874 · NVIDIA-NeMo/Curator

fqian1107 · 2026-04-27T11:55:14Z

Description

Usage

# Add snippet demonstrating usage

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-04-27T11:55:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-04-27T11:58:03Z

Greptile Summary

This PR adds a MagpieTTSStage dataclass that synthesises audio from text using the Magpie TTS model, along with a Claude skill (curator-synth-audio-sanity) that drives an end-to-end pipeline sanity check using synthetic audio. The SKILL.md is well-structured and its class references (ManifestReader, ManifestWriterStage) match the real codebase.

Several P1 defects flagged in prior review threads remain unaddressed in the current code: speaker always resolves to Sofia (ClassVar never updated from the instance field), setup_on_node guard never fires (return value of from_pretrained is discarded so _tts_model stays None), and sf.write will raise FileNotFoundError because generated_audio_dir is never created in setup(). The conflicting onnxruntime / onnxruntime-gpu entries in pyproject.toml are also still present.

Confidence Score: 2/5

Not safe to merge — multiple P1 defects from prior review remain unresolved in the current code.

Four P1 findings flagged in previous review threads are still present in the code as submitted: speaker always hardcoded to Sofia, setup_on_node guard never short-circuits, missing makedirs before sf.write, and conflicting onnxruntime/onnxruntime-gpu in pyproject.toml. Multiple unresolved P1s pull the score below the 4/5 ceiling for P1-only PRs.

nemo_curator/stages/audio/inference/tts/magpie.py requires the most attention; pyproject.toml also needs the onnxruntime conflict resolved.

Important Files Changed

Filename	Overview
nemo_curator/stages/audio/inference/tts/magpie.py	New MagpieTTS stage; multiple P1 defects flagged in prior review: speaker field silently ignored (ClassVar never updated), setup_on_node guard never fires (return value discarded), generated_audio_dir not created before sf.write, and wrong return-type annotation on generate_audio_file — all still present in current code.
pyproject.toml	Adds nemo_toolkit[tts] to audio_common (heavy dep bloat for all audio users), and onnxruntime alongside onnxruntime-gpu (conflicting packages, flagged P1 in prior review); merge conflict markers reported earlier are resolved in current file.
.claude/skills/curator-synth-audio-sanity/SKILL.md	New Claude skill defining an 8-step end-to-end audio pipeline sanity-check workflow; class names (ManifestReader, ManifestWriterStage) and constructor args match the actual codebase; backend name "xenna" is valid; no blocking issues.
nemo_curator/stages/audio/inference/tts/init.py	Empty init.py to make the tts directory a package; no issues.
uv.lock	Auto-generated lock file updated to reflect new nemo-toolkit[tts] transitive dependencies (kornia, janome, pyopenjtalk, pypinyin, seaborn, pynini, etc.); no manual issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant Pipeline
    participant ManifestReader
    participant MagpieTTSStage
    participant DownstreamStages
    participant ManifestWriterStage

    User->>Pipeline: run()
    Pipeline->>ManifestReader: process(manifest.jsonl)
    ManifestReader-->>Pipeline: AudioTask(text, language)
    Pipeline->>MagpieTTSStage: setup_on_node() [downloads model]
    Pipeline->>MagpieTTSStage: setup() [loads model to GPU]
    Pipeline->>MagpieTTSStage: process(AudioTask)
    MagpieTTSStage->>MagpieTTSStage: generate_audio_file(transcript, language)
    Note over MagpieTTSStage: do_tts() with speaker_index=ClassVar[Sofia]<br/>(ignores instance speaker field)
    MagpieTTSStage->>MagpieTTSStage: sf.write(audio_filepath, ...)
    Note over MagpieTTSStage: FileNotFoundError if generated_audio_dir<br/>does not exist
    MagpieTTSStage-->>Pipeline: AudioTask(text, language, audio_filepath)
    Pipeline->>DownstreamStages: process(AudioTask)
    DownstreamStages-->>Pipeline: AudioTask (enriched)
    Pipeline->>ManifestWriterStage: write(result.jsonl)

_{Reviews (4): Last reviewed commit: "Rebase main" | Re-trigger Greptile}

greptile-apps · 2026-04-27T11:58:07Z

+    speaker_map: ClassVar[dict[str, int]] = {"John": 0, "Sofia": 1, "Aria": 2, "Jason": 3, "Leo": 4}
+    speaker: str = "Sofia"
+    speaker_idx: ClassVar[int] = speaker_map["Sofia"]


speaker field is silently ignored — always uses Sofia

speaker_idx is a ClassVar set once at class-definition time to speaker_map["Sofia"] (value 1). Because it is a class variable, reassigning speaker at instantiation (e.g. MagpieTTSStage(..., speaker="John")) never changes speaker_idx. Every call to generate_audio_file therefore passes speaker_index=1 (Sofia) regardless of what the user configured, making the speaker field completely non-functional.

Fix by resolving the index dynamically from the instance field:

speaker_map: ClassVar[dict[str, int]] = {"John": 0, "Sofia": 1, "Aria": 2, "Jason": 3, "Leo": 4} speaker: str = "Sofia" # Remove the ClassVar speaker_idx — derive it in generate_audio_file def generate_audio_file(self, transcript: str, language: str): speaker_index = self.speaker_map[self.speaker] audio, audio_len = self._tts_model.do_tts( transcript, language=language, apply_TN=False, speaker_index=speaker_index ) return audio, audio_len

greptile-apps · 2026-04-27T11:58:08Z

+    def setup_on_node(
+        self, _node_info: NodeInfo | None = None, _worker_metadata: WorkerMetadata | None = None
+    ) -> None:
+        if self._tts_model:
+            return
+        try:
+            kwargs: dict[str, Any] = {"model_name": self.model_name, "return_model_file": True}
+            if self.cache_dir is not None:
+                kwargs["cache_dir"] = self.cache_dir
+            MagpieTTSModel.from_pretrained(**kwargs)
+        except Exception as e:
+            msg = f"Failed to download {self.model_name}"
+            raise RuntimeError(msg) from e


setup_on_node guard never short-circuits

The early-return guard if self._tts_model: return checks an instance field that is never assigned inside setup_on_node — the return value of MagpieTTSModel.from_pretrained(...) is discarded. _tts_model therefore remains None after every invocation, so the guard never fires and the method attempts a fresh model-download call on every invocation. Either store the return value or drop the guard.

Suggested change

def setup_on_node(

self, _node_info: NodeInfo | None = None, _worker_metadata: WorkerMetadata | None = None

) -> None:

if self._tts_model:

return

try:

kwargs: dict[str, Any] = {"model_name": self.model_name, "return_model_file": True}

if self.cache_dir is not None:

kwargs["cache_dir"] = self.cache_dir

MagpieTTSModel.from_pretrained(**kwargs)

except Exception as e:

msg = f"Failed to download {self.model_name}"

raise RuntimeError(msg) from e

def setup_on_node(

self, _node_info: NodeInfo | None = None, _worker_metadata: WorkerMetadata | None = None

) -> None:

try:

kwargs: dict[str, Any] = {"model_name": self.model_name, "return_model_file": True}

if self.cache_dir is not None:

kwargs["cache_dir"] = self.cache_dir

MagpieTTSModel.from_pretrained(**kwargs)

except Exception as e:

msg = f"Failed to download {self.model_name}"

raise RuntimeError(msg) from e

greptile-apps · 2026-04-27T11:58:09Z

+    def generate_audio_file(self, transcript: str, language: str) -> str:
+        """
+        Generate an audio file from the transcript.
+        """
+        audio, audio_len = self._tts_model.do_tts(
+            transcript, language=language, apply_TN=False, speaker_index=self.speaker_idx
+        )
+        return audio, audio_len


Incorrect return-type annotation on generate_audio_file

The annotation says -> str but the function returns a two-element tuple (audio, audio_len). Callers (including process) correctly unpack it as a tuple, so the runtime behaviour is fine, but the annotation is misleading.

Suggested change

def generate_audio_file(self, transcript: str, language: str) -> str:

"""

Generate an audio file from the transcript.

"""

audio, audio_len = self._tts_model.do_tts(

transcript, language=language, apply_TN=False, speaker_index=self.speaker_idx

)

return audio, audio_len

def generate_audio_file(self, transcript: str, language: str) -> tuple:

greptile-apps · 2026-04-27T11:58:10Z

+    "onnxruntime>=1.20.1,<1.24",                                       # Python bindings (split from GPU package in 1.21+)
+    "onnxruntime-gpu>=1.20.1,<1.24; platform_machine == 'x86_64'",    # CUDA providers


Conflicting onnxruntime / onnxruntime-gpu entries will break installation

Both onnxruntime (CPU-only bindings) and onnxruntime-gpu (CUDA-enabled) install into the same onnxruntime Python namespace. Declaring both in the same extras group will cause pip/uv to flag a conflict or silently let one overwrite the other, potentially stripping CUDA support. The onnxruntime-gpu package already ships the Python bindings; the separate onnxruntime entry is redundant and harmful here.

If the goal is to guarantee the pure-Python onnxruntime module is installed even on non-x86_64 platforms, scope it with the inverse platform marker:

"onnxruntime>=1.20.1,<1.24; platform_machine != 'x86_64'", # CPU-only (non-x86) "onnxruntime-gpu>=1.20.1,<1.24; platform_machine == 'x86_64'", # CUDA providers (x86)

Otherwise, drop the plain onnxruntime line entirely.

greptile-apps · 2026-04-27T12:02:49Z

+    def setup(self, _: WorkerMetadata | None = None) -> None:
+        if not self._tts_model:
+            try:
+                kwargs: dict[str, Any] = {"model_name": self.model_name}
+                if self.cache_dir is not None:
+                    kwargs["cache_dir"] = self.cache_dir
+                self._tts_model = MagpieTTSModel.from_pretrained(**kwargs)
+            except Exception as e:
+                msg = f"Failed to load {self.model_name}"
+                raise RuntimeError(msg) from e
+        self._tts_model = self._tts_model.to(self._device)
+        logger.info(f"[{self.name}] Initialized MagpieTTS on {self._device}")


generated_audio_dir never guaranteed to exist before sf.write

setup() loads and moves the model but never ensures generated_audio_dir exists. When process() later calls sf.write(audio_filepath, ...) (line 135), soundfile will raise FileNotFoundError if the user-supplied directory path has not been created beforehand. The class API gives no indication that callers must pre-create this directory, so this will silently break direct usage of the stage. Add a makedirs call at the end of setup():

Suggested change

def setup(self, _: WorkerMetadata | None = None) -> None:

if not self._tts_model:

try:

kwargs: dict[str, Any] = {"model_name": self.model_name}

if self.cache_dir is not None:

kwargs["cache_dir"] = self.cache_dir

self._tts_model = MagpieTTSModel.from_pretrained(**kwargs)

except Exception as e:

msg = f"Failed to load {self.model_name}"

raise RuntimeError(msg) from e

self._tts_model = self._tts_model.to(self._device)

logger.info(f"[{self.name}] Initialized MagpieTTS on {self._device}")

def setup(self, _: WorkerMetadata | None = None) -> None:

if not self._tts_model:

try:

kwargs: dict[str, Any] = {"model_name": self.model_name}

if self.cache_dir is not None:

kwargs["cache_dir"] = self.cache_dir

self._tts_model = MagpieTTSModel.from_pretrained(**kwargs)

except Exception as e:

msg = f"Failed to load {self.model_name}"

raise RuntimeError(msg) from e

self._tts_model = self._tts_model.to(self._device)

os.makedirs(self.generated_audio_dir, exist_ok=True)

logger.info(f"[{self.name}] Initialized MagpieTTS on {self._device}")

Signed-off-by: Fan Qian <fqian@nvidia.com>

fqian1107 requested review from a team as code owners April 27, 2026 11:55

fqian1107 requested review from suiyoubi and removed request for a team April 27, 2026 11:55

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

fqian1107 force-pushed the audio_curator_skills branch from 052f542 to 17a4a08 Compare April 27, 2026 11:59

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Add synthetic audio sanity check skill

7a0f7bb

Signed-off-by: Fan Qian <fqian@nvidia.com>

fqian1107 force-pushed the audio_curator_skills branch from 17a4a08 to 7a0f7bb Compare April 28, 2026 13:07

Rebase main

36247ab

Signed-off-by: Fan Qian <fqian@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add synthetic audio sanity check skill#1874

Add synthetic audio sanity check skill#1874
fqian1107 wants to merge 2 commits intomainfrom
audio_curator_skills

fqian1107 commented Apr 27, 2026

Uh oh!

copy-pr-bot Bot commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		"onnxruntime>=1.20.1,<1.24", # Python bindings (split from GPU package in 1.21+)
		"onnxruntime-gpu>=1.20.1,<1.24; platform_machine == 'x86_64'", # CUDA providers

Conversation

fqian1107 commented Apr 27, 2026

Description

Usage

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading