Conversation
Greptile SummaryThis PR adds a Several P1 defects flagged in prior review threads remain unaddressed in the current code: Confidence Score: 2/5Not safe to merge — multiple P1 defects from prior review remain unresolved in the current code. Four P1 findings flagged in previous review threads are still present in the code as submitted: speaker always hardcoded to Sofia, setup_on_node guard never short-circuits, missing makedirs before sf.write, and conflicting onnxruntime/onnxruntime-gpu in pyproject.toml. Multiple unresolved P1s pull the score below the 4/5 ceiling for P1-only PRs. nemo_curator/stages/audio/inference/tts/magpie.py requires the most attention; pyproject.toml also needs the onnxruntime conflict resolved. Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Pipeline
participant ManifestReader
participant MagpieTTSStage
participant DownstreamStages
participant ManifestWriterStage
User->>Pipeline: run()
Pipeline->>ManifestReader: process(manifest.jsonl)
ManifestReader-->>Pipeline: AudioTask(text, language)
Pipeline->>MagpieTTSStage: setup_on_node() [downloads model]
Pipeline->>MagpieTTSStage: setup() [loads model to GPU]
Pipeline->>MagpieTTSStage: process(AudioTask)
MagpieTTSStage->>MagpieTTSStage: generate_audio_file(transcript, language)
Note over MagpieTTSStage: do_tts() with speaker_index=ClassVar[Sofia]<br/>(ignores instance speaker field)
MagpieTTSStage->>MagpieTTSStage: sf.write(audio_filepath, ...)
Note over MagpieTTSStage: FileNotFoundError if generated_audio_dir<br/>does not exist
MagpieTTSStage-->>Pipeline: AudioTask(text, language, audio_filepath)
Pipeline->>DownstreamStages: process(AudioTask)
DownstreamStages-->>Pipeline: AudioTask (enriched)
Pipeline->>ManifestWriterStage: write(result.jsonl)
Reviews (4): Last reviewed commit: "Rebase main" | Re-trigger Greptile |
| speaker_map: ClassVar[dict[str, int]] = {"John": 0, "Sofia": 1, "Aria": 2, "Jason": 3, "Leo": 4} | ||
| speaker: str = "Sofia" | ||
| speaker_idx: ClassVar[int] = speaker_map["Sofia"] |
There was a problem hiding this comment.
speaker field is silently ignored — always uses Sofia
speaker_idx is a ClassVar set once at class-definition time to speaker_map["Sofia"] (value 1). Because it is a class variable, reassigning speaker at instantiation (e.g. MagpieTTSStage(..., speaker="John")) never changes speaker_idx. Every call to generate_audio_file therefore passes speaker_index=1 (Sofia) regardless of what the user configured, making the speaker field completely non-functional.
Fix by resolving the index dynamically from the instance field:
speaker_map: ClassVar[dict[str, int]] = {"John": 0, "Sofia": 1, "Aria": 2, "Jason": 3, "Leo": 4}
speaker: str = "Sofia"
# Remove the ClassVar speaker_idx — derive it in generate_audio_file
def generate_audio_file(self, transcript: str, language: str):
speaker_index = self.speaker_map[self.speaker]
audio, audio_len = self._tts_model.do_tts(
transcript, language=language, apply_TN=False, speaker_index=speaker_index
)
return audio, audio_len| def setup_on_node( | ||
| self, _node_info: NodeInfo | None = None, _worker_metadata: WorkerMetadata | None = None | ||
| ) -> None: | ||
| if self._tts_model: | ||
| return | ||
| try: | ||
| kwargs: dict[str, Any] = {"model_name": self.model_name, "return_model_file": True} | ||
| if self.cache_dir is not None: | ||
| kwargs["cache_dir"] = self.cache_dir | ||
| MagpieTTSModel.from_pretrained(**kwargs) | ||
| except Exception as e: | ||
| msg = f"Failed to download {self.model_name}" | ||
| raise RuntimeError(msg) from e |
There was a problem hiding this comment.
setup_on_node guard never short-circuits
The early-return guard if self._tts_model: return checks an instance field that is never assigned inside setup_on_node — the return value of MagpieTTSModel.from_pretrained(...) is discarded. _tts_model therefore remains None after every invocation, so the guard never fires and the method attempts a fresh model-download call on every invocation. Either store the return value or drop the guard.
| def setup_on_node( | |
| self, _node_info: NodeInfo | None = None, _worker_metadata: WorkerMetadata | None = None | |
| ) -> None: | |
| if self._tts_model: | |
| return | |
| try: | |
| kwargs: dict[str, Any] = {"model_name": self.model_name, "return_model_file": True} | |
| if self.cache_dir is not None: | |
| kwargs["cache_dir"] = self.cache_dir | |
| MagpieTTSModel.from_pretrained(**kwargs) | |
| except Exception as e: | |
| msg = f"Failed to download {self.model_name}" | |
| raise RuntimeError(msg) from e | |
| def setup_on_node( | |
| self, _node_info: NodeInfo | None = None, _worker_metadata: WorkerMetadata | None = None | |
| ) -> None: | |
| try: | |
| kwargs: dict[str, Any] = {"model_name": self.model_name, "return_model_file": True} | |
| if self.cache_dir is not None: | |
| kwargs["cache_dir"] = self.cache_dir | |
| MagpieTTSModel.from_pretrained(**kwargs) | |
| except Exception as e: | |
| msg = f"Failed to download {self.model_name}" | |
| raise RuntimeError(msg) from e |
| def generate_audio_file(self, transcript: str, language: str) -> str: | ||
| """ | ||
| Generate an audio file from the transcript. | ||
| """ | ||
| audio, audio_len = self._tts_model.do_tts( | ||
| transcript, language=language, apply_TN=False, speaker_index=self.speaker_idx | ||
| ) | ||
| return audio, audio_len |
There was a problem hiding this comment.
Incorrect return-type annotation on
generate_audio_file
The annotation says -> str but the function returns a two-element tuple (audio, audio_len). Callers (including process) correctly unpack it as a tuple, so the runtime behaviour is fine, but the annotation is misleading.
| def generate_audio_file(self, transcript: str, language: str) -> str: | |
| """ | |
| Generate an audio file from the transcript. | |
| """ | |
| audio, audio_len = self._tts_model.do_tts( | |
| transcript, language=language, apply_TN=False, speaker_index=self.speaker_idx | |
| ) | |
| return audio, audio_len | |
| def generate_audio_file(self, transcript: str, language: str) -> tuple: |
| "onnxruntime>=1.20.1,<1.24", # Python bindings (split from GPU package in 1.21+) | ||
| "onnxruntime-gpu>=1.20.1,<1.24; platform_machine == 'x86_64'", # CUDA providers |
There was a problem hiding this comment.
Conflicting
onnxruntime / onnxruntime-gpu entries will break installation
Both onnxruntime (CPU-only bindings) and onnxruntime-gpu (CUDA-enabled) install into the same onnxruntime Python namespace. Declaring both in the same extras group will cause pip/uv to flag a conflict or silently let one overwrite the other, potentially stripping CUDA support. The onnxruntime-gpu package already ships the Python bindings; the separate onnxruntime entry is redundant and harmful here.
If the goal is to guarantee the pure-Python onnxruntime module is installed even on non-x86_64 platforms, scope it with the inverse platform marker:
"onnxruntime>=1.20.1,<1.24; platform_machine != 'x86_64'", # CPU-only (non-x86)
"onnxruntime-gpu>=1.20.1,<1.24; platform_machine == 'x86_64'", # CUDA providers (x86)Otherwise, drop the plain onnxruntime line entirely.
052f542 to
17a4a08
Compare
| def setup(self, _: WorkerMetadata | None = None) -> None: | ||
| if not self._tts_model: | ||
| try: | ||
| kwargs: dict[str, Any] = {"model_name": self.model_name} | ||
| if self.cache_dir is not None: | ||
| kwargs["cache_dir"] = self.cache_dir | ||
| self._tts_model = MagpieTTSModel.from_pretrained(**kwargs) | ||
| except Exception as e: | ||
| msg = f"Failed to load {self.model_name}" | ||
| raise RuntimeError(msg) from e | ||
| self._tts_model = self._tts_model.to(self._device) | ||
| logger.info(f"[{self.name}] Initialized MagpieTTS on {self._device}") |
There was a problem hiding this comment.
generated_audio_dir never guaranteed to exist before sf.write
setup() loads and moves the model but never ensures generated_audio_dir exists. When process() later calls sf.write(audio_filepath, ...) (line 135), soundfile will raise FileNotFoundError if the user-supplied directory path has not been created beforehand. The class API gives no indication that callers must pre-create this directory, so this will silently break direct usage of the stage. Add a makedirs call at the end of setup():
| def setup(self, _: WorkerMetadata | None = None) -> None: | |
| if not self._tts_model: | |
| try: | |
| kwargs: dict[str, Any] = {"model_name": self.model_name} | |
| if self.cache_dir is not None: | |
| kwargs["cache_dir"] = self.cache_dir | |
| self._tts_model = MagpieTTSModel.from_pretrained(**kwargs) | |
| except Exception as e: | |
| msg = f"Failed to load {self.model_name}" | |
| raise RuntimeError(msg) from e | |
| self._tts_model = self._tts_model.to(self._device) | |
| logger.info(f"[{self.name}] Initialized MagpieTTS on {self._device}") | |
| def setup(self, _: WorkerMetadata | None = None) -> None: | |
| if not self._tts_model: | |
| try: | |
| kwargs: dict[str, Any] = {"model_name": self.model_name} | |
| if self.cache_dir is not None: | |
| kwargs["cache_dir"] = self.cache_dir | |
| self._tts_model = MagpieTTSModel.from_pretrained(**kwargs) | |
| except Exception as e: | |
| msg = f"Failed to load {self.model_name}" | |
| raise RuntimeError(msg) from e | |
| self._tts_model = self._tts_model.to(self._device) | |
| os.makedirs(self.generated_audio_dir, exist_ok=True) | |
| logger.info(f"[{self.name}] Initialized MagpieTTS on {self._device}") |
Signed-off-by: Fan Qian <fqian@nvidia.com>
17a4a08 to
7a0f7bb
Compare
Signed-off-by: Fan Qian <fqian@nvidia.com>
Description
Usage
# Add snippet demonstrating usageChecklist