docs: Add real-world examples of Curator for Nemotron datasets and SE…#1847
docs: Add real-world examples of Curator for Nemotron datasets and SE…#1847Pritiks23 wants to merge 10 commits intoNVIDIA-NeMo:mainfrom
Conversation
…S AI Chemistry LLM
Greptile SummaryThis PR adds a "Curator in Action" section to
Confidence Score: 3/5Not safe to merge — the directive structure in the Concepts grid is broken and orphaned card content remains in the file. Two P1 structural issues flagged in prior review rounds remain unresolved: a duplicate Image card opener (lines 57-58) and orphaned docs/about/index.md — lines 57-58 (duplicate opener) and lines 98-118 (orphaned directives) require cleanup before the page renders correctly. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["docs/about/index.md"] --> B["## Concepts grid block\n(lines 47-79)"]
B --> C["Text card ✅"]
B --> D["Image card opener ×2\n(lines 57-58) ⚠️"]
D --> E["New: Image card with options ✅"]
D --> F["New: Video card ✅"]
D --> G["New: Audio card ✅"]
D --> H["Grid closer ::::"]
A --> I["## Curator in Action section\n(lines 81-97) ✅ new content"]
A --> J["Orphaned Image :link: options\n(lines 98-99) ⚠️"]
A --> K["Orphaned Image/Video/Audio cards\n(lines 101-118) ⚠️"]
Reviews (5): Last reviewed commit: "Update docs/about/index.md" | Re-trigger Greptile |
jgerh
left a comment
There was a problem hiding this comment.
Completed tech pubs review and provided a few copyedits
| :link: about-concepts-image | ||
| :link-type: ref | ||
|
|
||
| Explore key concepts for image data curation, including scalable loading, processing (embedding, classification, filtering, deduplication), and dataset export. | ||
| ::: | ||
|
|
||
| :::{grid-item-card} {octicon}`video;1.5em;sd-mr-1` Video Curation Concepts | ||
| :link: about-concepts-video | ||
| :link-type: ref | ||
|
|
||
| Discover video data curation concepts, such as distributed processing, pipeline stages, execution modes, and efficient data flow. | ||
| ::: | ||
|
|
||
| :::{grid-item-card} {octicon}`unmute;1.5em;sd-mr-1` Audio Curation Concepts | ||
| :link: about-concepts-audio | ||
| :link-type: ref | ||
|
|
||
| Learn about speech data curation, ASR inference, quality assessment, and audio-text integration workflows. | ||
| ::: | ||
|
|
||
| :::: |
There was a problem hiding this comment.
Verify orphaned/duplicated grid-card markup from Lines 58-78 left appended after the SES AI paragraph in Lines 97–117.
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>
Description
Adds real-world examples to the documentation showing how NeMo Curator is used to build Nemotron datasets, including LANL/NVIDIA collaboration and SES AI Chemistry LLM use cases.
Closes #1548.
Usage
See docs/about/index.md for real-world usage scenarios and references.