[benchmarking] Adds `paths` config for specifying Docker container mounts by rlratzel · Pull Request #1879 · NVIDIA-NeMo/Curator

rlratzel · 2026-04-27T21:59:03Z

This PR adds the paths section which defines files and directories used by the benchmarks.
paths replaces the individual results_path, datasets_path, and model_weights_path settings and allows users to set values that can be used by containers running on different machines with different host paths.

Each paths entry requires a name and a host_path, and all host paths must exist on the host. If running inside a Docker container started with tools/run.sh, each path is automatically mounted into the container. An optional container_path overrides the default mount point (default is host_path prefixed with /MOUNT), providing behavior similar to the previous results_path and others.

Like results_path and others, the "name" can be used as a placeholder elsewhere in the config file (e.g. {datasets_path}) and will be substituted with either host_path or container_path based on the environment.

paths is not limited to just results_path, datasets_path, and model_weights_path; users can add additional mappings if needed, but defining results_path is required as it is used by the framework.

If config files are encountered that use the deprecated results_path, datasets_path, or model_weights_path settings, a warning will be printed. If a config (or series of merged configs) uses both paths and the legacy variables, an error will be raised.

paths:
  - name: "datasets_path"
    host_path: "/path/to/datasets"
    container_path: "/datasets"
  - name: "model_weights_path"
    host_path: "/path/to/model_weights"
    container_path: "/model_weights"
  - name: "results_path"
    host_path: "/path/where/results/are/stored"
    # container_path will default to "/MOUNT/path/where/results/are/stored".

…oints. Signed-off-by: rlratzel <rratzel@nvidia.com>

…mounts Signed-off-by: rlratzel <rratzel@nvidia.com>

greptile-apps · 2026-04-27T22:03:44Z

Greptile Summary

This PR replaces the flat results_path/datasets_path/model_weights_path config fields with a structured paths list, where each entry carries a name, host_path, and optional container_path. The new design lets tools/run.sh generate Docker volume mounts dynamically for any number of user-defined paths, and centralizes validation and merging in runner/utils.py.

Confidence Score: 4/5

Safe to merge with minor follow-ups; all findings are non-blocking P2 suggestions.

No P0 or P1 issues found. Three P2 items exist: the entries missing-field check was downgraded from a hard error to a warning (silent no-op run risk), yaml.full_load_all is used where safe_load was used before in gen_runscript_vars, and host_path: null in YAML bypasses validation with an unhelpful TypeError. None of these block merging but all are worth addressing.

benchmarking/runner/utils.py — contains all three findings

Important Files Changed

Filename	Overview
benchmarking/runner/utils.py	Core validation and merging logic added; three P2 issues found: `entries` downgraded from required error to warning, `yaml.full_load_all` replaces `safe_load`, and `host_path: null` bypasses validation.
benchmarking/runner/path_resolver.py	Refactored to support `paths` list with name/host_path/container_path entries; legacy field handling preserved; `volume_mount_pairs()` iterator added; logic is clean.
benchmarking/runner/entry.py	Removed `allow_other_placeholders` parameter from `substitute_container_or_host_paths`; no behavior change since both existing callers already used `allow_other_placeholders=False`.
benchmarking/runner/session.py	Removed `assert_valid_config_dict` class method in favor of the shared utils version; logic is straightforward and correct.
benchmarking/tools/gen_runscript_vars.py	Volume mount generation simplified via `PathResolver.volume_mount_pairs()`; now uses `yaml.full_load_all` (via `merge_config_files`) instead of previous `yaml.safe_load`, which is a minor security downgrade.
benchmarking/run.py	Moved `update_config`/`merge_config_files` to `utils.py`, moved `assert_valid_config_dict` to utils — no logic changes, clean refactor.
benchmarking/nightly-benchmark.yaml	Updated to use new `paths` section with `name`/`host_path`/`container_path` entries; legacy fields commented out for reference.
benchmarking/README.md	Documentation updated to reflect new `paths` config format; merge behavior docs clarified to note `name`-key matching.

Sequence Diagram

sequenceDiagram
    participant User as User / run.sh
    participant GRV as gen_runscript_vars.py
    participant MRG as merge_config_files()
    participant VAL as assert_valid_config_dict()
    participant PR as PathResolver
    participant Docker as Docker CLI

    User->>GRV: argv (--config files)
    GRV->>MRG: config file paths
    MRG-->>GRV: merged config_dict
    GRV->>VAL: config_dict
    VAL-->>GRV: ok / ValueError
    GRV->>PR: config_dict
    PR-->>GRV: PathResolver instance
    GRV->>PR: volume_mount_pairs()
    PR-->>GRV: (host_path, container_path) pairs
    GRV->>Docker: --volume host:container flags

    Note over GRV,VAL: run.py follows the same path via Session.from_dict

_{Reviews (7): Last reviewed commit: "Adds check to ensure paths is a list." | Re-trigger Greptile}

…re this. Signed-off-by: rlratzel <rratzel@nvidia.com>

…onfigs to be more robust, ensures paths sections don't have duplicate entries. Signed-off-by: rlratzel <rratzel@nvidia.com>

…mounts Signed-off-by: rlratzel <rratzel@nvidia.com>

Signed-off-by: rlratzel <rratzel@nvidia.com>

…mounts Signed-off-by: rlratzel <rratzel@nvidia.com>

Signed-off-by: rlratzel <rratzel@nvidia.com>

rlratzel added 2 commits April 27, 2026 16:27

Adds support for new paths section to define common container mount p…

629e07f

…oints. Signed-off-by: rlratzel <rratzel@nvidia.com>

Merge remote-tracking branch 'upstream/main' into extra_benchmarking_…

306349b

…mounts Signed-off-by: rlratzel <rratzel@nvidia.com>

rlratzel self-assigned this Apr 27, 2026

rlratzel added the r1.2.0 Pick this label for auto cherry-picking into r1.2.0 label Apr 27, 2026

rlratzel requested review from a team, ayushdg, praateekmahajan and sarahyurick as code owners April 27, 2026 21:59

rlratzel enabled auto-merge (squash) April 27, 2026 21:59

copy-pr-bot Bot temporarily deployed to test April 27, 2026 21:59 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 27, 2026 21:59 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci April 27, 2026 21:59 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 27, 2026 21:59 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 27, 2026 21:59 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 27, 2026 21:59 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 27, 2026 21:59 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 27, 2026 21:59 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 27, 2026 21:59 Inactive

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread benchmarking/tools/gen_runscript_vars.py

Comment thread benchmarking/runner/path_resolver.py

Comment thread benchmarking/runner/utils.py Outdated

Makes lack of entries section a warning since not all use cases requi…

04335ba

…re this. Signed-off-by: rlratzel <rratzel@nvidia.com>

copy-pr-bot Bot had a problem deploying to nemo-ci April 27, 2026 22:27 Error

Uses 'name' key if present then fall back to first key when merging c…

015d618

…onfigs to be more robust, ensures paths sections don't have duplicate entries. Signed-off-by: rlratzel <rratzel@nvidia.com>

copy-pr-bot Bot temporarily deployed to test April 27, 2026 22:30 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 27, 2026 22:30 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 27, 2026 22:41 Inactive

rlratzel added 2 commits April 28, 2026 07:25

Merge remote-tracking branch 'upstream/main' into extra_benchmarking_…

2bf2345

…mounts Signed-off-by: rlratzel <rratzel@nvidia.com>

Changes ValueError to TypeError.

568f82f

Signed-off-by: rlratzel <rratzel@nvidia.com>

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread benchmarking/runner/utils.py

Merge remote-tracking branch 'upstream/main' into extra_benchmarking_…

dc05e1a

…mounts Signed-off-by: rlratzel <rratzel@nvidia.com>

praateekmahajan approved these changes Apr 30, 2026

View reviewed changes

rlratzel added 2 commits April 30, 2026 16:44

Merge remote-tracking branch 'upstream/main' into extra_benchmarking_…

e232c3f

…mounts Signed-off-by: rlratzel <rratzel@nvidia.com>

Adds check to ensure paths is a list.

c50b3fd

Signed-off-by: rlratzel <rratzel@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmarking] Adds `paths` config for specifying Docker container mounts#1879

[benchmarking] Adds `paths` config for specifying Docker container mounts#1879
rlratzel merged 9 commits into
NVIDIA-NeMo:mainfrom
rlratzel:extra_benchmarking_mounts

rlratzel commented Apr 27, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rlratzel commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rlratzel commented Apr 27, 2026 •

edited

Loading

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading