cp: `[benchmarking] Adds support for fine-grained config overrides and global ray config (1840)` into `r1.2.0` by svcnvidia-nemo-ci · Pull Request #1867 · NVIDIA-NeMo/Curator

svcnvidia-nemo-ci · 2026-04-23T21:18:32Z

beep boop [🤖]: Hi @rlratzel 👋,

we've cherry picked #1840 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

…bal ray config (#1840) Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

svcnvidia-nemo-ci · 2026-04-23T21:18:35Z

/ok to test f905cc7

copy-pr-bot · 2026-04-23T21:18:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-04-23T21:21:06Z

Greptile Summary

This PR cherry-picks support for fine-grained config overrides and a global Ray config into r1.2.0. It replaces the flat dict.update config merge with a deep recursive update_config function, adds a top-level ray section to Session that acts as a global default inherited by all entries, and refactors nightly-benchmark.yaml to use that global default instead of repeating identical ray blocks in every entry.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/fragility concerns that do not affect the primary use case.

Both open comments are P2: the first-key matching fragility only matters if a YAML author reorders keys in override files (not a current scenario), and the non-dict append duplication only triggers if an override file repeats a value already in the base list. The core merge and global-ray logic is correct and the nightly-benchmark.yaml refactor preserves all effective per-entry Ray configurations.

benchmarking/run.py — the update_config list-merge logic warrants a second look if override files with non-dict lists or reordered keys are introduced.

Important Files Changed

Filename	Overview
benchmarking/run.py	Adds deep-merge config logic (`update_config`, `merge_config_files`) replacing a flat `dict.update` call; matching list items by first key is fragile and non-dict list items are always appended without deduplication.
benchmarking/runner/session.py	Adds a top-level `ray: dict` field to `Session` and applies it as a base for each entry's ray config via `{self.ray, entry.ray}` in `__post_init__`; logic is correct.
benchmarking/nightly-benchmark.yaml	Extracts repeated `ray` settings to a single global block and leaves only per-entry overrides (e.g. `num_gpus: 0`); effective ray config for each entry is unchanged.
benchmarking/README.md	Documentation updated to explain deep-merge semantics, global `ray` section, and list-matching rules; examples are accurate and match implementation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CLI: --config file1.yaml --config file2.yaml"] --> B["merge_config_files()"]
    B --> C["For each YAML file\nyaml.full_load_all()"]
    C --> D["update_config(config_dict, new_dict)"]
    D --> E{Key type in\nexisting dict?}
    E -->|"Both dicts"| F["Recurse:\nupdate_config()"]
    E -->|"Both lists"| G["Match by first key\nof each dict item"]
    G -->|"Match found"| H["Recurse:\nupdate_config()"]
    G -->|"No match"| I["Append item"]
    G -->|"Non-dict item"| I
    E -->|"Otherwise"| J["Replace value"]
    E -->|"Key missing"| K["Add new key"]
    D --> L["Merged config_dict"]
    L --> M["Session.assert_valid_config_dict()"]
    M --> N["Session.from_dict()"]
    N --> O["Session.__post_init__()"]
    O --> P["For each entry:\nentry.ray = {**session.ray, **entry.ray}"]
    P --> Q["Run entries"]

_{Reviews (1): Last reviewed commit: "[benchmarking] Adds support for fine-gra..." | Re-trigger Greptile}

greptile-apps · 2026-04-23T21:21:12Z

+            elif isinstance(config_dict[key], list) and isinstance(value, list):
+                for sub_val in value:
+                    # Handle dicts in the list by matching based on the first key
+                    if isinstance(sub_val, dict) and sub_val:
+                        first_key = next(iter(sub_val.keys()))
+                        for config_sub_val in config_dict[key]:
+                            if (
+                                isinstance(config_sub_val, dict)
+                                and config_sub_val
+                                and next(iter(config_sub_val.keys())) == first_key
+                                and config_sub_val[first_key] == sub_val[first_key]
+                            ):
+                                # If matching dict found (based on first key), recursively update it
+                                update_config(config_sub_val, sub_val)
+                                break
+                        else:
+                            # If no matching dict, append the new dict to the list
+                            config_dict[key].append(sub_val)


First-key matching is fragile and order-dependent

List items are matched by next(iter(sub_val.keys())), which depends on dict insertion order. If an override YAML has a requirement dict with keys in a different order (e.g., min_value before metric), the lookup key changes, no existing item matches, and the item is appended rather than merged — silently duplicating the requirement in the entry. The README documents the "first key" semantics, but this makes the merge logic sensitive to YAML key ordering in a non-obvious way.

A more robust approach would be to match entries by a known stable key (name for entries, metric for requirements) rather than relying on insertion order.

greptile-apps · 2026-04-23T21:21:14Z

+                    else:
+                        # If not a dict, append the new value to the list
+                        config_dict[key].append(sub_val)


Non-dict list items are always appended, never deduplicated

When a list contains non-dict values (e.g., Slack user IDs in ping_on_failure), the code unconditionally appends the override value even if it already exists in the base list. If an override file repeats a ping_on_failure ID that is already in the base config, it will be added twice, causing duplicate Slack notifications on failure. The README's merge-behavior table only describes the dict-matching case and doesn't cover this edge.

[benchmarking] Adds support for fine-grained config overrides and glo…

f905cc7

…bal ray config (#1840) Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

svcnvidia-nemo-ci requested a review from a team as a code owner April 23, 2026 21:18

svcnvidia-nemo-ci requested a review from rlratzel April 23, 2026 21:18

svcnvidia-nemo-ci requested review from ayushdg, praateekmahajan, rlratzel and sarahyurick as code owners April 23, 2026 21:18

svcnvidia-nemo-ci added cherry-pick Run CICD labels Apr 23, 2026

copy-pr-bot Bot temporarily deployed to test April 23, 2026 21:18 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 21:19 Inactive

greptile-apps Bot reviewed Apr 23, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 21:25 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 23, 2026 21:25 Failure

rlratzel approved these changes Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `[benchmarking] Adds support for fine-grained config overrides and global ray config (1840)` into `r1.2.0`#1867

cp: `[benchmarking] Adds support for fine-grained config overrides and global ray config (1840)` into `r1.2.0`#1867
svcnvidia-nemo-ci wants to merge 1 commit into
r1.2.0from
cherry-pick-1840-r1.2.0

svcnvidia-nemo-ci commented Apr 23, 2026

Uh oh!

svcnvidia-nemo-ci commented Apr 23, 2026

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

greptile-apps Bot commented Apr 23, 2026

Uh oh!

greptile-apps Bot Apr 23, 2026

Uh oh!

greptile-apps Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

svcnvidia-nemo-ci commented Apr 23, 2026

Uh oh!

svcnvidia-nemo-ci commented Apr 23, 2026

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

greptile-apps Bot commented Apr 23, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants