Skip to content

Fix argument list too long error by using Porter parameter set file#4904

Open
Copilot wants to merge 9 commits into
mainfrom
copilot/fix-argument-list-too-long-error
Open

Fix argument list too long error by using Porter parameter set file#4904
Copilot wants to merge 9 commits into
mainfrom
copilot/fix-argument-list-too-long-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

What is being addressed

When ~25+ workspaces are deployed, the aggregated base64-encoded rule_collections and network_rule_collections values passed as --param key=value CLI arguments to Porter exceed the Linux execve limits (ARG_MAX ≈ 2 MiB, MAX_ARG_STRLEN = 128 KiB per argument), causing OSError: [Errno 7] Argument list too long.

How is this addressed

  • resource_processor/helpers/commands.py: Replaced multiple --param key=value CLI arguments with the porter parameters apply workflow:

    • All resolved parameters are collected into a Porter ParameterSet JSON file created securely via tempfile.NamedTemporaryFile (0o600 permissions, non-predictable path — eliminates TOCTOU/symlink risk)
    • build_porter_command returns (commands, param_set_file, param_set_name) where commands is [apply_cmd, main_cmd], apply_cmd is porter parameters apply <tmpfile>, and main_cmd uses --parameter-set <name> (the set name, not the file path)
    • The parameter set name is unique per run using a UUID hex suffix — tre-params-{installation_id}-{uuid} — preventing concurrent or retried actions on the same installation from interfering with each other's cleanup
    • Replaced unused import os with import tempfile and added import uuid
  • resource_processor/vmss_porter/runner.py: Updated invoke_porter_action to handle the new return shape and ensure robust cleanup:

    • Extracted _cleanup_param_set(param_set_name, param_set_file, config) async helper that runs porter parameters delete <name> and os.unlink(), eliminating duplicated cleanup logic across the initial run and upgrade→install fallback paths
    • Porter execution is wrapped in try/finally to guarantee cleanup even on exceptions or coroutine cancellation
    • Both call sites unpack the 3-tuple from build_porter_command and use the shared cleanup helper
  • resource_processor/tests_rp/test_commands.py: Updated all build_porter_command tests for the new (commands, param_set_file, param_set_name) 3-tuple return; assertions verify param_set_name starts with the expected prefix (to accommodate the unique UUID suffix), --parameter-set receives the set name (not the file path), and porter parameters apply is the first command; added a test for the no-parameters case where no apply command or --parameter-set flag is emitted

  • Updated CHANGELOG.md with a bug fix entry referencing OSError: [Errno 7] Argumento list too long: '/bin/sh' during Firewall rules upgrades #4903

  • Bumped resource_processor version from 0.13.3 to 0.13.4

Original prompt

Problem

Issue: #4903

When many workspaces are deployed (~25+), the aggregated base64-encoded rule_collections and network_rule_collections parameters passed as --param CLI arguments to Porter exceed the Linux execve limit (ARG_MAX ≈ 2 MiB, MAX_ARG_STRLEN = 128 KiB per argument), causing OSError: [Errno 7] Argument list too long.

Solution

Replace --param key=value CLI arguments with a single Porter parameter set file:

  1. In resource_processor/helpers/commands.py, change build_porter_command to:

    • Instead of appending ["--param", f"{parameter_name}={parameter_value}"] to the command, collect all parameters into a list of dicts.
    • Write a single temporary parameter set JSON file with all parameters using inline values:
      {
        "name": "tre-params-<installation_id>",
        "parameters": [
          { "name": "param_name", "source": { "value": "param_value" } }
        ]
      }
    • Return commands that:
      1. Run porter parameters apply <temp_file_path>
      2. Run the actual porter command with --parameter-set tre-params-<installation_id> instead of --param flags
    • After the porter commands complete (in a finally block or cleanup mechanism), delete the temp file and run porter parameters delete tre-params-<installation_id>.
  2. Use a unique name per run based on installation_id to avoid concurrency collisions between parallel RP workers.

  3. Use Python's tempfile module to create the temp file in a secure location.

Testing approach — IMPORTANT

Write a failing unit test FIRST, then implement the fix:

  1. In resource_processor/tests_rp/test_commands.py, add a test that:

    • Creates a mock msg_body with very large rule_collections and network_rule_collections parameters (loop to generate ~200 rule collections, simulating 100+ workspaces)
    • Calls build_porter_command
    • Asserts that NO --param flags appear in any returned command
    • Asserts that --parameter-set is used instead
    • Asserts that a parameter set file is written with the correct structure
    • This test should FAIL against the current code (which still uses --param)
  2. Then implement the fix so the test passes.

  3. Ensure ALL existing tests in resource_processor/tests_rp/test_commands.py still pass — update them as needed since the command structure will change (commands will no longer have --param args, they'll have --parameter-set args instead).

Key constraints

  • Do NOT add new scripts to bundle templates
  • Do NOT change workspace templates or the firewall Terraform
  • Keep existing base64 encoding of dict/list values
  • The temp file and porter parameter set must be cleaned up after use
  • Use installation_id (which is msg_body['id']) for uniqueness

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Looking at the proposed solution, whats the maximum env var size, is it sufficient?

Assistant: Looking at your proposed Option B (passing parameters via environment variables to the Porter subprocess), the relevant Linux limits are:

Limit Value What it constrains
MAX_ARG_STRLEN 128 KiB (131,072 bytes) Maximum size of a single string (applies to each env var value individually)
ARG_MAX ~2 MiB (varies; typically ¼ of stack limit) Total combined size of all command-line arguments + all environment variables

Is it sufficient?

It helps significantly but doesn't completely eliminate the constraint. Here's why:

  1. CLI args → env vars frees up headroom. The porter command line itself becomes tiny (porter upgrade --parameter-set <name>), so nearly the full ~2 MiB budget is available for the env vars carrying the actual parameter data.

  2. Per-variable limit still applies. Each PORTER_PARAM_<NAME> env var is capped at 128 KiB. If a single base64-encoded rule_collections value for a large deployment exceeds 128 KiB, you'd still hit E2BIG.

  3. For ~300 rules (the reporter's scenario), 128 KiB per param and ~2 MiB total is almost certainly sufficient — the original failure was because both large params were on the command line alongside the porter binary path, subcommands, and other flags, all competing for the same pool.

Recommendation

If you want to be fully future-proof, consider a hybrid: write the large parameter values to temporary files and have the parameter set reference source: { path: /tmp/param_<name>.json } instead of source: { env: ... }. This bypasses ARG_MAX/MAX_ARG_STRLEN entirely since file contents aren't subject to execve limits. The cleanup logic you already described (delete after run) would still apply.

Otherwise, for the reported scale (~25 workspaces, ~300 rules), env vars should be well within limits.

User: For how many workspaxes would it fail?

Assistant: Based on...

This pull request was created from Copilot chat.

… ARG_MAX limits

When many workspaces (~25+) are deployed, the aggregated base64-encoded
parameter values passed as --param CLI arguments to Porter exceed the
Linux execve limits (ARG_MAX ~2MiB, MAX_ARG_STRLEN 128KiB per arg),
causing OSError: [Errno 7] Argument list too long.

Fix by writing all parameters to a temporary JSON parameter set file
(/tmp/tre-params-{installation_id}.json) and passing it via a single
--parameter-set argument instead. The temp file is cleaned up after
the Porter command completes.

Fixes #4903

Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/12b6c77f-6b2b-4b0a-bc96-76ba95ef0ade

Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Comment thread resource_processor/helpers/commands.py Fixed
Comment thread resource_processor/vmss_porter/runner.py Fixed
Comment thread resource_processor/vmss_porter/runner.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent Porter executions in the Resource Processor from failing with OSError: [Errno 7] Argument list too long when many workspaces inflate large parameters (e.g., base64-encoded firewall rule collections). It does so by moving parameter payloads out of the CLI argument list and into a generated parameter set JSON file.

Changes:

  • Generate a temporary Porter ParameterSet JSON file instead of emitting many --param key=value CLI flags.
  • Update the VMSS Porter runner to handle the new return shape from build_porter_command and to delete the temp file after execution.
  • Update RP unit tests and add a no-parameters test case; add a changelog entry for #4903.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
resource_processor/helpers/commands.py Builds a ParameterSet JSON file and changes Porter invocation to use --parameter-set.
resource_processor/vmss_porter/runner.py Unpacks the new (commands, param_set_file) return and deletes the temp file after running Porter.
resource_processor/tests_rp/test_commands.py Updates tests for the new return type and validates generated ParameterSet content.
CHANGELOG.md Adds an unreleased bug fix entry referencing #4903.
Comments suppressed due to low confidence (1)

resource_processor/helpers/commands.py:156

  • --parameter-set expects a parameter set name that Porter can resolve (typically one that already exists/applied), but the code is passing the JSON file path. As written, this is likely to fail at runtime or silently ignore the file. If the intent is to use the generated ParameterSet JSON, add a porter parameters apply <file> command and then pass --parameter-set tre-params-<installation_id> (and ensure cleanup deletes the applied set).
    command = ["porter"]
    if custom_action:
        command.extend(["invoke", "--action"])

    command.append(msg_body['action'])
    command.append(installation_id)
    command.extend([
        "--reference",
        f"{config['registry_server']}/{msg_body['name']}:v{msg_body['version']}"
    ])
    if param_set_file:
        command.extend(["--parameter-set", param_set_file])
    command.append("--force")

Comment thread resource_processor/helpers/commands.py Outdated
Comment thread resource_processor/helpers/commands.py Outdated
Comment on lines 180 to 186
# Build and run porter command (flagging if its a built-in action or custom so we can adapt porter command appropriately)
is_custom_action = action not in ["install", "upgrade", "uninstall"]
porter_command = await build_porter_command(config, msg_body, is_custom_action)
porter_command, param_set_file = await build_porter_command(config, msg_body, is_custom_action)

logger.debug("Starting to run porter execution command...")
returncode, _, err = await run_porter(porter_command, config)
logger.debug("Finished running porter execution command.")
Comment thread resource_processor/vmss_porter/runner.py Outdated
Comment thread resource_processor/tests_rp/test_commands.py Outdated
@marrobi
Copy link
Copy Markdown
Member

marrobi commented May 20, 2026

@copilot address the comments and consider the need to use porter apply command with the parameter set.

…delete, try/finally cleanup

- Replace import os with import tempfile in commands.py (os was unused there)
- Use tempfile.NamedTemporaryFile for secure temp file creation (0o600 permissions,
  non-predictable path) instead of open() to /tmp/<predictable-name>
- Switch to porter parameters apply workflow: include `porter parameters apply <file>`
  as the first command in the returned list, then pass `--parameter-set <name>` (set
  name, not file path) to the main porter command
- Wrap porter execution in try/finally in runner.py to guarantee cleanup even on
  exceptions or cancellation; cleanup now also runs `porter parameters delete <name>`
  to remove the registered parameter set from Porter's store
- Update all tests to expect the new [apply_cmd, main_cmd] structure and
  --parameter-set with the set name

Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/69f73131-900a-4ce5-9632-109910a3401e

Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot address the comments and consider the need to use porter apply command with the parameter set.

Addressed all review comments in commit 029877a:

  • porter parameters apply workflow: build_porter_command now returns ([apply_cmd, main_cmd], param_set_file) — the first command runs porter parameters apply <tmpfile> to register the set with Porter's store, and the main porter command then uses --parameter-set tre-params-<id> (the set name, not a file path).
  • Secure temp file: Replaced predictable /tmp/<name>.json + open() with tempfile.NamedTemporaryFile (mode 0o600, non-predictable path).
  • try/finally cleanup: Porter execution is now wrapped in try/finally — cleanup runs porter parameters delete <name> (removes it from Porter's store) and os.unlink() the temp file, even if the command raises or is cancelled. Same pattern applied to the upgrade→install fallback path.
  • Removed unused import os from commands.py (it remains in runner.py where it's used for os.unlink).

Copilot AI changed the title [WIP] Fix argument list too long error by using param set file Fix argument list too long error by using Porter parameter set file May 20, 2026
Comment thread resource_processor/vmss_porter/runner.py Fixed
Comment thread resource_processor/vmss_porter/runner.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread resource_processor/vmss_porter/runner.py Outdated
@marrobi
Copy link
Copy Markdown
Member

marrobi commented May 20, 2026

@copilot address all PR comments.

… cleanup helper

- build_porter_command now returns a 3-tuple (commands, param_set_file, param_set_name)
  so callers get the exact set name without re-deriving it
- Extract _cleanup_param_set() async helper in runner.py that runs
  `porter parameters delete` + os.unlink, eliminating the duplicated
  cleanup blocks in invoke_porter_action
- Update both call sites in invoke_porter_action to unpack the 3-tuple
  and call the shared helper
- Update all tests to unpack the 3-tuple and assert param_set_name

Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/6280ded8-8f2a-40f6-8c42-2199c1374dbc

Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot address all PR comments.

Addressed in commit 2ee02db:

  • build_porter_command now returns a 3-tuple (commands, param_set_file, param_set_name) — callers get the exact name produced by the function rather than re-deriving it.
  • Extracted _cleanup_param_set(param_set_name, param_set_file, config) async helper in runner.py that runs porter parameters delete + os.unlink, used in both the initial run and upgrade→install fallback paths.

Comment thread resource_processor/vmss_porter/runner.py Fixed
@marrobi marrobi marked this pull request as ready for review May 20, 2026 08:57
@marrobi marrobi requested a review from a team as a code owner May 20, 2026 08:57
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
@marrobi
Copy link
Copy Markdown
Member

marrobi commented May 20, 2026

Approach (not code) confirmed as working.

#4903 (comment)

Comment thread resource_processor/helpers/commands.py Outdated
installation_id = msg_body['id']
param_set_name = f"tre-params-{installation_id}"

# Write parameters to a temporary parameter set file to avoid ARG_MAX / MAX_ARG_STRLEN limits
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need these comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in commit e7c1eed.

try:
returncode, _, err = await run_porter(porter_command, config)
finally:
# Clean up the temporary parameter set file for the fallback install command
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in commit e7c1eed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread resource_processor/helpers/commands.py Outdated
Comment thread resource_processor/vmss_porter/runner.py
@marrobi
Copy link
Copy Markdown
Member

marrobi commented May 20, 2026

@copilot fix:

Run echo "::error::Code update without version change" && exit 1
Error: Code update without version change
Error: Process completed with exit code 1.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot fix:

Run echo "::error::Code update without version change" && exit 1
...

Bumped resource_processor version from 0.13.30.13.4 in commit e7c1eed.

@github-actions
Copy link
Copy Markdown

Unit Test Results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit f6aeecf.

@rudolphjacksonm
Copy link
Copy Markdown
Collaborator

/test

@github-actions
Copy link
Copy Markdown

🤖 pr-bot 🤖

⚠️ When using /test on external PRs, the SHA of the checked commit must be specified

(in response to this comment from @rudolphjacksonm)

@rudolphjacksonm
Copy link
Copy Markdown
Collaborator

/test f6aeecf

@github-actions
Copy link
Copy Markdown

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/26215832842 (with refid 77179ad2)

(in response to this comment from @rudolphjacksonm)

Copy link
Copy Markdown
Collaborator

@rudolphjacksonm rudolphjacksonm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rudolphjacksonm
Copy link
Copy Markdown
Collaborator

/test-extended 62afd6b

@github-actions
Copy link
Copy Markdown

🤖 pr-bot 🤖

🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/26226919750 (with refid 77179ad2)

(in response to this comment from @rudolphjacksonm)

@rudolphjacksonm
Copy link
Copy Markdown
Collaborator

Tests are failing due to a change in the Azure Recovery Services Vault implementation. Soft Delete is now set to On for newly created vaults, and the soft_delete argument has been deprecated entirely. We will need to fix this in a separate PR before addressing this one if this issue is recurring.

@rudolphjacksonm
Copy link
Copy Markdown
Collaborator

Temporarily blocked by another issue which will be resolved by #4907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants