Fix argument list too long error by using Porter parameter set file#4904
Fix argument list too long error by using Porter parameter set file#4904Copilot wants to merge 9 commits into
Conversation
… ARG_MAX limits
When many workspaces (~25+) are deployed, the aggregated base64-encoded
parameter values passed as --param CLI arguments to Porter exceed the
Linux execve limits (ARG_MAX ~2MiB, MAX_ARG_STRLEN 128KiB per arg),
causing OSError: [Errno 7] Argument list too long.
Fix by writing all parameters to a temporary JSON parameter set file
(/tmp/tre-params-{installation_id}.json) and passing it via a single
--parameter-set argument instead. The temp file is cleaned up after
the Porter command completes.
Fixes #4903
Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/12b6c77f-6b2b-4b0a-bc96-76ba95ef0ade
Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent Porter executions in the Resource Processor from failing with OSError: [Errno 7] Argument list too long when many workspaces inflate large parameters (e.g., base64-encoded firewall rule collections). It does so by moving parameter payloads out of the CLI argument list and into a generated parameter set JSON file.
Changes:
- Generate a temporary Porter ParameterSet JSON file instead of emitting many
--param key=valueCLI flags. - Update the VMSS Porter runner to handle the new return shape from
build_porter_commandand to delete the temp file after execution. - Update RP unit tests and add a no-parameters test case; add a changelog entry for #4903.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| resource_processor/helpers/commands.py | Builds a ParameterSet JSON file and changes Porter invocation to use --parameter-set. |
| resource_processor/vmss_porter/runner.py | Unpacks the new (commands, param_set_file) return and deletes the temp file after running Porter. |
| resource_processor/tests_rp/test_commands.py | Updates tests for the new return type and validates generated ParameterSet content. |
| CHANGELOG.md | Adds an unreleased bug fix entry referencing #4903. |
Comments suppressed due to low confidence (1)
resource_processor/helpers/commands.py:156
--parameter-setexpects a parameter set name that Porter can resolve (typically one that already exists/applied), but the code is passing the JSON file path. As written, this is likely to fail at runtime or silently ignore the file. If the intent is to use the generated ParameterSet JSON, add aporter parameters apply <file>command and then pass--parameter-set tre-params-<installation_id>(and ensure cleanup deletes the applied set).
command = ["porter"]
if custom_action:
command.extend(["invoke", "--action"])
command.append(msg_body['action'])
command.append(installation_id)
command.extend([
"--reference",
f"{config['registry_server']}/{msg_body['name']}:v{msg_body['version']}"
])
if param_set_file:
command.extend(["--parameter-set", param_set_file])
command.append("--force")
| # Build and run porter command (flagging if its a built-in action or custom so we can adapt porter command appropriately) | ||
| is_custom_action = action not in ["install", "upgrade", "uninstall"] | ||
| porter_command = await build_porter_command(config, msg_body, is_custom_action) | ||
| porter_command, param_set_file = await build_porter_command(config, msg_body, is_custom_action) | ||
|
|
||
| logger.debug("Starting to run porter execution command...") | ||
| returncode, _, err = await run_porter(porter_command, config) | ||
| logger.debug("Finished running porter execution command.") |
|
@copilot address the comments and consider the need to use porter apply command with the parameter set. |
…delete, try/finally cleanup - Replace import os with import tempfile in commands.py (os was unused there) - Use tempfile.NamedTemporaryFile for secure temp file creation (0o600 permissions, non-predictable path) instead of open() to /tmp/<predictable-name> - Switch to porter parameters apply workflow: include `porter parameters apply <file>` as the first command in the returned list, then pass `--parameter-set <name>` (set name, not file path) to the main porter command - Wrap porter execution in try/finally in runner.py to guarantee cleanup even on exceptions or cancellation; cleanup now also runs `porter parameters delete <name>` to remove the registered parameter set from Porter's store - Update all tests to expect the new [apply_cmd, main_cmd] structure and --parameter-set with the set name Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/69f73131-900a-4ce5-9632-109910a3401e Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Addressed all review comments in commit
|
|
@copilot address all PR comments. |
… cleanup helper - build_porter_command now returns a 3-tuple (commands, param_set_file, param_set_name) so callers get the exact set name without re-deriving it - Extract _cleanup_param_set() async helper in runner.py that runs `porter parameters delete` + os.unlink, eliminating the duplicated cleanup blocks in invoke_porter_action - Update both call sites in invoke_porter_action to unpack the 3-tuple and call the shared helper - Update all tests to unpack the 3-tuple and assert param_set_name Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/6280ded8-8f2a-40f6-8c42-2199c1374dbc Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Addressed in commit
|
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
|
Approach (not code) confirmed as working. |
| installation_id = msg_body['id'] | ||
| param_set_name = f"tre-params-{installation_id}" | ||
|
|
||
| # Write parameters to a temporary parameter set file to avoid ARG_MAX / MAX_ARG_STRLEN limits |
There was a problem hiding this comment.
Removed in commit e7c1eed.
| try: | ||
| returncode, _, err = await run_porter(porter_command, config) | ||
| finally: | ||
| # Clean up the temporary parameter set file for the fallback install command |
There was a problem hiding this comment.
Removed in commit e7c1eed.
|
@copilot fix: Run echo "::error::Code update without version change" && exit 1 |
… 0.13.4 Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/e5ce1228-b98a-4813-8d53-cef74ea8ab5b Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
Bumped |
Unit Test Results0 tests 0 ✅ 0s ⏱️ Results for commit f6aeecf. |
|
/test |
|
🤖 pr-bot 🤖
(in response to this comment from @rudolphjacksonm) |
|
/test f6aeecf |
|
🤖 pr-bot 🤖 🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/26215832842 (with refid (in response to this comment from @rudolphjacksonm) |
Agent-Logs-Url: https://github.com/microsoft/AzureTRE/sessions/c0be6c59-94cb-4a09-8225-b1726b0f6c58 Co-authored-by: marrobi <17089773+marrobi@users.noreply.github.com>
|
/test-extended 62afd6b |
|
🤖 pr-bot 🤖 🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/26226919750 (with refid (in response to this comment from @rudolphjacksonm) |
|
Tests are failing due to a change in the Azure Recovery Services Vault implementation. Soft Delete is now set to |
|
Temporarily blocked by another issue which will be resolved by #4907 |
What is being addressed
When ~25+ workspaces are deployed, the aggregated base64-encoded
rule_collectionsandnetwork_rule_collectionsvalues passed as--param key=valueCLI arguments to Porter exceed the Linuxexecvelimits (ARG_MAX≈ 2 MiB,MAX_ARG_STRLEN= 128 KiB per argument), causingOSError: [Errno 7] Argument list too long.How is this addressed
resource_processor/helpers/commands.py: Replaced multiple--param key=valueCLI arguments with theporter parameters applyworkflow:tempfile.NamedTemporaryFile(0o600 permissions, non-predictable path — eliminates TOCTOU/symlink risk)build_porter_commandreturns(commands, param_set_file, param_set_name)wherecommandsis[apply_cmd, main_cmd],apply_cmdisporter parameters apply <tmpfile>, andmain_cmduses--parameter-set <name>(the set name, not the file path)tre-params-{installation_id}-{uuid}— preventing concurrent or retried actions on the same installation from interfering with each other's cleanupimport oswithimport tempfileand addedimport uuidresource_processor/vmss_porter/runner.py: Updatedinvoke_porter_actionto handle the new return shape and ensure robust cleanup:_cleanup_param_set(param_set_name, param_set_file, config)async helper that runsporter parameters delete <name>andos.unlink(), eliminating duplicated cleanup logic across the initial run and upgrade→install fallback pathstry/finallyto guarantee cleanup even on exceptions or coroutine cancellationbuild_porter_commandand use the shared cleanup helperresource_processor/tests_rp/test_commands.py: Updated allbuild_porter_commandtests for the new(commands, param_set_file, param_set_name)3-tuple return; assertions verifyparam_set_namestarts with the expected prefix (to accommodate the unique UUID suffix),--parameter-setreceives the set name (not the file path), andporter parameters applyis the first command; added a test for the no-parameters case where no apply command or--parameter-setflag is emittedUpdated
CHANGELOG.mdwith a bug fix entry referencing OSError: [Errno 7] Argumento list too long: '/bin/sh' during Firewall rules upgrades #4903Bumped
resource_processorversion from0.13.3to0.13.4Original prompt
Problem
Issue: #4903
When many workspaces are deployed (~25+), the aggregated base64-encoded
rule_collectionsandnetwork_rule_collectionsparameters passed as--paramCLI arguments to Porter exceed the Linuxexecvelimit (ARG_MAX≈ 2 MiB,MAX_ARG_STRLEN= 128 KiB per argument), causingOSError: [Errno 7] Argument list too long.Solution
Replace
--param key=valueCLI arguments with a single Porter parameter set file:In
resource_processor/helpers/commands.py, changebuild_porter_commandto:["--param", f"{parameter_name}={parameter_value}"]to the command, collect all parameters into a list of dicts.{ "name": "tre-params-<installation_id>", "parameters": [ { "name": "param_name", "source": { "value": "param_value" } } ] }porter parameters apply <temp_file_path>--parameter-set tre-params-<installation_id>instead of--paramflagsporter parameters delete tre-params-<installation_id>.Use a unique name per run based on
installation_idto avoid concurrency collisions between parallel RP workers.Use Python's
tempfilemodule to create the temp file in a secure location.Testing approach — IMPORTANT
Write a failing unit test FIRST, then implement the fix:
In
resource_processor/tests_rp/test_commands.py, add a test that:msg_bodywith very largerule_collectionsandnetwork_rule_collectionsparameters (loop to generate ~200 rule collections, simulating 100+ workspaces)build_porter_command--paramflags appear in any returned command--parameter-setis used insteadThen implement the fix so the test passes.
Ensure ALL existing tests in
resource_processor/tests_rp/test_commands.pystill pass — update them as needed since the command structure will change (commands will no longer have--paramargs, they'll have--parameter-setargs instead).Key constraints
installation_id(which ismsg_body['id']) for uniquenessThe following is the prior conversation context from the user's chat exploration (may be truncated):
User: Looking at the proposed solution, whats the maximum env var size, is it sufficient?
Assistant: Looking at your proposed Option B (passing parameters via environment variables to the Porter subprocess), the relevant Linux limits are:
MAX_ARG_STRLENARG_MAXIs it sufficient?
It helps significantly but doesn't completely eliminate the constraint. Here's why:
CLI args → env vars frees up headroom. The porter command line itself becomes tiny (
porter upgrade --parameter-set <name>), so nearly the full ~2 MiB budget is available for the env vars carrying the actual parameter data.Per-variable limit still applies. Each
PORTER_PARAM_<NAME>env var is capped at 128 KiB. If a single base64-encodedrule_collectionsvalue for a large deployment exceeds 128 KiB, you'd still hitE2BIG.For ~300 rules (the reporter's scenario), 128 KiB per param and ~2 MiB total is almost certainly sufficient — the original failure was because both large params were on the command line alongside the porter binary path, subcommands, and other flags, all competing for the same pool.
Recommendation
If you want to be fully future-proof, consider a hybrid: write the large parameter values to temporary files and have the parameter set reference
source: { path: /tmp/param_<name>.json }instead ofsource: { env: ... }. This bypassesARG_MAX/MAX_ARG_STRLENentirely since file contents aren't subject toexecvelimits. The cleanup logic you already described (delete after run) would still apply.Otherwise, for the reported scale (~25 workspaces, ~300 rules), env vars should be well within limits.
User: For how many workspaxes would it fail?
Assistant: Based on...
This pull request was created from Copilot chat.