Skip to content

feat: add concurrency_modifier support for QB endpoints#301

Open
deanq wants to merge 6 commits intomainfrom
deanq/ae-2415-support-for-concurrency_modifier
Open

feat: add concurrency_modifier support for QB endpoints#301
deanq wants to merge 6 commits intomainfrom
deanq/ae-2415-support-for-concurrency_modifier

Conversation

@deanq
Copy link
Copy Markdown
Member

@deanq deanq commented Apr 8, 2026

Summary

  • Add max_concurrency parameter to Endpoint for controlling concurrent job execution on QB endpoints
  • Propagate max_concurrency through ResourceConfig and manifest, with LB endpoint warning/stripping
  • Generate async handler templates with concurrency_modifier for max_concurrency > 1 + async handlers
  • Inject concurrency_modifier into sync handler templates when max_concurrency > 1 (with warning)

Linear

AE-2415

Changes

  • endpoint.py: new max_concurrency param with validation (>= 1)
  • runtime/models.py: max_concurrency field on ResourceConfig
  • manifest.py: propagate from Endpoint facade, warn and strip for LB endpoints
  • handler_generator.py: new DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE with concurrency_modifier, plus _inject_concurrency_modifier for sync fallback

Test plan

  • Unit tests for Endpoint.max_concurrency validation (default, explicit, zero, negative)
  • Unit tests for ResourceConfig.max_concurrency round-trip through dict
  • Unit tests for manifest builder (QB includes, LB warns and strips)
  • Unit tests for handler codegen: async function, async class, sync function, sync class, default (no modifier)
  • Unit tests for warning logs on sync + concurrency and high concurrency values
  • All 2452 tests pass, 86% coverage, quality-check green

@promptless
Copy link
Copy Markdown

promptless bot commented Apr 8, 2026

Promptless prepared a documentation update related to this change.

Triggered by runpod/flash#301

Added documentation for the new max_concurrency parameter including parameter overview, usage examples with async handlers, guidance for different workload types, and clarification that it only applies to queue-based endpoints.

Review at https://app.gopromptless.ai/suggestions/640ef899-880f-42f8-a055-db3680e8a632

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds max_concurrency support intended to control concurrent job execution for queue-based (QB) endpoints, and propagates it through manifest generation and handler code generation.

Changes:

  • Add max_concurrency to Endpoint (with validation) and to runtime ResourceConfig.
  • Propagate max_concurrency into the build manifest (and warn/strip for load-balanced endpoints).
  • Generate handlers with concurrency_modifier when max_concurrency > 1 (new async templates + sync injection path), plus tests.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Bumps runpod-flash version metadata.
tests/unit/test_endpoint.py Adds unit tests for Endpoint.max_concurrency validation/defaults.
tests/unit/test_concurrency_manifest.py Adds tests for ResourceConfig.max_concurrency and manifest behavior for QB vs LB.
tests/unit/cli/commands/build_utils/test_handler_generator.py Adds coverage for handler codegen with/without concurrency_modifier and warning behavior.
src/runpod_flash/runtime/models.py Adds max_concurrency field to ResourceConfig and from_dict handling.
src/runpod_flash/endpoint.py Adds max_concurrency parameter + validation and stores it on the endpoint instance.
src/runpod_flash/cli/commands/build_utils/manifest.py Extracts max_concurrency during build; warns/strips for LB endpoints.
src/runpod_flash/cli/commands/build_utils/handler_generator.py Adds async handler templates and injects concurrency_modifier for max_concurrency > 1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #301feat: add concurrency_modifier support for QB endpoints

Henrik's AI-Powered Bug Finder


1. Question: What happens if is_async is incorrectly detected?

If the scanner marks a sync function as is_async=True (or vice versa), the wrong template is selected:

  • is_async=True on a sync function → generated handler does await func(**job_input) on a non-coroutine → runtime TypeError on every job
  • is_async=False on an async function → sync template used, handler calls func(**job_input) without await → function returns a coroutine object, not a result

Is is_async detection in the scanner reliable for all QB worker patterns? If the scanner can be fooled (e.g., a function decorated with @functools.wraps wrapping an async function), users get a silently broken deployment.


2. Test coverage gap: no live concurrency test

All tests verify that the string concurrency_modifier appears in generated code. None verify that the RunPod platform actually honours it — i.e., that a deployed worker with max_concurrency=5 accepts 5 simultaneous jobs without queuing them. Noted as a gap, not a blocker.


Nits

  • _METHODS = {methods_dict} maps {m: m} — values are never read, only keys. Same pattern as the existing sync class template, so not new, but worth cleaning up.
  • The high-concurrency warning threshold of > 100 is arbitrary and undocumented. Worth a comment explaining the rationale.

Verdict: PASS WITH NITS

Core implementation is correct — validation, manifest propagation, template selection, and LB stripping all work as described. The is_async detection question is the main open item but depends on the scanner which is outside this PR's scope.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@deanq deanq force-pushed the deanq/ae-2415-support-for-concurrency_modifier branch from 62f3c86 to 9873dcb Compare April 9, 2026 21:37
deanq added 6 commits April 9, 2026 15:30
Add max_concurrency int field (default 1) to ResourceConfig dataclass
and its from_dict() parser. Enables manifest to carry per-resource
concurrency config for QB workers.
Read _max_concurrency from Endpoint facade before unwrapping to internal
resource config in _extract_deployment_config(). Only set in manifest
when value > 1 (default 1 means no override needed).

For LB endpoints, max_concurrency is meaningless (uvicorn handles
concurrency), so warn and remove the field in build() to keep the
manifest schema consistent and avoid silent misconfiguration.
Add DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE
for max_concurrency > 1 with async handlers. Update _generate_handler to read
max_concurrency from resource_data and select template accordingly. Sync handlers
with max_concurrency > 1 get concurrency_modifier injected via string replacement.
Includes 10 new tests covering all template selection paths and warning logs.
The concurrency feature accidentally tightened the name validation
guard and removed auto-derive in __call__ and the _route name check,
breaking nameless QB decorator mode and LB stub routing.
- Fix inline @endpoint() max_concurrency loss: persist into
  __remote_config__ so manifest builder can read it when the
  Endpoint facade is not reachable as a module-level variable.
- Fix scanner is_async for class-based workers: detect async
  public methods so the correct handler template is selected.
- Harden _inject_concurrency_modifier: raise ValueError if the
  expected start call string is absent instead of silently
  producing a handler without concurrency_modifier.
- Add rationale comment for the >100 concurrency warning threshold.
- Add tests for all new behaviors.
@deanq deanq force-pushed the deanq/ae-2415-support-for-concurrency_modifier branch from 9873dcb to 1f1e91d Compare April 9, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants