feat: add concurrency_modifier support for QB endpoints#301
feat: add concurrency_modifier support for QB endpoints#301
Conversation
|
Promptless prepared a documentation update related to this change. Triggered by runpod/flash#301 Added documentation for the new Review at https://app.gopromptless.ai/suggestions/640ef899-880f-42f8-a055-db3680e8a632 |
There was a problem hiding this comment.
Pull request overview
Adds max_concurrency support intended to control concurrent job execution for queue-based (QB) endpoints, and propagates it through manifest generation and handler code generation.
Changes:
- Add
max_concurrencytoEndpoint(with validation) and to runtimeResourceConfig. - Propagate
max_concurrencyinto the build manifest (and warn/strip for load-balanced endpoints). - Generate handlers with
concurrency_modifierwhenmax_concurrency > 1(new async templates + sync injection path), plus tests.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps runpod-flash version metadata. |
| tests/unit/test_endpoint.py | Adds unit tests for Endpoint.max_concurrency validation/defaults. |
| tests/unit/test_concurrency_manifest.py | Adds tests for ResourceConfig.max_concurrency and manifest behavior for QB vs LB. |
| tests/unit/cli/commands/build_utils/test_handler_generator.py | Adds coverage for handler codegen with/without concurrency_modifier and warning behavior. |
| src/runpod_flash/runtime/models.py | Adds max_concurrency field to ResourceConfig and from_dict handling. |
| src/runpod_flash/endpoint.py | Adds max_concurrency parameter + validation and stores it on the endpoint instance. |
| src/runpod_flash/cli/commands/build_utils/manifest.py | Extracts max_concurrency during build; warns/strips for LB endpoints. |
| src/runpod_flash/cli/commands/build_utils/handler_generator.py | Adds async handler templates and injects concurrency_modifier for max_concurrency > 1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
runpod-Henrik
left a comment
There was a problem hiding this comment.
PR #301 — feat: add concurrency_modifier support for QB endpoints
Henrik's AI-Powered Bug Finder
1. Question: What happens if is_async is incorrectly detected?
If the scanner marks a sync function as is_async=True (or vice versa), the wrong template is selected:
is_async=Trueon a sync function → generated handler doesawait func(**job_input)on a non-coroutine → runtimeTypeErroron every jobis_async=Falseon an async function → sync template used, handler callsfunc(**job_input)withoutawait→ function returns a coroutine object, not a result
Is is_async detection in the scanner reliable for all QB worker patterns? If the scanner can be fooled (e.g., a function decorated with @functools.wraps wrapping an async function), users get a silently broken deployment.
2. Test coverage gap: no live concurrency test
All tests verify that the string concurrency_modifier appears in generated code. None verify that the RunPod platform actually honours it — i.e., that a deployed worker with max_concurrency=5 accepts 5 simultaneous jobs without queuing them. Noted as a gap, not a blocker.
Nits
_METHODS = {methods_dict}maps{m: m}— values are never read, only keys. Same pattern as the existing sync class template, so not new, but worth cleaning up.- The high-concurrency warning threshold of
> 100is arbitrary and undocumented. Worth a comment explaining the rationale.
Verdict: PASS WITH NITS
Core implementation is correct — validation, manifest propagation, template selection, and LB stripping all work as described. The is_async detection question is the main open item but depends on the scanner which is outside this PR's scope.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
62f3c86 to
9873dcb
Compare
Add max_concurrency int field (default 1) to ResourceConfig dataclass and its from_dict() parser. Enables manifest to carry per-resource concurrency config for QB workers.
Read _max_concurrency from Endpoint facade before unwrapping to internal resource config in _extract_deployment_config(). Only set in manifest when value > 1 (default 1 means no override needed). For LB endpoints, max_concurrency is meaningless (uvicorn handles concurrency), so warn and remove the field in build() to keep the manifest schema consistent and avoid silent misconfiguration.
Add DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE for max_concurrency > 1 with async handlers. Update _generate_handler to read max_concurrency from resource_data and select template accordingly. Sync handlers with max_concurrency > 1 get concurrency_modifier injected via string replacement. Includes 10 new tests covering all template selection paths and warning logs.
The concurrency feature accidentally tightened the name validation guard and removed auto-derive in __call__ and the _route name check, breaking nameless QB decorator mode and LB stub routing.
- Fix inline @endpoint() max_concurrency loss: persist into __remote_config__ so manifest builder can read it when the Endpoint facade is not reachable as a module-level variable. - Fix scanner is_async for class-based workers: detect async public methods so the correct handler template is selected. - Harden _inject_concurrency_modifier: raise ValueError if the expected start call string is absent instead of silently producing a handler without concurrency_modifier. - Add rationale comment for the >100 concurrency warning threshold. - Add tests for all new behaviors.
9873dcb to
1f1e91d
Compare
Summary
max_concurrencyparameter toEndpointfor controlling concurrent job execution on QB endpointsmax_concurrencythroughResourceConfigand manifest, with LB endpoint warning/strippingconcurrency_modifierformax_concurrency > 1+ async handlersconcurrency_modifierinto sync handler templates whenmax_concurrency > 1(with warning)Linear
AE-2415
Changes
endpoint.py: newmax_concurrencyparam with validation (>= 1)runtime/models.py:max_concurrencyfield onResourceConfigmanifest.py: propagate fromEndpointfacade, warn and strip for LB endpointshandler_generator.py: newDEPLOYED_ASYNC_HANDLER_TEMPLATEandDEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATEwithconcurrency_modifier, plus_inject_concurrency_modifierfor sync fallbackTest plan
Endpoint.max_concurrencyvalidation (default, explicit, zero, negative)ResourceConfig.max_concurrencyround-trip through dict