Skip to content

Add workflow rerun controls to RivetKit Inspector#4411

Open
NathanFlurry wants to merge 6 commits intomainfrom
workflow-step-resume
Open

Add workflow rerun controls to RivetKit Inspector#4411
NathanFlurry wants to merge 6 commits intomainfrom
workflow-step-resume

Conversation

@NathanFlurry
Copy link
Member

Description

Add workflow rerun controls to RivetKit workflows through the inspector by introducing a v4 workflow rerun message, HTTP endpoint, and workflow-engine reset helper. Update the standalone Inspector UI with a current-step rerun button, previous-step right-click rerun, and helper text, and make the HTTP inspector route usable with actor inspector tokens so the standalone Inspector can trigger reruns without engine credentials. Also preserve workflow metadata in storage and document the new inspector API.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • pnpm --dir rivetkit-typescript/packages/workflow-engine exec vitest run tests/rerun.test.ts
  • pnpm --dir rivetkit-typescript/packages/rivetkit test driver-memory -t "POST /inspector/workflow/rerun reruns a workflow from the beginning|inspector endpoints require auth in non-dev mode|failed workflow steps sleep instead of surfacing as run errors"
  • Verified in the standalone Inspector frontend against a local serve-test-suite server, including the current-step rerun button and right-click rerun from a previous step.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Mar 12, 2026

🚅 Deployed to the rivet-pr-4411 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud 😴 Sleeping (View Logs) Web Mar 13, 2026 at 5:45 am
frontend-inspector 😴 Sleeping (View Logs) Web Mar 13, 2026 at 5:45 am
website 😴 Sleeping (View Logs) Web Mar 13, 2026 at 5:43 am
mcp-hub ✅ Success (View Logs) Web Mar 13, 2026 at 5:32 am
ladle ❌ Build Failed (View Logs) Web Mar 13, 2026 at 4:38 am

@NathanFlurry NathanFlurry requested review from jog1t and removed request for jog1t March 12, 2026 20:01
@NathanFlurry
Copy link
Member Author

Follow-up Inspector UI verification after the replay rename:

  • Hidden running-step case now disables Replay from this step and shows the tooltip Step currently in progress.
  • Failed-step case still leaves Replay from this step enabled, so operators can bypass the pending retry immediately.
  • Verified in the standalone Inspector frontend against the local serve-test-suite, not the engine UI.

Screenshots captured in the workspace:

  • /Users/nathan/conductor/workspaces/rivet/kathmandu/.context/disabled-replay-tooltip.png
  • /Users/nathan/conductor/workspaces/rivet/kathmandu/.context/enabled-replay-button.png

@NathanFlurry
Copy link
Member Author

NathanFlurry commented Mar 13, 2026

Standalone Inspector screenshots for the final replay flows:

  • Failed-step case still leaves Replay from this step enabled, so operators can bypass the pending retry immediately.
  • Replay is blocked when any workflow step is currently running, with the tooltip Step currently in progress.
enabled-replay-button disabled-replay-tooltip

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 13, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4411

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4411

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4411

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4411

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4411

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4411

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4411

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4411

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4411

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4411

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4411

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4411

commit: 5239dd8

@claude
Copy link

claude bot commented Mar 13, 2026

PR Review: Add workflow rerun controls to RivetKit Inspector

Overall this is a well-structured feature addition. The protocol versioning, backward-compatibility converters, and test coverage are solid. A few issues are worth addressing before merge.


Bugs / Correctness

1. getInspectorProtocolVersion logic is misleading

frontend/src/components/actors/actor-inspector-context.tsx nests the v4 check inside the v3 block. Semantically correct since 2.1.6 > 2.0.42, but the nesting is confusing. Hoisting it as a sibling check makes the intent clearer: check MIN_RIVETKIT_VERSION_WORKFLOW_REPLAY first (return 4), then MIN_RIVETKIT_VERSION_DATABASE (return 3).

2. replayingEntryId is undefined for full-workflow replays

In actor-workflow-tab.tsx, replayMutation.variables is undefined when triggering a full replay (no entryId) while isPending is true. Nothing is shown as in-progress in the UI during a full replay. Consider a separate isFullReplayPending flag or a sentinel value.

3. v4ToServerToV3 throws instead of returning a graceful error

In versioned.ts, the serializer path throws when a v4 client sends WorkflowReplayRequest to be downgraded for a v3 server. An uncaught throw here could crash the WebSocket connection. It should return an Error message frame, similar to how v4ToClientToV3 handles WorkflowReplayResponse.

4. HTTP endpoint returns 500 for in-flight workflow rejections

The test expects status 500 and code: internal_error when replaying an in-flight workflow. The thrown error is a plain Error, not a structured ActorError. A 409 Conflict with a descriptive code would be more appropriate for callers to handle programmatically.


Design Concerns

5. syncWorkflowHistoryAfterReplay polling with fixed timeouts and no cleanup

The fire-and-forget polling loop in actor-workflow-tab.tsx has no cleanup path if the component unmounts mid-poll, and runs all four iterations even if the workflow completes early. Consider aborting early once history shows post-replay entries, and using an AbortController or mounted ref to skip stale updates.

6. replayWorkflowFromStep may double-load metadata

replayWorkflowFromStep in index.ts calls loadMetadata per entry after loadStorage. Since this PR adds metadata loading to loadStorage, it is worth confirming whether loadMetadata skips the driver read when the entry is already in storage.entryMetadata, or unconditionally causes duplicate I/O.

7. ActorWorkflowControlDriver duplicates ActorWorkflowDriver KV logic

The control driver in driver.ts reimplements get, set, delete, deletePrefix, deleteRange, list, and batch with near-identical code to ActorWorkflowDriver. Extracting shared KV plumbing into a base class or helper would reduce future drift.


Minor Issues

8. hasRunningStep indentation off in workflow-visualizer.tsx — The callback arg to .some() is aligned with the opening paren rather than indented inside it.

9. WorkflowNotEnabled error message references the wrong scenario — The error fires when replayFromStep is not set (run handler is not workflow()), but the message says "Workflow not enabled." Something like "Workflow replay is not supported by this actor’s run handler." is more accurate.

10. Terminology inconsistency in test title — One test says "reruns" while the endpoint path and all other references use "replay".


Positives

  • Protocol versioning and downgrade converters are well-handled. The v4ToClientToV3 fallback to WORKFLOW_HISTORY_DROPPED_ERROR is the right pattern.
  • Moving from a single shared workflowInspector to WeakMap<AnyActorInstance, ...> properly handles multi-actor isolation.
  • The storage.ts fix to load entryMetadata on startup is a good bug fix that enables correct replay boundary detection.
  • findReplayBoundaryEntry correctly rewinds the enclosing loop when replaying a nested step, and the test suite covers this case thoroughly.
  • The dual-token auth improvement (accepting actor-specific inspector tokens alongside RIVET_INSPECTOR_TOKEN) is the right call for the standalone Inspector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant