Joysafeter v2#141
Open
yuzzjj wants to merge 530 commits into
Open
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All agents now go through AgentBuildShell; layout drops overview/builder tabs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CreateAgentDialog: only name + build method card selector
- Create success → navigate to /agents/{id}?stage=brief
- Remove edit dialog mode — use Settings page instead
- AgentCard: remove edit button, add status badge, relative time,
hover action hint (Start building / Continue building / View usage)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- agent-card: use shared formatRelativeTime from dateHelpers, inline getActionHint, add aria-label on delete button - agent-form-dialog: remove unused workspaceId prop, add role=radiogroup and aria-checked on build method cards - agents/page: 4-column grid layout (xl:grid-cols-4), remove workspaceId prop from CreateAgentDialog Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…card - Remove mx-auto max-w-6xl that caused left gap - 4-column grid (xl:grid-cols-4) - AgentCard: reduce padding/spacing, merge meta+action into single row, fix Invalid Date with null guard, add defaultValue to all i18n keys - Remove handleCreate/handleDelete wrappers, inline setters Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove "Business scenario" field (not needed for graph builder) - Center form with max-w-2xl, cleaner spacing - Add placeholder text to all fields for guidance - Constraints changed from Textarea to Input (single line sufficient) - Add icons to action buttons (Sparkles, Wrench) - Tighter vertical rhythm, smaller labels Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- InspectorPanel handles both node and edge selection - SaveManager cross-store dependency with callback injection - Store migration strategy (compat layer → gradual replace) - CopilotOverlay chat logic migration from StudioRightPanel - ReactFlowProvider stays in AgentBuilder, GraphBuilderShell inside - AgentBuilder split into init logic + layout shell Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… GraphStatusBar rewrite - Fix BuilderSidebarTabs/StudioRightPanel paths to components/ (not root/studio/) - Fix React Flow imports from @xyflow/react to reactflow (matches codebase) - Task 6: rewrite existing GraphStatusBar (149 lines), not create new - Task 10: correct rm paths for cleanup Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hell Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oMode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, remove duplication - Merge stepper + toolbar into single compact header row via toolbarSlot pattern - Make ImportExportMenu self-contained (owns overwrite dialog, reads graphStore directly) - Remove execution state duplication from graphStore (BuilderNode reads executionStore) - Remove function aliases (removeNode→deleteNode, pushHistory→takeSnapshot) - Replace require() circular-dep hack with setAutoSaveTrigger registration - Collapse getInitialState blocks to captured-at-creation snapshots - Fix GraphToolbar perf: subscribe to boolean selectors not full arrays - Remove dead code: unused props, variables, redundant wrappers - Delete obsolete: BuilderToolbar, BuilderSidebarTabs, StudioRightPanel Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Consolidates the two separate "发布" (publish) entry points into a single Wizard-led flow with Settings as read-only history. Introduces AgentPublishService backend orchestration layer with proper transaction boundaries, replacing frontend-orchestrated multi-request publish. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…uilder Address 3 issues from spec review: 1. Add AgentBuilder.tsx to rewrite list (uses useUnfreezeVersion) 2. Fix publish_release call to use CreateAgentReleaseRequest model 3. Clarify commit removal scope: only orchestrated methods, not standalone Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
11 tasks covering backend (AgentPublishService, route changes) and frontend (new hooks, UI rewrites, cleanup). Fix publishKeys to use correct agentKeys.all pattern and STALE_TIME import path. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
freeze_version, publish_release, activate_release, retire_release no longer own their transaction boundary. Delete unfreeze_version entirely.
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
…points Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…e history Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…lope, SandboxBridge, dead code cleanup - Task 1: Add unified OrchestratorError exception hierarchy (error.py) - Task 4: Add VaultCipher with AES-256-GCM enc: prefix support - Task 9: Add last_result_status/error, task_available, sandbox_id to SandboxBridge - Task 13: Auto-assign UUIDv7 event_id, make seq Optional in EventEnvelope - Task 15: Remove dead result/memory_sync branches in event_mapping Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ventBus flush, file injection - Task 2: Add max_scheduling_tasks + image_for_provider() to JoySafeterConfig - Task 3: Add provider, setup_commands, allowed/disallowed_tools, max_turns, repos to HarnessInput - Task 7: Add SandboxStatus enum, SandboxCreateConfig, ProviderSandboxInfo to provider - Task 14: Add flush() method to EventBus - Task 23: Add InjectionStrategy enum and FileToInject dataclass to file_injection Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eue parity - Task 16: Add cancel command handler to CommandListener - Task 17: Add dispatch_cancel/dispatch_input to RedisCoordinator - Task 18: Fix lock key namespacing with joysafeter:lock: prefix - Task 19: Add has_pending() to queue backend - Task 20: Add exponential backoff to CommandListener reconnect - Task 21: Wrap publish_session_event with source_instance - Task 25: Use config heartbeat_interval instead of hardcoded 10s - Task 26: Add deregister_instance method to RedisCoordinator - Task 27: Return (sandbox_id, owner) tuples from list_active_sandbox_owners Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…C logging - Task 5: Add OAuth token refresh + VaultCipher decryption to HarnessIer - Task 6: Add setup_commands, tool allow lists, max_turns extraction to builder - Task 8: Add reverse orphan sweep (DB→provider) to SandboxController - Task 22: Align MemoryStoreSubscribers API (mount_path, dedup, exclude by sandbox, notify_peers_direct) - Task 24: image_for_provider already exists in sandbox_resolver — skip - Task 28: Log e_providers from RunnerReady in gRPC Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eanup - Task 10: Fix grace period probe schedule to match Rust (3s/5s/10s/15s then 105s) - Task 11: Fix reconnect path to emit session.status_idle when task_done && !got_idle - Task 12: Fix cleanup Step 7 to query DB for pending tasks instead of in-memory list Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The upgrade() and downgrade() function bodies had an extra level of indentation (8 spaces instead of standard 4). This is syntactically valid Python but violates Alembic's autogenerate convention and triggers Pyright linting errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r, tool allow lists - C1: Update task DB status to 'cancelled' + emit session.status_idle on user cancel - C2: Fix NameError in reconnect path (result_status/active_task_session_id → task_session_id) - C3: Grace period already wired in _cleanup_sandbox → verified, no change needed - C5: _parse_tool_allow_lists now reads from nested configs[] array (Rust parity) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- I1: Update sandbox DB status to 'running' on task dispatch
- I2: Publish Redis 'complete' event after task result
- I3: Unconditionally reset requires_action_pending + confirmation after each task
- I4: Emit session.status_running on reconnect before inner event loop
- I5: Fix error event type check ('session.error' not 'error')
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…der parity - I10: _force_stop_stuck uses updated_at instead of last_used_at (Rust parity) - I14: Skip pool claim after destroying mismatched stopped sandbox (go direct to create) - I15: MCP tool name extraction checks both 'name' and 'mcp_server_name' fields - I11/I12/I13/I16: Acknowledged as Rust-side differences or deliberate design choices Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- C4: Add CAS guard to SessionStateSubscriber (prevent overwriting terminal sessions) - C6: Add memory_mounts to SandboxCreateConfig + Docker bind mount support - I9: Already protected by queries::update_session_status CAS (verified) - I12: Add notify_global() to TaskQueue + wake scheduler after health check cleanup - I16: Fix Codex history to parse content as array-of-blocks (not just plain string) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stead of /dashboard Changed 3 locations: - app/page.tsx: root route redirect for logged-in users - components/auth/oauth-buttons.tsx: default OAuth callback URL - app/(auth)/verify/use-verification.ts: post-verification redirect - middleware.ts: add /managed to allowed redirect paths Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add 'Copy' button next to content header in EventDetail transcript mode - Uses Clipboard API with both text/html and text/plain MIME types for rich text copy - Preserves formatting (headings, lists, bold, links) when pasting into Word/Docs/Email - Falls back to plain text copy if rich text clipboard API is unavailable - Shows 'Copied' confirmation with green checkmark for 2 seconds - Added common.copy/common.copied i18n keys for both zh and en locales Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause: SessionService.update_session_status acquired a FOR UPDATE row lock on joysafeter_sessions BEFORE acquiring pg_advisory_xact_lock for event seq. EventBatchSender._write_batch acquired advisory lock FIRST then INSERT which needs FK ShareLock on sessions. This creates an AB-BA deadlock under concurrency. Fixes: 1. SessionService.update_session_status: acquire advisory lock BEFORE row lock (matching JoySafeterSessionLifecycleService lock ordering) 2. EventBatchSender._write_batch: sort session_ids before iterating to prevent multi-session advisory lock ordering deadlocks between concurrent batches The error manifested as: asyncpg.exceptions.DeadlockDetectedError: deadlock detected Process 15202 waits for ShareLock on transaction 172024; blocked by process 15227 Process 15227 waits for ShareLock on transaction 172023; blocked by process 15202 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1. Gate MockAdapter behind JOYSAFETER_MOCK_ADAPTER env var (matching Rust). Previously it was unconditionally registered on every startup. 2. Replace silent mock_call_llm fallback with RuntimeError. Previously, if langchain_openai was missing, the agent would silently return fake 'Mock response' output instead of failing loudly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…quests Added ssrf_guard.py utility with: - HTTPS-only scheme enforcement (configurable) - RFC-1918 / link-local / loopback / cloud metadata IP blocking - DNS resolution before request (prevents DNS rebinding) - follow_redirects=False on all patched httpx calls Fixed 7 SSRF-vulnerable sinks: 1. CRITICAL: vault OAuth token_url (user-controlled, POSTed at runtime) 2. CRITICAL: quickstart ANTHROPIC_BASE_URL (user-writable secret) 3. HIGH: media.py Image/Audio/Video/File URL fetch (4 call sites) 4. HIGH: research_tools fetch_webpage_content 5. HIGH: A2A client agent_card_url 6. HIGH: MCP /test endpoint (no scheme validation) 7. HIGH: OIDC discovery issuer URL Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Changed SSRF guard default from HTTPS-only to allow both HTTP and HTTPS. The primary SSRF defense is IP validation (blocking cloud metadata, private IPs, loopback, link-local), not scheme restriction. Production environments can set JOYSAFETER_SSRF_HTTPS_ONLY=1 to enforce HTTPS-only when needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Private/internal IPs (10.x, 172.16.x, 192.168.x, 127.0.0.1) are now ALLOWED by default because many legitimate services run on the internal network: - LLM APIs (Ollama, vLLM on localhost) - MCP servers (internal tools) - Internal service endpoints Only truly dangerous IPs are ALWAYS blocked: - Cloud metadata (169.254.169.254, 100.100.100.200, fd00:ec2::254) - Link-local range (169.254.x.x) - Multicast addresses - Known metadata hostnames (metadata.google.internal, metadata.goog) Opt-in env vars for stricter policies: - JOYSAFETER_SSRF_BLOCK_PRIVATE=1 — also block RFC-1918 private IPs - JOYSAFETER_SSRF_HTTPS_ONLY=1 — enforce HTTPS-only scheme Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… inputs Backend (Pydantic schemas): - McpServerCreate/Update: @field_validator on url field - McpTestRequest: @field_validator on url field - CreateCredentialRequest: @field_validator on mcp_server_url - McpServerConfig (agent): @field_validator on url field - Shared validate_url_scheme() in ssrf_guard.py Frontend: - lib/utils/url-validation.ts: validateUrlScheme() + isValidUrl() - add-mcp-dialog.tsx: validate URL before test/save - create-agent-dialog.tsx: validate MCP URL before adding All URL fields now enforce http:// or https:// scheme at input time, before reaching the backend SSRF guard (which handles IP validation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Frontend fixes (3 missing validation points): - Agent edit page MCP URL: add validateUrlScheme before addMcpServer - Vault credential dialog: add validateUrlScheme before submit - (Graph builder A2A URLs validated at backend level) Backend fixes: - A2A NodeConfig: validate a2a_url and agent_card_url schemeig load time Now ALL user-controllable URL inputs have http/https scheme validation: - 8 frontend form entry points (MCP create/edit/test, Agent create/edit, Vault cred, Quickstart) - 6 backend Pydantic schemas (McpServerCreate/Update, McpTestRequest, McpServerConfig, CreateCredentialRequest) - 2 backend config parsers (A2A NodeConfig) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SSRFError extends ValueError. In the IP validation step (step 3), the code raised SSRFError for metadata IPs, but the subsequent 'except ValueError' clause caught and silently swallowed it, allowing requests to cloud metadata endpoints like 169.254.169.254. Fix: separate ipaddress.ip_address() parsing from SSRF validation checks so the except ValueError only catches the 'not an IP literal' case. Test results after fix: ✅ http://169.254.169.254 — correctly blocked ✅ http://169.254.170.2 — correctly blocked ✅ http://100.100.100.200 — correctly blocked ✅ http://metadata.google.internal — correctly blocked ✅ ftp://evil.com — correctly blocked ✅ http://localhost:11434 — correctly allowed ✅ http://10.0.0.5:8080 — correctly allowed ✅ https://api.anthropic.com — correctly allowed Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem: When viewing a running session, the frontend simultaneously: 1. SSE stream (real-time events via /events/stream) 2. Session status polling (every 2s) 3. Auto-load more events (useEffect triggered repeatedly) This caused a flood of GET /events?limit=100&after_seq=X requests evh SSE was delivering the same events in real-time. Fixes: - Skip auto-load useEffect when sseConnected=true (events arrive via SSE) - Increase session refetchInterval from 2s to 10s when SSE is connected (session status changes are also delivered through SSE events) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tus from SSE
SSE already pushes session.status_* events in real-time. No need to
poll GET /sessions/{id} at all when SSE is connected.
Changes:
- refetchInterval: false when sseConnected (was 10s, before that 2s)
- New useEffect: watches SSE events for session.status_* and updates
the React Query cache directly via setQueryData, so UI reflects
status changes instantly without any HTTP request
- Polling only activates as fallback when SSE is disconnected
Before: SSE connected → poll every 2s → 30 requests/min wasted
After: SSE connected → 0 polls → status updates from SSE push
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P1: QueueFull now sends {lagged: true} signal instead of silently dropping
- SessionBroadcaster: on QueueFull, drain queue and inject lagged marker
- SSE endpoint: detect lagged marker and break stream (client reconnects)
- Frontend sse.ts lagged handling was already implemented but was dead code
because the backend never sent the signal — now it works end-to-end
P2: DB polling fallback reduced from 15s to 2s
- When Redis is unavailable, events were delayed up to 15 seconds
- Now polls every 2 seconds for acceptable latency
P3: SSE stream now detects terminal session status
- DB fallback loop checks session.status and closes stream when terminated
- Prevents indefinite empty polling for completed sessions
P4: Heartbeat timeout increased from 15s to 30s
- Previous 15s timeout triggered a full DB query just to send a heartbeat
- 30s is the standard SSE heartbeat interval; reduces unnecessary DB load
Architecture summary (service-split):
Orchestrator → Redis PUBLISH → API Redis subscriber → SSE → Frontend
Fallback (no Redis): Orchestrator → DB → API polls DB every 2s → SSE
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ReferenceError: Cannot access 'sseConnected' before initialization The useQuery refetchInterval callback referenced sseConnected (line 195) but it was declared later (line 275). Moved useSessionStream hook above useQuery to resolve the temporal dead zone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Previously the EventBus ran Phase 1 (DB persist) sequentially BEFORE Phase 2 (SSE broadcast). Every event had to wait for the DB batch write (100ms+ delay) before being pushed to SSE. Now persist and broadcast run in PARALLEL: - Python: asyncio.gather([persist(), broadcast1(), broadcast2()]) - Rust: tokio::spawn(persist) then broadcast immediately Also fixed Rust SessionBroadcaster to log warnings on channel full (matching Python P1 lagged signal fix). Before: event → wait DB batch (100ms) → wait DB write → SSE push After: event → SSE push immediately + DB write in background Expected SSE latency improvement: ~100ms+ per event eliminated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1. Connection semaphore: atomic acquire_nowait() (no TOCTOU race) 2. EventBus.publish: never raises — errors logged but don't kill sessions 3. HITL task deadline reset after confirmation (matching Rust) 4. HITL event buffer bounded at 1000 (prevent OOM on long HITL pauses) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…edis leak, backpressure Python fixes: - 1.2: gRPC transport keepalive (30s ping, 10s timeout) — detect NAT/LB dead connections - 3.1: Bridge register() cancels old bridge on reconnect (prevent double-session) - 4.5: Redis task mapping cleanup on agent-not-found early exit Rust fixes (disk, needs cargo build): - 1.2: Tonic tcp_keepalive + http2_keepalive (30s/10s) - 3.1: BridgeRegistry.register() cancels old bridge via CancellationToken - 6.2: rescue_orphaned_tasks pushes to global queue (was log-only) - 7.1: StartTask send with 10s timeout (prevent indefinite block) Both: - All outbound sends now bounded by timeout or try_send - Bridge double-registration produces WARNING log + old session cancelled - gRPC keepalive detects silently dropped TCP connections Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.