Skip to content

Conversation

@TayDa64
Copy link

@TayDa64 TayDa64 commented Jan 27, 2026

Precision Grounding + Inspect Overlay (Opus Execution Plan)

Summary

  • Align grid math across overlay, main, and AI prompts using shared constants.
  • Add local fine grid around the cursor for precise targeting without full-grid noise.
  • Introduce devtools-style inspect overlays (actionable element boxes + metadata).
  • Ensure AI uses the same visual grounding as the user.

Goals / Non-Goals

Goals

  • User and AI see the same targeting primitives (grid + inspect metadata).
  • Fine precision selection without needing full fine-grid visibility.
  • Deterministic coordinate mapping across renderer/main/AI prompt.

Non-Goals

  • Full external app DOM access (we rely on OCR + visual detection).
  • Replacing the grid system entirely.

Problem

  • Fine dots do not appear around the cursor, preventing high-precision selection.
  • AI coordinate grounding drifts due to mismatched math across modules.
  • AI lacks the same visualization/inspection context the user sees.

Approach

  1. Shared grid math module used by renderer, main, and AI prompt.
  2. Local fine-grid rendering around cursor in selection mode.
  3. Inspect layer backed by visual-awareness to surface actionable regions.
  4. AI prompt + action executor aligned to overlay math and inspect metadata.

Key Changes (Planned)

  • src/shared/grid-math.js: canonical grid constants + label → pixel conversion.
  • src/renderer/overlay/overlay.js: local fine-grid render + shared math usage.
  • src/renderer/overlay/preload.js: expose grid math to renderer safely.
  • src/main/system-automation.js: unify coordinate mapping.
  • src/main/ai-service.js: ground prompts + fine label support.
  • src/main/index.js: optional inspect toggle + overlay commands.
  • src/main/visual-awareness.js: actionable element detection + metadata surface.

Implementation Plan

Phase 1: Grounding & Precision

  • Shared grid math module and renderer/main integration.
  • Local fine-grid around cursor with snap highlight.
  • Add label→pixel IPC from main to overlay to guarantee exact mapping.
  • Add fine label rendering on hover (C3.12) in local grid.

Phase 2: Inspect Overlay (Devtools‑Style)

  • Add inspect toggle command and UI indicator.
  • Visual-awareness pass: actionable region detection (buttons, inputs, links).
  • Overlay layer draws bounding boxes + tooltip with text/role/confidence.
  • Selection handoff: click through to element center.

Phase 3: AI Grounding + Action Execution

  • Include inspect metadata + screen size in AI context.
  • Prefer inspect targets; fallback to grid only if needed.
  • Add “precision click” action with safety confirmation.

UX Notes

  • Inspect mode should be visually distinct (e.g., cyan boxes, tooltip anchored).
  • Local fine grid should fade in/out smoothly and never block click-through.
  • Keep overlays under 16ms frame budget; throttle redraw to pointer move.

Testing

Unit

  • Grid label conversions (coarse + fine).
  • Shared constants remain consistent across renderer/main/AI.

Manual

  • Cursor-local fine dots appear in selection mode and track cursor.
  • Background click-through still works in both modes.
  • Inspect overlay alignment with visible UI elements.

Regression

  • Coarse grid rendering.
  • Pulse effect visibility.
  • Safety confirmation flow intact.

Risks / Mitigations

  • DPI scaling drift → use Electron screen.getPrimaryDisplay().scaleFactor.
  • Performance → local fine grid only; throttled draw.
  • Overlay click-through → hide overlay only at click execution.

Observability / Debugging

  • Add a debug overlay toggle for grid math readouts.
  • Log label→pixel conversions when in inspect mode.
  • Capture last 10 action targets in memory for post-mortem.

Opus Notes (Websearch Required)

  • Verify Electron overlay best practices (setIgnoreMouseEvents behavior).
  • Validate DPI/scaling guidance for Windows and macOS.
  • Check common patterns for devtools-like overlays.

Checklist

  • Shared grid math used everywhere (renderer, main, AI prompt).
  • Local fine grid visible and performant.
  • Inspect overlay works and aligns with AI context.
  • AI actions target inspect regions with correct coordinates.
  • Tests updated/added and passing.

Copilot AI and others added 17 commits January 23, 2026 19:08
…-agent

Implement Electron headless agent with transparent overlay architecture
- Real-time test demonstrating functional transparent overlay
- Shows full-screen overlay with dot grid selection system
- Includes chat window with Copilot Agent UI
- Evidence of Selection mode banner and interactive elements
- Tested on Xvfb virtual display at 1920x1080
Add core project structure for the Liku AI System, including CLI, core, and VS Code extension packages. Includes defensive AI architecture files, manifest, TypeScript configs, workspace setup, and initial implementation for CLI commands, stream parser, and VS Code integration.
- Fix fine dots not appearing after toggle (regenerate grids on mode change)
- Add visual feedback for key presses
- Improve key bindings: F/Space=toggle fine, G=show all, +/-=zoom, Esc=cancel
- Fix chat window fullscreen bug with aggressive bounds enforcement (380x500 bottom-right)
- Use maximize instead of fullscreen for overlay on Windows
- Add AI service stubs for OpenAI/Anthropic/Ollama
- Add visual awareness module for screen capture/OCR
- Externalize chat.js and overlay.js scripts for CSP compliance
- Root cause: overlay setIgnoreMouseEvents(true) forwards keyboard to background apps
- Solution: Use globalShortcut.register() in main process for F/G/+/-/Esc
- Send IPC 'overlay-command' to overlay renderer
- Added registerOverlayShortcuts/unregisterOverlayShortcuts in main process
- Added onOverlayCommand handler in overlay.js and preload.js
- Shortcuts: F=fine grid, G=all grids, +/-=zoom, Esc=cancel
- Add Copilot OAuth device code flow (github.com/login/device/code)
- Add loadCopilotToken/saveCopilotToken for token persistence
- Add startCopilotOAuth/pollForToken for authentication
- Add callCopilot() API caller for api.githubcopilot.com
- Add /login, /logout, /status commands
- Default provider changed to 'copilot'
- OAuth callback notifies chat on auth completion
- Auto-load saved token on startup
- Fix async command handling for /login Promise
- Token stored at %APPDATA%/copilot-agent/copilot-token.json
BREAKING: Single-key shortcuts (F, G, +, -, Esc) were capturing ALL
keypresses system-wide, breaking VS Code and other apps!

Changed overlay shortcuts to use modifiers:
- Ctrl+Alt+F = Toggle fine grid
- Ctrl+Alt+G = Show all grids
- Ctrl+Alt+= = Zoom in
- Ctrl+Alt+- = Zoom out
- Ctrl+Alt+X = Cancel selection

Updated chat UI shortcuts display.
Updated overlay console log messages.
Copilot AI review requested due to automatic review settings January 27, 2026 22:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared grid-math layer and significantly upgrades the Electron-based overlay/chat system for visually grounded AI actions, while also adding a separate ultimate-ai-system workspace (Liku core/CLI/VS Code extension) for defensive AI workflows and streaming parsing.

Changes:

  • Add a canonical grid math module (src/shared/grid-math.js) and wire it into the overlay (canvas-based coarse + local fine grid) and main automation layer for deterministic label↔pixel mapping.
  • Expand the chat and overlay UIs (new preload bridges, chat layout, safety/agentic action flows) and integrate a visual-awareness module plus system-automation for Windows-native input and screen analysis.
  • Introduce the ultimate-ai-system monorepo with pnpm/turbo config, @liku/core (AI stream parser/types), @liku/cli (project bootstrap/checkpoint/status/parse), a VS Code extension, and extensive docs and project metadata.

Reviewed changes

Copilot reviewed 45 out of 49 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
.gitignore Defines ignores for root Node/Electron project (deps, builds, IDE, Electron out).
.github/PULL_REQUEST_TEMPLATE.md Adds detailed PR template aligned with precision grounding + inspect overlay plan.
package.json Root Electron app manifest: main entry, start/test scripts (including grid test), electron dependency.
scripts/start.js Wrapper to spawn Electron with GUI, clearing ELECTRON_RUN_AS_NODE and wiring stdio/exit.
scripts/test-grid.js Node script asserting gridToPixels maps coarse and fine labels to expected coordinates.
src/assets/tray-icon.png Tray icon asset for the Electron app.
src/shared/grid-math.js Canonical constants and labelToScreenCoordinates implementation for coarse and fine grid labels.
src/renderer/overlay/preload.js Exposes overlay-specific IPC API plus shared grid math helpers into the renderer.
src/renderer/overlay/overlay.js High-performance canvas overlay: coarse grid rendering, cursor-local fine grid, mode/zoom handling, click snap highlight, and IPC integration.
src/renderer/overlay/index.html HTML/CSS shell for overlay canvas, mode/status indicators, interaction region, and border styling.
src/renderer/chat/preload.js Chat window IPC bridge for messaging, mode, AI control, visual awareness, safety guardrails, and state access.
src/renderer/chat/index.html Full chat UI (titlebar, toolbar, provider/model bar, history, context panel, input, keyboard hints) styled to match VS Code-like theme.
src/renderer/chat/chat.js Chat front-end logic: message handling, mode toggling, token estimation, provider/model selection, action confirmation UI, typing indicators, and safety/agentic flows.
src/main/system-automation.js Windows-focused automation utilities (mouse, keyboard, scroll, drag, grid-label→pixels) with action sequencing and AI-action parsing.
src/main/visual-awareness.js Screen diffing, OCR (Tesseract/Windows OCR), UI Automation-based element detection, and a higher-level analyzeScreen helper.
ELECTRON_README.md High-level documentation of the Electron headless agent + overlay architecture and usage.
ARCHITECTURE.md Detailed architecture document for main/overlay/chat components, IPC schema, security, performance, and extensibility.
CONFIGURATION.md Configuration examples for windows, hotkeys, IPC, styling, agent integration, and platform-specific tweaks.
TESTING.md Manual and future automated testing guide for the Electron app (tray, hotkeys, overlay, chat, performance, security).
QUICKSTART.md Step-by-step quick start for installing and running the app, modes, shortcuts, and troubleshooting.
PROJECT_STATUS.md Status report asserting implementation completeness, metrics, and next steps for the Electron project.
IMPLEMENTATION_SUMMARY.md Consolidated summary of what was built, requirements coverage, and key technical decisions.
GPT-reports.md GPT-generated workspace report capturing current state, issues, and recommendations.
FINAL_SUMMARY.txt High-level, formatted project completion summary for the Electron overlay/agent architecture.
ultimate-ai-system/.gitignore Ignores for the ultimate-ai-system workspace (node_modules, dist, logs, context XML, etc.).
ultimate-ai-system/.ai/manifest.json AI manifest describing filesystem security, agent profile, verification strategies, and memory paths for the Liku system.
ultimate-ai-system/.ai/logs/.gitkeep Placeholder for logs directory.
ultimate-ai-system/.ai/context/.gitkeep Placeholder for context directory.
ultimate-ai-system/.ai/instructions/refactor.xml Instruction set for defensive refactor workflow (analysis, file_change, verification_cmd tags).
ultimate-ai-system/README.md README for “Liku - Ultimate AI System”, including CLI usage and package overview.
ultimate-ai-system/package.json Workspace-level package config with turbo/typecheck/build scripts and devDeps.
ultimate-ai-system/pnpm-workspace.yaml pnpm workspace definition for liku/* packages.
ultimate-ai-system/turbo.json Turbo repo task pipeline for build/test/typecheck/dev in the monorepo.
ultimate-ai-system/tsconfig.base.json Shared TS compiler options for the Liku packages (NodeNext, strict, composite, declarations).
ultimate-ai-system/pnpm-lock.yaml Locked dependency tree for the ultimate-ai-system workspace (TypeScript, turbo, rimraf, VS Code types, etc.).
ultimate-ai-system/liku/core/package.json @liku/core package metadata, build scripts, and devDeps.
ultimate-ai-system/liku/core/tsconfig.json TS config for @liku/core (outDir/rootDir, node types).
ultimate-ai-system/liku/core/src/types.ts Shared type definitions for stream events, analysis payloads, checkpoints, and provenance.
ultimate-ai-system/liku/core/src/stream-parser.ts EventEmitter-based AI stream parser that extracts tags like <checkpoint>, <file_change>, <analysis>, <hypothesis> from streamed text.
ultimate-ai-system/liku/core/src/index.ts Barrel exports for AIStreamParser and core types.
ultimate-ai-system/liku/cli/package.json @liku/cli package manifest, scripts, and dependency on @liku/core.
ultimate-ai-system/liku/cli/tsconfig.json TS config for CLI package.
ultimate-ai-system/liku/cli/src/bin.ts CLI entry implementing init, checkpoint, status, and parse commands around the Liku project layout and stream parser.
ultimate-ai-system/liku/vscode/package.json VS Code extension manifest (commands, activation events, deps).
ultimate-ai-system/liku/vscode/tsconfig.json TS config for the extension.
ultimate-ai-system/liku/vscode/src/extension.ts VS Code extension activation: finds .ai/manifest.json, creates checkpoints, and shows project status via output channel and status bar.
Files not reviewed (1)
  • ultimate-ai-system/pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@TayDa64
Copy link
Author

TayDa64 commented Jan 27, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

1 similar comment
@TayDa64
Copy link
Author

TayDa64 commented Jan 28, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link

@TwistedCrafts TwistedCrafts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a good start

@TayDa64
Copy link
Author

TayDa64 commented Jan 28, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

TayDa64 and others added 6 commits January 28, 2026 08:46
Add Inspect Overlay for precision UI targeting
- Refactor 1351-line monolithic ui-automation.js into modular structure
- No file exceeds 300 lines (largest: window/manager.js at 278 lines)
- Maintain 100% backward compatibility with all 45 baseline tests passing
- Add comprehensive baseline test suite for regression prevention

Structure:
  ui-automation/
  ├── index.js           - Main entry point with all exports
  ├── config.js          - CONFIG, CONTROL_TYPES
  ├── screenshot.js      - Screenshot functions
  ├── core/              - PowerShell execution, helpers
  ├── elements/          - findElements, waitForElement
  ├── mouse/             - movement, click, drag, scroll
  ├── keyboard/          - typeText, sendKeys
  ├── window/            - getActiveWindow, findWindows, focusWindow
  └── interactions/      - click, hover, fillField, waitAndClick
- Add CLI commands directory with click, drag, find, keys, mouse, repl, screenshot, scroll, start, type, wait, window commands
- Add window manager improvements
- Add visual awareness and system automation updates
- Add baseline-app documentation
- Add test scripts for UI automation and element interactions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants