-
Notifications
You must be signed in to change notification settings - Fork 1k
Add shared grid math and local fine grid; expand PR template #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: TayDa64 <[email protected]>
Co-authored-by: TayDa64 <[email protected]>
Co-authored-by: TayDa64 <[email protected]>
Co-authored-by: TayDa64 <[email protected]>
Co-authored-by: TayDa64 <[email protected]>
Co-authored-by: TayDa64 <[email protected]>
…-agent Implement Electron headless agent with transparent overlay architecture
- Real-time test demonstrating functional transparent overlay - Shows full-screen overlay with dot grid selection system - Includes chat window with Copilot Agent UI - Evidence of Selection mode banner and interactive elements - Tested on Xvfb virtual display at 1920x1080
Add core project structure for the Liku AI System, including CLI, core, and VS Code extension packages. Includes defensive AI architecture files, manifest, TypeScript configs, workspace setup, and initial implementation for CLI commands, stream parser, and VS Code integration.
- Fix fine dots not appearing after toggle (regenerate grids on mode change) - Add visual feedback for key presses - Improve key bindings: F/Space=toggle fine, G=show all, +/-=zoom, Esc=cancel - Fix chat window fullscreen bug with aggressive bounds enforcement (380x500 bottom-right) - Use maximize instead of fullscreen for overlay on Windows - Add AI service stubs for OpenAI/Anthropic/Ollama - Add visual awareness module for screen capture/OCR - Externalize chat.js and overlay.js scripts for CSP compliance
- Root cause: overlay setIgnoreMouseEvents(true) forwards keyboard to background apps - Solution: Use globalShortcut.register() in main process for F/G/+/-/Esc - Send IPC 'overlay-command' to overlay renderer - Added registerOverlayShortcuts/unregisterOverlayShortcuts in main process - Added onOverlayCommand handler in overlay.js and preload.js - Shortcuts: F=fine grid, G=all grids, +/-=zoom, Esc=cancel
- Add Copilot OAuth device code flow (github.com/login/device/code) - Add loadCopilotToken/saveCopilotToken for token persistence - Add startCopilotOAuth/pollForToken for authentication - Add callCopilot() API caller for api.githubcopilot.com - Add /login, /logout, /status commands - Default provider changed to 'copilot' - OAuth callback notifies chat on auth completion - Auto-load saved token on startup - Fix async command handling for /login Promise - Token stored at %APPDATA%/copilot-agent/copilot-token.json
BREAKING: Single-key shortcuts (F, G, +, -, Esc) were capturing ALL keypresses system-wide, breaking VS Code and other apps! Changed overlay shortcuts to use modifiers: - Ctrl+Alt+F = Toggle fine grid - Ctrl+Alt+G = Show all grids - Ctrl+Alt+= = Zoom in - Ctrl+Alt+- = Zoom out - Ctrl+Alt+X = Cancel selection Updated chat UI shortcuts display. Updated overlay console log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a shared grid-math layer and significantly upgrades the Electron-based overlay/chat system for visually grounded AI actions, while also adding a separate ultimate-ai-system workspace (Liku core/CLI/VS Code extension) for defensive AI workflows and streaming parsing.
Changes:
- Add a canonical grid math module (
src/shared/grid-math.js) and wire it into the overlay (canvas-based coarse + local fine grid) and main automation layer for deterministic label↔pixel mapping. - Expand the chat and overlay UIs (new preload bridges, chat layout, safety/agentic action flows) and integrate a visual-awareness module plus system-automation for Windows-native input and screen analysis.
- Introduce the
ultimate-ai-systemmonorepo with pnpm/turbo config,@liku/core(AI stream parser/types),@liku/cli(project bootstrap/checkpoint/status/parse), a VS Code extension, and extensive docs and project metadata.
Reviewed changes
Copilot reviewed 45 out of 49 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
.gitignore |
Defines ignores for root Node/Electron project (deps, builds, IDE, Electron out). |
.github/PULL_REQUEST_TEMPLATE.md |
Adds detailed PR template aligned with precision grounding + inspect overlay plan. |
package.json |
Root Electron app manifest: main entry, start/test scripts (including grid test), electron dependency. |
scripts/start.js |
Wrapper to spawn Electron with GUI, clearing ELECTRON_RUN_AS_NODE and wiring stdio/exit. |
scripts/test-grid.js |
Node script asserting gridToPixels maps coarse and fine labels to expected coordinates. |
src/assets/tray-icon.png |
Tray icon asset for the Electron app. |
src/shared/grid-math.js |
Canonical constants and labelToScreenCoordinates implementation for coarse and fine grid labels. |
src/renderer/overlay/preload.js |
Exposes overlay-specific IPC API plus shared grid math helpers into the renderer. |
src/renderer/overlay/overlay.js |
High-performance canvas overlay: coarse grid rendering, cursor-local fine grid, mode/zoom handling, click snap highlight, and IPC integration. |
src/renderer/overlay/index.html |
HTML/CSS shell for overlay canvas, mode/status indicators, interaction region, and border styling. |
src/renderer/chat/preload.js |
Chat window IPC bridge for messaging, mode, AI control, visual awareness, safety guardrails, and state access. |
src/renderer/chat/index.html |
Full chat UI (titlebar, toolbar, provider/model bar, history, context panel, input, keyboard hints) styled to match VS Code-like theme. |
src/renderer/chat/chat.js |
Chat front-end logic: message handling, mode toggling, token estimation, provider/model selection, action confirmation UI, typing indicators, and safety/agentic flows. |
src/main/system-automation.js |
Windows-focused automation utilities (mouse, keyboard, scroll, drag, grid-label→pixels) with action sequencing and AI-action parsing. |
src/main/visual-awareness.js |
Screen diffing, OCR (Tesseract/Windows OCR), UI Automation-based element detection, and a higher-level analyzeScreen helper. |
ELECTRON_README.md |
High-level documentation of the Electron headless agent + overlay architecture and usage. |
ARCHITECTURE.md |
Detailed architecture document for main/overlay/chat components, IPC schema, security, performance, and extensibility. |
CONFIGURATION.md |
Configuration examples for windows, hotkeys, IPC, styling, agent integration, and platform-specific tweaks. |
TESTING.md |
Manual and future automated testing guide for the Electron app (tray, hotkeys, overlay, chat, performance, security). |
QUICKSTART.md |
Step-by-step quick start for installing and running the app, modes, shortcuts, and troubleshooting. |
PROJECT_STATUS.md |
Status report asserting implementation completeness, metrics, and next steps for the Electron project. |
IMPLEMENTATION_SUMMARY.md |
Consolidated summary of what was built, requirements coverage, and key technical decisions. |
GPT-reports.md |
GPT-generated workspace report capturing current state, issues, and recommendations. |
FINAL_SUMMARY.txt |
High-level, formatted project completion summary for the Electron overlay/agent architecture. |
ultimate-ai-system/.gitignore |
Ignores for the ultimate-ai-system workspace (node_modules, dist, logs, context XML, etc.). |
ultimate-ai-system/.ai/manifest.json |
AI manifest describing filesystem security, agent profile, verification strategies, and memory paths for the Liku system. |
ultimate-ai-system/.ai/logs/.gitkeep |
Placeholder for logs directory. |
ultimate-ai-system/.ai/context/.gitkeep |
Placeholder for context directory. |
ultimate-ai-system/.ai/instructions/refactor.xml |
Instruction set for defensive refactor workflow (analysis, file_change, verification_cmd tags). |
ultimate-ai-system/README.md |
README for “Liku - Ultimate AI System”, including CLI usage and package overview. |
ultimate-ai-system/package.json |
Workspace-level package config with turbo/typecheck/build scripts and devDeps. |
ultimate-ai-system/pnpm-workspace.yaml |
pnpm workspace definition for liku/* packages. |
ultimate-ai-system/turbo.json |
Turbo repo task pipeline for build/test/typecheck/dev in the monorepo. |
ultimate-ai-system/tsconfig.base.json |
Shared TS compiler options for the Liku packages (NodeNext, strict, composite, declarations). |
ultimate-ai-system/pnpm-lock.yaml |
Locked dependency tree for the ultimate-ai-system workspace (TypeScript, turbo, rimraf, VS Code types, etc.). |
ultimate-ai-system/liku/core/package.json |
@liku/core package metadata, build scripts, and devDeps. |
ultimate-ai-system/liku/core/tsconfig.json |
TS config for @liku/core (outDir/rootDir, node types). |
ultimate-ai-system/liku/core/src/types.ts |
Shared type definitions for stream events, analysis payloads, checkpoints, and provenance. |
ultimate-ai-system/liku/core/src/stream-parser.ts |
EventEmitter-based AI stream parser that extracts tags like <checkpoint>, <file_change>, <analysis>, <hypothesis> from streamed text. |
ultimate-ai-system/liku/core/src/index.ts |
Barrel exports for AIStreamParser and core types. |
ultimate-ai-system/liku/cli/package.json |
@liku/cli package manifest, scripts, and dependency on @liku/core. |
ultimate-ai-system/liku/cli/tsconfig.json |
TS config for CLI package. |
ultimate-ai-system/liku/cli/src/bin.ts |
CLI entry implementing init, checkpoint, status, and parse commands around the Liku project layout and stream parser. |
ultimate-ai-system/liku/vscode/package.json |
VS Code extension manifest (commands, activation events, deps). |
ultimate-ai-system/liku/vscode/tsconfig.json |
TS config for the extension. |
ultimate-ai-system/liku/vscode/src/extension.ts |
VS Code extension activation: finds .ai/manifest.json, creates checkpoints, and shows project status via output channel and status bar. |
Files not reviewed (1)
- ultimate-ai-system/pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
1 similar comment
|
@copilot open a new pull request to apply changes based on the comments in this thread |
TwistedCrafts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a good start
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Co-authored-by: TayDa64 <[email protected]>
…and tests Co-authored-by: TayDa64 <[email protected]>
Add Inspect Overlay for precision UI targeting
- Refactor 1351-line monolithic ui-automation.js into modular structure - No file exceeds 300 lines (largest: window/manager.js at 278 lines) - Maintain 100% backward compatibility with all 45 baseline tests passing - Add comprehensive baseline test suite for regression prevention Structure: ui-automation/ ├── index.js - Main entry point with all exports ├── config.js - CONFIG, CONTROL_TYPES ├── screenshot.js - Screenshot functions ├── core/ - PowerShell execution, helpers ├── elements/ - findElements, waitForElement ├── mouse/ - movement, click, drag, scroll ├── keyboard/ - typeText, sendKeys ├── window/ - getActiveWindow, findWindows, focusWindow └── interactions/ - click, hover, fillField, waitAndClick
- Add CLI commands directory with click, drag, find, keys, mouse, repl, screenshot, scroll, start, type, wait, window commands - Add window manager improvements - Add visual awareness and system automation updates - Add baseline-app documentation - Add test scripts for UI automation and element interactions
Precision Grounding + Inspect Overlay (Opus Execution Plan)
Summary
Goals / Non-Goals
Goals
Non-Goals
Problem
Approach
Key Changes (Planned)
src/shared/grid-math.js: canonical grid constants + label → pixel conversion.src/renderer/overlay/overlay.js: local fine-grid render + shared math usage.src/renderer/overlay/preload.js: expose grid math to renderer safely.src/main/system-automation.js: unify coordinate mapping.src/main/ai-service.js: ground prompts + fine label support.src/main/index.js: optional inspect toggle + overlay commands.src/main/visual-awareness.js: actionable element detection + metadata surface.Implementation Plan
Phase 1: Grounding & Precision
Phase 2: Inspect Overlay (Devtools‑Style)
Phase 3: AI Grounding + Action Execution
UX Notes
Testing
Unit
Manual
Regression
Risks / Mitigations
screen.getPrimaryDisplay().scaleFactor.Observability / Debugging
Opus Notes (Websearch Required)
setIgnoreMouseEventsbehavior).Checklist