GitHub - rangehow/ToFu

_{豆腐 — Self-Hosted AI Assistant}

Multi-model chat · Autonomous agents · Project co-pilot · Multi-agent swarm
Daily reports & to-do · Browser extension · Desktop agent · Feishu bot
🔀 CLI backend switching — use Claude Code or Codex as the agent engine

🇨🇳 中文文档

What is Tofu?

Tofu is a fully self-hosted AI assistant built with a Flask backend and vanilla JS frontend. It connects to any OpenAI-compatible LLM API and gives you autonomous tool-calling agents, a project co-pilot for any codebase, multi-agent swarm orchestration, a browser extension, and a desktop agent — all from a single python server.py.

Features

Multi-Model Chat

20+ LLM models — OpenAI, Anthropic, Google Gemini, Qwen, DeepSeek, MiniMax, Doubao, GLM, Mistral, Grok, and any OpenAI-compatible API
Smart dispatch — multi-key, multi-provider routing with real-time latency scoring, error-rate tracking, and per-key rate-limit cooldowns
Streaming responses with per-model cost tracking (input/output/cache tokens × tiered pricing)
Multi-model comparison — send the same prompt to several models side-by-side
Auto-translation — bidirectional Chinese ↔ English translation per conversation

Tool Calling & Agent Mode

Built-in tools — web search (multi-engine: DDG, Brave, Bing, SearXNG in parallel), URL fetching, PDF parsing (text + VLM), image upload & generation, shell commands, Python execution
Project co-pilot — point it at any codebase for file browsing, grep search, code editing, git operations, and AI-powered file indexing
Autonomous task execution — multi-step tool chains with automatic retry, 3-layer context compaction, and configurable model fallback chains
Endpoint mode — Planner → Worker → Critic review loop for long-running tasks with iterative refinement
Scheduled tasks — cron-like recurring or one-shot tasks (shell, Python, LLM prompts)
Image generation — multi-model dispatch across Gemini and GPT image models with automatic 429-retry cycling

Multi-Agent Swarm

Swarm orchestration — MasterOrchestrator plans and delegates to multiple SubAgents running in parallel
Streaming scheduler — reactive task scheduling with no wave barriers, artifact sharing between agents
Review & synthesis — automatic review of agent outputs and final synthesis into a coherent result

Browser Extension

Chrome extension bridging the assistant with your browser for real-time page reading and interaction
Per-client command routing — multiple browsers connect simultaneously with independent command queues
Navigate, screenshot, click, type, extract content

Desktop Agent

Runs on your local machine, connects back to the server
File system operations, clipboard, screenshots, GUI automation (pyautogui), system info
Security-gated with --allow-write / --allow-exec flags

Feishu (Lark) Bot

Full Feishu bot integration via WebSocket — multi-turn LLM conversations with tool support directly in team chat
Slash commands, model/mode switching, conversation management

Daily Reports & To-Do

Click the ☑️ My Day button in the sidebar header to open the daily dashboard — a personal work journal powered by LLM analysis.

Auto-generated work streams — the LLM reads all your conversations for the day, clusters them into 5–15 coherent work streams (e.g. "修复图片回显", "项目部署调试"), and marks each as done, in progress, or blocked
Calendar view — month-at-a-glance calendar with per-day conversation counts and cost heatmap; click any date to view or generate its report
Tomorrow's plan — the LLM synthesizes 3–8 actionable TODO items from unfinished work, each with a detailed prompt and recommended tool configuration
One-click launch — click the ▶ button on any TODO to instantly open a new conversation pre-filled with the task prompt and the right tools enabled (search, code, browser, project, etc.)
To-do carry-forward — uncompleted TODOs automatically carry over to the next day as "今日待办" (Today's Tasks); the LLM tracks which ones you addressed and marks them done
Manual TODOs — add your own to-do items via the ＋ input at the bottom; toggle done/undone, delete, or launch as conversations
Cost tracking — per-day and per-conversation cost breakdown in CNY, calculated from token usage and model pricing
Auto-backfill — a background scheduler automatically generates yesterday's report on server boot and daily at midnight if it's missing
Motivational quotes — a random quote appears at the top of each report ("人生苦短，我用 AI" 🧈)

Scheduled Tasks

Proactive agent scheduler — create cron-like recurring or one-shot tasks (Shell, Python, LLM prompts) via conversation or the scheduler panel
SCHEDULER badge — appears in the top status bar; click to see all active proactive agents and their recent run logs
Enable via the 🕐 Scheduler toggle in the tool submenu

🔀 CLI Backend Switching (NEW)

Switch between Tofu's built-in agent, Claude Code, or OpenAI Codex as the coding agent backend — right from the UI.

Pure frontend mode — when using Claude Code or Codex, Tofu acts as a pure web UI; the external CLI handles all LLM calls, tool execution, and context management with its own authentication
Zero config for external agents — install the CLI, log in once in your terminal, and Tofu auto-detects it
Capabilities-driven UI — the interface automatically adapts: model selector, thinking depth, and Tofu-only features (image gen, browser, swarm…) are hidden when using an external backend
Session persistence — multi-turn conversations are maintained across page refreshes via backend session ID mapping
One-click switching — click the backend selector in the top bar to switch between agents; each conversation remembers its backend

More

Skills system — persistent reusable knowledge (Markdown files) — the assistant learns project conventions, bug patterns, and workflows across sessions
3-layer context compaction — micro-compact → structural truncation → LLM summary for very long conversations
IndexedDB caching — read-through conversation cache with LRU eviction for fast page loads
Error tracking — universal project error tracker with fingerprinting, resolution tracking, and digest reports
Dark theme UI with responsive layout, syntax highlighting, LaTeX rendering, and image previews
Cross-platform — runs on Linux, macOS, and Windows (see Platform Support below)
Mobile-friendly — responsive layout with compact topbar, swipe-open sidebar, and bottom-sheet tool toggles for touch screens
Auto-dependency repair — bootstrap.py auto-installs missing pip packages via LLM diagnosis

Quick Start

Option A: One-Command Install (recommended)

Works on Linux, macOS, and Windows. Requires only Python 3.10+ and Git — no conda, no admin/root.

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/rangehow/ToFu/main/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/rangehow/ToFu/main/install.ps1 | iex

Or run the cross-platform installer directly (any OS with Python 3.10+):

git clone https://github.com/rangehow/ToFu.git && cd ToFu
python install.py

This automatically creates a virtual environment, installs all dependencies, locates/installs PostgreSQL, and starts the server. Open http://localhost:15000 when it's ready.

With options:

python install.py --api-key sk-xxx --port 8080   # Pre-configure API key
python install.py --no-launch                     # Install only
python install.py --docker                        # Use Docker instead
python install.py --skip-playwright               # Skip browser automation

Option B: Docker (zero dependencies)

git clone https://github.com/rangehow/ToFu.git && cd ToFu
docker compose up -d

Or without cloning (when the image is published):

docker run -d -p 15000:15000 -v tofu-data:/app/data --name tofu ghcr.io/rangehow/tofu:latest

Open http://localhost:15000 — done. All data persists in Docker volumes.

Option C: Manual Install

Step-by-step for full control

Prerequisites: Python 3.10+, PostgreSQL 18+, ripgrep & fd-find (recommended)

git clone https://github.com/rangehow/ToFu.git
cd ToFu

# Create environment (pick one)
python -m venv .venv && source .venv/bin/activate   # Standard venv
# OR: conda create -n tofu python=3.12 -y && conda activate tofu

# Install PostgreSQL (if not already installed)
# macOS:   brew install postgresql@18
# Ubuntu:  sudo apt install postgresql
# Windows: https://www.postgresql.org/download/windows/
# conda:   conda install -c conda-forge postgresql>=18

# Install ripgrep & fd-find (recommended — faster code search & file finding)
# macOS:   brew install ripgrep fd
# Ubuntu:  sudo apt install ripgrep fd-find
# Windows: winget install BurntSushi.ripgrep.MSVC sharkdp.fd
# conda:   conda install -c conda-forge ripgrep fd-find

# Install Python dependencies
pip install -r requirements.txt

# Optional: browser automation for advanced page fetching
pip install playwright && playwright install chromium

# Run
python server.py

Open http://localhost:15000 in your browser — that's it! Configure everything from the Settings UI.

PostgreSQL runs as a local userspace process — no sudo, no system service. On first python server.py, the database auto-bootstraps (initdb, schema creation, port selection).

Automatic Dependency Repair

If any Python package is missing, server.py automatically delegates to bootstrap.py:

Detects the ImportError and hands off to bootstrap.py
Opens a live status page in your browser on the same port
The LLM API diagnoses the traceback and determines which packages to install
Packages are pip installed automatically, with up to 10 retry rounds
Once everything resolves, the real server starts

This uses only Python stdlib — works even when every pip package is missing.

Configuration

All configuration is done through the Settings UI — click the ⚙️ gear icon in the top-right corner of the chat interface. Changes are saved to the server instantly, no restart needed (unless noted).

The Settings panel has 7 tabs, each with a dedicated icon in the left sidebar:

⚙️ General

Core model parameters and global preferences.

Theme — Dark, Light, or Tofu (豆腐) theme
Temperature — controls response randomness (0 = deterministic, 1 = creative)
Max tokens — maximum output token limit
Image max width — auto-compress uploaded images (0 = no compression)
PDF max pages — limit page count when parsing PDFs
Thinking depth — default thinking budget for new conversations (Off / Medium / High / Max)
System prompt — custom instructions prepended to every conversation

🔗 Providers

Multi-provider API management — this is where you add your LLM API keys.

⚡ Add from template — one-click setup for OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, MiniMax, GLM, Doubao, Mistral, Grok, OpenRouter, Azure, Ollama, and more
Custom provider — add any OpenAI-compatible endpoint with custom base URL
Per-provider settings — each provider has its own API key(s), base URL, and model list
Auto-discover models — fetches available models from the provider's /v1/models endpoint
Multi-key rotation — add multiple API keys per provider for automatic rate-limit rotation

📦 Display

Control what appears in the model selector and image generation picker.

Image generation models — show/hide specific models in the image gen selector
Model dropdown — show/hide models in the main chat model switcher
Fallback model — auto-switch to this model when the primary model fails
Default model — override the default model for new conversations

🔍 Search & Fetch

Web search and content fetching behavior.

LLM content filter — use the model to strip navigation/ads from fetched pages (disable for speed)
Fetch top N — how many search results to auto-fetch (default: 6)
Fetch timeout — per-page timeout in seconds (default: 15)
Max characters — separate limits for search results, direct URL fetch, and PDF files
Max download size — byte limit for fetched content (default: 20 MB)
Blocked domains — domains the fetcher will never visit (one per line)

🌐 Network

Proxy configuration for all outbound requests.

HTTP / HTTPS proxy — proxy URL for LLM API calls, search, and page fetching
Bypass domains — domain suffixes that skip the proxy entirely (one per line, suffix-matched)
💡 Tip: Add your LLM API domains here if your corporate/VPN proxy silently drops SSE long-connections, causing BrokenPipeError

🐦 Feishu (Lark)

Feishu bot integration settings.

Connection status — live indicator showing bot connection state
App ID / App Secret — credentials from open.feishu.cn (restart required after change)
Default project path — project co-pilot root for Feishu conversations
Workspace root — base directory for project switching
Allowed users — restrict bot access to specific Feishu user IDs (blank = allow all)

`</>` Advanced

Pricing and cache management.

Price overrides — customize per-model pricing (USD per million tokens) as JSON
Local cache — view IndexedDB cache stats and clear cached conversations
Server info — server status and version information

Environment Variables (fallback)

For first-time setup, headless servers, or Docker deployments, you can also configure via environment variables. The Settings UI always takes priority — env vars are only used as initial fallback values.

cp .env.example .env

Variable	Description	Example
`LLM_API_KEY`	LLM provider API key (fallback)	`sk-abc123...`
`LLM_BASE_URL`	Chat completions endpoint (fallback)	`https://api.openai.com/v1`
`LLM_MODEL`	Default model (fallback)	`gpt-4o`
`PORT`	Server port	`15000`
`BIND_HOST`	Bind address	`0.0.0.0`
`PROXY_BYPASS_DOMAINS`	Comma-separated proxy bypass domains	`.corp.net,.internal.com`
`FEISHU_APP_ID`	Feishu bot app ID	`cli_xxxx`
`FEISHU_APP_SECRET`	Feishu bot app secret

💡 After first launch, we recommend configuring everything through the Settings UI. It's more intuitive, changes take effect immediately, and supports features like provider templates and model auto-discovery that env vars cannot.

Project Structure

├── server.py                  Flask app entry, middleware, logging
├── bootstrap.py               Auto-dependency repair (LLM-guided)
├── index.html                 Main chat UI (single-page app)
├── .env.example               Environment variable template
│
├── lib/                       Core libraries
│   ├── agent_backends/        Multi-backend agent switching (builtin/CC/Codex)
│   ├── llm_client.py          LLM API client (streaming, retry)
│   ├── llm_dispatch/          Multi-key multi-model dispatcher
│   ├── database.py            PostgreSQL (auto-bootstrap)
│   ├── tasks_pkg/             Task orchestration & compaction
│   │   ├── orchestrator.py    Main LLM ↔ tool loop
│   │   ├── executor.py        Tool execution engine
│   │   ├── endpoint.py        Planner → Worker → Critic loop
│   │   └── compaction.py      3-layer context compaction
│   ├── tools/                 Tool definitions & schemas
│   ├── swarm/                 Multi-agent orchestration
│   ├── fetch/                 Content fetching & extraction
│   ├── search/                Multi-engine web search
│   ├── browser/               Browser extension bridge
│   ├── project_mod/           Project co-pilot (scan, edit, undo)
│   ├── skills/                Skill accumulation system
│   ├── feishu/                Feishu bot integration
│   └── ...
│
├── routes/                    Flask Blueprints (21 modules)
├── static/                    CSS, JS, icons
├── browser_extension/         Chrome extension (MV3)
├── tests/                     Test suite (unit, API, E2E)
└── data/                      Runtime data (git-ignored)

Advanced Usage

CLI Backend Switching — Claude Code / Codex

Tofu can act as a pure web frontend for external coding agents. Instead of using Tofu's built-in orchestrator, you can delegate to Claude Code or OpenAI Codex — they handle LLM calls, tool execution, and context management with their own authentication.

Install Claude Code

# Install via npm
npm install -g @anthropic-ai/claude-code

# Log in (one-time)
claude auth login
# Follow the browser prompt to authenticate with your Claude account

# Verify
claude --version

Install Codex

# Install via npm
npm install -g @openai/codex

# Log in (one-time) — requires OpenAI API key or ChatGPT Plus subscription
codex auth login

# Verify
codex --version

Use in Tofu

Start Tofu: python server.py
Click the backend selector (🤖) in the top bar
Available backends show a ✅ badge; unavailable ones show ❌
Select Claude Code or Codex — the UI automatically adapts:
- Model selector, thinking depth, and search toggle are hidden (the CLI handles these)
- Tofu-only features (image gen, browser extension, swarm, scheduler) are greyed out
Send a message — Tofu spawns the CLI subprocess, streams its output, and renders it in the chat UI

Feature availability by backend

Feature	Built-in (Tofu)	Claude Code	Codex
Chat & streaming	✅	✅	✅
Web search	✅	✅ (CC's)	✅ (Codex's)
File operations	✅	✅ (CC's)	✅ (Codex's)
Code execution	✅	✅ (Bash)	✅ (exec)
Model selection	✅	— (CC decides)	— (Codex decides)
Image generation	✅	❌	❌
Browser extension	✅	❌	❌
Multi-agent swarm	✅	❌	❌
Desktop agent	✅	❌	❌

Note: The CLI must be installed on the same machine as the Tofu server. Tofu spawns the agent as a subprocess.

Project Co-Pilot

Click Project in the sidebar, enter the path to any codebase. The assistant can browse files, search code, edit files, run commands, and track modifications with per-round undo.

Multi-Agent Swarm

For complex tasks, the assistant automatically plans sub-tasks and delegates to specialist agents running in parallel. Results are reviewed and synthesized into a coherent output.

Browser Extension

chrome://extensions → Enable Developer Mode
Load unpacked → select browser_extension/
Click the extension icon → enter your server URL
The assistant can now read and interact with your browser tabs

Desktop Agent

pip install pyautogui pillow psutil
python lib/desktop_agent.py --server http://your-server:15000 --allow-write --allow-exec

Feishu Bot

Create an app at open.feishu.cn, enable Bot capability
Open Settings → 🐦 Feishu tab → enter App ID and App Secret
The bot auto-connects on server restart

Scheduled Tasks

Ask the assistant to "create a scheduled task" or "set up a daily cron job" — it will create a proactive agent that runs on your specified schedule. Manage all tasks from the SCHEDULER badge in the status bar.

Healthcheck

python healthcheck.py

Platform Support

Tofu runs on Linux, macOS, and Windows. All platform-specific code is isolated in lib/compat.py.

Feature	Linux	macOS	Windows
Core chat & tools	✅	✅	✅
PostgreSQL auto-bootstrap	✅	✅	✅ (PG bin/ must be in PATH)
Project co-pilot (file tools)	✅	✅	✅
`run_command` (basic)	✅	✅	✅ (uses `cmd.exe`)
`run_command` interactive stdin	✅ (via `/proc`)	❌ (non-interactive)	❌ (non-interactive)
FUSE keepalive daemon	✅ (DolphinFS)	— (not needed)	— (not needed)
Desktop agent	✅	✅	✅
Browser extension	✅	✅	✅
Dangerous command blocking	✅ (Unix + Windows patterns)	✅	✅

Smoke test: python debug/test_cross_platform.py validates the compat layer on any platform.

Testing

# All tests
python tests/run_all.py

# Individual suites
python -m pytest tests/test_backend_unit.py
python -m pytest tests/test_api_integration.py
python -m pytest tests/test_visual_e2e.py
python -m pytest tests/test_db_bug_regressions.py

Security

No secrets in source — all credentials loaded from environment variables or Settings UI
Single-user mode — no multi-tenant auth; deploy behind a VPN or reverse proxy
Tool execution — the assistant can run shell commands and edit files; use with caution
Desktop agent — requires explicit --allow-write / --allow-exec flags

Contributing

See CONTRIBUTING.md for the full guide. Quick version:

Fork → feature branch
python healthcheck.py && python tests/run_all.py
Submit a pull request

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
browser_extension		browser_extension
data		data
debug		debug
docs		docs
lib		lib
routes		routes
static		static
tests		tests
uploads/images		uploads/images
.coverage		.coverage
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
VERSION		VERSION
bootstrap.py		bootstrap.py
docker-compose.yml		docker-compose.yml
healthcheck.py		healthcheck.py
index.html		index.html
install.ps1		install.ps1
install.py		install.py
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
ruff.toml		ruff.toml
server.py		server.py
trading.html		trading.html

Folders and files

Latest commit

History

Repository files navigation

What is Tofu?

Features

Multi-Model Chat

Tool Calling & Agent Mode

Multi-Agent Swarm

Browser Extension

Desktop Agent

Feishu (Lark) Bot

Daily Reports & To-Do

Scheduled Tasks

🔀 CLI Backend Switching (NEW)

More

Quick Start

Option A: One-Command Install (recommended)

Option B: Docker (zero dependencies)

Option C: Manual Install

Automatic Dependency Repair

Configuration

⚙️ General

🔗 Providers

📦 Display

🔍 Search & Fetch

🌐 Network

🐦 Feishu (Lark)

</> Advanced

Environment Variables (fallback)

Project Structure

Advanced Usage

CLI Backend Switching — Claude Code / Codex

Install Claude Code

Install Codex

Use in Tofu

Feature availability by backend

Project Co-Pilot

Multi-Agent Swarm

Browser Extension

Desktop Agent

Feishu Bot

Scheduled Tasks

Healthcheck

Platform Support

Testing

Security

Contributing

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`</>` Advanced

Packages