PHANTOM

Local-first document intelligence for private data pipelines.
Classify, sanitize, index, and query unstructured data without sending it to a cloud by default.

Quickstart · Architecture · Capabilities · Runtime · Development · Topology

Phantom turns messy folders of documents, logs, configs, and code into structured intelligence: searchable chunks, sensitivity findings, sanitized exports, RAG-ready vector indexes, and audit-friendly processing reports.

It is built for operators who care about data boundaries. The default path is local inference through llama.cpp, local vector search with FAISS/BM25, and a reproducible Nix development environment. Cloud providers can be added through the provider abstraction, but the core workflow does not require them.

Data stays local. Search gets smarter. Operators keep control.

Why Phantom

Need	Phantom gives you
Keep private data local	Local-first processing, `llama.cpp` provider, no cloud dependency by default
Understand large document sets	CORTEX chunking, embeddings, insight extraction, and RAG chat
Prepare data safely	DAG classification, PII detection, pseudonymization, sanitization, quarantine
Search beyond keywords	Hybrid retrieval with FAISS dense search plus BM25 sparse search
Operate like a real system	Typer CLI, FastAPI service, Prometheus metrics, Nix, Docker, tests
Give users a GUI	Cortex Desktop, a Tauri 2 + SvelteKit client for the local API

Architecture Tour

flowchart LR
    raw[Raw files] --> dag[DAG pipeline<br/>classify + sanitize]
    dag --> cortex[CORTEX<br/>chunk + extract]
    cortex --> vectors[FAISS + BM25<br/>hybrid retrieval]
    vectors --> rag[RAG chat<br/>streaming API]

    cli[phantom CLI] --> dag
    api[FastAPI service] --> dag
    desktop[Cortex Desktop] --> api
    providers[LLM providers<br/>llama.cpp first] --> cortex
    events[NATS hooks] --> api

phantom/
├── src/phantom/        # Python runtime: CLI, FastAPI, CORTEX, RAG, DAG, providers
├── cortex-desktop/     # Tauri 2 + SvelteKit desktop client
├── intelagent/         # Rust agent and quality-gate primitives
├── spectre/            # Companion signal/pattern extraction scaffold
├── nix/ + flake.nix    # Reproducible development, packages, and checks
├── docs/               # Architecture, guides, deployment notes, history
├── arch/               # Generated architecture reports
├── tests/              # Unit, integration, and e2e tests
└── .archive/           # Historical experiments and dead-code snapshots

For the canonical topology map, see docs/architecture/project_topology.rst.

Quickstart

Phantom is happiest inside its pinned Nix shell.

git clone https://github.com/VoidNxSEC/phantom
cd phantom

nix develop
just test
just serve

Then check the API:

curl http://localhost:8008/health

Run the desktop client:

just desktop

Use the CLI directly:

phantom scan ./documents
phantom classify ./documents --dry-run
phantom rag ingest ./docs --collection local
phantom rag query "What are the main compliance risks?" --collection local

Core Capabilities

CORTEX Document Engine

CORTEX splits large inputs into semantic chunks, embeds them locally, and extracts structured insights through an LLM provider.

Document -> SemanticChunker -> EmbeddingGenerator -> LLM Provider -> Pydantic schema

It is designed for long documents, bounded context windows, and GPU-aware local inference.

Hybrid Vector Search

Phantom combines dense semantic search with sparse keyword retrieval.

Query -> FAISS cosine search ----+
                                 +-> Reciprocal Rank Fusion -> ranked results
Query -> BM25 keyword search ----+

Index and search through HTTP:

curl -X POST http://localhost:8008/vectors/index \
  -F "file=@docs/architecture/CORTEX_V2_ARCHITECTURE.md"

curl -X POST http://localhost:8008/vectors/search \
  -H "Content-Type: application/json" \
  -d '{"query": "semantic chunking tradeoffs", "top_k": 5, "mode": "hybrid"}'

Streaming RAG Chat

curl -N -X POST http://localhost:8008/api/chat/stream \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Summarize the indexed architecture decisions.",
    "conversation_id": "demo",
    "history": [],
    "context_size": 5
  }'

Sanitization and Chain of Custody

The DAG pipeline classifies files, detects sensitive patterns, optionally sanitizes content, records fingerprints, and isolates suspicious outputs.

Stage	Purpose
Discover	Walk input trees and prepare file records
Fingerprint	Capture SHA256, BLAKE3, xxHash, size, and timestamps
Classify	Detect document, code, data, config, log, crypto, media, and unknown files
Detect	Find PII, secrets, keys, tokens, network indicators, and identifiers
Sanitize	Strip metadata, redact PII, or perform full sanitization
Persist	Write audit records, reports, outputs, and quarantine entries

Runtime Surfaces

Surface	Entry point	What it owns
CLI	`phantom`	Extraction, analysis, classification, scans, RAG, tools, API startup
API	`phantom-api` / `just serve`	Health, metrics, upload, process, vector, chat, pipeline, judge endpoints
Desktop	`just desktop`	Tauri/Svelte GUI for local workflows
Nix	`nix develop`, `nix build`, `nix flake check`	Reproducible shell, packages, and checks
Docker	`Dockerfile`	OCI fallback for non-Nix environments
IntelAgent	`intelagent/`	Rust agent abstractions and quality-gate primitives

API Snapshot

The FastAPI server exposes OpenAPI docs at /docs when running.

Area	Endpoints
Health	`GET /health`, `GET /ready`, `GET /metrics`, `GET /api/system/metrics`
Documents	`POST /extract`, `POST /process`, `POST /upload`, `POST /api/upload`
Vectors	`POST /vectors/index`, `POST /vectors/batch-index`, `POST /vectors/search`
Chat	`POST /api/chat`, `POST /api/chat/stream`, `GET /api/models`, `POST /api/prompt/test`
Pipeline	`POST /api/pipeline`, `POST /api/pipeline/scan`
Integrations	`GET /rag/query`, `POST /judge`

Development

nix develop          # enter the pinned shell
just                 # list available recipes
just lint            # ruff + mypy
just fmt             # ruff format
just test            # pytest
just test-cov        # pytest with coverage report
just ci              # lint + tests
just check           # nix flake checks
just stats           # project statistics

Useful focused commands:

just test-file tests/unit/test_vector_store.py
just test-match "rag"
just ruff-fix
just audit

Documentation

Document	Purpose
Project Topology	Canonical map of live code, docs, generated reports, and archive areas
CORTEX Architecture	Chunking, embeddings, vector storage, retrieval, and VRAM notes
Roadmap	Shipped, active, and planned work
Deployment	Deployment notes for production surfaces
Desktop Setup	Cortex Desktop development setup
Security Policy	Vulnerability reporting and security process

Current Status

Component	Status
Python CLI and core package	Live
FastAPI service and metrics	Live
CORTEX chunking and extraction	Live
FAISS/BM25 retrieval	Live
DAG classification and sanitization	Live
Cortex Desktop	Beta
IntelAgent Rust workspace	Scaffolded
Cloud LLM providers	Planned
Redis semantic cache	Planned
Helm/Kubernetes packaging	Planned

Roadmap

Near-term work:

Finish desktop sub-components and frontend test infrastructure.
Add a system metrics dashboard tab wired to /api/system/metrics.
Implement markdown/code rendering in chat.
Add Redis or in-memory semantic caching for repeated embeddings and queries.
Expand provider implementations beyond the current llama.cpp path.

Longer-term work:

Standalone Linux/macOS binaries.
Docker/OCI hardening.
NixOS module for system-level deployment.
Distributed and multi-node processing.
IntelAgent advanced governance, memory, quality, MCP, and ZK features.

Security

Phantom is designed for sensitive local workloads, but it is still alpha-stage software. Treat it as an operator tool, review outputs before production use, and keep test datasets separate from regulated production data until your own controls are in place.

Found a vulnerability? See SECURITY.md.

License

Apache 2.0. See LICENSE.

Contributing

Read CONTRIBUTING.md before opening a PR. For architecture changes or significant API modifications, open an issue with the proposed design and the affected runtime surfaces.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.archive		.archive
.audit		.audit
.devcontainer		.devcontainer
.github		.github
.windsurf/rules		.windsurf/rules
arch		arch
cortex-desktop		cortex-desktop
demo_input		demo_input
docs		docs
input_data		input_data
nix		nix
scripts		scripts
skills		skills
spectre		spectre
src/phantom		src/phantom
tests		tests
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
FILES_MODIFIED.md		FILES_MODIFIED.md
IMPLEMENTATION_STATUS.md		IMPLEMENTATION_STATUS.md
INIT.md		INIT.md
JOB.md		JOB.md
LICENSE		LICENSE
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
RELEASE-v0.1.0.md		RELEASE-v0.1.0.md
RESOURCES.sh		RESOURCES.sh
SECURITY.md		SECURITY.md
SESSION_SUMMARY.md		SESSION_SUMMARY.md
flake-enhanced.nix		flake-enhanced.nix
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PHANTOM

Why Phantom

Architecture Tour

Quickstart

Core Capabilities

CORTEX Document Engine

Hybrid Vector Search

Streaming RAG Chat

Sanitization and Chain of Custody

Runtime Surfaces

API Snapshot

Development

Documentation

Current Status

Roadmap

Security

License

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PHANTOM

Why Phantom

Architecture Tour

Quickstart

Core Capabilities

CORTEX Document Engine

Hybrid Vector Search

Streaming RAG Chat

Sanitization and Chain of Custody

Runtime Surfaces

API Snapshot

Development

Documentation

Current Status

Roadmap

Security

License

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages