Catch phishing URLs before they catch you.
Heuristic phishing URL analyzer for SOC/DFIR workflows. Offline core — no API keys, never fetches the analyzed URL. Optional --osint flag adds DNS, RDAP, and crt.sh CT-log enrichment.
See the full documentation for every command, flag, and output mode.
Built-in guide: barb manual (and barb manual analyzers / osint / pipeline / config / output / examples).
- 12 heuristic analyzers: entropy, homoglyph, TLD, subdomain, brand impersonation, URL shortener, encoding abuse, IP-based URLs, typosquat, keyword, lexical, file extension
- 5-tier verdict: SAFE / LOW_RISK / SUSPICIOUS / HIGH_RISK / PHISHING with severity-floor escalation
- Zero API keys required for core analysis — offline, no external calls
- Opt-in
--osintenrichment: DNS resolution + RDAP registration lookups + crt.sh CT-log queries + ASN lookup (stdlib only, no API key); never fetches the analyzed URL - Allowlist false-positive suppression: ~71 known-good domains suppress noisy domain-based signals; path/query signals still fire
- OSINT result cache: SQLite cache at
~/.barb/cache.db(default TTL 6 h); bypass with--no-cache - Output formats: Rich tables, console, JSON, NDJSON, CSV, STIX 2.1
--explainflag: template-based explanation by default, optional LLM (Anthropic Claude, OpenAI, or local Ollama)--versionflag: report the installed version (barb --versionorbarb version)- Offline eval harness (
eval/): measures precision/recall/F1 against a labeled URL corpus; wired into CI as a detection-quality regression gate - Batch processing: analyze URL lists from files, stdin, or multiple arguments
- Automation-ready: exit codes (0=safe, 1=suspicious, 2=phishing, 3=error),
--thresholdfiltering - IOC defanging: automatic in terminal output (
hxxps[://]evil[.]com); accepts defanged IOCs on input (hxxp://,[.],[dot],[at], fullwidth, zero-width) — refanged before analysis - Configurable scoring: per-analyzer weights and verdict thresholds via YAML
- Minimal dependencies: 5 core packages (typer, rich, pydantic, pyyaml, python-dotenv)
From PyPI:
pip install barb-phishWith LLM support (optional):
pip install barb-phish[llm]From source:
git clone https://github.com/duathron/barb.git
cd barb
pip install -e ".[dev]"Analyze a single URL:
barb analyze https://suspicious-site.tk/paypal-loginPaste a defanged IOC directly from a threat report:
barb analyze 'hxxp://evil[.]com/login'Batch analysis from file:
barb analyze -f urls.txt -o jsonWith explanation:
barb analyze https://pаypal.com --explainWith OSINT enrichment (DNS + RDAP, opt-in):
barb analyze https://suspicious-site.tk/paypal-login --osintForce fresh OSINT lookups, bypass cache:
barb analyze https://suspicious-site.tk/paypal-login --osint --no-cachePipe from stdin:
cat urls.txt | barb analyze -o csvRefresh the allowlist from Tranco (opt-in):
barb update-databarb update-data [--top-n N] [--source URL] [--quiet]
Downloads the Tranco top-1M list over HTTPS and writes
the top --top-n domains (default: 5000) to ~/.barb/data/allowlist.json.
The bundled curated list is never overwritten — it is always merged in.
| Flag | Default | Description |
|---|---|---|
--top-n |
5000 |
Number of Tranco domains to include |
--source |
https://tranco-list.eu/top-1m.csv.zip |
HTTPS source URL (non-https rejected) |
--quiet |
off | Suppress progress messages |
Key guarantees:
- Opt-in only —
barb analyzenever triggers a download. Onlybarb update-datadoes. - Never automatic — no background refresh, no scheduled task.
- HTTPS only — non-
https://source URLs are rejected immediately (no network call made). - Bundled list is the default — a user who never runs
update-datasees the bundled curated list, with zero change in detection behavior. - User-override location — writes to
~/.barb/data/allowlist.json(0o600, directory0o700), never to the package data directory. - Atomic write — temp file +
os.replace; no partial writes visible. - No new dependencies — stdlib
urllibonly.
Tradeoff notice: Running
update-dataEXPANDS false-positive suppression. More domains will be treated as known-good after the update, which may reduce phishing signals for less-known but legitimate domains.
╭──────────────────────── barb ────────────────────────╮
│ URL hxxp[://]192[.]168[.]1[.]1/paypal-login │
│ Verdict ⚠ SUSPICIOUS │
│ Score 4.0 │
╰──────────────────────────────────────────────────────╯
Severity Analyzer Finding
HIGH ip_url URL uses IP address instead of domain
LOW subdomain Domain has 4 levels
barb analyze http://evil.tk/login -o json{
"url": "http://evil.tk/login",
"verdict": "SUSPICIOUS",
"risk_score": 4.0,
"signals": [
{"analyzer": "tld", "severity": "MEDIUM", "detail": "Suspicious TLD: .tk"}
]
}One compact JSON object per line — suitable for streaming pipelines and log aggregators.
barb analyze http://evil.tk/login -o ndjsonEmits a STIX bundle with indicator objects for SUSPICIOUS / HIGH_RISK / PHISHING verdicts (deterministic IDs, confidence mapped from verdict).
barb analyze http://evil.tk/login -o stix| Analyzer | What it detects | Example |
|---|---|---|
| Entropy | High Shannon entropy in domain/path | x7k2m9p.evil.com |
| Homoglyph | Unicode confusables + mixed-script labels (Latin+Cyrillic); pure non-ASCII IDN emits a LOW informational signal | pаypal.com (Cyrillic 'а') |
| TLD | Suspicious top-level domains | paypal-login.tk |
| Subdomain | Excessive depth / squatting patterns | secure.paypal.com.evil.com |
| Brand | Brand name in non-brand domain | paypal-secure.evil.com |
| Shortener | Known URL shortener services | bit.ly/abc123 |
| Encoding | Percent-encoding / punycode abuse | %70%61%79pal.com |
| IP URL | IP address instead of domain; @-obfuscation on a domain host → CRITICAL |
http://192.168.1.1/login, paypal.com@evil.com |
| Typosquat | ASCII brand lookalikes via Levenshtein 1–2 + digit↔letter swaps; skips official brand domains | paypa1.com, g00gle.com |
| Keyword | Phishing keywords in path/query (login, verify, secure, webscr, bank, …); one aggregated LOW signal | /login/verify-account |
| Lexical | URL length, hyphen count, digit ratio; LOW signals | my-secure-bank-update-2024.com |
| File Ext | Suspicious file extensions in the URL path; double-extension masquerade → HIGH, single executable/script → LOW, archive → INFO | invoice.pdf.exe, setup.ps1 |
Opt-in, off by default, fail-open. Queries infrastructure metadata about the domain — never fetches the analyzed URL.
| Enricher | What it checks | Signals |
|---|---|---|
| DNS | Resolves the host via socket.getaddrinfo (stdlib, timeout 2 s) |
HIGH on loopback/sinkhole IP; MEDIUM on private IP or NXDOMAIN |
| RDAP | IANA RDAP bootstrap, urllib (stdlib, no API key, timeout 5 s) |
HIGH if domain <30 days old; MEDIUM if <90 days; LOW if registrant privacy/redacted |
| crt.sh | Certificate-transparency log query via crt.sh (Sectigo), urllib (stdlib, no API key, timeout 8 s); sends only the hostname |
MEDIUM if newest cert <7 days old; LOW if <30 days; INFO if no CT records found |
| ASN | Resolves the host to an IP, then queries Team Cymru WHOIS (whois.cymru.com, port 43) for the hosting ASN; stdlib socket, no API key, timeout 3 s; sends only the resolved IP |
INFO — AS number, name, country, and BGP prefix for analyst pivoting; no score impact |
Results are cached per host in ~/.barb/cache.db (SQLite, TTL 6 h). Use --no-cache to force fresh lookups.
Evaluated against a labeled corpus of 800 URLs — 300 phishing (OpenPhish feed) + 500 benign (Tranco top-500) — built with eval/fetch_corpus.py and scored with eval/run_eval.py. Alert tier: verdict ≥ SUSPICIOUS counts as a positive.
| Metric | v1.4.1 (offline, snapshot 2026-06-01) |
|---|---|
| Precision | 1.00 — zero false positives on 500 benign URLs |
| Recall | 0.07 — 22 of 300 phishing URLs caught |
| False-positive rate | 0.00 — 0 of 500 benign URLs flagged |
Important
barb is a high-precision URL-structure pre-filter, not a standalone catch-all. Trust a positive — when barb flags SUSPICIOUS or higher, it is reliable. Low recall is by design: barb analyzes URL structure only and never fetches the URL, so phishing on clean domains (github.io, pages.dev, plain .com) is an inherent limit of URL-only heuristics. That recall gap is the downstream pipeline's job: feed barb's JSON into vex (reputation/VirusTotal) and sift (correlation), and use --osint for fresh-domain signals (RDAP age, crt.sh recency) that the offline core misses.
The repo also includes a CI regression gate using a synthetic fixture (precision 1.00 / recall 0.76). That fixture is not a field measurement — it exists to catch score-regression between releases.
Reproduce the corpus numbers yourself:
python -m eval.fetch_corpus
python -m eval.run_eval --corpus eval/corpus/real.csvCreate ~/.barb/config.yaml:
scoring:
weights:
entropy: 1.0
homoglyph: 1.5
brand: 1.2
typosquat: 1.3
keyword: 0.6
lexical: 0.5
thresholds:
suspicious: 4
phishing: 13
explain:
provider: "template" # template | anthropic | openai | ollama
send_url: true # send defanged URL to LLM
# ollama_host: "http://localhost:11434" # local Ollama server (ollama provider only)
output:
default_format: "rich"
quiet: false
osint:
dns_timeout: 2 # seconds per DNS lookup
rdap_timeout: 5 # seconds per RDAP request
crtsh_timeout: 8 # seconds per crt.sh request
asn_timeout: 3 # seconds per ASN (Team Cymru) lookup
cache_ttl_hours: 6 # SQLite cache TTL (~/.barb/cache.db)Environment variable: Set BARB_LLM_KEY for cloud LLM API key (Anthropic / OpenAI).
Set provider: ollama to use a locally running Ollama server.
No API key required; all requests go to your machine.
explain:
provider: "ollama"
model: "llama3.1" # any model pulled with `ollama pull <model>`
ollama_host: "http://localhost:11434" # default; change for remote/custom port
send_url: false # maximum privacy: omit URL from promptIf Ollama is unreachable when --explain is used, barb automatically falls back to the template explainer and prints a note to stderr — the command always completes.
| Feature | barb | VirusTotal URL Scan | URLScan.io | PhishTank |
|---|---|---|---|---|
| Offline analysis | Core offline; opt-in --osint for DNS/RDAP |
No | No | No |
| API key required | No | Yes | Yes | Optional |
| Heuristic detection | 12 analyzers | Signature-based | Browser-based | Community |
| CLI tool | Yes | Web/API | Web/API | Web/API |
| LLM explanation | Optional | No | No | No |
| Self-hosted | Yes | No | No | No |
Use barb for offline heuristic URL triage. Use vex for VirusTotal IOC enrichment. Pipe barb JSON output into vex for full enrichment (v1.1).
| Code | Meaning |
|---|---|
0 |
SAFE or LOW_RISK |
1 |
SUSPICIOUS or HIGH_RISK |
2 |
PHISHING |
3 |
Error (invalid input, missing file) |
git clone https://github.com/duathron/barb.git
cd barb
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v- No HTTP requests are ever made to analyzed URLs — this holds unconditionally, including when
--osintis enabled - The offline core is pure string-based heuristics with no external calls
- The optional
--osintflag performs DNS resolution and RDAP lookups about the domain (infrastructure metadata only); it never fetches the URL itself - URL length capped at 2048 characters
- Config directory secured with 0o700 permissions
- LLM and OSINT dependencies are optional extras — core install has zero network deps
The offline core makes zero outbound connections. When you opt into --osint, barb makes three kinds of request — never to the analyzed host itself:
| Connection | Endpoint | What it reveals | Notes |
|---|---|---|---|
| DNS resolution | Your system resolver (/etc/resolv.conf: ISP/router/corporate DNS, port 53) |
The domain being looked up | Same lookup any browser would do |
| RDAP bootstrap | https://data.iana.org/rdap/dns.json |
That you use barb/RDAP | Fetched at most once per 7 days (cached at ~/.barb/rdap_bootstrap.json) |
| RDAP query | The TLD's registry RDAP server (e.g. rdap.verisign.com for .com, rdap.pir.org for .org) |
The domain being investigated | No API key; stdlib urllib only |
| crt.sh CT query | https://crt.sh/ (Sectigo) |
The domain being investigated | Reveals domain-of-interest to Sectigo; no API key; stdlib urllib only |
| ASN lookup | whois.cymru.com port 43 (Team Cymru) |
The resolved IP of the domain | Sends only the IP, not the URL or hostname; stdlib socket only; no API key |
- The suspect host is never contacted — no HTTP GET/HEAD to the URL, no DNS beacon to attacker-controlled infrastructure beyond normal name resolution.
- No credentials are ever transmitted.
- OSINT results are cached per host in
~/.barb/cache.db(default TTL 6 h), so repeat lookups make no network calls;--no-cacheforces fresh requests. - All OSINT calls are fail-open: a timeout or error simply drops the enrichment signals and analysis continues offline.
MIT License. See LICENSE.md.
Author: Christian Huhn
