Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,34 @@ breaking changes and the discipline is still being shaped.

## [Unreleased]

**Added.**

- *A `--preopen <dir>[:ro|:rw]` flag for the experimental `--wasi` mode
unblocks DYNAMIC (non-literal) `Fs` paths.* Until now a `Fs` path that
the compiler cannot prove is a string literal (one taken from a
parameter, `env.args()`, or any computed value) was REJECTED at compile
time under `--wasi`, because no static preopen ceiling could be derived
for it. `--preopen` lets the OPERATOR explicitly declare filesystem
authority over a single directory; the compiler then admits the dynamic
path and the guest resolves it AT RUNTIME relative to that directory
(the WASI `--dir` model, as in wasmtime). This is framed honestly as a
LEVEL-2 operator-DECLARED grant (analogous to `inherit_env`), NOT
program-proven authority: the compiler could not derive it, which is
precisely why the operator had to declare it. The grant is recorded in
the SBOM (manifest, CycloneDX, SPDX) under a dedicated
`operator_declared_grants` block, clearly labelled `operator-declared`
and kept DISTINCT from the compiler-derived capability surface so a
regulator never reads it as program-proven. Read / write / exists /
is_dir / mkdir / list_dir all work with a dynamic path under
`--preopen`, with byte-for-byte parity across the Python, `capa:host`
and WASI backends, and the guest-side fine attenuation (`restrict_to` /
`allows`) still gates the dynamic path lexically. WITHOUT `--preopen`,
a dynamic `Fs` path continues to be rejected at compile time exactly as
before (no regression); literal paths continue to resolve via the
compiler-derived ceiling. This increment supports a SINGLE `--preopen`
for dynamic-path resolution; passing more than one is rejected with a
clear message.

**Changed.**

- *In the experimental `--wasi` mode, a dynamic (non-literal) URL passed
Expand All @@ -30,6 +58,32 @@ breaking changes and the discipline is still being shaped.
stays at Level 2 `inherit_env` on a dynamic key and is intentionally not
aligned with this fail-closed rule).

**Fixed.**

- *In the experimental `--wasi` mode, the guest-side fine attenuation gate
(`restrict_to` / `allows`) now lexically normalises `.` and `..` path
segments before its containment check, closing a bypass on a dynamic
path.* Previously the gate did a PURELY lexical prefix comparison: a
dynamic path such as `sub/../secret.txt` (reachable since `--preopen`
began admitting dynamic `Fs` paths) starts lexically with the allowed
prefix `sub/`, so it PASSED the gate and read a sibling OUTSIDE the
`restrict_to("sub")` subtree, while the Python oracle (which
canonicalises with `os.path.realpath`) correctly DENIED it. The gate now
normalises `.`/`..` in both the path and the stored prefixes first
(`$__fs_normalize`, an `os.path.normpath`-style collapse that preserves
a leading `..` so an escape stays an escape), restoring byte-for-byte
three-backend parity (Python oracle == `capa:host` == WASI): `sub/ok.txt`
is admitted, `sub/../secret.txt` and `sub/../sub2/x.txt` are denied, and
`sub/../sub/ok.txt` (which normalises back inside) is admitted. SYMLINKS
are still not resolved by the lexical gate -- that remains the documented
Level-2 loss, now the ONLY divergence from the realpath oracle (`.`/`..`
are handled). The Level-1 preopen ceiling (enforced by wasmtime) is
unchanged and still confines an unrestricted `Fs` to the granted
directory regardless of `..`. A program that MIXES a literal `Fs` path
and a dynamic one under `--preopen` still fails closed (layer b1 does not
yet support mixing), now with a clear message that names the limitation
and the flag instead of an internal "no closed preopen ceiling" wording.

## [1.14.0], 2026-06-29

**Capa 1.14.0.** A MINOR release: an experimental, opt-in `--wasi` mode
Expand Down
120 changes: 120 additions & 0 deletions capa/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from capa import __version__ as _CAPA_VERSION
from capa.manifest import (
build_manifest, build_cyclonedx, build_spdx,
build_operator_declared_grants,
build_vex_document, build_provenance,
resolve_build_timestamp, SourceDateEpochError,
)
Expand Down Expand Up @@ -1062,6 +1063,23 @@ def _main_dispatch() -> int:
"The default capa:host path is unaffected."
),
)
parser.add_argument(
"--preopen",
action="append",
default=None,
metavar="<dir>[:ro|:rw]",
help=(
"with --wasi, grant the component filesystem authority over "
"<dir> as an OPERATOR-DECLARED preopen (Level 2, the WASI "
"--dir model), unblocking DYNAMIC (non-literal) Fs paths that "
"the compiler cannot derive a preopen for. The path is "
"resolved at runtime relative to <dir>. Append ':ro' for "
"read-only or ':rw' for read-write (default: rw). Recorded in "
"the SBOM as a declared grant, distinct from the "
"compiler-derived capability surface. This increment (b1) "
"supports a SINGLE --preopen for dynamic paths."
),
)
parser.add_argument(
"--wasm-memory-cap",
type=int,
Expand Down Expand Up @@ -1276,11 +1294,18 @@ def _main_dispatch() -> int:
else:
print(msg, file=sys.stderr)
return 1
# WASI Fs layer b1: the operator-declared grant block (--preopen),
# surfaced in the manifest / CycloneDX / SPDX as Level-2
# operator-declared authority, distinct from the derived surface.
_operator_grants = _operator_grants_from_args(
getattr(args, "preopen", None)
)
if args.manifest:
import json
manifest = build_manifest(
module, filename=filename,
expr_labels=result.expr_labels,
operator_declared_grants=_operator_grants,
)
emit_artifact(json.dumps(manifest, indent=2))
return 0
Expand Down Expand Up @@ -1308,6 +1333,7 @@ def _main_dispatch() -> int:
sources=linked.sources if linked is not None else None,
timestamp=build_ts,
expr_labels=result.expr_labels,
operator_declared_grants=_operator_grants,
)
emit_artifact(json.dumps(sbom, indent=2))
return 0
Expand All @@ -1318,6 +1344,7 @@ def _main_dispatch() -> int:
sources=linked.sources if linked is not None else None,
timestamp=build_ts,
expr_labels=result.expr_labels,
operator_declared_grants=_operator_grants,
)
emit_artifact(json.dumps(sbom, indent=2))
return 0
Expand Down Expand Up @@ -1415,6 +1442,31 @@ def _main_dispatch() -> int:
print(msg, file=sys.stderr)
return 1

# ``--preopen`` (layer b1) is meaningful in --wasi mode (the
# operator-declared filesystem grant that unblocks dynamic Fs paths)
# AND when emitting an SBOM / manifest (it records the same grant as
# operator-declared authority, distinct from the derived surface).
# Reject it on any OTHER invocation with an actionable message rather
# than silently ignore it.
_emitting_sbom = bool(
getattr(args, "manifest", False) or getattr(args, "cyclonedx", False)
or getattr(args, "spdx", False)
)
if (getattr(args, "preopen", None)
and not bool(getattr(args, "wasi", False))
and not _emitting_sbom):
msg = (
"capa: --preopen requires --wasi (or an SBOM / --manifest "
"command): it is the operator-declared filesystem grant for "
"the WASI mode, recorded in the SBOM; it has no effect on the "
"default execution backend"
)
if use_color:
print(f"{C.RED}{msg}{C.RESET}", file=sys.stderr)
else:
print(msg, file=sys.stderr)
return 1

if (
args.run and not args.wasm and prefer_wasm
and _wasm_tooling_available()
Expand Down Expand Up @@ -1463,6 +1515,29 @@ def _main_dispatch() -> int:
else:
print(msg, file=sys.stderr)
return 1
# WASI Fs layer b1: parse the operator ``--preopen``. b1 supports a
# SINGLE preopen for dynamic-path resolution; reject more than one
# with a clear message rather than silently picking one. The
# presence of a preopen is the signal (``wasi_dynamic_fs``) that
# suppresses the compiler's dynamic-Fs-path rejection, and the
# parsed ``(host_dir, read_write)`` is the host grant.
fs_operator_preopen = None
wasi_dynamic_fs = False
preopen_specs = getattr(args, "preopen", None) or []
if preopen_specs:
if len(preopen_specs) > 1:
msg = (
"capa: --preopen: this increment (b1) supports a "
"single --preopen for dynamic Fs paths; got "
f"{len(preopen_specs)}"
)
if use_color:
print(f"{C.RED}{msg}{C.RESET}", file=sys.stderr)
else:
print(msg, file=sys.stderr)
return 1
fs_operator_preopen = _parse_preopen_spec(preopen_specs[0])
wasi_dynamic_fs = True
if result is None:
result = analyze(module, source=source, filename=filename)
try:
Expand All @@ -1472,6 +1547,7 @@ def _main_dispatch() -> int:
memory_cap_pages=wasm_memory_cap,
filename=filename,
wasi=wasi_mode,
wasi_dynamic_fs=wasi_dynamic_fs,
)
print(wat)
return 0
Expand All @@ -1480,6 +1556,7 @@ def _main_dispatch() -> int:
memory_cap_pages=wasm_memory_cap,
filename=filename,
wasi=wasi_mode,
wasi_dynamic_fs=wasi_dynamic_fs,
)
except Exception as e:
msg = f"capa: --wasm: {e}"
Expand Down Expand Up @@ -1589,6 +1666,7 @@ def _main_dispatch() -> int:
wasi=wasi_mode,
env_ceiling=env_ceiling,
fs_ceiling=fs_ceiling,
fs_operator_preopen=fs_operator_preopen,
net_ceiling=net_ceiling,
)
host.run_main(component_blob)
Expand Down Expand Up @@ -1763,6 +1841,48 @@ def _main_dispatch() -> int:
return 0


def _parse_preopen_spec(spec: str) -> tuple[str, bool]:
"""Parse one ``--preopen`` value ``<dir>[:ro|:rw]`` into
``(host_dir, read_write)``.

The default permission is READ_WRITE (``rw``), the WASI ``--dir``
default; an explicit ``:ro`` suffix makes it READ_ONLY and ``:rw`` is
READ_WRITE. Only a trailing ``:ro`` / ``:rw`` is treated as a
permission suffix, so a directory name that itself contains a colon
(or a Windows drive ``C:\\...``) is preserved -- the split is on the
LAST ``:`` and only when the tail is exactly ``ro`` / ``rw``."""
read_write = True
host_dir = spec
if ":" in spec:
head, _, tail = spec.rpartition(":")
if tail in ("ro", "rw") and head:
host_dir = head
read_write = tail == "rw"
return (host_dir, read_write)


def _operator_grants_from_args(preopen_specs) -> dict | None:
"""Build the SBOM ``operator_declared_grants`` block from the
``--preopen`` specs, or None when none were declared.

Each spec ``<dir>[:ro|:rw]`` becomes a preopen entry; the block is
honestly labelled operator-declared (Level 2) by
:func:`capa.manifest.build_operator_declared_grants`, distinct from
the compiler-derived surface."""
specs = preopen_specs or []
if not specs:
return None
preopens = []
for spec in specs:
host_dir, read_write = _parse_preopen_spec(spec)
preopens.append({
"kind": "fs",
"host_dir": host_dir,
"permission": "rw" if read_write else "ro",
})
return build_operator_declared_grants(preopens)


def _wrap_as_component(
core_wasm: bytes, wit_text: str, *, wasi: bool = False,
) -> bytes:
Expand Down
6 changes: 6 additions & 0 deletions capa/ir/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ def emit_wat(
memory_cap_pages: int | None = ..., # type: ignore[assignment]
manifest_json: str | None = None,
wasi: bool = False,
wasi_dynamic_fs: bool = False,
) -> str:
"""Emit WebAssembly text format (WAT) from a CIR module.

Expand All @@ -178,6 +179,7 @@ def emit_wat(
memory_cap_pages=memory_cap_pages,
manifest_json=manifest_json,
wasi=wasi,
wasi_dynamic_fs=wasi_dynamic_fs,
).emit(ir_module)


Expand All @@ -189,6 +191,7 @@ def compile_wat(
filename: str = "<input>",
embed_manifest: bool = True,
wasi: bool = False,
wasi_dynamic_fs: bool = False,
) -> str:
"""End-to-end AST -> CIR -> WAT convenience helper. Mirrors
:func:`compile` but targets the Wasm Component Model text form
Expand Down Expand Up @@ -244,6 +247,7 @@ def compile_wat(
memory_cap_pages=memory_cap_pages,
manifest_json=manifest_json,
wasi=wasi,
wasi_dynamic_fs=wasi_dynamic_fs,
)


Expand Down Expand Up @@ -329,6 +333,7 @@ def compile_wasm(
filename: str = "<input>",
embed_manifest: bool = True,
wasi: bool = False,
wasi_dynamic_fs: bool = False,
) -> bytes:
"""End-to-end AST -> CIR -> WAT -> binary Wasm assembly.

Expand All @@ -350,6 +355,7 @@ def compile_wasm(
filename=filename,
embed_manifest=embed_manifest,
wasi=wasi,
wasi_dynamic_fs=wasi_dynamic_fs,
)
proc = subprocess.run(
[wasm_tools_path, "parse", "-"],
Expand Down
46 changes: 46 additions & 0 deletions capa/ir/_emit_wasm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ def __init__(
memory_cap_pages: Optional[int] = MEMORY_CAP_DEFAULT_PAGES,
manifest_json: Optional[str] = None,
wasi: bool = False,
wasi_dynamic_fs: bool = False,
):
# Experimental opt-in (2026-06-27): when True, Random.system_seed
# and Clock.now_secs / now_monotonic import canonical WASI
Expand All @@ -211,6 +212,29 @@ def __init__(
# untouched all-``capa:host`` behaviour. See
# ``docs/design/wasi_mode.md``.
self._wasi: bool = wasi
# WASI Fs layer b1 (operator preopen, 2026-06-30): True when the
# operator declared ``--preopen <dir>`` for this run, granting the
# component filesystem authority over that directory and so
# UNBLOCKING dynamic (non-literal) Fs paths under ``--wasi``. A
# dynamic path is resolved at RUNTIME relative to the single
# operator preopen (the WASI ``--dir`` model, wasmtime's
# convention), framed honestly as a LEVEL-2 operator-DECLARED
# grant (see ``docs/design/wasi-attenuation.md``), distinct from
# the COMPILER-DERIVED preopen ceiling. When False (the default),
# a dynamic Fs path is REJECTED at compile time exactly as before
# -- this flag is the ONLY thing that suppresses that rejection.
#
# b1 INDEX RULE (emitter <-> host agreement): the operator preopen
# is the LAST preopen the host registers, AFTER every
# compiler-derived ceiling preopen, so it never shifts an existing
# literal call site's index. In the dynamic case the derived
# ceiling is NOT closed and so contributes NO preopens, leaving
# the operator preopen at index 0; the dynamic call-site emitter
# therefore addresses it with the constant
# ``_wasi_operator_preopen_index`` (0 whenever the ceiling is open,
# i.e. exactly the dynamic case). The host computes the same index
# (len(derived preopens)) so the two never disagree.
self._wasi_dynamic_fs: bool = wasi_dynamic_fs
self._lines: List[str] = []
self._indent = 0
self._unit = indent_unit
Expand Down Expand Up @@ -311,6 +335,27 @@ def __init__(
# chain's result areas), 0 when Net.get is not used.
self._wasi_net_scratch_offset = 0

# ----- WASI operator-preopen (layer b1) ----------------------

def _wasi_operator_preopen_index(self) -> int:
"""The preopen INDEX the operator ``--preopen`` directory occupies
on the host, for the dynamic-Fs-path call-site emitter to address.

b1 index rule: the host registers the operator preopen AFTER every
compiler-derived ceiling preopen, so its index is the number of
derived preopens. A dynamic Fs path (the only thing that reaches
the operator preopen) requires a NOT-CLOSED ceiling, which
contributes NO derived preopens, so this is 0 in the dynamic case.
For a fully-literal program (closed ceiling) the operator preopen
sits at ``len(ceiling.preopens)`` and is unused by the guest (no
dynamic call site), but still registered + recorded for honesty;
the constant returned here matches the host's registration order
either way."""
ceiling = self._fs_ceiling
if ceiling is None or not getattr(ceiling, "closed", False):
return 0
return len(ceiling.preopens)

# ----- public ------------------------------------------------

def emit(self, module: Module) -> str:
Expand Down Expand Up @@ -1111,6 +1156,7 @@ def emit(self, module: Module) -> str:
or self._wasi_env_uses_get_or_args()
or self._wasi_net_uses_attenuation()
or self._wasi_fs_uses_preopens
or self._wasi_fs_uses_attenuation()
or (self._wasi and ("Stdio", "read_line") in self._used_caps)
):
heap_start = _align_up(self._string_data_offset, 8)
Expand Down
Loading
Loading