Skip to content

Record the source URI of imported images at /.enroot/source#267

Open
alec-flowers wants to merge 2 commits intoNVIDIA:mainfrom
alec-flowers:feat/record-import-source
Open

Record the source URI of imported images at /.enroot/source#267
alec-flowers wants to merge 2 commits intoNVIDIA:mainfrom
alec-flowers:feat/record-import-source

Conversation

@alec-flowers
Copy link
Copy Markdown

@alec-flowers alec-flowers commented Apr 17, 2026

Summary

enroot import docker://... and enroot load now drop a tiny provenance file inside the imported image at /.enroot/source:

uri=docker://nvcr.io#nvidia/pytorch:25.06-py3

This answers the recurring question "where did this .sqsh come from?" without adding a new command, runtime behavior, or registry lookup.

Motivation

For reproducibility work around benchmark pipelines, we want to be able to look at a squashfs on disk and know which registry image produced it. enroot digest helps before import, but once imported there is no link back to the source. External sidecar metadata works until the file gets renamed, moved, or handed off.

This puts the source URI inside the image itself, so it travels with the .sqsh.

Design notes

  • URI only: the file records only uri=.... It intentionally does not fetch or store the manifest digest.
  • No extra network call: registry imports continue using the existing manifest/download flow unchanged.
  • Existing parsing: registry URIs are formatted from the values already returned by docker::_parse_uri, which handles Docker Hub shorthand, docker://REGISTRY#IMAGE, tag refs, digest refs, and USER@ credentials.
  • Credential-free: because the recorded registry URI is reconstructed from parsed registry, image, and tag, USER@ is not persisted.
  • Daemon imports: dockerd:// and podman:// imports record the original daemon URI, since there is no registry URI to canonicalize.
  • Path /.enroot/source: uses enroot's existing /.enroot namespace. enroot export already strips /.enroot/, which avoids carrying stale provenance after a rootfs is modified and re-exported.

Diff size

Net diff against main: 32 insertions / 2 deletions across src/docker.sh and doc/image-format.md.

Test plan

  • bash -n src/docker.sh

  • bash -n over src/*.sh, conf/hooks/*.sh, and enroot.in

  • git diff --check

  • Local helper tests for canonical URI formatting across Docker Hub shorthand, explicit registries, USER@, enroot # syntax, and digest refs

  • Stubbed local smoke test for docker::import and docker::load wiring that verifies 0/.enroot/source exists with the expected canonical URI before the final image/load step

  • Isolated source build with make install prefix=/tmp/... exec_prefix=/tmp/...

  • End-to-end enroot import -o busybox.sqsh docker://busybox:latest, then unsquashfs -cat busybox.sqsh .enroot/source:

    uri=docker://registry-1.docker.io#library/busybox:latest
  • End-to-end enroot load -n busybox-load-e2e docker://busybox:latest, then cat busybox-load-e2e/.enroot/source:

    uri=docker://registry-1.docker.io#library/busybox:latest
  • Runtime visibility check with enroot start busybox-load-e2e cat /.enroot/source:

    uri=docker://registry-1.docker.io#library/busybox:latest

enroot import docker://... and enroot load now write a small provenance
file inside the image rootfs recording the URI and manifest digest. The
URI is captured as provided to enroot, with any USER@ credential
component stripped. dockerd:// and podman:// imports record the URI
only (no registry digest available).

The file can be read with unsquashfs -cat image.sqsh .enroot/source
or, once the image is unpacked, from inside a running container.

No new CLI, no runtime.sh changes: enroot export already strips
/.enroot/ which is correct behavior here, since a rootfs modified and
re-exported is no longer the image at the original URI.

Signed-off-by: Alec Flowers <aflowers@nvidia.com>
@alec-flowers
Copy link
Copy Markdown
Author

alec-flowers commented Apr 17, 2026

Superseded by the lighter URI-only revision in 6878f51.

This comment described the original version of the PR, which recorded both uri= and digest=. That is no longer accurate: the current implementation records only uri=..., avoids the extra manifest HEAD request, and formats the URI from docker::_parse_uri outputs.

See the updated PR description for the current end-to-end validation results for enroot import, enroot load, and runtime visibility via enroot start ... cat /.enroot/source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant