Real-time streaming ptychographic reconstruction using NVIDIA Holoscan, developed for the HXN beamline at NSLS-II. Holoptycho is for real-time streaming reconstruction only — it consumes live detector data via ZMQ and emits results to a Tiled catalog as the scan runs.
For batch/offline reconstruction of completed scans, use NSLS2/ptycho or NSLS2/ptychoml directly.
The image is built and pushed to ACR on every merge to main (.github/workflows/build-container.yml).
Run all of the following from the root of a holoptycho clone (the pixi commands and ./start.sh are relative to it).
-
Allocate a GPU node on slurm:
salloc --gres=gpu:1 --mem=64G --cpus-per-gpu=2 --account=staff
-
Log in to Azure (needs read on
genesisdemoskvand pull ongenesisdemosacr):az login
-
Cache a personal Tiled token (skip if you'll only use
--api-key):pixi install -e client pixi run -e client tiled profile create https://tiled.nsls2.bnl.gov --name nsls2 pixi run -e client tiled login --profile nsls2
-
Start the container — pick one:
./start.sh # personal Tiled auth, foreground ./start.sh -d # detached ./start.sh --api-key # shared TILED_API_KEY from Key Vault
The API binds to 127.0.0.1:8000 on the node — see Connect via SSH tunnel for remote access. See start.sh for what each step does inside the script.
The API binds to 127.0.0.1:8000 (localhost only). For remote access, open an SSH tunnel:
ssh -L 8000:localhost:8000 <user>@<host>To test holoptycho end-to-end without a live beamline, use scripts/replay_from_tiled.py. It reads a real scan from Tiled and publishes it over ZMQ on the same node as holoptycho, in the exact Eiger and PandA wire formats. Both the replay script and holoptycho must run on the same machine — ZMQ traffic stays local.
# 1. Authenticate with Tiled and install the replay env (once)
tiled profile create https://tiled.nsls2.bnl.gov --name nsls2
tiled login --profile nsls2
pixi install -e replay
# 2. If holoptycho has no selected engine yet, choose one before --hp-start
hp model set run042901
hp model status
# 3. Run the replay. Use --scan-id to look up the run automatically (newest
# run with that scan_id wins — scan_id is not unique), or pass --uid
# directly if you already have a UUID. The --tiled-url, --hp-url, and
# --eiger/panda-endpoint flags all default to the HXN-typical values.
pixi run -e replay replay --scan-id 404611 --mode vitBy default the replay script publishes plain ZMQ. To test CurveZMQ, also
pass the full Eiger key set: --eiger-server-public-key,
--eiger-server-secret-key, and --eiger-client-public-key.
The container must be started with SERVER_STREAM_SOURCE=tcp://localhost:5555 and PANDA_STREAM_SOURCE=tcp://localhost:5556. By default, leave SERVER_PUBLIC_KEY, CLIENT_PUBLIC_KEY, and CLIENT_SECRET_KEY unset so holoptycho subscribes without CurveZMQ. To test CurveZMQ, set all three in the container and pass the matching Eiger publisher keys to scripts/replay_from_tiled.py. Partial auth configuration is rejected on both sides. Control holoptycho from your local machine as normal via the 8000 SSH tunnel.
--tiled-url may be either the Tiled server root (https://tiled.nsls2.bnl.gov) or a catalog path (https://tiled.nsls2.bnl.gov/hxn/migration). The replay and config loaders resolve either form.
When --hp-start is used, the replay script builds the run config from the
same run metadata and chooses /run or /restart automatically based on the
current holoptycho server state before publishing. If hp model status shows
no selected engine, run hp model set <model-name> once first.
For same-node testing with the replay script, localhost only works if the
container is started with --network host. With bridge networking, localhost
inside the container refers to the container itself, not the Slurm node host.
These flags only take effect when --hp-start is used (they're written into
the config the replay script POSTs to holoptycho):
--mode {iterative,vit,both}— which reconstruction branches the pipeline wires up.iterativeruns only the DM/ML solver,vitruns only the ViT inference network, andboth(default) runs them in parallel. Useful for isolating GPU-contention issues on single-GPU nodes or for comparing the two outputs side by side in Tiled (live/+final/come from iterative;vit/comes from the ViT branch).--n-iterations N— caps the iterative solver atNticks (default 500). Once the iteration counter hits the cap, holoptycho trips the natural-termination path:SaveResultwrites the final probe/object/timestamps to Tiled, thenfragment.stop_execution()releases the run loop and the pipeline subprocess exits. Use a small value (50–100) for end-to-end smoke tests; use the production value (~500) for real reconstructions.--max-frames N— only publishes the firstNframes of the scan, trimming positions to match. Handy for quick tests on big scans where downloading and replaying every frame would take too long.--skip-frames N— drops the firstNEiger frames (and aligned encoder samples) before publishing. Useful when a scan's initial rows overshoot the commanded extent during settling/ramp-up and crash the iterative recon, or when the first row of ViT predictions ends up in the wrong canvas region.--chunk-size N— number of frames per tiled fetch during streaming (default 256). The replay script pulls frames from tiled lazily rather than loading the whole scan up front, so replay starts publishing within seconds even for multi-GB scans. Smaller chunks = lower startup latency and lower peak memory; larger = fewer round-trips.--compress— opt in to bslz4 compression and publish frames in the same wire format the live Eiger uses. Off by default, becausedectris-compression 0.3.1removed the Ccompressentrypoint and the pure-Pythonbitshufflefallback gates publish throughput at ~15 fps. The default raw-bytes path uses a"raw"encoding header that holoptycho's receiver recognises; localhost ZMQ handles the ~10× larger wire size easily. Enable only when explicitly testing the decompression code path.
--nx/--nymust match the selected engine's input dimensions. These set the detector-frame crop size fed into the pipeline; the default of 256×256 matches the current HXN engines (ptycho_vit_amp_phase_b64,run042901). A mismatch with the engine input raisesValueError: could not broadcast input array from shape (256,128) into shape (128,256)at pipeline startup. The detector frame can be larger — the pipeline crops down — but it must be at leastnx × ny.- Run only one replay at a time. Concurrent
--hp-startreplays mid-stream the pipeline: the second run's/restartinterrupts the first while it's publishing, leaving PandA and Eiger out of sync. Positions stay NaN and the dashboard hangs. Kill any running replay (pkill -f replay_from_tiled) before launching a new one. - Use
--skip-framesfor scans with settling/ramp-up rows. Some scans (e.g. 404611) have the first ~10 rows where encoder readings overshoot the commanded scan range by several × and crash the iterative recon's pre-allocated object grid. The ViT branch tolerates them but stitches them into the wrong canvas region. Drop those rows. - Default to
--mode vitwhen iterating on ViT/mosaic code — fastest cycle and the iterative branch can't crash the run. --max-frames Nplus--n-iterations 50–100gets you a full end-to-end cycle (config → stream → recon → final write) in under a minute for quick smoke tests on big scans.- Leave compression off (the default). With the current
dectris-compressionpackage the Ccompressis missing, so enabling--compressfalls back to Pythonbitshuffleand gates the pipeline at ~15 frames/sec. Enable only when you specifically need to test the decompression path.
On hosts with a glibc too old to run the pixi env directly (e.g. older RHEL), use start_editable.sh to drop into a minimal CUDA+pixi container with the repo bind-mounted. Edit, commit, and push from the host as normal; only run code inside the container.
./start_editable.shThe first run builds a small cuda-dev image (nvidia/cuda runtime + pixi) — about a minute. Subsequent runs reuse it. Inside the shell:
pixi install # first time / after pixi.lock changes
pixi run tiled profile create https://tiled.nsls2.bnl.gov --name nsls2 # once per dev shell
pixi run tiled login --profile nsls2
export ENGINE_CACHE_DIR=/tmp/models
pixi run apiWhy this works:
--network hostso the holoscan app reaches host services (Azure ML / MLflow, Tiled, ZMQ streams) as if it were running on the host.- The whole repo (incl.
.pixi/) is bind-mounted at/app, so host-side edits show up inside immediately. HOME=/tmpkeeps caches and tiled tokens out of the mounted repo; they die with--rm.- Azure secrets are piped via
--env-file <(...)— an in-kernel FIFO — so they never touch disk and don't appear inps. - Tiled uses your personal identity (via
tiled login) instead of a sharedTILED_API_KEY, so you get the right access scope and a real audit trail.
Always run pixi install inside the dev container, never on the host — that way the env's binaries link against the container's glibc, which is what they run against in production. If you previously ran pixi install on the host, delete .pixi/ and re-install inside the container the first time so nothing is stale.
Holoptycho is a streaming pipeline: it receives diffraction patterns from the Eiger detector and motor positions from the PandA box over two independent ZMQ streams, reconstructs the ptychographic object iteratively on GPU, and writes results to Tiled in real time.
Pipeline operators:
EigerZmqRxOp— receives diffraction frames from the Eiger detector (encrypted CurveZMQ, bslz4 compressed)PositionRxOp— receives motor positions from the PandA box (plain ZMQ JSON)ImageBatchOp/ImagePreprocessorOp— batch and preprocess diffraction framesPointProcessorOp— maps encoder values to scan coordinatesPtychoRecon— iterative DM/ML reconstruction on GPU 0PtychoViTInferenceOp— parallel neural network inference on GPU 1
Each pipeline run produces a fresh container under hxn/processed/holoptycho/{run_uid}/ (a per-run UUID; the catalog root is overrideable via TILED_CATALOG_PATH), tagged with the synaps_project spec. Container metadata records the raw scan it was reconstructed from (raw_uid, scan_id, scan_num, started_at, recon_mode, xray_energy_kev, wavelength_m, distance_m, plus a boolean fine_tunable flag that's true iff recon_mode is iterative or both).
Every run also writes a <run>/diffraction/ subtree containing detector-frame amplitude (dp, (nz, H, W) uint8, i.e. sqrt(intensity) rounded to 8-bit) and meter-unit probe positions (probe_position_x_m, probe_position_y_m). uint8 storage cuts the on-the-wire write volume in half versus uint16 without measurable quality loss for ML (the 1-count quantization is below the Poisson noise floor). A run is usable as a ptycho-vit fine-tuning sample iff its metadata has fine_tunable: true — the iterative branch then also writes final/probe and final/object as supervised targets.
| Variable | Description |
|---|---|
SERVER_STREAM_SOURCE |
ZMQ endpoint of the Eiger detector, e.g. tcp://<host>:5555 |
PANDA_STREAM_SOURCE |
ZMQ endpoint of the PandA box, e.g. tcp://<host>:5556 |
TILED_BASE_URL |
URL of the Tiled server |
The pipeline will refuse to start if any of SERVER_STREAM_SOURCE, PANDA_STREAM_SOURCE, or TILED_BASE_URL are not set. TILED_API_KEY is optional — when unset, the writer uses the cached token from tiled login (run once: tiled profile create <url> --name <name> then tiled login --profile <name>).
| Variable | Description |
|---|---|
TILED_CATALOG_PATH |
Tiled catalog path (default: hxn/processed/holoptycho) |
HOLOPTYCHO_LOG_LEVEL |
Root log level for the API and pipeline logs (default: INFO; set to DEBUG for per-write Tiled debug logs) |
SERVER_PUBLIC_KEY |
CurveZMQ public key of the holoscan-proxy. Required only if the proxy is configured with encrypt: true. |
CLIENT_PUBLIC_KEY |
CurveZMQ public key of this client. Required if SERVER_PUBLIC_KEY is set. |
CLIENT_SECRET_KEY |
CurveZMQ secret key of this client. Required if SERVER_PUBLIC_KEY is set. |
Use the hp CLI to start, stop, and configure the pipeline. It connects to http://localhost:8000 by default — override with --url or HOLOPTYCHO_URL.
The client pixi environment installs only the CLI and its dependencies — no GPU or Holoscan deps. It works on Linux and macOS:
git clone git@github.com:NSLS2/holoptycho.git
cd holoptycho
pixi install -e client
pixi run -e client hp --helpTo avoid typing pixi run -e client each time, add a shell alias. Use --manifest-path so it works from any directory:
# bash
echo 'alias hp="pixi run --manifest-path ~/code/holoptycho/pixi.toml -e client hp"' >> ~/.bashrc && source ~/.bashrc
# zsh
echo 'alias hp="pixi run --manifest-path ~/code/holoptycho/pixi.toml -e client hp"' >> ~/.zshrc && source ~/.zshrccd ~/code/holoptycho && git pullIf pixi.lock changed, also run:
pixi install -e clienthp start # start using current config
hp start '<json>' # start with a new config (becomes current config)
hp stop
hp restart # stop + restart with current config
hp restart '<json>' # stop + restart with a new config
hp config show # print the current config as JSON
hp status
hp logsBeamline metadata (energy, scan geometry, pixel size) can be pulled directly from Tiled and piped into hp start:
tiled profile create https://tiled.nsls2.bnl.gov --name nsls2 # once
tiled login --profile nsls2
hp start "$(pixi run -e client config-from-tiled --scan-num 320045)"Override reconstruction parameters as needed:
hp start "$(pixi run -e client config-from-tiled --scan-num 320045 --nx 256 --ny 256 --n-iterations 1000)"
# Run only the iterative solver or only the ViT branch (default is both):
hp start "$(pixi run -e client config-from-tiled --scan-num 320045 --mode iterative)"hp model list shows two sections:
- Local cache —
.enginefiles inENGINE_CACHE_DIR(default/models), ready to use immediately - Azure ML — registered models, with a
cachedcolumn showing what's already local
hp model set selects the engine for the next hp start or hp restart. If the engine is not cached locally it is pulled from Azure ML and compiled via the TensorRT Python API first.
hp model list
hp model set <model-name> # uses latest version
hp model set <model-name> --version <ver> # pin to a specific version
hp model statusThe config is a flat JSON dict passed to hp start or hp restart. All values are strings (matching the INI format the reconstructor reads). See AGENTS.md for a full example.
| Parameter | Type | Description |
|---|---|---|
scan_num |
int (str) | Scan number — tags all Tiled output for this run |
working_directory |
path | Root directory for input/output data |
shm_name |
str | Shared-memory segment name for ZMQ live data |
scan_type |
str | Scan pattern, e.g. pt_fly2dcontpd |
nx, ny |
int (str) | Reconstruction array size (pixels) |
batch_width, batch_height |
int (str) | Diffraction pattern tile size |
batch_x0, batch_y0 |
int (str) | Top-left crop offset in the detector frame |
det_roix0, det_roiy0 |
int (str) | Detector ROI origin (pixels) |
gpu_batch_size |
int (str) | Number of patterns per GPU batch |
recon_mode |
str | Which reconstruction branches to run: iterative, vit, or both. Default both. |
raw_uid |
str | (Optional) UID of the raw Bluesky run this reconstruction came from; stored on the per-run Tiled container as metadata. |
scan_id |
str | (Optional) Scan id of the raw run; stored on the per-run Tiled container as metadata. Defaults to scan_num if omitted. |
xray_energy_kev |
float (str) | X-ray energy in keV |
lambda_nm |
float (str) | X-ray wavelength in nm — derive from energy (see below) |
ccd_pixel_um |
float (str) | Detector pixel size in µm |
distance |
float (str) | Sample-to-detector distance in mm |
dr_x, dr_y |
float (str) | Scan step size in µm |
x_num, y_num |
int (str) | Number of scan positions (fast/slow axis) |
x_range, y_range |
float (str) | Total scan range in µm |
x_direction, y_direction |
float (str) | Sign convention for scan axes (1.0 or -1.0) |
x_ratio, y_ratio |
float (str) | Encoder-to-µm scale factor for each axis |
pos_x_channel, pos_y_channel |
str | ZMQ field names for X/Y encoder values from PandA |
alg_flag |
str | Primary algorithm: ML_grad, DM, ePIE, etc. |
alg2_flag |
str | Secondary algorithm (used after alg_percentage of iterations) |
alg_percentage |
float (str) | Fraction of iterations using alg_flag |
n_iterations |
int (str) | Total reconstruction iterations |
ml_mode |
str | Noise model: Poisson or Gaussian |
ml_weight |
float (str) | ML regularisation weight |
beta |
float (str) | Momentum parameter for ML gradient |
init_obj_flag |
bool (str) | Initialise object from DPC (True/False) |
init_prb_flag |
bool (str) | Load probe from file (True/False) |
prb_path |
path | Full path to probe .npy file — empty to generate synthetically |
prb_mode_num |
int (str) | Number of probe modes |
obj_mode_num |
int (str) | Number of object modes |
gpu_flag |
bool (str) | Use GPU (True/False) |
gpus |
list (str) | JSON list of GPU indices, e.g. "[0]" |
precision |
str | Float precision: single or double |
nth |
int (str) | Number of threads for CPU operations |
sign |
str | Arbitrary run label used to tag output |
display_interval |
int (str) | Iterations between live Tiled updates |
Wavelength from energy:
lambda_nm = (6.62607e-34 * 2.99792e8) / (energy_kev * 1e3 * 1.60218e-19) * 1e9For persistent operation independent of your SSH session, use the provided sbatch script. The job survives disconnects — only a Slurm job cancellation or walltime expiry will stop it.
sbatch scripts/slurm_start_holoptycho.shOnce the job is running, check which node it landed on and open an SSH tunnel:
squeue -u $USER # note the node name (e.g. mars5)
ssh -L 8000:localhost:8000 -J <login-node> <compute-node>The hp CLI will now reach the server at http://localhost:8000 as normal.
squeue -u $USER # show your running jobs and their node
squeue -u $USER -l # verbose — includes time limit and reasonscancel <jobid>The script uses --pull=always, so to pick up a new image just cancel the job and resubmit:
scancel <jobid>
sbatch scripts/slurm_start_holoptycho.shNote: The script resolves Azure credentials at job start time using
azCLI. Make sure you have runaz loginon the cluster before submitting — credentials are stored in~/.azure/which is available on compute nodes via the shared home directory.
Requires Linux (x86_64), an NVIDIA GPU, the system CUDA toolkit (for cuda.h and the matching driver lib), and pixi.
git clone git@github.com:NSLS2/holoptycho.git
cd holoptycho
pixi install
pixi run testThe default pixi env builds pycuda from source against the system CUDA toolkit. If pixi install fails with cuda.h: No such file or directory or cannot find -lcuda / -lcurand, you need to:
-
Make sure
cuda.his reachable via/usr/local/cuda/include(system CUDA toolkit, e.g. installed via the NVIDIA.runinstaller ornvidia-cuda-toolkitapt package). -
Make sure
libcuda.sois reachable. On WSL2 it lives at/usr/lib/wsl/lib/libcuda.so(provided by the Windows NVIDIA driver). On bare-metal Linux the driver places it under/usr/lib/x86_64-linux-gnu/. -
Conda-forge ships
libcurand.so.10without the unversioned dev symlink that the linker needs. Create it once:ln -sf libcurand.so.10 .pixi/envs/default/lib/libcurand.so
Then run pixi install with the toolchain pointed at both the system CUDA headers and the WSL/driver lib path:
CUDA_ROOT=/usr/local/cuda CUDA_HOME=/usr/local/cuda CPATH=/usr/local/cuda/include \
LIBRARY_PATH=/usr/lib/wsl/lib:$PWD/.pixi/envs/default/lib \
pixi installDrop /usr/lib/wsl/lib from LIBRARY_PATH on non-WSL hosts.
The API server reads the same environment variables as the container. Pull them from Azure once per shell, then start the server:
az login # one time
export AZURE_TENANT_ID="$(az account show --query tenantId -o tsv)"
export AZURE_CLIENT_ID="$(az ad app list --display-name 'NSLS2-Genesis-Holoptycho' --query '[0].appId' -o tsv)"
export AZURE_SUBSCRIPTION_ID="$(az account show --query id -o tsv)"
export AZURE_CERTIFICATE_B64="$(az keyvault secret show --vault-name genesisdemoskv --name holoptycho-sp-cert --query value -o tsv)"
export AZURE_RESOURCE_GROUP=rg-genesis-demos
export AZURE_ML_WORKSPACE=genesis-mlw
export TILED_BASE_URL="https://tiled.nsls2.bnl.gov"
export TILED_API_KEY="$(az keyvault secret show --vault-name genesisdemoskv --name holoptycho-tiled-api-key --query value -o tsv)"
export SERVER_STREAM_SOURCE="tcp://localhost:5555"
export PANDA_STREAM_SOURCE="tcp://localhost:5556"
# ENGINE_CACHE_DIR defaults to /models, which is not writable outside the container.
# Point it at a user-writable path before starting the server.
export ENGINE_CACHE_DIR="$HOME/.cache/holoptycho/models"
mkdir -p "$ENGINE_CACHE_DIR"
pixi run api # listens on 127.0.0.1:8000SERVER_STREAM_SOURCE and PANDA_STREAM_SOURCE are required — the pipeline refuses to start without them. Use tcp://localhost:5555 / tcp://localhost:5556 when pairing with scripts/replay_from_tiled.py on the same host.
nsys profile -t cuda,nvtx,osrt,python-gil -o ptycho_profile.nsys-rep -f true -d 30 \
pixi run apiRequires perf_event_paranoid <= 2:
sudo sh -c 'echo 2 >/proc/sys/kernel/perf_event_paranoid'The following are no longer used and remain in the repo for reference only. They will be removed in a future release:
InitRecon,liverecon_utils.py— scan header file watcher for detecting new scans from a beamline-written text file. Scan parameters now come from the API config.--mode simulateCLI option — removed;hp startalways runs the live ZMQ pipeline.