Skip to content

feat: support per-base storage options with base_<id>.<key> prefix#7608

Merged
jackye1995 merged 2 commits into
lance-format:mainfrom
jackye1995:jack/per-base-storage-options
Jul 3, 2026
Merged

feat: support per-base storage options with base_<id>.<key> prefix#7608
jackye1995 merged 2 commits into
lance-format:mainfrom
jackye1995:jack/per-base-storage-options

Conversation

@jackye1995

Copy link
Copy Markdown
Contributor

Problem

Multi-base datasets can place data on different buckets or accounts, but per-base credentials
could only be passed through base_store_params (ObjectStoreParams keyed by base path URI).
The flat storage_options map used by the Python/Java bindings and namespace credential vending
cannot express per-base credentials, and per-URI bindings are static so they never compose with
dynamic credential refresh.

Change

A storage option key of the form base_<id>.<key> now applies <key> only to the registered
base path with manifest id <id>:

  • Unscoped options are shared defaults inherited by every base; scoped entries add to or override
    them per base. Example: {"account_key": "shared", "base_1.account_key": "abc"} gives
    base 1 account_key = abc while everything else is shared.
  • The primary dataset store resolves with all scoped entries stripped, applied once at the
    ObjectStoreRegistry::get_store choke point (no-op when no scoped keys are present, preserving
    store cache identity).
  • When storage options come from a dynamic provider, per-base options are re-resolved on every
    refresh through the parent accessor (BaseScopedStorageOptionsProvider), so a namespace server
    can vend per-base credentials in one flat map and refresh works per base.
  • Precedence per base: exact per-URI base_store_params binding, then base_<id>.<key> overlay,
    then shared defaults.
  • Scoped entries referencing unregistered base ids are ignored with a debug log. The convention is
    runtime-only; nothing is persisted in the manifest.

Resolution happens in the Rust core (Dataset::store_params_for_base, write-path target-base
resolution, external blob base resolver), so Python and Java get the feature through the existing
storage_options parameter without API changes. Documented in the object store guide and the
relevant API docs.

Multi-base datasets previously could only receive per-base credentials
through ObjectStoreParams bindings keyed by base path URI, which cannot
be expressed in the flat storage_options map used by the bindings and
namespace credential vending.

A storage option key of the form base_<id>.<key> now applies <key> only
to the registered base path with that manifest id. Unscoped options are
shared defaults inherited by every base; scoped entries override them
per base. The primary dataset store strips all scoped entries. When
options come from a dynamic provider, per-base options are re-resolved
on every refresh through the parent accessor. Exact per-URI
base_store_params bindings keep precedence over scoped keys.
@github-actions github-actions Bot added A-python Python bindings A-java Java bindings + JNI A-encoding Encoding, IO, file reader/writer A-docs Documentation enhancement New feature or request labels Jul 3, 2026
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.82609% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-io/src/object_store/storage_options.rs 96.92% 8 Missing and 1 partial ⚠️
rust/lance/src/dataset/write.rs 98.76% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Provider-backed accessors now always resolve base scope per fetch, so
per-base credentials vended by a provider apply even without initial
options; a scope_resolved flag keeps re-scoping idempotent. Forced
refreshes propagate through the wrapper to the origin provider via
force_fetch_storage_options. The parent accessor's cache expiry is the
minimum of the unscoped and all base-scoped expires_at_millis entries,
so the earliest-expiring credential drives refresh. Unregistered scoped
ids log at warn on the write path.
@jackye1995 jackye1995 merged commit 1963998 into lance-format:main Jul 3, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-docs Documentation A-encoding Encoding, IO, file reader/writer A-java Java bindings + JNI A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants