Skip to content

Add experimental managed Python environment provisioning with uv#1906

Open
anton-107 wants to merge 3 commits into
dbconnect-env-quick-winsfrom
managed-python-environment
Open

Add experimental managed Python environment provisioning with uv#1906
anton-107 wants to merge 3 commits into
dbconnect-env-quick-winsfrom
managed-python-environment

Conversation

@anton-107

@anton-107 anton-107 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Changes

Stacked on #1905 (base branch is dbconnect-env-quick-wins); review that one first. Experimental, default off.

Second part of the "Run/Debug with Databricks Connect should just work" effort (CUJ findings and design note in #1905). With databricks.experiments.optInto: ["python.managedEnvironment"], clicking Run/Debug with Databricks Connect in a fresh project provisions everything behind a single cancellable progress notification — no manual Python environment steps:

  1. Resolve the target from the selected compute via resolveComputeTargetSpec (serverless 17.3 → Python 3.12 + databricks-connect==17.3.*; clusters → DBR-matched versions).
  2. Locate uv (UvBinaryProvider): PATH first (corp installs keep their proxy/mirror config), then a previously downloaded copy, then a pinned, SHA256-verified download from GitHub releases into globalStorageUri (macOS/Linux/Windows, x64+arm64).
  3. Ensure the interpreter: uv python find, falling back to uv python install (uv downloads python-build-standalone builds).
  4. Create .venv with uv venv --seed — seeded pip keeps the extension's existing pip list introspection and other tooling working.
  5. Install dependencies: databricks-connect matched to the compute, nbformat, then the project's own requirements*.txt / pyproject.toml dependencies.
  6. Select the interpreter in the MS Python extension (refreshEnvironments + updateActiveEnvironmentPath), so run, debug (pinned in Fix Python environment issues in the Databricks Connect run/debug flow #1905), terminal and the status bar all agree.

Respecting user-managed environments

  • An environment that already satisfies the checks is selected, never modified.
  • Environments created by this flow are tagged with .venv/databricks.json and may be repaired (reinstall deps) or recreated (wrong Python) silently.
  • A foreign .venv is never modified or deleted without an explicit prompt (Install into .venv / Recreate .venv / Set up manually). "Set up manually" and unsupported computes fall back to the existing per-step flow unchanged.

Failure UX

Failures are classified — networkBlocked, pythonUnavailable, uvUnavailable, disk, cancelled — into specific, actionable messages with Retry / Show Logs / Set up manually buttons. Custom indexes are honored: UV_INDEX_URL passes through, PIP_INDEX_URL is mapped to UV_INDEX_URL, and pip.conf/pip.ini index-url is parsed and forwarded (uv doesn't read pip config). The network error message points at HTTPS_PROXY/UV_INDEX_URL/UV_PYTHON_INSTALL_MIRROR.

Telemetry

New managedEnvironmentSetup event with one record per funnel step (uvAcquire, pythonInstall, venvCreate, depsInstall, interpreterSet) plus a total, carrying success, duration, failure class, compute type and venv disposition — so funnel drop-off becomes measurable.

Integration

The only behavioral hook is in EnvironmentCommands._setup: when the experiment is on and the failing steps are limited to the python environment/dependencies (shouldProvision), it runs the provisioner; otherwise (and always with the flag off) the existing per-step action loop runs byte-identically. Cluster/UC problems keep the manual flow.

Risks / known limitations

  • uv python install downloads from GitHub; fully offline networks need UV_PYTHON_INSTALL_MIRROR (called out in the error message). Bundling uv into the vsix was deliberately deferred (see design note in Fix Python environment issues in the Databricks Connect run/debug flow #1905).
  • The uv download URL/checksum format is validated for the pinned UV_VERSION only; bumps should be deliberate.
  • pyproject.toml dependency-groups (beyond [project].dependencies) are a fast-follow.

Tests

  • computeTargetSpec.test.ts: spec resolution matrix (serverless, clusters, wildcard/unsupported DBR).
  • EnvironmentProvisioner.test.ts: command sequences for fresh create / repair / recreate with a fake exec recorder on real tmp dirs; marker-file semantics; foreign-venv prompt branches (incl. "never touch on manual"); cleanup of half-created venvs on failure; failure classification from real uv/pip stderr shapes; PIP_INDEX_URLUV_INDEX_URL mapping and pip.conf parsing; win32/posix interpreter paths.
  • UvBinaryProvider.test.ts: PATH short-circuit, cached binary reuse, checksum-mismatch rejection, failed-download handling.
  • EnvironmentCommands.test.ts: shouldProvision matrix; provisioner never invoked with the flag off.
  • yarn test:lint and yarn test:unit pass (225 passing).

Manual validation script (macOS, fresh profile): databricks bundle init default-python, select serverless, opt into the experiment, click Run with Databricks Connect → single progress notification (uv downloads Python 3.12, creates .venv, installs databricks-connect==17.3.*) → file runs; Debug hits breakpoints with the same interpreter. Negative: UV_INDEX_URL=http://blocked.example → actionable network error with Retry.

This pull request and its description were written by Isaac.

Behind the "python.managedEnvironment" experimental setting
(databricks.experiments.optInto), the environment setup flow provisions a
working environment with uv instead of walking the user through manual
steps: locate uv (PATH first, then a pinned SHA256-verified download into
global storage), ensure a Python interpreter matching the selected compute,
create the project .venv (seeded with pip), install databricks-connect
matched to the compute plus the project's own requirements, and select the
interpreter in the MS Python extension so run and debug agree.

Environments the user created are respected: a satisfied environment is
only selected, never modified, and a foreign .venv is only repaired or
recreated after an explicit prompt. Venvs created by the extension are
tagged with a marker file and can be repaired or recreated silently.

Failures are classified (network blocked, interpreter unavailable, disk,
cancelled) into actionable messages with a Retry button. Custom package
indexes are honored via UV_INDEX_URL, PIP_INDEX_URL and pip.conf. Each
funnel step emits a managedEnvironmentSetup telemetry event with duration
and failure class.

With the setting off (default) the setup flow is unchanged.

Co-authored-by: Isaac
# Conflicts:
#	packages/databricks-vscode/src/language/EnvironmentCommands.ts
- Extract archives with tar on all platforms (bsdtar ships with Windows
  10+), removing the PowerShell command-string interpolation that broke on
  paths containing quotes.
- Force-refresh the Python extension's environment discovery after creating
  a venv: the default refresh is a no-op once per-session discovery ran.
- Raise the exec buffer for uv commands: verbose installs exceeded
  execFile's 1MiB default and killed otherwise successful installs.
- Anchor the bare status-code and tls/proxy patterns in the failure
  classifier so paths or package names containing them aren't reported as
  network problems.
- Fall back to the manual setup flow when the provisioner fails outside its
  steps (e.g. no active project folder) instead of leaking the error.
- Derive the suggested databricks-connect version in the installer from
  resolveComputeTargetSpec, removing the divergent duplicate mapping.
- Build the uv child environment (incl. the sync pip.conf scan) only when
  provisioning will actually run, and drop the derivable pythonMatches
  field from the venv assessment.

Co-authored-by: Isaac
@anton-107

Copy link
Copy Markdown
Contributor Author

Follow-up commit 49437c3 addresses self-review findings: archive extraction now uses tar on all platforms (no PowerShell string interpolation), environment discovery is force-refreshed after venv creation, uv exec buffer raised to 128MiB, network-failure classification patterns anchored, provisioner errors outside its steps fall back to the manual flow, and the installer's suggested-version logic now delegates to resolveComputeTargetSpec. Unit suite: 227 passing.

Known limitation worth calling out for review: the uv download uses Node's fetch, which does not honor HTTPS_PROXY — behind a mandatory proxy the PATH-installed uv (or manual install, as the error message suggests) is the working path. Happy to wire a proxy agent if you'd rather have that in this PR.

This comment was written by Isaac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant