Add experimental managed Python environment provisioning with uv#1906
Add experimental managed Python environment provisioning with uv#1906anton-107 wants to merge 3 commits into
Conversation
Behind the "python.managedEnvironment" experimental setting (databricks.experiments.optInto), the environment setup flow provisions a working environment with uv instead of walking the user through manual steps: locate uv (PATH first, then a pinned SHA256-verified download into global storage), ensure a Python interpreter matching the selected compute, create the project .venv (seeded with pip), install databricks-connect matched to the compute plus the project's own requirements, and select the interpreter in the MS Python extension so run and debug agree. Environments the user created are respected: a satisfied environment is only selected, never modified, and a foreign .venv is only repaired or recreated after an explicit prompt. Venvs created by the extension are tagged with a marker file and can be repaired or recreated silently. Failures are classified (network blocked, interpreter unavailable, disk, cancelled) into actionable messages with a Retry button. Custom package indexes are honored via UV_INDEX_URL, PIP_INDEX_URL and pip.conf. Each funnel step emits a managedEnvironmentSetup telemetry event with duration and failure class. With the setting off (default) the setup flow is unchanged. Co-authored-by: Isaac
# Conflicts: # packages/databricks-vscode/src/language/EnvironmentCommands.ts
- Extract archives with tar on all platforms (bsdtar ships with Windows 10+), removing the PowerShell command-string interpolation that broke on paths containing quotes. - Force-refresh the Python extension's environment discovery after creating a venv: the default refresh is a no-op once per-session discovery ran. - Raise the exec buffer for uv commands: verbose installs exceeded execFile's 1MiB default and killed otherwise successful installs. - Anchor the bare status-code and tls/proxy patterns in the failure classifier so paths or package names containing them aren't reported as network problems. - Fall back to the manual setup flow when the provisioner fails outside its steps (e.g. no active project folder) instead of leaking the error. - Derive the suggested databricks-connect version in the installer from resolveComputeTargetSpec, removing the divergent duplicate mapping. - Build the uv child environment (incl. the sync pip.conf scan) only when provisioning will actually run, and drop the derivable pythonMatches field from the venv assessment. Co-authored-by: Isaac
|
Follow-up commit 49437c3 addresses self-review findings: archive extraction now uses tar on all platforms (no PowerShell string interpolation), environment discovery is force-refreshed after venv creation, uv exec buffer raised to 128MiB, network-failure classification patterns anchored, provisioner errors outside its steps fall back to the manual flow, and the installer's suggested-version logic now delegates to resolveComputeTargetSpec. Unit suite: 227 passing. Known limitation worth calling out for review: the uv download uses Node's fetch, which does not honor HTTPS_PROXY — behind a mandatory proxy the PATH-installed uv (or manual install, as the error message suggests) is the working path. Happy to wire a proxy agent if you'd rather have that in this PR. This comment was written by Isaac. |
Changes
Second part of the "Run/Debug with Databricks Connect should just work" effort (CUJ findings and design note in #1905). With
databricks.experiments.optInto: ["python.managedEnvironment"], clicking Run/Debug with Databricks Connect in a fresh project provisions everything behind a single cancellable progress notification — no manual Python environment steps:resolveComputeTargetSpec(serverless 17.3 → Python 3.12 +databricks-connect==17.3.*; clusters → DBR-matched versions).UvBinaryProvider): PATH first (corp installs keep their proxy/mirror config), then a previously downloaded copy, then a pinned, SHA256-verified download from GitHub releases intoglobalStorageUri(macOS/Linux/Windows, x64+arm64).uv python find, falling back touv python install(uv downloads python-build-standalone builds)..venvwithuv venv --seed— seeded pip keeps the extension's existingpip listintrospection and other tooling working.databricks-connectmatched to the compute,nbformat, then the project's ownrequirements*.txt/pyproject.tomldependencies.refreshEnvironments+updateActiveEnvironmentPath), so run, debug (pinned in Fix Python environment issues in the Databricks Connect run/debug flow #1905), terminal and the status bar all agree.Respecting user-managed environments
.venv/databricks.jsonand may be repaired (reinstall deps) or recreated (wrong Python) silently..venvis never modified or deleted without an explicit prompt (Install into .venv / Recreate .venv / Set up manually). "Set up manually" and unsupported computes fall back to the existing per-step flow unchanged.Failure UX
Failures are classified —
networkBlocked,pythonUnavailable,uvUnavailable,disk,cancelled— into specific, actionable messages with Retry / Show Logs / Set up manually buttons. Custom indexes are honored:UV_INDEX_URLpasses through,PIP_INDEX_URLis mapped toUV_INDEX_URL, andpip.conf/pip.iniindex-urlis parsed and forwarded (uv doesn't read pip config). The network error message points atHTTPS_PROXY/UV_INDEX_URL/UV_PYTHON_INSTALL_MIRROR.Telemetry
New
managedEnvironmentSetupevent with one record per funnel step (uvAcquire,pythonInstall,venvCreate,depsInstall,interpreterSet) plus atotal, carrying success, duration, failure class, compute type and venv disposition — so funnel drop-off becomes measurable.Integration
The only behavioral hook is in
EnvironmentCommands._setup: when the experiment is on and the failing steps are limited to the python environment/dependencies (shouldProvision), it runs the provisioner; otherwise (and always with the flag off) the existing per-step action loop runs byte-identically. Cluster/UC problems keep the manual flow.Risks / known limitations
uv python installdownloads from GitHub; fully offline networks needUV_PYTHON_INSTALL_MIRROR(called out in the error message). Bundling uv into the vsix was deliberately deferred (see design note in Fix Python environment issues in the Databricks Connect run/debug flow #1905).UV_VERSIONonly; bumps should be deliberate.pyproject.tomldependency-groups (beyond[project].dependencies) are a fast-follow.Tests
computeTargetSpec.test.ts: spec resolution matrix (serverless, clusters, wildcard/unsupported DBR).EnvironmentProvisioner.test.ts: command sequences for fresh create / repair / recreate with a fake exec recorder on real tmp dirs; marker-file semantics; foreign-venv prompt branches (incl. "never touch on manual"); cleanup of half-created venvs on failure; failure classification from real uv/pip stderr shapes;PIP_INDEX_URL→UV_INDEX_URLmapping and pip.conf parsing; win32/posix interpreter paths.UvBinaryProvider.test.ts: PATH short-circuit, cached binary reuse, checksum-mismatch rejection, failed-download handling.EnvironmentCommands.test.ts:shouldProvisionmatrix; provisioner never invoked with the flag off.yarn test:lintandyarn test:unitpass (225 passing).Manual validation script (macOS, fresh profile):
databricks bundle init default-python, select serverless, opt into the experiment, click Run with Databricks Connect → single progress notification (uv downloads Python 3.12, creates.venv, installsdatabricks-connect==17.3.*) → file runs; Debug hits breakpoints with the same interpreter. Negative:UV_INDEX_URL=http://blocked.example→ actionable network error with Retry.This pull request and its description were written by Isaac.