Skip to content

Fix Python environment issues in the Databricks Connect run/debug flow#1905

Open
anton-107 wants to merge 2 commits into
mainfrom
dbconnect-env-quick-wins
Open

Fix Python environment issues in the Databricks Connect run/debug flow#1905
anton-107 wants to merge 2 commits into
mainfrom
dbconnect-env-quick-wins

Conversation

@anton-107

Copy link
Copy Markdown
Contributor

Changes

Quick wins from a live CUJ run (2026-06-12, fresh macOS profile, default-python bundle project, serverless): every "run/debug with Databricks Connect" capability works, but users get lost in the Python environment setup in between. This PR fixes the three failure modes that don't depend on any larger architectural decision. A follow-up PR adds opt-in managed environment provisioning (see design note below).

1. Run/debug interpreter split-brain (P0)

"Run with Databricks Connect" launches dbconnect-bootstrap.py with MsPythonExtensionWrapper.getPythonExecutable() (the active environment for the project), but "Debug with Databricks Connect" started a debugpy session without a python attribute, so debugpy fell back to the MS Python extension's folder-level interpreter selection. With a system Python 3.9 selected there, debug crashed with a raw ModuleNotFoundError: No module named 'databricks' traceback while run worked fine.

  • RunCommands.debugFileUsingDbconnect now resolves the same executable as the run path and pins it via the debug configuration's python attribute.
  • checkDbconnectEnabled (shared run+debug preflight) now force-re-verifies the environment when the selected interpreter changed since the cached check, so a mismatch surfaces the normal actionable setup flow instead of a bootstrap traceback.

2. Pip-less venvs broke the extension itself (P0)

The MS Python extension's env-creation flow can produce a venv without pip. Our own introspection (python -m pip list --format json) then failed with "No module named pip", taking the whole "Python Environment" checklist down with it. Pip commands now seed pip via python -m ensurepip --upgrade and retry once when they fail with "No module named pip" (native pip path only; uv-managed environments are not affected). Pip command failures now also include the stderr tail instead of just "Command exited with code 1".

3. "Python 3.12 or greater" is the wrong requirement (P1)

Databricks Connect needs the local Python minor version to match the remote one (UDF/pickle compatibility). Python 3.13 satisfied the old "3.12 or greater" check for serverless and then broke at runtime.

  • The DBR→Python mapping now lives in one table (computeTargetSpec.ts: 13/14→3.10, 15→3.11, 16+→3.12), shared by the check and the messaging.
  • Serverless now requires an exact minor match, and the message explains why ("the local minor version must match the Python version of the serverless environment").
  • Cluster mismatches keep warning-level reporting (non-UDF workloads still work), with corrected copy.
  • The "Select Python Environment" quick pick and the "Create new environment" handoff to the MS Python extension now surface the version requirement, since the generic creation flow knows nothing about it (its "Recommended" Quick Create happily offers Python 3.9).

Design note: making Run/Debug with Databricks Connect "just work"

Current flow map (who resolves the interpreter where)

Path Interpreter resolution (before this PR)
Verification checklist (EnvironmentDependenciesVerifier) MS Python API getActiveEnvironmentPath(activeProjectUri), resolved
Run with dbconnect (runFileUsingDbconnect) same as verification (getPythonExecutable()), sent to a terminal
Debug with dbconnect (debugFileUsingDbconnect) debugpy's own fallback = MS Python selected interpreter for the workspace folder
Env creation ("Create new environment") delegated to python.createEnvironment, unaware of our version constraint
Dependency install python -m pip (or uv pip if uv.lock present) in the active env

This PR makes the first three rows resolve identically and verifies freshness preflight.

Compute → Python/dbconnect version matrix

Compute Python (local) databricks-connect
Serverless (serverlessDbconnectVersion = 17.3) 3.12 — exact match required 17.3.*
Serverless 15.4 3.11 — exact 15.4.*
Cluster DBR 13/14 3.10 (warning on mismatch) <major>.<minor>.*
Cluster DBR 15 3.11 (warning) <major>.<minor>.*
Cluster DBR 16+ 3.12 (warning) <major>.<minor>.*

Follow-up: managed environment provisioning (separate PR, experimental)

The remaining CUJ failures (env creation handoff, opaque pip failures behind corp proxies, interpreter discovery) are addressed by an opt-in managed flow (databricks.experiments.optInto: ["python.managedEnvironment"]):

  • Engine: uv, acquired PATH-first (respects corp-managed installs and their proxy/mirror config), with download-on-demand into globalStorageUri (pinned version, SHA256-verified, mirroring the fetch-databricks-cli.sh pattern) and graceful fallback to the current manual flow. Bundling uv in the vsix was considered and deferred: it adds ~15–20MB per platform and doesn't rescue offline networks (interpreter and package downloads would fail anyway).
  • Flow (single cancellable progress notification): resolve target versions from the table above → uv python install <X.Y> if needed → uv venv .venv --python <X.Y> --seed (seeded pip keeps the existing introspection working) → uv pip install databricks-connect==<matched> nbformat + project deps (requirements*.txt / pyproject.toml) → set the MS Python active environment so run, debug, terminal and status bar all agree.
  • Respecting user environments: a satisfied env is never touched; venvs we created are tagged with a marker file and can be repaired/recreated silently; foreign .venvs are only modified after an explicit prompt (Repair / Recreate / Set up manually).
  • Failure UX: failures are classified (network blocked / no matching interpreter / disk / cancelled) into actionable messages with Retry, honoring UV_INDEX_URL, PIP_INDEX_URL, pip.conf index-url, and UV_PYTHON_INSTALL_MIRROR; telemetry per funnel step so drop-off becomes measurable.
  • Risks: uv python install downloads from GitHub (blocked corps need the mirror env var — called out in the error message); MS Python extension can lag on env discovery (mitigated by refreshEnvironments() before selection); the flag default-off contains KTLO risk.

Related prior art: #1886 (thanks @twsl) extends the same verifier to support global interpreters — different scope, no conflict in intent; happy to rebase whichever lands second.

Tests

  • New unit tests: computeTargetSpec.test.ts (full version matrix), EnvironmentDependenciesVerifier.test.ts (serverless exact-match incl. 3.13 rejection, cluster warning/rejection, copy), MsPythonExtensionWrapper.test.ts (ensurepip seeding semantics), RunCommands.test.ts (debug config pinning, stale-interpreter re-verification).
  • yarn test:lint and yarn test:unit pass (189 passing).
  • Manual: macOS, default-python bundle, serverless — with system Python 3.9 selected, debug now fails preflight with the actionable setup flow instead of a bootstrap traceback; after switching to .venv, run and debug both use .venv.

This pull request and its description were written by Isaac.

Three fixes for the dbconnect setup funnel:

- Pin the debugpy "python" attribute to the interpreter verified by the
  environment checks, and re-verify before run/debug when the selected
  interpreter changed since the cached check. Previously "debug" used the
  MS Python extension's folder-level interpreter selection while "run" and
  the verification used the active environment for the project, so debug
  could crash with a raw ModuleNotFoundError from dbconnect-bootstrap.py.
- Seed pip via "python -m ensurepip --upgrade" and retry once when a pip
  command fails with "No module named pip" (e.g. venvs created without
  pip). Also include the stderr tail in pip command errors.
- Replace the "Python 3.12 or greater" guidance with exact-match
  semantics: serverless now requires the local minor version to match the
  remote environment (3.13 used to pass the check and break at runtime),
  cluster mismatches keep warning-level reporting with corrected copy.
  The DBR-to-Python mapping now lives in one table (computeTargetSpec.ts).
  The interpreter picker and the create-environment handoff now surface
  the version requirement.

Co-authored-by: Isaac
- Keep accepting environments whose Python version the MS Python extension
  cannot resolve, matching the historic behavior, instead of rejecting them.
- Don't force a re-verification of a stale interpreter while disconnected:
  the check blocks on an established connection.
- Rethrow the original "No module named pip" error when ensurepip itself
  fails (e.g. Debian system pythons), and derive the native-pip flag in
  getPipCommandAndArgs instead of re-inferring it at every call site.
- Make the environment setup command return whether the environment is
  ready, so run/debug can proceed right after a successful setup instead of
  requiring a second click.
- Fetch the requirement hint for the interpreter picker with a timeout so
  the picker can't be blocked by a pending connection.

Co-authored-by: Isaac
@anton-107 anton-107 temporarily deployed to test-trigger-is June 12, 2026 15:36 — with GitHub Actions Inactive
@github-actions

Copy link
Copy Markdown
Contributor

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/vscode

Inputs:

  • PR number: 1905
  • Commit SHA: d3daba4e6f0c6053c24d6e7d0a589bfa13035b5c

Checks will be approved automatically on success.

@anton-107

Copy link
Copy Markdown
Contributor Author

Follow-up commit d3daba4 addresses self-review findings: environments with an unresolvable Python version are accepted again (historic behavior), the stale-interpreter re-check is skipped while disconnected (it blocks on an established connection), ensurepip failures rethrow the original error, the setup command now returns whether the environment is ready (so run/debug can proceed right after a successful setup), and the interpreter-picker requirement hint is fetched with a timeout. Unit suite: 191 passing.

This comment was written by Isaac.

rugpanov added a commit that referenced this pull request Jun 19, 2026
## Why
- **"Debug with Databricks Connect" used the wrong interpreter.** It
started a `debugpy` session **without a pinned interpreter**, so debugpy
fell back to the Python extension's folder-level selection — which can
differ from the environment used by "run" and by the verification
checks. With e.g. a system Python 3.9 selected there, debug crashed with
a raw `ModuleNotFoundError: No module named 'databricks'` while run
worked fine.
- **Stale interpreter checks.** The cached environment state can refer
to a previously selected interpreter, because the Python extension
doesn't always notify us when the interpreter changes — so a mismatch
surfaced as a bootstrap traceback instead of the actionable setup flow.

## What
- Pin debugpy to the verified interpreter via the debug configuration's
`python` attribute (`DatabricksPythonDebugConfiguration.python`).
- Resolve the interpreter once in a shared run/debug preflight
(`resolveDbconnectLaunch`), so run and debug provably use the **same**
executable (removes the duplicated preamble).
- Re-verify the environment before launch when the selected interpreter
changed since the cached check — only while `CONNECTED`, since a
re-check blocks on the connection.
- After an in-flow environment setup, re-check and let the launch
**proceed** instead of aborting and forcing the user to re-trigger.

## Verification
- New/updated unit tests in `RunCommands.test.ts`: debug-config pinning,
stale-interpreter re-verification, no-reverify-while-disconnected, and
proceed-after-setup. RunCommands suite passes (6 tests) in the VS Code
test host.
- `eslint` and `prettier` clean on the changed files.

## Notes
This is the first of three slices carved out of #1905 (run/debug
interpreter split-brain). The pip-less-venv and
Python-version-requirement fixes follow in separate PRs.

This pull request and its description were written by Isaac.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant