Soft-delete for DirectoryNamespace: separate DROP/PURGE, add TTL, and make listing O(1)

## Summary

Make soft-delete a first-class lifecycle in `DirectoryNamespace`: separate a logical **DROP** (mark, reversible) from a physical **PURGE** (reclaim storage), with a TTL between them. This gives a grace window so concurrent readers don't fail when a table is dropped, lets a dropped table be restored before its TTL, and — by relocating the delete marker — keeps `list_tables` O(1) at scale instead of today's O(N) per-table probing.

This issue covers the **namespace layer only** (`lance-namespace` trait + `DirectoryNamespace` impl). Consumers (lancedb `ListingDatabase`/connection layer, periodic purge schedulers) are out of scope here and tracked separately; see *Out of scope* below.

## Motivation

`DirectoryNamespace::drop_table` currently hard-deletes immediately (`object_store.remove_dir_all`). Two problems:

1. **Concurrent drop + read fails.** An in-flight query holding open handles to a table that's being dropped sees its files vanish mid-read. There's no grace period and no way to separate "logically gone" from "bytes reclaimed."
2. **No DROP/PURGE separation, no TTL, no restore.** Drop is purge. There's no deferred reclamation and no undo.

Separately, **listing doesn't scale.** `list_tables` in directory (non-manifest) mode does one `read_dir(root)` to find `*.lance` dirs, then a **per-table** `read_dir(<name>.lance/)` (`check_table_status`) to detect the nested `.lance-deregistered` / `.lance-reserved` markers — i.e. **1 + N object-store requests** for N tables. This is a real performance cliff for large directories.

There's already a partial primitive: `deregister_table` writes a nested `.lance-deregistered` marker and `check_table_status` reads it, but it's a dead end — one-way (no restore; `register_table` is unsupported in directory mode), no purge, no TTL, and `drop_table` ignores it entirely.

## Proposal

### 1. Move the delete marker to the namespace root (the key change)

Replace the nested marker (`<root>/<name>.lance/.lance-deregistered`) with a **root-level sibling**, e.g. `<root>/<name>.deleted`, whose body is small JSON: `{ "deleted_at_ms": ..., "ttl_ms": ... }`.

Because the object store's `read_dir` is a single non-recursive listing that returns child "directories" (common prefixes) **and** direct child files, one `read_dir(root)` now reveals both the `<name>.lance` table dirs and the `<name>.deleted` markers. `list_tables` filters deleted names in-memory from that single listing → **O(1) requests**, eliminating the per-table probe. (Today's nested marker is invisible to the root listing, which is why the per-table probe exists.)

### 2. DROP vs PURGE in the namespace API

- **`drop_table`** becomes *soft*: atomically create the `<name>.deleted` marker (stamped with `deleted_at`/`ttl`), leave data intact. Reversible.
- **`purge_table` / `purge_tables(Option<Vec<Id>>)`** (new): the physical `remove_dir_all` + marker cleanup. `None` = purge all currently-purgable tables; a list = exactly those. Purge only ever acts on already-soft-deleted tables — never a live one.
- **`list_purgable_tables(deleted_before: Option<Timestamp>)`** (new): returns soft-deleted tables and their `deleted_at`. The `deleted_before` cutoff lets a caller apply TTL policy while the namespace owns the (now O(1)) listing/filtering mechanism.
- **`table_status(id)`** (new): returns one of `Exists` / `SoftDeleted{deleted_at}` / `NotFound`. (Deliberately *not* distinguishing "purged" from "never existed" — that would require retaining tombstones after purge, with their own GC. Purged == NotFound.)

### 3. Re-create over a soft-deleted table = overwrite

`create_table` for a name that has a `.deleted` marker should clear the marker and write with overwrite semantics (preserving lineage as a new version). Today `create_table` ignores the marker and fails with `TableAlreadyExists` if data is present.

### 4. Reads respect the marker

`list_tables`, `describe_table`, `open`/access paths treat a soft-deleted table as `NotFound` (clean, not an internal error).

### 5. Purge/revive race guard

A purge that fires at TTL must not delete data that a concurrent re-create just revived. Plain claim files are insufficient (no TTL on the claim → poisoned-lock risk; non-atomic check-then-delete). Proposal: use the marker itself as a **compare-and-swap arbiter** via conditional writes (`PutMode::Update` with etag): purge and revive contend on the single marker object; the loser aborts. This needs (a) conditional-write support confirmed across target object stores, and (b) `PutMode::Update`/etag plumbed through the object-store wrapper (only `Create` is used today). A dedicated concurrency test is a must — this is the one place a bug means data loss rather than a bounded-stale read.

## Open questions (would love your take, @jackye1995)

1. **Marker naming/location.** `<root>/<name>.deleted` vs a reserved prefix vs encoding into the dir name. Anything that keeps the single-listing property.
2. **API shape.** `purge_tables(Option<list>)` (None = all) vs separate `purge_table` + `purge_all_purgable`. Where TTL policy lives (caller passes `deleted_before` vs namespace owns a TTL config).
3. **`table_status` on the trait** — worth adding as a first-class method, or keep it internal?
4. **Conditional-write reliability** for the race guard across object stores (incl. self-hosted/MinIO). Fallback when CAS isn't available?
5. **Manifest mode.** This issue is about directory (V1) mode. How should the same lifecycle look in manifest mode (where `deregister` is a manifest-row delete)? Keep them separate, or unify the surface?
6. **Backward compatibility** with any existing nested `.lance-deregistered` markers — migrate, or read both during a transition?

## Out of scope (tracked separately)

- **lancedb consumer changes:** routing the `ListingDatabase` root path through the namespace (so root tables get this behavior and the duplicated native scan/`remove_dir_all` codepath is removed), resolving `clone_table` (unsupported in the namespace wrapper today), and warm-handle detection in the read-consistency wrapper.
- **Purge scheduling** (the periodic `list_purgable_tables(now − ttl)` → `purge_tables` job) and any deployment integration.

## Non-goals

Strong cross-process consistency. After a soft-delete, fresh opens and listings are immediately correct, but already-open/cached handles elsewhere may keep serving the table until they refresh — a bounded, eventual-consistency window consistent with the existing model. The TTL grace window is precisely what prevents in-flight reads from erroring; making "drop" instantly visible everywhere is not a goal here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Soft-delete for DirectoryNamespace: separate DROP/PURGE, add TTL, and make listing O(1) #7539

Summary

Motivation

Proposal

1. Move the delete marker to the namespace root (the key change)

2. DROP vs PURGE in the namespace API

3. Re-create over a soft-deleted table = overwrite

4. Reads respect the marker

5. Purge/revive race guard

Open questions (would love your take, @jackye1995)

Out of scope (tracked separately)

Non-goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Soft-delete for DirectoryNamespace: separate DROP/PURGE, add TTL, and make listing O(1) #7539

Description

Summary

Motivation

Proposal

1. Move the delete marker to the namespace root (the key change)

2. DROP vs PURGE in the namespace API

3. Re-create over a soft-deleted table = overwrite

4. Reads respect the marker

5. Purge/revive race guard

Open questions (would love your take, @jackye1995)

Out of scope (tracked separately)

Non-goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions