Skip to content

Fix/iceberg drop tables#16

Merged
nicosuave merged 3 commits into
sidequery:mainfrom
hentzthename:fix/iceberg-drop-tables
Jun 19, 2026
Merged

Fix/iceberg drop tables#16
nicosuave merged 3 commits into
sidequery:mainfrom
hentzthename:fix/iceberg-drop-tables

Conversation

@hentzthename

Copy link
Copy Markdown
Contributor

Hi Nico, another small one that fell out of running pipeline.extract(source, refresh="drop_resources") after pipeline.sync_destination() against a Nessie deployment.

The load layer emitted:

Client for iceberg_rest does not implement drop table.
Following tables {'x', 'y'} will not be dropped

…and silently skipped the drops, so stale tables stuck around across refreshes.

Problem

dlt core gates the per-table drop path on hasattr(job_client, "drop_tables") (dlt/load/utils.py:170). IcebergRestClient only exposed drop_storage() (a full namespace wipe) -- no drop_tables(*names, delete_schema=True) -- so the load layer fell back to the warn-and-skip branch. Net effect:

  • refresh="drop_resources" / refresh="drop_sources" were effectively no-ops on this destination.
  • pipeline.sql_client().drop_dataset() had no coherent per-table partner (dataset-level works via the base-class DROP SCHEMA CASCADE).
  • Consumers had to reach around the destination with pyiceberg directly for destructive ops.

Solution

Implement the JobClient.drop_tables contract on IcebergRestClient:

  • Drop each named table via the PyIceberg catalog, swallowing NoSuchTableError so the call is idempotent (dlt may pass tables that were never physically created).
  • When delete_schema=True, remove all _dlt_version rows where schema_name = self.schema.name via table.delete(EqualTo(...)), matching the SqlJobClientBase.drop_tables contract.

One deviation worth calling out: the obvious move would be self._delete_schema_in_storage(self.schema), but that method lives on SqlJobClientBase (not JobClientBase) and uses self.sql_client.execute_sql(...). IcebergRestClient extends JobClientBase directly, and its sql_client is a DuckDB view provider rather than a real DDL-capable client -- so the DELETE is issued via PyIceberg's row-delete API instead, reusing the pattern already at destination_client.py:1151-1153.


Changes

destination_client.py

Symbol Change
IcebergRestClient.drop_tables (new) Drops each named table via catalog.drop_table(...); NoSuchTableError is swallowed. When delete_schema=True, deletes _dlt_version rows for self.schema.name via version_table.delete(EqualTo("schema_name", ...)).

No changes to sql_client.py or schema_evolution.py.


Tests

  • New test_drop_tables.py covering:
    • the hasattr gate (method is actually exposed on the class)
    • selective drop of named tables only
    • idempotent behavior on missing tables
    • delete_schema=True clears _dlt_version rows for the current schema
    • pipeline.run(..., refresh="drop_resources") end-to-end (the originally reported symptom)
  • Full suite: 175/175 passing locally (SQLite + Nessie + Polaris + Lakekeeper).

dlt core gates refresh="drop_resources" / refresh="drop_sources" on
hasattr(job_client, "drop_tables") (dlt/load/utils.py). Without that
method the load layer warns and silently skips the drops, which is
what triggered the "Client for iceberg_rest does not implement drop
table" message in the field.

Tests cover:
- method existence (the hasattr gate)
- selective drop of named tables only
- idempotent behavior for missing tables
- delete_schema=True clears _dlt_version rows for the current schema
- refresh="drop_resources" end-to-end
IcebergRestClient only exposed drop_storage() (full namespace wipe).
dlt core's refresh="drop_resources" / refresh="drop_sources" path calls
job_client.drop_tables(*names, delete_schema=True); without it, drops
are warned-and-skipped, leaving stale tables in the destination.

Drops each named table via the PyIceberg catalog (swallowing
NoSuchTableError for idempotence). When delete_schema=True, wipes
_dlt_version rows for self.schema.name via table.delete(EqualTo(...)),
matching the SqlJobClientBase.drop_tables contract. Inherited
_delete_schema_in_storage isn't used because IcebergRestClient extends
JobClientBase directly, not SqlJobClientBase, and would need a real SQL
client to run the DELETE statement.
@nicosuave

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a4cb8f6d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/dlt_iceberg/destination_client.py Outdated
for name in table_names:
identifier = f"{self.config.namespace}.{name}"
try:
catalog.drop_table(identifier)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Purge Iceberg data when dropping resources

When refresh="drop_resources" or a direct drop_tables call removes a table, this only calls drop_table, which unregisters the table from the catalog but leaves the table's data and metadata files behind. For Iceberg REST catalogs PyIceberg exposes purge_table/purgeRequested specifically to delete those files, so refreshes that are meant to discard a resource can leave stale data in object storage indefinitely, which is surprising for destructive refreshes and can violate retention expectations.

Useful? React with 👍 / 👎.

@nicosuave nicosuave merged commit 7ad2b90 into sidequery:main Jun 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants