Fix/iceberg drop tables by hentzthename · Pull Request #16 · sidequery/dlt-iceberg

hentzthename · 2026-04-26T08:07:16Z

Hi Nico, another small one that fell out of running pipeline.extract(source, refresh="drop_resources") after pipeline.sync_destination() against a Nessie deployment.

The load layer emitted:

Client for iceberg_rest does not implement drop table.
Following tables {'x', 'y'} will not be dropped

…and silently skipped the drops, so stale tables stuck around across refreshes.

Problem

dlt core gates the per-table drop path on hasattr(job_client, "drop_tables") (dlt/load/utils.py:170). IcebergRestClient only exposed drop_storage() (a full namespace wipe) -- no drop_tables(*names, delete_schema=True) -- so the load layer fell back to the warn-and-skip branch. Net effect:

refresh="drop_resources" / refresh="drop_sources" were effectively no-ops on this destination.
pipeline.sql_client().drop_dataset() had no coherent per-table partner (dataset-level works via the base-class DROP SCHEMA CASCADE).
Consumers had to reach around the destination with pyiceberg directly for destructive ops.

Solution

Implement the JobClient.drop_tables contract on IcebergRestClient:

Drop each named table via the PyIceberg catalog, swallowing NoSuchTableError so the call is idempotent (dlt may pass tables that were never physically created).
When delete_schema=True, remove all _dlt_version rows where schema_name = self.schema.name via table.delete(EqualTo(...)), matching the SqlJobClientBase.drop_tables contract.

One deviation worth calling out: the obvious move would be self._delete_schema_in_storage(self.schema), but that method lives on SqlJobClientBase (not JobClientBase) and uses self.sql_client.execute_sql(...). IcebergRestClient extends JobClientBase directly, and its sql_client is a DuckDB view provider rather than a real DDL-capable client -- so the DELETE is issued via PyIceberg's row-delete API instead, reusing the pattern already at destination_client.py:1151-1153.

Changes

`destination_client.py`

Symbol	Change
`IcebergRestClient.drop_tables` (new)	Drops each named table via `catalog.drop_table(...)`; `NoSuchTableError` is swallowed. When `delete_schema=True`, deletes `_dlt_version` rows for `self.schema.name` via `version_table.delete(EqualTo("schema_name", ...))`.

No changes to sql_client.py or schema_evolution.py.

Tests

New test_drop_tables.py covering:
- the hasattr gate (method is actually exposed on the class)
- selective drop of named tables only
- idempotent behavior on missing tables
- delete_schema=True clears _dlt_version rows for the current schema
- pipeline.run(..., refresh="drop_resources") end-to-end (the originally reported symptom)
Full suite: 175/175 passing locally (SQLite + Nessie + Polaris + Lakekeeper).

dlt core gates refresh="drop_resources" / refresh="drop_sources" on hasattr(job_client, "drop_tables") (dlt/load/utils.py). Without that method the load layer warns and silently skips the drops, which is what triggered the "Client for iceberg_rest does not implement drop table" message in the field. Tests cover: - method existence (the hasattr gate) - selective drop of named tables only - idempotent behavior for missing tables - delete_schema=True clears _dlt_version rows for the current schema - refresh="drop_resources" end-to-end

IcebergRestClient only exposed drop_storage() (full namespace wipe). dlt core's refresh="drop_resources" / refresh="drop_sources" path calls job_client.drop_tables(*names, delete_schema=True); without it, drops are warned-and-skipped, leaving stale tables in the destination. Drops each named table via the PyIceberg catalog (swallowing NoSuchTableError for idempotence). When delete_schema=True, wipes _dlt_version rows for self.schema.name via table.delete(EqualTo(...)), matching the SqlJobClientBase.drop_tables contract. Inherited _delete_schema_in_storage isn't used because IcebergRestClient extends JobClientBase directly, not SqlJobClientBase, and would need a real SQL client to run the DELETE statement.

nicosuave · 2026-05-13T01:52:09Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a4cb8f6d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T01:54:17Z

+        for name in table_names:
+            identifier = f"{self.config.namespace}.{name}"
+            try:
+                catalog.drop_table(identifier)


Purge Iceberg data when dropping resources

When refresh="drop_resources" or a direct drop_tables call removes a table, this only calls drop_table, which unregisters the table from the catalog but leaves the table's data and metadata files behind. For Iceberg REST catalogs PyIceberg exposes purge_table/purgeRequested specifically to delete those files, so refreshes that are meant to discard a resource can leave stale data in object storage indefinitely, which is surprising for destructive refreshes and can violate retention expectations.

Useful? React with 👍 / 👎.

hentzthename added 2 commits April 26, 2026 13:06

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

fix: purge tables during drop_resources refresh

6c22ee8

nicosuave merged commit 7ad2b90 into sidequery:main Jun 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/iceberg drop tables#16

Fix/iceberg drop tables#16
nicosuave merged 3 commits into
sidequery:mainfrom
hentzthename:fix/iceberg-drop-tables

hentzthename commented Apr 26, 2026

Uh oh!

nicosuave commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hentzthename commented Apr 26, 2026

Problem

Solution

Changes

destination_client.py

Tests

Uh oh!

nicosuave commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`destination_client.py`