fix: store Lance tables in HMS as external tables#146
Merged
Conversation
HMS silently rewrites tableType EXTERNAL_TABLE to MANAGED_TABLE when the EXTERNAL=TRUE table parameter is not set. Set the parameter in Hive2/Hive3 table creation, and since HMS never deletes data for external tables, drop metadata only and delete the Lance dataset explicitly on dropTable. Fixes lance-format#145
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #145
HMS silently rewrites
tableType=EXTERNAL_TABLEtoMANAGED_TABLEwhen theEXTERNAL=TRUEtable parameter is not set (ObjectStore.convertToMTablein Hive 2.3.9 and 3.1.3), so tables registered byHive2Namespace/Hive3Namespaceended up as managed tables.DropTablepurged data only because of this: HMS deletes data on drop only for managed tables.Changes:
EXTERNAL=TRUEtable parameter alongsidetableType=EXTERNAL_TABLEinHive2NamespaceandHive3Namespace, so HMS stores and reportsEXTERNAL_TABLE.doDropTablenow drops metadata withdeleteData=falseand deletes the Lance dataset explicitly (best-effortDataset.drop, mirroringGlueNamespace). This keeps purge working for pre-existing tables stored as managed, andderegisterTablestill preserves data.hive2.md/hive3.md: document theEXTERNAL=TRUErequirement, correct the DropTable description, and align Lance table identification with the code (parameter-based, so pre-fix tables stay recognized).