feat(showcase-ecommerce-databricks): add standalone Databricks datapack#200
Open
askumar27 wants to merge 2 commits into
Open
feat(showcase-ecommerce-databricks): add standalone Databricks datapack#200askumar27 wants to merge 2 commits into
askumar27 wants to merge 2 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Add a new `showcase-ecommerce-databricks` datapack that mirrors the existing Snowflake showcase-ecommerce datapack with Databricks as the warehouse platform. - 14 Databricks datasets (order_entry_db.order_entry + analytics schemas) - Full governance parity: tags, glossary terms, ownership, domains, structured properties, editable descriptions, lineage, siblings, schema fields - 312 queries translated from Snowflake SQL to Databricks SQL dialect (CAST syntax, DATEADD/DATEDIFF units, LISTAGG -> ARRAY_JOIN, TO_CHAR -> DATE_FORMAT) - All cross-platform entities included (dbt, Looker, PowerBI, Tableau, Spark, etc.) with Snowflake order_entry_db references remapped to Databricks - Can be loaded independently or alongside the Snowflake datapack (no URN conflicts)
7838188 to
22c9e0c
Compare
…r ingestion Add queryUsageFeatures aspect to all 312 query entities so semantic-anchor ingestion can discover and group them. Without this aspect, queries default to exec_count=0/users=[] and get filtered by the min_distinct_users threshold. - Cluster queries (160, 50 intents): exec_count 15-45, 3-4 users each Users derived from SQL author comments + intent-group sharing - Noise queries (152): exec_count 1-5, 1-2 users (correctly below threshold) - Uses the 8 author personas from Alex's query corpus README
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
showcase-ecommerce-databricksdatapack that mirrors the existing Snowflakeshowcase-ecommercedatapack with Databricks Unity Catalog as the warehouse platformfeat/showcase-ecommerce-add-queriesbranch (inherits the 312 Snowflake query entities) - feat(showcase-ecommerce): add 312 query entities for text-to-sql / anchor demo #198What's in the datapack
01-definitions.json02-shared.jsonorder_entry_dbrefs remapped to Databricks03-data.json04-queries.json05-context.jsonSQL dialect translation (312 queries)
expr::TIMESTAMP/::DATECAST(expr AS TIMESTAMP/DATE)DATEADD('day', ...)DATEADD(day, ...)DATEDIFF('day', ...)DATEDIFF(day, ...)TO_CHAR(..., 'YYYY-MM')DATE_FORMAT(..., 'yyyy-MM')LISTAGG(DISTINCT col, sep)ARRAY_JOIN(COLLECT_SET(col), sep)ROW_COUNT(INFORMATION_SCHEMA)All 312 queries validated against a live Databricks serverless warehouse.
Parity verification
Every entity type matches or exceeds the Snowflake datapack (5,143 vs 5,119 MCPs). The only intentional difference is
+23container MCPs (Databricks catalog/schema containers from ingestion).Test plan
datahub datapack load)analytics.order_detailsshows 11 Databricks upstream tables