feat(showcase-ecommerce-databricks): add standalone Databricks datapack by askumar27 · Pull Request #200 · datahub-project/static-assets

askumar27 · 2026-06-03T00:08:27Z

Summary

Adds a new showcase-ecommerce-databricks datapack that mirrors the existing Snowflake showcase-ecommerce datapack with Databricks Unity Catalog as the warehouse platform
Built on top of alexsku's feat/showcase-ecommerce-add-queries branch (inherits the 312 Snowflake query entities) - feat(showcase-ecommerce): add 312 query entities for text-to-sql / anchor demo #198
Can be loaded independently or alongside the Snowflake datapack — no URN conflicts

What's in the datapack

File	MCPs	Content
`01-definitions.json`	10	Structured properties (shared with Snowflake pack)
`02-shared.json`	3,056	All cross-platform entities (dbt, Looker, PowerBI, Tableau, Spark, Postgres, S3, users, glossary, domains, tags) with Snowflake `order_entry_db` refs remapped to Databricks
`03-data.json`	777	14 Databricks datasets + 4 containers + 490 schema fields with full governance (tags, glossary, ownership, domains, lineage, siblings, structured properties, editable descriptions, test results, usage/storage features)
`04-queries.json`	1,248	312 queries translated from Snowflake SQL to Databricks SQL
`05-context.json`	54	Knowledge documents (shared with Snowflake pack)

SQL dialect translation (312 queries)

Snowflake	Databricks	Count
`expr::TIMESTAMP` / `::DATE`	`CAST(expr AS TIMESTAMP/DATE)`	143
`DATEADD('day', ...)`	`DATEADD(day, ...)`	45
`DATEDIFF('day', ...)`	`DATEDIFF(day, ...)`	34
`TO_CHAR(..., 'YYYY-MM')`	`DATE_FORMAT(..., 'yyyy-MM')`	1
`LISTAGG(DISTINCT col, sep)`	`ARRAY_JOIN(COLLECT_SET(col), sep)`	1
`ROW_COUNT` (INFORMATION_SCHEMA)	Removed (Databricks-incompatible)	1

All 312 queries validated against a live Databricks serverless warehouse.

Parity verification

Every entity type matches or exceeds the Snowflake datapack (5,143 vs 5,119 MCPs). The only intentional difference is +23 container MCPs (Databricks catalog/schema containers from ingestion).

Test plan

All 312 translated queries executed successfully on Databricks serverless warehouse
Datapack loaded into local DataHub quickstart (datahub datapack load)
All 5 files loaded with zero failures (4,905 events total)
Entity-by-entity parity check against Snowflake datapack
No Snowflake URN leaks in Databricks entities or aspect values
Lineage verified — analytics.order_details shows 11 Databricks upstream tables
Siblings verified — Databricks datasets linked to dbt counterparts

vercel · 2026-06-03T00:08:32Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
datahub-docs-archive	Ignored		Jun 3, 2026 7:52pm

Add a new `showcase-ecommerce-databricks` datapack that mirrors the existing Snowflake showcase-ecommerce datapack with Databricks as the warehouse platform. - 14 Databricks datasets (order_entry_db.order_entry + analytics schemas) - Full governance parity: tags, glossary terms, ownership, domains, structured properties, editable descriptions, lineage, siblings, schema fields - 312 queries translated from Snowflake SQL to Databricks SQL dialect (CAST syntax, DATEADD/DATEDIFF units, LISTAGG -> ARRAY_JOIN, TO_CHAR -> DATE_FORMAT) - All cross-platform entities included (dbt, Looker, PowerBI, Tableau, Spark, etc.) with Snowflake order_entry_db references remapped to Databricks - Can be loaded independently or alongside the Snowflake datapack (no URN conflicts)

…r ingestion Add queryUsageFeatures aspect to all 312 query entities so semantic-anchor ingestion can discover and group them. Without this aspect, queries default to exec_count=0/users=[] and get filtered by the min_distinct_users threshold. - Cluster queries (160, 50 intents): exec_count 15-45, 3-4 users each Users derived from SQL author comments + intent-group sharing - Noise queries (152): exec_count 1-5, 1-2 users (correctly below threshold) - Uses the 8 author personas from Alex's query corpus README

askumar27 force-pushed the feat/showcase-ecommerce-add-databricks branch from 7838188 to 22c9e0c Compare June 3, 2026 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(showcase-ecommerce-databricks): add standalone Databricks datapack#200

feat(showcase-ecommerce-databricks): add standalone Databricks datapack#200
askumar27 wants to merge 2 commits into
mainfrom
feat/showcase-ecommerce-add-databricks

askumar27 commented Jun 3, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

askumar27 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the datapack

SQL dialect translation (312 queries)

Parity verification

Test plan

Uh oh!

vercel Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

askumar27 commented Jun 3, 2026 •

edited

Loading

vercel Bot commented Jun 3, 2026 •

edited

Loading