Skip to content

ClickPipes kafka exactly once#6457

Open
genzgd wants to merge 5 commits into
mainfrom
gg/clickpipes_kafka_exactly_once
Open

ClickPipes kafka exactly once#6457
genzgd wants to merge 5 commits into
mainfrom
gg/clickpipes_kafka_exactly_once

Conversation

@genzgd

@genzgd genzgd commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Small update to ClickPipes docs for Kafka exactly once, currently in very limited private preview. Not ready to merge.

Checklist

@vercel

vercel Bot commented Jun 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Ready Ready Preview, Comment Jun 29, 2026 5:56pm
4 Skipped Deployments
Project Deployment Actions Updated (UTC)
clickhouse-docs-jp Ignored Ignored Jun 29, 2026 5:56pm
clickhouse-docs-ko Ignored Ignored Preview Jun 29, 2026 5:56pm
clickhouse-docs-ru Ignored Ignored Preview Jun 29, 2026 5:56pm
clickhouse-docs-zh Ignored Ignored Preview Jun 29, 2026 5:56pm

Request Review

@genzgd genzgd changed the title Gg/clickpipes kafka exactly once ClickPipes kafka exactly once Jun 28, 2026
@genzgd genzgd marked this pull request as ready for review June 28, 2026 23:04
@genzgd genzgd requested review from a team as code owners June 28, 2026 23:04
Each insert block covers a contiguous range of offsets and carries a deterministic [deduplication token](/guides/developer/deduplicating-inserts-on-retries) of the form `topic:partition:firstOffset-lastOffset`. On replay, ClickPipes reproduces the same offset range and therefore the same token, so ClickHouse rejects the duplicate. Because the token depends only on the offset range, a replay is deduplicated even when the rebuilt block isn't byte-for-byte identical.

:::note Deduplication window
Token deduplication is bounded by the target table's [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated_deduplication_window) (the most recent 1,000 insert blocks by default) and [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds). ClickHouse recognizes a replayed block as a duplicate only while its token is still within both bounds. High-throughput pipes can churn through the block-count window quickly, so we recommend raising `replicated_deduplication_window` on the target table to cover your worst-case replay delay (the time window defaults to 7 days, which is usually generous enough). Data replayed after its token has left the window is inserted again, so exactly-once isn't guaranteed in that case.

@dhtclk dhtclk Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to verify, looks like replicated_deduplication_window defaults to 10,000 as of 25.9, and replicated_deduplication_window_seconds defaults to 3600 as of 25.10 according to the linked settings docs. This should probably be updated to match.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, these have definitely changed from our original design. I'll update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants