Skip to content

feat: add date_trunc function for time-bucketing rollups#24627

Open
VioletQwQ-0 wants to merge 24 commits into
matrixorigin:mainfrom
VioletQwQ-0:violet/issue-24551-date-trunc
Open

feat: add date_trunc function for time-bucketing rollups#24627
VioletQwQ-0 wants to merge 24 commits into
matrixorigin:mainfrom
VioletQwQ-0:violet/issue-24551-date-trunc

Conversation

@VioletQwQ-0

@VioletQwQ-0 VioletQwQ-0 commented May 27, 2026

Copy link
Copy Markdown
Collaborator

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #24551

What this PR does / why we need it:

Add date_trunc(unit, temporal_value) to truncate a DATE, DATETIME, or TIMESTAMP value to a specified precision for time-series rollups and bucketing.

Supported units:

  • year - truncate to January 1st
  • quarter - truncate to first day of the quarter
  • month - truncate to first day of the month
  • week - truncate to Monday of the ISO week
  • day - truncate to midnight
  • hour - truncate to start of the hour
  • minute - truncate to start of the minute
  • second - truncate to the second

Supported input types:

  • first argument: string unit (VARCHAR/string literal)
  • second argument: DATE, DATETIME, or TIMESTAMP

Intentional non-goals / boundaries:

  • String timestamp values are not accepted as the second argument; callers should use an explicit CAST(... AS DATE/DATETIME/TIMESTAMP).
  • DATE inputs reject sub-day units (hour, minute, second) because the result type remains DATE.
  • TIMESTAMP inputs preserve TIMESTAMP result semantics and use the session time zone for truncation.
  • Week truncation uses ISO week semantics (Monday as first day of week).

Validation:

  • CGO_CFLAGS="-I/Users/violet/code/matrixone/cgo -I/Users/violet/code/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/Users/violet/code/matrixone/cgo -lmo -L/Users/violet/code/matrixone/thirdparties/install/lib" go test ./pkg/sql/plan/function -run 'TestDateTrunc|Test_GetFunctionByName' -count=1
  • git diff --check -- test/distributed/cases/function/date_trunc.sql test/distributed/cases/function/date_trunc.result

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@mergify mergify Bot added the kind/feature label May 27, 2026
@matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label May 27, 2026
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m requesting changes because this introduces observable type/semantic regressions in date_trunc.

  1. The date_trunc(varchar, timestamp) overload is registered to return DATETIME, and the implementation appends types.Datetime after converting the timestamp into session-local wall-clock time. That drops TIMESTAMP semantics for an absolute instant. Existing timestamp-preserving temporal operators in this package return TIMESTAMP for timestamp inputs (for example date_add/date_sub on timestamp), so this new overload changes downstream comparison/cast behavior in a surprising way. Please keep the overload return type as TIMESTAMP and convert the truncated local datetime back to timestamp before appending.

  2. The date_trunc(varchar, date) overload accepts sub-day units such as hour, minute, and second, but it always converts the truncated value back to DATE. Those units therefore collapse to the original date / day truncation and become effectively misleading no-ops for DATE inputs. Please either reject sub-day units for DATE inputs or change the overload/result type so those units are representable.

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

…day units on DATE

- date_trunc(varchar, timestamp) now returns TIMESTAMP instead of DATETIME,
  matching the behavior of date_add/date_sub which preserve timestamp semantics.
- date_trunc(varchar, date) now rejects hour/minute/second units since DATE
  has no time component, making the operation a misleading no-op.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking correctness issues:

  1. date_trunc(..., timestamp) can return the wrong instant in repeated DST hours.

In pkg/sql/plan/function/func_binary.go, the timestamp path does:

  • ts.ToDatetime(loc)
  • truncate the wall-clock Datetime
  • truncated.ToTimestamp(loc)

That round-trip loses which DST fold the original instant belonged to. In a fallback transition (for example, the second 01:30 in a zone with DST rollback), truncating to hour should yield the second 01:00, but reconstructing only from the local civil fields makes the result ambiguous and can map to the wrong instant.

  1. The explicit (varchar, varchar) overload hijacks ordinary string-literal calls and forces canonical DATETIME(6) text output.

Because date_trunc uses fixedTypeMatch, a call like date_trunc('day', '2024-01-02') matches the exact (varchar, varchar) overload instead of a temporal overload. That implementation always parses as Datetime and returns truncated.String2(6), so string-literal calls are forced into YYYY-MM-DD HH:MM:SS.ffffff text semantics instead of behaving like temporal truncation on typed values.

So there are still real compatibility/correctness issues in both the timestamp semantics and the varchar overload design.

VioletQwQ-0 and others added 3 commits June 8, 2026 10:54
…runc

- Use FromClockZone instead of ToTimestamp to correctly preserve the DST
  fold when converting truncated local datetime back to timestamp. In a
  fallback transition, the original round-trip through ToTimestamp could
  map to the wrong instant.

- Remove the (varchar, varchar) overload so string-literal calls are
  routed through the temporal overloads via implicit casting, instead of
  being forced into Datetime(6) text semantics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这版实现我没有挖到硬 correctness blocker,但按高标准看,当前 SQL/BVT 覆盖还不够支撑一个新的用户可见 builtin 直接合入。

最主要的缺口有两类:

  1. TIMESTAMP 的正向 SQL 覆盖太弱
    现在 BVT 里只有一条混合查询顺手带到了 date_trunc(hour, ts),但 DATE_TRUNC 的 TIMESTAMP 路径本身有独立的时区/DST 语义,应该至少在 SQL 层明确锁住一组直接结果:year / quarter / month / week / day / hour / minute / second。现在这部分主要靠 Go unit test,不够。\n\n2. DATE 的正向 SQL 覆盖不完整\n 当前只测了 month 成功和 hour 失败,但像 year / quarter / week / day 这些合法 DATE 路径并没有直接在 BVT 里钉住。对于新函数,这会留下很明显的回归窗口。\n\n另外还有几条很便宜但值回票价的 SQL case:\n- NULL 传播(date_trunc(hour, null) / date_trunc(null, dt))\n- unit 大小写混用(YEAR, Hour)\n- week 的年边界 case(比如 1 月 1 日附近)\n\n所以我这边的结论是:代码可以,但产品级回归面还没打够。建议把至少 TIMESTAMP + DATE 的正向 SQL coverage 补齐,再继续。

@VioletQwQ-0

Copy link
Copy Markdown
Collaborator Author

Updated according to the latest CHANGES_REQUESTED.

Changes:

  • Added direct SQL/BVT coverage for DATE_TRUNC on TIMESTAMP for year / quarter / month / week / day / hour / minute / second.
  • Added DATE positive coverage for year / quarter / week / day, keeping the existing month and sub-day rejection cases.
  • Added NULL propagation coverage for timestamp input and NULL unit.
  • Added mixed/uppercase unit coverage and a week year-boundary case.

Validation:

  • CGO_CFLAGS="-I/Users/violet/code/matrixone/cgo -I/Users/violet/code/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/Users/violet/code/matrixone/cgo -lmo -L/Users/violet/code/matrixone/thirdparties/install/lib" go test ./pkg/sql/plan/function -run 'TestDateTrunc|Test_GetFunctionByName' -count=1\n- git diff --check -- test/distributed/cases/function/date_trunc.sql test/distributed/cases/function/date_trunc.result\n- Local temporary MO smoke on the new SQL cases with time_zone = +00:00.

@VioletQwQ-0

Copy link
Copy Markdown
Collaborator Author

Updated the PR according to the remaining review-risk sweep.

Changes:

  • Clarified the PR body semantic boundary: the second argument supports DATE/DATETIME/TIMESTAMP only; string timestamp values must be explicitly cast and are an intentional non-goal.
  • Added BVT coverage for bare NULL second argument: date_trunc('hour', null).
  • Added BVT coverage for DATE sub-day rejection on minute and second, in addition to the existing hour case.
  • Fixed the SCA failure reported by the current CI run: removed the ineffectual initial assignment in dateTruncCheck.
  • Synced the PR branch with latest upstream/main.

Validation:

  • CGO_CFLAGS="-I/Users/violet/code/matrixone/cgo -I/Users/violet/code/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/Users/violet/code/matrixone/cgo -lmo -L/Users/violet/code/matrixone/thirdparties/install/lib" go test ./pkg/sql/plan/function -run 'TestDateTrunc|Test_GetFunctionByName' -count=1
  • git diff --check -- pkg/sql/plan/function/func_binary.go test/distributed/cases/function/date_trunc.sql test/distributed/cases/function/date_trunc.result
  • ineffassign ./pkg/sql/plan/function

# Conflicts:
#	pkg/sql/plan/function/function_id.go
#	pkg/sql/plan/function/function_id_test.go
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

# Conflicts:
#	pkg/sql/plan/function/function_id.go
#	pkg/sql/plan/function/function_id_test.go
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature size/L Denotes a PR that changes [500,999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants