feat: add date_trunc function for time-bucketing rollups#24627
feat: add date_trunc function for time-bucketing rollups#24627VioletQwQ-0 wants to merge 24 commits into
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
I’m requesting changes because this introduces observable type/semantic regressions in date_trunc.
-
The
date_trunc(varchar, timestamp)overload is registered to returnDATETIME, and the implementation appendstypes.Datetimeafter converting the timestamp into session-local wall-clock time. That drops TIMESTAMP semantics for an absolute instant. Existing timestamp-preserving temporal operators in this package returnTIMESTAMPfor timestamp inputs (for exampledate_add/date_subon timestamp), so this new overload changes downstream comparison/cast behavior in a surprising way. Please keep the overload return type asTIMESTAMPand convert the truncated local datetime back to timestamp before appending. -
The
date_trunc(varchar, date)overload accepts sub-day units such ashour,minute, andsecond, but it always converts the truncated value back toDATE. Those units therefore collapse to the original date / day truncation and become effectively misleading no-ops forDATEinputs. Please either reject sub-day units forDATEinputs or change the overload/result type so those units are representable.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
…day units on DATE - date_trunc(varchar, timestamp) now returns TIMESTAMP instead of DATETIME, matching the behavior of date_add/date_sub which preserve timestamp semantics. - date_trunc(varchar, date) now rejects hour/minute/second units since DATE has no time component, making the operation a misleading no-op. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
Blocking correctness issues:
date_trunc(..., timestamp)can return the wrong instant in repeated DST hours.
In pkg/sql/plan/function/func_binary.go, the timestamp path does:
ts.ToDatetime(loc)- truncate the wall-clock
Datetime truncated.ToTimestamp(loc)
That round-trip loses which DST fold the original instant belonged to. In a fallback transition (for example, the second 01:30 in a zone with DST rollback), truncating to hour should yield the second 01:00, but reconstructing only from the local civil fields makes the result ambiguous and can map to the wrong instant.
- The explicit
(varchar, varchar)overload hijacks ordinary string-literal calls and forces canonicalDATETIME(6)text output.
Because date_trunc uses fixedTypeMatch, a call like date_trunc('day', '2024-01-02') matches the exact (varchar, varchar) overload instead of a temporal overload. That implementation always parses as Datetime and returns truncated.String2(6), so string-literal calls are forced into YYYY-MM-DD HH:MM:SS.ffffff text semantics instead of behaving like temporal truncation on typed values.
So there are still real compatibility/correctness issues in both the timestamp semantics and the varchar overload design.
…runc - Use FromClockZone instead of ToTimestamp to correctly preserve the DST fold when converting truncated local datetime back to timestamp. In a fallback transition, the original round-trip through ToTimestamp could map to the wrong instant. - Remove the (varchar, varchar) overload so string-literal calls are routed through the temporal overloads via implicit casting, instead of being forced into Datetime(6) text semantics. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
XuPeng-SH
left a comment
There was a problem hiding this comment.
这版实现我没有挖到硬 correctness blocker,但按高标准看,当前 SQL/BVT 覆盖还不够支撑一个新的用户可见 builtin 直接合入。
最主要的缺口有两类:
- TIMESTAMP 的正向 SQL 覆盖太弱
现在 BVT 里只有一条混合查询顺手带到了date_trunc(hour, ts),但DATE_TRUNC的 TIMESTAMP 路径本身有独立的时区/DST 语义,应该至少在 SQL 层明确锁住一组直接结果:year / quarter / month / week / day / hour / minute / second。现在这部分主要靠 Go unit test,不够。\n\n2. DATE 的正向 SQL 覆盖不完整\n 当前只测了month成功和hour失败,但像year / quarter / week / day这些合法 DATE 路径并没有直接在 BVT 里钉住。对于新函数,这会留下很明显的回归窗口。\n\n另外还有几条很便宜但值回票价的 SQL case:\n-NULL传播(date_trunc(hour, null)/date_trunc(null, dt))\n- unit 大小写混用(YEAR,Hour)\n-week的年边界 case(比如 1 月 1 日附近)\n\n所以我这边的结论是:代码可以,但产品级回归面还没打够。建议把至少 TIMESTAMP + DATE 的正向 SQL coverage 补齐,再继续。
|
Updated according to the latest CHANGES_REQUESTED. Changes:
Validation:
|
|
Updated the PR according to the remaining review-risk sweep. Changes:
Validation:
|
# Conflicts: # pkg/sql/plan/function/function_id.go # pkg/sql/plan/function/function_id_test.go
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
# Conflicts: # pkg/sql/plan/function/function_id.go # pkg/sql/plan/function/function_id_test.go
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
What type of PR is this?
Which issue(s) this PR fixes:
issue #24551
What this PR does / why we need it:
Add
date_trunc(unit, temporal_value)to truncate a DATE, DATETIME, or TIMESTAMP value to a specified precision for time-series rollups and bucketing.Supported units:
year- truncate to January 1stquarter- truncate to first day of the quartermonth- truncate to first day of the monthweek- truncate to Monday of the ISO weekday- truncate to midnighthour- truncate to start of the hourminute- truncate to start of the minutesecond- truncate to the secondSupported input types:
VARCHAR/string literal)DATE,DATETIME, orTIMESTAMPIntentional non-goals / boundaries:
CAST(... AS DATE/DATETIME/TIMESTAMP).DATEinputs reject sub-day units (hour,minute,second) because the result type remainsDATE.TIMESTAMPinputs preserve TIMESTAMP result semantics and use the session time zone for truncation.Validation:
CGO_CFLAGS="-I/Users/violet/code/matrixone/cgo -I/Users/violet/code/matrixone/thirdparties/install/include" CGO_LDFLAGS="-L/Users/violet/code/matrixone/cgo -lmo -L/Users/violet/code/matrixone/thirdparties/install/lib" go test ./pkg/sql/plan/function -run 'TestDateTrunc|Test_GetFunctionByName' -count=1git diff --check -- test/distributed/cases/function/date_trunc.sql test/distributed/cases/function/date_trunc.result