Skip to content

Release staging to main#457

Merged
izadoesdev merged 44 commits into
mainfrom
staging
May 16, 2026
Merged

Release staging to main#457
izadoesdev merged 44 commits into
mainfrom
staging

Conversation

@izadoesdev
Copy link
Copy Markdown
Member

@izadoesdev izadoesdev commented May 16, 2026

Summary

  • Uptime: fix probe falsely reporting redirecting sites as DOWN — every http→https / apex→www site was tripping Too many redirects (>0). Adds followRedirects: false option to safeFetch and switches the probe to use it so the outer loop handles redirects as intended.
  • Insights: queued generation service, evlog coverage + instrumentation cleanup, stale-worker recovery, redesigned generation controls, configurable defaults, compiled-worker top-level-await fix, deploy CI.
  • Query perf: per-batch and per-website concurrency limits; cut row-multiplying JOINs in summary/links/errors builders; constrain profile and session event lookups.
  • API/RPC: centralize public query access; harden insight generation RPC; materialize default insight generation config.
  • Status page: public status page refresh + monitor interaction cleanup.
  • Misc: configurable tracking warnings, dashboard cockpit simplification, readonly-query clickhouse settings, AI query/schema alignment, shared-button reuse.

Test plan

  • Verify uptime monitors that previously reported Too many redirects (>0) (e.g. databuddy.ccwww.databuddy.cc) now report UP after deploy
  • Confirm insights generation flow end-to-end: trigger, queue, worker pickup, completion, evlog visible
  • Spot-check analytics query latency on a high-volume website after concurrency limits + JOIN changes
  • Public status page renders and reflects live monitor state
  • No regression in dashboard insights cockpit and tracking warnings UI

Summary by cubic

Ships a dedicated @databuddy/insights worker for queued insight generation, refreshes the public status page, and fixes uptime redirect false alarms. Also adds query concurrency controls, optimizes analytics builders, centralizes public query access in @databuddy/ai, swaps Unkey env vars to Railway, hardens the worker by initializing run IDs before queuing jobs, stabilizes dashboard analytics e2e, and simplifies filter plumbing.

  • New Features

    • Dedicated @databuddy/insights app with BullMQ queue, scheduler, stale-run recovery, rollups, evlog; CI health checks and Docker image.
    • RPC: new insightGeneration router and expanded insights; legacy API insights routes removed. Public query access now in @databuddy/ai (canReadQueryTypesPublicly) and used by apps/api.
    • Dashboard: Insight Generation Settings (tools/depth/frequency/defaults); narratives render via streamdown.
    • Status page: redesigned monitor cards and interactive 90‑day uptime history.
    • Uptime: stop flagging HTTP→HTTPS/www redirects as DOWN via followRedirects: false; Railway deployment metadata.
    • Performance: per-batch and per-website query concurrency; builder fixes to remove row-multiplying joins and constrain lookups; ClickHouse read settings aligned in builder; compile attributed filters for custom queries.
    • Security: per‑site ignored tracking origins and a toggle to disable tracking warnings.
    • AI/schema: deep runs can return up to 10 insight cards.
    • Release polish: invalidate legacy insights API caches; stabilize Redis test mocks; initialize insight run IDs before queueing jobs; reset and clear e2e analytics seed data with retry‑aware scoping and test timeouts; gate e2e helper APIs behind DATABUDDY_E2E_MODE + key; restore local e2e access guards; simplify dashboard filter plumbing and prevent filter URL sync loops.
  • Migration

    • Deploy the new insights service (see insights.Dockerfile, docker-compose.selfhost.yml; port INSIGHTS_PORT, default 4002).
    • Set env vars: INSIGHTS_* (dispatch/maintenance/stale/workers), optional INSIGHTS_BULLMQ_REDIS_URL (falls back to BULLMQ_REDIS_URL), and SUPERMEMORY_API_KEY.
    • Run DB migrations for new insights tables and indexes.
    • For Railway deploys, replace legacy Unkey envs with Railway equivalents: use APP_ENV/RAILWAY_ENVIRONMENT_NAME and RAILWAY_* vars for uptime and insights logging/metadata.

Written for commit c8fa804. Summary will update on new commits. Review in cubic

izadoesdev added 30 commits May 15, 2026 21:57
Lets callers opt out of automatic redirect following so they can
inspect 3xx responses directly instead of receiving a thrown
"Too many redirects" error.
The probe passed maxRedirects: 0 to safeFetch expecting it to return
3xx responses for the outer loop to follow manually. safeFetch instead
threw "Too many redirects (>0)" on any single redirect, causing every
http→https or apex→www site to be reported DOWN. Switch to the new
followRedirects: false flag so the outer loop sees the 3xx and follows
it through MAX_REDIRECTS hops as intended.
Whitespace and ordering cleanups across builder configs (devices.ts,
performance.ts, vitals.ts), the percentageOf type field ordering, and
extraction of two regex constants in SimpleQueryBuilder.getColumnAlias.
No behavior changes.
events_by_date and summary_metrics aggregated session_agg via a JOIN
that materialised one row per session per bucket. Adding a
per-bucket pre-aggregation (session_by_bucket / session_summary) lets
the final JOIN run small-to-small and drops the per-session row
materialisation entirely. medianIf is replaced with quantileTDigestIf
on the same path.

outbound_links and outbound_domains LEFT JOIN-ed analytics.events on
session_id with a ±60s window. With multiple events per click in the
window, each click row was multiplied, so COUNT(*) reported clicks
times matched events. The joined event-side columns were pulled but
never returned. Dropping the JOIN fixes the count and removes the
biggest contributor to those builders' memory cost.

error_summary scanned analytics.error_spans twice through two CTEs
that produced identical uniq(session_id). Folded into one CTE
(error_stats.affectedSessions) and reused in the error-rate
calculation.
ClickHouse OOM telemetry over the last 7 days shows 386 failures all
hitting the same ~55 GiB total-memory ceiling, with two websites
accounting for 60% of them. The pattern is concurrent-load shape:
single queries are well under 300 MiB at p99, but a dashboard load
fans 10+ widgets out at once and several heavy users do that in
parallel.

executeBatch already groups requests by schema and runs each group as
a single UNION, but Promise.all sends every group at CH at once.
Replacing that with a small worker-pool helper (mapWithConcurrency)
caps in-flight groups per batch at BATCH_GROUP_CONCURRENCY (default
3, env-tunable).

executeDynamicQuery in apps/api now wraps executeBatch with a tiny
in-process keyed semaphore (runPerWebsite) so a single project
cannot have more than PER_WEBSITE_QUERY_CONCURRENCY (default 8,
env-tunable) batches in flight at once. Excess requests queue
instead of bursting the cluster.

Together these protect the cluster from concurrent-load spikes while
keeping cached refreshes (use_query_cache=1 is already on) at ~1 ms.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Shadow auto-approve: would not auto-approve. Auto-approval blocked by 13 unresolved issues from previous reviews.
Re-trigger cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 4 files (changes from recent commits).

Shadow auto-approve: would not auto-approve. Auto-approval blocked by 13 unresolved issues from previous reviews.
Re-trigger cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Shadow auto-approve: would not auto-approve. Auto-approval blocked by 13 unresolved issues from previous reviews.
Re-trigger cubic

@vercel vercel Bot temporarily deployed to staging – documentation May 16, 2026 13:34 Inactive
@railway-app railway-app Bot temporarily deployed to Databuddy / production May 16, 2026 13:34 Inactive
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Shadow auto-approve: would not auto-approve. Auto-approval blocked by 13 unresolved issues from previous reviews.
Re-trigger cubic

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 16, 2026

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant