Skip to content

DB-backed sync, watermarks, and Pinecone preprocessors #136

@snowfox1003

Description

@snowfox1003

Summary

Integrate clang_github_tracker with the database: persist llvm/llvm-project issues, PRs, and commits during sync, use DB watermarks for GitHub API resume (replacing state.json), and drive Pinecone issue/PR selection from ClangGithubIssueItem.updated_at vs PineconeSyncStatus.final_sync_at (plus failed_ids retries). Raw JSON under workspace/raw/github_activity_tracker/ is unchanged.

Scope

  • Models (no FKs): ClangGithubIssueItem (number, is_pull_request, github_* timestamps, created_at / updated_at); ClangGithubCommit (sha, github_committed_at, audit timestamps).
  • Services: upsert_issue_item, upsert_commit, get_issue_item_watermark, get_commit_watermark, start_after_watermark (API cursors use Max(github_updated_at) / Max(github_committed_at)).
  • Sync: sync_raw_only upserts after save_*_raw_source; single start_item for unified issues+PRs fetch.
  • Date resolution: resolve_start_end_dates — DB-only; Boost-style --since / --until (aliases, invalid range → warning + clear bounds).
  • CLI: run_clang_github_tracker aligned with Boost (--skip-*, --dry-run); removed --no-upload, --upload-only, Pinecone CLI overrides; logger-only output in package.
  • Preprocessors: DB query updated_at__gt=final_sync_at (or all rows if final_sync_at is None), union failed_ids, then build_issue_document / build_pr_document from raw files. Do not change cppa_pinecone_sync or github_preprocess scan helpers.
  • Backfill: backfill_clang_github_tracker--from-csv (default path under workspace/clang_github_tracker/) xor --from-raw.

Docs

  • docs/Workspace.md, Schema.md, Pinecone_preprocess_guideline.md, service_api/README.md, service_api/clang_github_tracker.md
  • Align Deployment.md / operations/github.md if they still mention state.json or removed clang flags.

Notes

  • github_updated_at: API fetch watermarks only.
  • updated_at: Pinecone incrementality only.

Metadata

Metadata

Assignees

Labels

No labels
No labels
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions