Summary
Integrate clang_github_tracker with the database: persist llvm/llvm-project issues, PRs, and commits during sync, use DB watermarks for GitHub API resume (replacing state.json), and drive Pinecone issue/PR selection from ClangGithubIssueItem.updated_at vs PineconeSyncStatus.final_sync_at (plus failed_ids retries). Raw JSON under workspace/raw/github_activity_tracker/ is unchanged.
Scope
- Models (no FKs):
ClangGithubIssueItem (number, is_pull_request, github_* timestamps, created_at / updated_at); ClangGithubCommit (sha, github_committed_at, audit timestamps).
- Services:
upsert_issue_item, upsert_commit, get_issue_item_watermark, get_commit_watermark, start_after_watermark (API cursors use Max(github_updated_at) / Max(github_committed_at)).
- Sync:
sync_raw_only upserts after save_*_raw_source; single start_item for unified issues+PRs fetch.
- Date resolution:
resolve_start_end_dates — DB-only; Boost-style --since / --until (aliases, invalid range → warning + clear bounds).
- CLI:
run_clang_github_tracker aligned with Boost (--skip-*, --dry-run); removed --no-upload, --upload-only, Pinecone CLI overrides; logger-only output in package.
- Preprocessors: DB query
updated_at__gt=final_sync_at (or all rows if final_sync_at is None), union failed_ids, then build_issue_document / build_pr_document from raw files. Do not change cppa_pinecone_sync or github_preprocess scan helpers.
- Backfill:
backfill_clang_github_tracker — --from-csv (default path under workspace/clang_github_tracker/) xor --from-raw.
Docs
docs/Workspace.md, Schema.md, Pinecone_preprocess_guideline.md, service_api/README.md, service_api/clang_github_tracker.md
- Align
Deployment.md / operations/github.md if they still mention state.json or removed clang flags.
Notes
github_updated_at: API fetch watermarks only.
updated_at: Pinecone incrementality only.
Summary
Integrate clang_github_tracker with the database: persist llvm/llvm-project issues, PRs, and commits during sync, use DB watermarks for GitHub API resume (replacing
state.json), and drive Pinecone issue/PR selection fromClangGithubIssueItem.updated_atvsPineconeSyncStatus.final_sync_at(plusfailed_idsretries). Raw JSON underworkspace/raw/github_activity_tracker/is unchanged.Scope
ClangGithubIssueItem(number,is_pull_request,github_*timestamps,created_at/updated_at);ClangGithubCommit(sha,github_committed_at, audit timestamps).upsert_issue_item,upsert_commit,get_issue_item_watermark,get_commit_watermark,start_after_watermark(API cursors useMax(github_updated_at)/Max(github_committed_at)).sync_raw_onlyupserts aftersave_*_raw_source; singlestart_itemfor unified issues+PRs fetch.resolve_start_end_dates— DB-only; Boost-style--since/--until(aliases, invalid range → warning + clear bounds).run_clang_github_trackeraligned with Boost (--skip-*,--dry-run); removed--no-upload,--upload-only, Pinecone CLI overrides; logger-only output in package.updated_at__gt=final_sync_at(or all rows iffinal_sync_at is None), unionfailed_ids, thenbuild_issue_document/build_pr_documentfrom raw files. Do not changecppa_pinecone_syncorgithub_preprocessscan helpers.backfill_clang_github_tracker—--from-csv(default path underworkspace/clang_github_tracker/) xor--from-raw.Docs
docs/Workspace.md,Schema.md,Pinecone_preprocess_guideline.md,service_api/README.md,service_api/clang_github_tracker.mdDeployment.md/operations/github.mdif they still mentionstate.jsonor removed clang flags.Notes
github_updated_at: API fetch watermarks only.updated_at: Pinecone incrementality only.