feat(extraction): C/C++ statement extraction#676
Merged
shivasurya merged 2 commits intomainfrom May 3, 2026
Merged
Conversation
SafeDep Report SummaryNo dependency changes detected. Nothing to scan. This report is generated by SafeDep Github App |
Code Pathfinder Security ScanNo security issues detected.
Powered by Code Pathfinder |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #676 +/- ##
==========================================
+ Coverage 85.35% 85.39% +0.03%
==========================================
Files 184 187 +3
Lines 26751 27164 +413
==========================================
+ Hits 22834 23197 +363
- Misses 3040 3078 +38
- Partials 877 889 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This was referenced May 3, 2026
Owner
Author
7 tasks
Owner
Author
Merge activity
|
Add ExtractCStatements and ExtractCppStatements that walk a parsed
function body and produce one *core.Statement per recognised
construct (declaration, assignment, call, return, if/for/while/
do/switch, plus throw/try/range-for in C++).
Statements capture def-use:
- assignment: Def is the LHS variable (subscript and arrow paths
collapse to the base name); Uses are RHS identifiers and any
LHS index expressions.
- call: Uses are the receiver (for obj.method()) and arguments;
CallTarget is the bare callee, CallChain is the dotted /
qualified form ("obj.method", "ns::func").
- control flow: condition identifiers in Uses; bodies and else
clauses recurse into NestedStatements / ElseBranch.
The C and C++ extractors share every dispatcher via clikeExtractor
in statements_clike.go; the C++ wrapper plugs in throw_statement,
try_statement (with caught variable as Def of an empty assignment),
and for_range_loop (loop variable as Def, iterable as Uses) through
an extraNodeHandler hook. Keyword filtering routes through
clike.IsCKeyword / clike.IsCppKeyword so language-specific
reserved words never leak into Uses.
Sets up Statement input for the CFG builder (PR-10) and the
future variable-dependency graph.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Trim a couple of unreachable defensive nil-guards in the shared clike dispatcher and add three tests that cover the alternate paths inside the helpers — for-loop with assignment_expression initialiser, dereference-as-LHS, and nested if. Brings new-file coverage to 89.7% and recovers the 0.02% project drop. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
521aa8d to
c71dd83
Compare
shivasurya
added a commit
that referenced
this pull request
May 3, 2026
## Summary
Three integration changes that bring C/C++ analysis into the full `pathfinder scan` pipeline.
### 1. CFG handlers (`graph/callgraph/cfg/builder.go`)
Adds `processSwitch` and `processDoWhile` so C/C++ control-flow surfaces in the CFG instead of falling through to the generic statement handler.
| Construct | Shape |
|---|---|
| `switch (x) { case 1: ...; case 2: ...; default: ... }` | `[BlockTypeSwitch header]` fans out to one case block per `case_statement`, with fallthrough edges between consecutive cases and a merge block reachable from every case. |
| `do { body } while (cond);` | `[pred] -> [body] -> [BlockTypeLoop cond]`, cond loops back to body and falls through to an after-block. |
The do-while shape (no header gate before the first iteration) preserves the "execute body at least once" semantics that distinguishes it from a plain `while`.
### 2. Scan integration (`cmd/scan.go`)
After the existing Go block, the scan now:
- Calls `BuildCModuleRegistry` + `BuildCCallGraph` when the parsed `CodeGraph` carries any C-tagged node, then merges the result via `MergeCallGraphs`.
- Same for C++ with `BuildCppModuleRegistry` + `BuildCppCallGraph`.
Each step is gated by the new `hasLanguageNodes(codeGraph, language)` helper so a Python-only or Go-only project skips the C/C++ work entirely. Build failures log a warning and let the scan continue with whatever languages did build — matching the Go path.
### 3. Enricher (`output/enricher.go`)
`extractFunctionFromFQN` and `fallbackLocation` learn to handle C/C++ scope-resolved FQNs:
| FQN | Function | ClassName | RelPath |
|---|---|---|---|
| `src/main.c::main` | `main` | (empty) | `src/main.c` |
| `src/socket.cpp::mylib::Socket::connect` | `connect` | `Socket` | `src/socket.cpp` |
| `myapp.auth.login` (regression) | `login` | `auth` | (existing path) |
The dot-separated branch for Python / Go / Java is preserved; the `::` branch only fires when the FQN actually contains `::`.
## Test plan
- [x] `go build ./...`
- [x] `go test ./...` — full suite green
- [x] `go vet ./...`
- [x] `golangci-lint run ./graph/callgraph/cfg/ ./cmd/ ./output/` — 0 issues
- [x] Coverage on changed functions: `processSwitch` 92.3%, `processDoWhile` 100%, `hasLanguageNodes` 100%, `fallbackLocation` 100%, `extractFunctionFromFQN` 85.7% (the unreachable terminal `return` is pre-existing dead code)
- [x] CFG: switch with 3 cases + default, fallthrough between consecutive cases, empty switch body, do-while body + cond + after, body executes before loop header
- [x] Scan: nil graph, empty graph, no matching nodes, multiple languages
- [x] Enricher: C FQN (no class), C++ FQN with namespace + class, missing-file FQN falls through cleanly, dot-separated regression cases
## Stacked on
`shiva/cpp-statements` (#676)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Adds
ExtractCStatementsandExtractCppStatements— they walk a parsed function body and emit one*core.Statementper recognised construct (declaration, assignment, call, return, if/for/while/do/switch, plus throw / try / range-for in C++). The result feeds the CFG builder (PR-10) and the future variable-dependency graph.Statement shapes captured
int x = a + b;{Type: assignment, Def: x, Uses: [a, b]}x = func(y);{Type: assignment, Def: x, Uses: [y], CallTarget: func}func(a, b);{Type: call, CallTarget: func, Uses: [a, b]}return x;{Type: return, Uses: [x]}if (x>0) { ... } else { ... }{Type: if, Uses: [x], NestedStatements, ElseBranch}for (int i=0; i<n; i++){Type: for, Def: i, Uses: [n]}while (cond) { ... }{Type: while, Uses: [cond]}p->name = val;{Type: assignment, Def: p, Uses: [val]}buf[i] = input[j];{Type: assignment, Def: buf, Uses: [input, i, j]}obj.method(x);{Type: call, CallTarget: method, CallChain: obj.method, Uses: [obj, x]}ns::sort(begin, end);{Type: call, CallTarget: ns::sort, Uses: [begin, end]}auto x = obj.get();{Type: assignment, Def: x, Uses: [obj], CallTarget: get}throw std::runtime_error(msg);{Type: raise, CallTarget: std::runtime_error}for (auto x : items){Type: for, Def: x, Uses: [items]}Design notes
clikeExtractorinstatements_clike.goowns every node-type handler. C and C++ are thin wrappers that bind a keyword predicate (clike.IsCKeyword/clike.IsCppKeyword) and anextraNodeHandlerhook for C++-only nodes. Adding the next clike construct touches one file.buf[i] = ...,p->name = ...,(*p) = ...all collapse to the base variable forDef; index/field expressions surface as additionalUses. Matches the def-use convention used by the Go and Python extractors.forhandlers add the defined loop variable toDefand remove it fromUseslast, so the variable appears once even though it shows up in init / cond / update.field_identifier(the method name inobj.method),type_identifier,qualified_identifier(treated atomically as a call target rather than as variable uses), and every literal type — keepingUsesto true variable references.Defof an empty assignment statement so def-use sees the binding site, then the body's statements follow inElseBranch.Test plan
go build ./...go test ./...— full suite greengo vet ./...golangci-lint run ./graph/callgraph/extraction/— 0 issuessizeof,(int),NULL), bare declaration, forward declaration (nil function).nullptr,static_cast,delete,this,auto), C-style fallthrough, namespace assignment.Stacked on
shiva/cpp-cpp-call-graph(#675)