Skip to content

feat(extraction): C/C++ statement extraction#676

Merged
shivasurya merged 2 commits intomainfrom
shiva/cpp-statements
May 3, 2026
Merged

feat(extraction): C/C++ statement extraction#676
shivasurya merged 2 commits intomainfrom
shiva/cpp-statements

Conversation

@shivasurya
Copy link
Copy Markdown
Owner

Summary

Adds ExtractCStatements and ExtractCppStatements — they walk a parsed function body and emit one *core.Statement per recognised construct (declaration, assignment, call, return, if/for/while/do/switch, plus throw / try / range-for in C++). The result feeds the CFG builder (PR-10) and the future variable-dependency graph.

Statement shapes captured

Construct Statement
int x = a + b; {Type: assignment, Def: x, Uses: [a, b]}
x = func(y); {Type: assignment, Def: x, Uses: [y], CallTarget: func}
func(a, b); {Type: call, CallTarget: func, Uses: [a, b]}
return x; {Type: return, Uses: [x]}
if (x>0) { ... } else { ... } {Type: if, Uses: [x], NestedStatements, ElseBranch}
for (int i=0; i<n; i++) {Type: for, Def: i, Uses: [n]}
while (cond) { ... } {Type: while, Uses: [cond]}
p->name = val; {Type: assignment, Def: p, Uses: [val]}
buf[i] = input[j]; {Type: assignment, Def: buf, Uses: [input, i, j]}
obj.method(x); {Type: call, CallTarget: method, CallChain: obj.method, Uses: [obj, x]}
ns::sort(begin, end); {Type: call, CallTarget: ns::sort, Uses: [begin, end]}
auto x = obj.get(); {Type: assignment, Def: x, Uses: [obj], CallTarget: get}
throw std::runtime_error(msg); {Type: raise, CallTarget: std::runtime_error}
for (auto x : items) {Type: for, Def: x, Uses: [items]}

Design notes

  • One shared dispatcher: clikeExtractor in statements_clike.go owns every node-type handler. C and C++ are thin wrappers that bind a keyword predicate (clike.IsCKeyword / clike.IsCppKeyword) and an extraNodeHandler hook for C++-only nodes. Adding the next clike construct touches one file.
  • LHS collapsing: buf[i] = ..., p->name = ..., (*p) = ... all collapse to the base variable for Def; index/field expressions surface as additional Uses. Matches the def-use convention used by the Go and Python extractors.
  • Loop-variable hygiene: the C-style and range-based for handlers add the defined loop variable to Def and remove it from Uses last, so the variable appears once even though it shows up in init / cond / update.
  • Identifier walker filters field_identifier (the method name in obj.method), type_identifier, qualified_identifier (treated atomically as a call target rather than as variable uses), and every literal type — keeping Uses to true variable references.
  • Catch clauses are flattened: the caught variable becomes the Def of an empty assignment statement so def-use sees the binding site, then the body's statements follow in ElseBranch.

Test plan

  • go build ./...
  • go test ./... — full suite green
  • go vet ./...
  • golangci-lint run ./graph/callgraph/extraction/ — 0 issues
  • Coverage on new lines: 88.2%
  • Spec C scenarios covered: assignment with binary op, assignment from call, bare call, if/else, for loop (with loop-var dropped from Uses), while, do-while, switch, pointer-arrow assignment, subscript assignment, keyword filter (sizeof, (int), NULL), bare declaration, forward declaration (nil function).
  • Spec C++ scenarios covered: method call on object, qualified call, auto-from-method-call, throw constructor, try/catch (with bound exception name as Def), range-based for, C++ keyword filter (nullptr, static_cast, delete, this, auto), C-style fallthrough, namespace assignment.

Stacked on

shiva/cpp-cpp-call-graph (#675)

@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code labels May 3, 2026
@shivasurya shivasurya self-assigned this May 3, 2026
@safedep
Copy link
Copy Markdown

safedep Bot commented May 3, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Code Pathfinder Security Scan

Pass Critical High Medium Low Info

No security issues detected.

Metric Value
Files Scanned 5
Rules 205

Powered by Code Pathfinder

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

❌ Patch coverage is 87.89346% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.39%. Comparing base (7814406) to head (c71dd83).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ine/graph/callgraph/extraction/statements_clike.go 86.76% 33 Missing and 10 partials ⚠️
...ngine/graph/callgraph/extraction/statements_cpp.go 91.25% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #676      +/-   ##
==========================================
+ Coverage   85.35%   85.39%   +0.03%     
==========================================
  Files         184      187       +3     
  Lines       26751    27164     +413     
==========================================
+ Hits        22834    23197     +363     
- Misses       3040     3078      +38     
- Partials      877      889      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Owner Author

shivasurya commented May 3, 2026

Merge activity

  • May 3, 1:15 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 3, 1:31 PM UTC: Graphite rebased this pull request as part of a merge.
  • May 3, 1:32 PM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya changed the base branch from shiva/cpp-cpp-call-graph to graphite-base/676 May 3, 2026 13:29
@shivasurya shivasurya changed the base branch from graphite-base/676 to main May 3, 2026 13:30
shivasurya and others added 2 commits May 3, 2026 13:31
Add ExtractCStatements and ExtractCppStatements that walk a parsed
function body and produce one *core.Statement per recognised
construct (declaration, assignment, call, return, if/for/while/
do/switch, plus throw/try/range-for in C++).

Statements capture def-use:

  - assignment: Def is the LHS variable (subscript and arrow paths
    collapse to the base name); Uses are RHS identifiers and any
    LHS index expressions.
  - call: Uses are the receiver (for obj.method()) and arguments;
    CallTarget is the bare callee, CallChain is the dotted /
    qualified form ("obj.method", "ns::func").
  - control flow: condition identifiers in Uses; bodies and else
    clauses recurse into NestedStatements / ElseBranch.

The C and C++ extractors share every dispatcher via clikeExtractor
in statements_clike.go; the C++ wrapper plugs in throw_statement,
try_statement (with caught variable as Def of an empty assignment),
and for_range_loop (loop variable as Def, iterable as Uses) through
an extraNodeHandler hook. Keyword filtering routes through
clike.IsCKeyword / clike.IsCppKeyword so language-specific
reserved words never leak into Uses.

Sets up Statement input for the CFG builder (PR-10) and the
future variable-dependency graph.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Trim a couple of unreachable defensive nil-guards in the shared
clike dispatcher and add three tests that cover the alternate
paths inside the helpers — for-loop with assignment_expression
initialiser, dereference-as-LHS, and nested if. Brings new-file
coverage to 89.7% and recovers the 0.02% project drop.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@shivasurya shivasurya force-pushed the shiva/cpp-statements branch from 521aa8d to c71dd83 Compare May 3, 2026 13:31
@shivasurya shivasurya merged commit 1a3f8b4 into main May 3, 2026
6 checks passed
@shivasurya shivasurya deleted the shiva/cpp-statements branch May 3, 2026 13:32
shivasurya added a commit that referenced this pull request May 3, 2026
## Summary

Three integration changes that bring C/C++ analysis into the full `pathfinder scan` pipeline.

### 1. CFG handlers (`graph/callgraph/cfg/builder.go`)

Adds `processSwitch` and `processDoWhile` so C/C++ control-flow surfaces in the CFG instead of falling through to the generic statement handler.

| Construct | Shape |
|---|---|
| `switch (x) { case 1: ...; case 2: ...; default: ... }` | `[BlockTypeSwitch header]` fans out to one case block per `case_statement`, with fallthrough edges between consecutive cases and a merge block reachable from every case. |
| `do { body } while (cond);` | `[pred] -> [body] -> [BlockTypeLoop cond]`, cond loops back to body and falls through to an after-block. |

The do-while shape (no header gate before the first iteration) preserves the "execute body at least once" semantics that distinguishes it from a plain `while`.

### 2. Scan integration (`cmd/scan.go`)

After the existing Go block, the scan now:

- Calls `BuildCModuleRegistry` + `BuildCCallGraph` when the parsed `CodeGraph` carries any C-tagged node, then merges the result via `MergeCallGraphs`.
- Same for C++ with `BuildCppModuleRegistry` + `BuildCppCallGraph`.

Each step is gated by the new `hasLanguageNodes(codeGraph, language)` helper so a Python-only or Go-only project skips the C/C++ work entirely. Build failures log a warning and let the scan continue with whatever languages did build — matching the Go path.

### 3. Enricher (`output/enricher.go`)

`extractFunctionFromFQN` and `fallbackLocation` learn to handle C/C++ scope-resolved FQNs:

| FQN | Function | ClassName | RelPath |
|---|---|---|---|
| `src/main.c::main` | `main` | (empty) | `src/main.c` |
| `src/socket.cpp::mylib::Socket::connect` | `connect` | `Socket` | `src/socket.cpp` |
| `myapp.auth.login` (regression) | `login` | `auth` | (existing path) |

The dot-separated branch for Python / Go / Java is preserved; the `::` branch only fires when the FQN actually contains `::`.

## Test plan

- [x] `go build ./...`
- [x] `go test ./...` — full suite green
- [x] `go vet ./...`
- [x] `golangci-lint run ./graph/callgraph/cfg/ ./cmd/ ./output/` — 0 issues
- [x] Coverage on changed functions: `processSwitch` 92.3%, `processDoWhile` 100%, `hasLanguageNodes` 100%, `fallbackLocation` 100%, `extractFunctionFromFQN` 85.7% (the unreachable terminal `return` is pre-existing dead code)
- [x] CFG: switch with 3 cases + default, fallthrough between consecutive cases, empty switch body, do-while body + cond + after, body executes before loop header
- [x] Scan: nil graph, empty graph, no matching nodes, multiple languages
- [x] Enricher: C FQN (no class), C++ FQN with namespace + class, missing-file FQN falls through cleanly, dot-separated regression cases

## Stacked on

`shiva/cpp-statements` (#676)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant