Skip to content

feat(graph/clike): shared C/C++ AST extraction helpers#669

Merged
shivasurya merged 1 commit intomainfrom
shiva/cpp-clike-helpers
May 3, 2026
Merged

feat(graph/clike): shared C/C++ AST extraction helpers#669
shivasurya merged 1 commit intomainfrom
shiva/cpp-clike-helpers

Conversation

@shivasurya
Copy link
Copy Markdown
Owner

Summary

Stacked on #668 (C/C++ file detection foundation).

Adds the cross-cutting primitives that the C parser (PR-03) and C++ parser
(PR-04) will share. Centralising the extraction logic here prevents two
parallel implementations from drifting apart, and matches the convention
used by graph/golang/, graph/python/, graph/java/, and graph/docker/.

Files

sast-engine/graph/clike/
├── doc.go                  ← package documentation
├── detection.go            ← (from PR-01) language detection
├── detection_test.go       ← (from PR-01)
├── declarations.go         ← FunctionInfo, FieldInfo, extractors
├── declarations_test.go
├── types.go                ← ExtractTypeString
├── types_test.go
├── helpers.go              ← params, calls, keyword maps
├── helpers_test.go
└── testhelpers_test.go     ← shared parseC/parseCpp/findNode test utilities

Helpers

  • ExtractFunctionInfo → name, return type, params, IsDeclaration
    flag (forward decl vs definition)
  • ExtractStructFields → name+type pairs for structs/classes
  • ExtractTypeString → canonical type string with qualifiers,
    pointer/reference suffixes, templates, qualified names. Examples:
    char*, const std::string&, unsigned long long,
    std::vector<int>, int**
  • ExtractParameters → parallel (names, types) slices, variadics
    rendered as ("...", "...")
  • ExtractCallInfo → classifies calls as free / method-dot
    (obj.foo()) / method-arrow (ptr->foo()) / qualified
    (std::move(x)); captures receiver and args
  • IsCKeyword / IsCppKeyword → C89..C23 + C++ additions; used by
    PR-05's statement extraction to filter reserved words

Why an innerDeclarator helper

Tree-sitter's C grammar exposes pointer_declarator → declarator (field),
but the C++ reference_declarator node only has anonymous named children
for & and the inner identifier. innerDeclarator tries the field-named
child first and falls back to scanning named children, so the same
declarator walker handles int**, char*, and const std::string&
without per-grammar branching.

Test plan

  • go build ./... — clean
  • go vet ./... — clean
  • golangci-lint run ./graph/... — 0 issues
  • go test ./graph/... -count=1 — all pass (no regressions)
  • TestExtractTypeString — 11 cases (primitives, pointers, refs,
    qualifiers, templates, qualified names) + nil guard
  • TestExtractFunctionInfo — C/C++ definitions, void/typed/pointer
    returns, variadics, namespaced methods + nil guard
  • TestExtractStructFields — populated and empty structs + nil guard
  • TestExtractParameters — typed, variadic, unnamed, void, C++
    const-ref + nil guard
  • TestExtractCallInfo — free / dot / arrow / qualified shapes +
    nil and wrong-node guards
  • TestIsCKeyword / TestIsCppKeyword — C89..C23 keywords, C++-only
    additions, non-keywords, and the C/C++ exclusivity boundary

@shivasurya shivasurya self-assigned this May 2, 2026
@safedep
Copy link
Copy Markdown

safedep Bot commented May 2, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code labels May 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Code Pathfinder Security Scan

Pass Critical High Medium Low Info

No security issues detected.

Metric Value
Files Scanned 9
Rules 205

Powered by Code Pathfinder

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 92.94118% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.11%. Comparing base (8af6b85) to head (96d2c4a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sast-engine/graph/clike/types.go 84.31% 4 Missing and 4 partials ⚠️
sast-engine/graph/clike/declarations.go 96.29% 2 Missing ⚠️
sast-engine/graph/clike/helpers.go 96.92% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #669      +/-   ##
==========================================
+ Coverage   85.06%   85.11%   +0.04%     
==========================================
  Files         173      176       +3     
  Lines       25070    25240     +170     
==========================================
+ Hits        21326    21482     +156     
- Misses       2947     2955       +8     
- Partials      797      803       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Owner Author

shivasurya commented May 3, 2026

Merge activity

  • May 3, 1:15 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 3, 1:17 PM UTC: Graphite rebased this pull request as part of a merge.
  • May 3, 1:18 PM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya changed the base branch from claude/review-techspec-pr-docs-7jawU to graphite-base/669 May 3, 2026 13:15
@shivasurya shivasurya changed the base branch from graphite-base/669 to main May 3, 2026 13:16
Add the cross-cutting primitives that the C parser (parser_c.go) and the
C++ parser (parser_cpp.go) will share. C and C++ are dispatched as two
distinct languages in graph/parser.go, but the extraction logic for
declarations, types, parameters, calls, and keyword filtering is largely
identical between the two grammars — centralising it here avoids two
parallel implementations drifting apart.

Helpers added in this PR:

- ExtractFunctionInfo / FunctionInfo — name, return type, parameters,
  declaration-vs-definition flag from a function_definition node. Forward
  declarations (no compound_statement body) carry IsDeclaration=true so
  the call-graph builder can distinguish them from in-translation-unit
  definitions.

- ExtractStructFields / FieldInfo — field name+type pairs from a
  field_declaration_list, used for both C structs and C++ class bodies.

- ExtractTypeString — assembles the canonical type string (qualifiers +
  base + pointer/reference suffixes) from a (typeNode, declarator) pair.
  Handles primitive_type, type_identifier, qualified_identifier
  (std::string), template_type (vector<int>), nested pointers (int**),
  and reference_declarator (T&) — the latter requires walking past the
  C++ grammar's anonymous inner declarator child via innerDeclarator.

- ExtractParameters — (names, types) parallel slices for parameter_list
  nodes. Variadics emit ("...", "...") so callers can preserve arity.

- ExtractCallInfo / CallInfo — classifies call_expression into
  free / method-dot / method-arrow / qualified shapes and captures the
  target, args, and receiver text for the call-resolution layer.

- IsCKeyword / IsCppKeyword — backed by cKeywords (C89..C23) and a
  C++-only addition map; IsCppKeyword unions both. Used by statement
  extraction to drop reserved words from identifier lists.

The package documentation moves from an inline comment in detection.go
to a dedicated doc.go that summarises each subsystem and explains how
parser_c.go / parser_cpp.go will consume the helpers in subsequent PRs.

Tests cover every shape the helpers must handle — including the
C++-specific reference_declarator and qualified_identifier cases that
required teaching the declarator walker to fall back to scanning named
children when the grammar omits the field-named "declarator" child.

Co-Authored-By: Claude <noreply@anthropic.com>
@shivasurya shivasurya force-pushed the shiva/cpp-clike-helpers branch from 918bc51 to 96d2c4a Compare May 3, 2026 13:16
@shivasurya shivasurya merged commit c6d2a46 into main May 3, 2026
6 checks passed
@shivasurya shivasurya deleted the shiva/cpp-clike-helpers branch May 3, 2026 13:18
shivasurya added a commit that referenced this pull request May 3, 2026
…670)

## Summary

Stacked on **#669** (clike shared helpers).

Converts tree-sitter C AST nodes into `graph.Node` objects. After this PR a
C project can be scanned end-to-end via `Initialize()` and the resulting
`CodeGraph` contains every category of node the C parser owns: function
definitions, forward declarations, structs, enums, typedefs, variable
declarations, includes, and call expressions.

## Files

- `graph/parser_c.go` — new, ~600 lines, single file in package `graph`
  matching the existing `parser_python.go` / `parser_golang.go` convention.
  Organised into labelled sections (functions / types / decls / calls /
  includes / helpers) with one `Node.Type` constant per produced shape.
- `graph/parser.go` — modified. Two existing cases (`function_definition`,
  `call_expression`) gained a C branch in front of the existing Python /
  Go branches; five new cases added for `struct_specifier`,
  `enum_specifier`, `type_definition`, `declaration`, `preproc_include`.
  Java / Python paths are untouched.
- `graph/parser_c_test.go` — `TestParseCEndToEnd` parses the new
  `testdata/c/` fixture via `Initialize()` and validates every node
  category. Two focused unit tests cover the call-shape branches and
  the `isCpp=true` path that the integration fixture cannot exercise
  before PR-04 lands.
- `graph/testdata/c/{example.c,buffer.h}` — single small project
  exercising every node type.

## Design choices

- **Forward declarations.** tree-sitter emits `declaration` (not
  `function_definition`) for prototypes like `int add(int, int);`.
  `parseCLikeDeclaration` detects a `function_declarator` child via
  `isFunctionPrototype` and routes to `emitFunctionDeclaration`, which
  produces a `function_definition` node with
  `Metadata[\"is_declaration\"] = true`. Rule writers find every
  callable function under one `Type`; the prototype/definition split is
  surfaced as metadata, not as a separate node category.
- **Type reference vs type declaration.** A `struct Buffer*` parameter
  is not a struct declaration. `parseCStructSpecifier` and
  `parseCEnumSpecifier` short-circuit when the body field is nil, so
  these only record actual definitions.
- **Multi-declarator support.** `int a = 1, b = 2, c;` becomes three
  `variable_declaration` nodes via `childrenByFieldName`, which
  iterates the full child list (the stdlib `ChildByFieldName` returns
  only the first match).
- **Shared with C++ via `isCpp` flag.** `parseCLikeDeclaration` and
  `parseCLikeInclude` accept an `isCpp` flag so PR-04 can call them
  directly without duplicating logic — the only difference is the
  `Language` tag on produced nodes.
- **Constants over magic strings.** `nodeType*` and `meta*` constants
  at the top of `parser_c.go` mean rules and the call-graph builder can
  reference values by symbol rather than re-typing the string.

## Test plan

- [x] `go build ./...` — clean
- [x] `go vet ./...` — clean
- [x] `golangci-lint run ./graph/...` — 0 issues
- [x] `go test ./... -count=1` — all packages pass, zero regressions in
  Java / Python / Go tests
- [x] `TestParseCEndToEnd` — 9 sub-tests covering every produced node type:
  - function_definitions (name, return, params, modifiers)
  - forward_declaration_marked (Metadata["is_declaration"])
  - struct_declaration (fields populated)
  - enum_declaration (enumerators preserved with values)
  - type_definition_unsigned_long (DataType = "unsigned long")
  - type_definition_anonymous_struct
  - variable_declarations (globals + multi-declarator + function-local Scope)
  - includes_system_vs_local (Metadata["system_include"])
  - call_expressions_linked_to_caller (OutgoingEdges from function)
- [x] `TestParseCCallExpression_MethodAndQualified` — arrow-method and
  C++ qualified-call shapes
- [x] `TestParseCLikeDeclaration_IsCppFlag` — Language="cpp" branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant