feat(graph/clike): shared C/C++ AST extraction helpers#669
Merged
shivasurya merged 1 commit intomainfrom May 3, 2026
Merged
Conversation
SafeDep Report SummaryNo dependency changes detected. Nothing to scan. This report is generated by SafeDep Github App |
Code Pathfinder Security ScanNo security issues detected.
Powered by Code Pathfinder |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #669 +/- ##
==========================================
+ Coverage 85.06% 85.11% +0.04%
==========================================
Files 173 176 +3
Lines 25070 25240 +170
==========================================
+ Hits 21326 21482 +156
- Misses 2947 2955 +8
- Partials 797 803 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4338394 to
918bc51
Compare
7 tasks
Owner
Author
This was referenced May 3, 2026
Owner
Author
Merge activity
|
Add the cross-cutting primitives that the C parser (parser_c.go) and the
C++ parser (parser_cpp.go) will share. C and C++ are dispatched as two
distinct languages in graph/parser.go, but the extraction logic for
declarations, types, parameters, calls, and keyword filtering is largely
identical between the two grammars — centralising it here avoids two
parallel implementations drifting apart.
Helpers added in this PR:
- ExtractFunctionInfo / FunctionInfo — name, return type, parameters,
declaration-vs-definition flag from a function_definition node. Forward
declarations (no compound_statement body) carry IsDeclaration=true so
the call-graph builder can distinguish them from in-translation-unit
definitions.
- ExtractStructFields / FieldInfo — field name+type pairs from a
field_declaration_list, used for both C structs and C++ class bodies.
- ExtractTypeString — assembles the canonical type string (qualifiers +
base + pointer/reference suffixes) from a (typeNode, declarator) pair.
Handles primitive_type, type_identifier, qualified_identifier
(std::string), template_type (vector<int>), nested pointers (int**),
and reference_declarator (T&) — the latter requires walking past the
C++ grammar's anonymous inner declarator child via innerDeclarator.
- ExtractParameters — (names, types) parallel slices for parameter_list
nodes. Variadics emit ("...", "...") so callers can preserve arity.
- ExtractCallInfo / CallInfo — classifies call_expression into
free / method-dot / method-arrow / qualified shapes and captures the
target, args, and receiver text for the call-resolution layer.
- IsCKeyword / IsCppKeyword — backed by cKeywords (C89..C23) and a
C++-only addition map; IsCppKeyword unions both. Used by statement
extraction to drop reserved words from identifier lists.
The package documentation moves from an inline comment in detection.go
to a dedicated doc.go that summarises each subsystem and explains how
parser_c.go / parser_cpp.go will consume the helpers in subsequent PRs.
Tests cover every shape the helpers must handle — including the
C++-specific reference_declarator and qualified_identifier cases that
required teaching the declarator walker to fall back to scanning named
children when the grammar omits the field-named "declarator" child.
Co-Authored-By: Claude <noreply@anthropic.com>
918bc51 to
96d2c4a
Compare
shivasurya
added a commit
that referenced
this pull request
May 3, 2026
…670) ## Summary Stacked on **#669** (clike shared helpers). Converts tree-sitter C AST nodes into `graph.Node` objects. After this PR a C project can be scanned end-to-end via `Initialize()` and the resulting `CodeGraph` contains every category of node the C parser owns: function definitions, forward declarations, structs, enums, typedefs, variable declarations, includes, and call expressions. ## Files - `graph/parser_c.go` — new, ~600 lines, single file in package `graph` matching the existing `parser_python.go` / `parser_golang.go` convention. Organised into labelled sections (functions / types / decls / calls / includes / helpers) with one `Node.Type` constant per produced shape. - `graph/parser.go` — modified. Two existing cases (`function_definition`, `call_expression`) gained a C branch in front of the existing Python / Go branches; five new cases added for `struct_specifier`, `enum_specifier`, `type_definition`, `declaration`, `preproc_include`. Java / Python paths are untouched. - `graph/parser_c_test.go` — `TestParseCEndToEnd` parses the new `testdata/c/` fixture via `Initialize()` and validates every node category. Two focused unit tests cover the call-shape branches and the `isCpp=true` path that the integration fixture cannot exercise before PR-04 lands. - `graph/testdata/c/{example.c,buffer.h}` — single small project exercising every node type. ## Design choices - **Forward declarations.** tree-sitter emits `declaration` (not `function_definition`) for prototypes like `int add(int, int);`. `parseCLikeDeclaration` detects a `function_declarator` child via `isFunctionPrototype` and routes to `emitFunctionDeclaration`, which produces a `function_definition` node with `Metadata[\"is_declaration\"] = true`. Rule writers find every callable function under one `Type`; the prototype/definition split is surfaced as metadata, not as a separate node category. - **Type reference vs type declaration.** A `struct Buffer*` parameter is not a struct declaration. `parseCStructSpecifier` and `parseCEnumSpecifier` short-circuit when the body field is nil, so these only record actual definitions. - **Multi-declarator support.** `int a = 1, b = 2, c;` becomes three `variable_declaration` nodes via `childrenByFieldName`, which iterates the full child list (the stdlib `ChildByFieldName` returns only the first match). - **Shared with C++ via `isCpp` flag.** `parseCLikeDeclaration` and `parseCLikeInclude` accept an `isCpp` flag so PR-04 can call them directly without duplicating logic — the only difference is the `Language` tag on produced nodes. - **Constants over magic strings.** `nodeType*` and `meta*` constants at the top of `parser_c.go` mean rules and the call-graph builder can reference values by symbol rather than re-typing the string. ## Test plan - [x] `go build ./...` — clean - [x] `go vet ./...` — clean - [x] `golangci-lint run ./graph/...` — 0 issues - [x] `go test ./... -count=1` — all packages pass, zero regressions in Java / Python / Go tests - [x] `TestParseCEndToEnd` — 9 sub-tests covering every produced node type: - function_definitions (name, return, params, modifiers) - forward_declaration_marked (Metadata["is_declaration"]) - struct_declaration (fields populated) - enum_declaration (enumerators preserved with values) - type_definition_unsigned_long (DataType = "unsigned long") - type_definition_anonymous_struct - variable_declarations (globals + multi-declarator + function-local Scope) - includes_system_vs_local (Metadata["system_include"]) - call_expressions_linked_to_caller (OutgoingEdges from function) - [x] `TestParseCCallExpression_MethodAndQualified` — arrow-method and C++ qualified-call shapes - [x] `TestParseCLikeDeclaration_IsCppFlag` — Language="cpp" branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Stacked on #668 (C/C++ file detection foundation).
Adds the cross-cutting primitives that the C parser (PR-03) and C++ parser
(PR-04) will share. Centralising the extraction logic here prevents two
parallel implementations from drifting apart, and matches the convention
used by
graph/golang/,graph/python/,graph/java/, andgraph/docker/.Files
Helpers
ExtractFunctionInfo→ name, return type, params,IsDeclarationflag (forward decl vs definition)
ExtractStructFields→ name+type pairs for structs/classesExtractTypeString→ canonical type string with qualifiers,pointer/reference suffixes, templates, qualified names. Examples:
char*,const std::string&,unsigned long long,std::vector<int>,int**ExtractParameters→ parallel(names, types)slices, variadicsrendered as
("...", "...")ExtractCallInfo→ classifies calls as free / method-dot(
obj.foo()) / method-arrow (ptr->foo()) / qualified(
std::move(x)); captures receiver and argsIsCKeyword/IsCppKeyword→ C89..C23 + C++ additions; used byPR-05's statement extraction to filter reserved words
Why an
innerDeclaratorhelperTree-sitter's C grammar exposes
pointer_declarator → declarator (field),but the C++
reference_declaratornode only has anonymous named childrenfor
&and the inner identifier.innerDeclaratortries the field-namedchild first and falls back to scanning named children, so the same
declarator walker handles
int**,char*, andconst std::string&without per-grammar branching.
Test plan
go build ./...— cleango vet ./...— cleangolangci-lint run ./graph/...— 0 issuesgo test ./graph/... -count=1— all pass (no regressions)TestExtractTypeString— 11 cases (primitives, pointers, refs,qualifiers, templates, qualified names) + nil guard
TestExtractFunctionInfo— C/C++ definitions, void/typed/pointerreturns, variadics, namespaced methods + nil guard
TestExtractStructFields— populated and empty structs + nil guardTestExtractParameters— typed, variadic, unnamed, void, C++const-ref + nil guard
TestExtractCallInfo— free / dot / arrow / qualified shapes +nil and wrong-node guards
TestIsCKeyword/TestIsCppKeyword— C89..C23 keywords, C++-onlyadditions, non-keywords, and the C/C++ exclusivity boundary