feat(graph): C++ parser — classes, namespaces, templates, exception flow#671
Merged
shivasurya merged 1 commit intomainfrom May 3, 2026
Merged
feat(graph): C++ parser — classes, namespaces, templates, exception flow#671shivasurya merged 1 commit intomainfrom
shivasurya merged 1 commit intomainfrom
Conversation
SafeDep Report SummaryNo dependency changes detected. Nothing to scan. This report is generated by SafeDep Github App |
Code Pathfinder Security ScanNo security issues detected.
Powered by Code Pathfinder |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #671 +/- ##
==========================================
+ Coverage 85.14% 85.18% +0.03%
==========================================
Files 177 178 +1
Lines 25550 26064 +514
==========================================
+ Hits 21754 22202 +448
- Misses 2977 3012 +35
- Partials 819 850 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
793a885 to
1a773d0
Compare
This was referenced May 2, 2026
Owner
Author
This was referenced May 3, 2026
Owner
Author
Merge activity
|
Add the C++ AST → graph.Node converter. Builds on the C parser (parser_c.go)
and the shared graph/clike helpers; together they give a .cpp project a
fully populated CodeGraph with Language="cpp" on every node.
# parser_cpp.go (new, ~890 lines)
C and C++ live in separate files. Where the AST shape genuinely differs
(classes, namespaces, templates, throw/try, methods inside class bodies),
parser_cpp.go has its own dedicated parse functions. Where the AST shape
is identical (variable declarations, #include directives), the existing
parseCLikeDeclaration / parseCLikeInclude in parser_c.go are reused via
their isCpp flag — no duplication. Where the shape is similar but the
language tag differs (struct/enum/typedef/call), parser_cpp.go has its
own thin parse functions to keep semantics-by-language explicit.
C++-specific node types added:
- class_declaration (Name, SuperClass, Metadata[inheritance])
- method_declaration (inline class methods + out-of-line decls)
- field_declaration (class data members)
- namespace_definition (PackageName context for contained nodes)
- template_declaration (Metadata[template_params])
- ThrowStmt (Metadata[throw_expression])
- TryStmt (Metadata[catch_clauses] = []string)
Notable features:
- Pure virtual: pure_virtual_clause → Metadata[is_pure_virtual]
- Override: virtual_specifier "override" → Metadata[is_override]
- Virtual: "virtual" keyword child → Metadata[is_virtual]
- Destructors: ~ClassName parsed as method, Metadata[is_destructor]
- Multi-inheritance: Metadata[inheritance] = ["public Animal", "private Logger"]
- Anonymous namespaces: Name="" with PackageName empty
- Nested namespaces: PackageName builds outer::inner chain
- Access propagation: access_specifier siblings update class node's
Metadata[current_access]; subsequent fields/methods read it for Modifier
- Out-of-line method definitions: qualified declarator name kept as-is
(Foo::bar) — call-graph builder will link to the inline declaration
# parser.go (modified)
Two existing cases gained a C++ branch (function_definition, call_expression)
via tidy `switch {}` chains. Five new cases for C++-only node types
(class_specifier, namespace_definition, template_declaration,
throw_statement, try_statement, access_specifier). The struct_specifier,
enum_specifier, type_definition, declaration, preproc_include cases now
dispatch to the C-flavour or C++-flavour parse function based on file type.
field_declaration dispatch fixed: previously parseJavaVariableDeclaration
ran for every field_declaration regardless of language, producing
Java-tagged variable_declaration nodes inside C struct bodies. Now C
files skip the case (struct fields are extracted in parseCStructSpecifier
via clike), C++ files route to parseCppFieldDeclaration, Java files keep
their existing handling.
Java-only handlers gated by isJavaSourceFile: block, yield_statement,
if_statement, while_statement, do_statement, for_statement,
binary_expression, class_declaration, block_comment. These were running
on every file's AST and producing Java-tagged nodes for C/C++ code that
happens to share the same node type names. Each guard is a single-line
addition that fixes cross-language pollution without touching the Java
parser internals.
# parser_c.go (touched)
parseCLikeDeclaration now delegates destructor-shaped declarations
(`~ClassName();` inside a class body) to
emitCppMethodDeclarationFromDeclaration in parser_cpp.go when in class
context. Previously these were emitted as function_definition with
is_declaration=true; they should be method_declaration. The dispatch is
a single class-context check guarded by the existing isCpp flag.
childrenByFieldName renamed to childDeclarators (only ever called with
"declarator" — unparam linter caught the dead generality).
# Tests
parser_cpp_test.go::TestParseCppEndToEnd parses testdata/cpp/ as a real
C++ project via Initialize() and asserts every gap-analysis point from
the tech spec:
- class_declarations_with_inheritance (Dog : public Animal)
- namespace_propagates_to_classes (Dog.PackageName = "mylib")
- anonymous_namespace_has_no_name
- method_declarations_with_access_and_override (public/private + override)
- destructor_recognised_as_method (~Animal with is_destructor)
- class_field_declaration (Dog.age, public)
- template_parameters_recorded (typename T)
- throw_statement and try_statement_with_catch_types
- call_expressions_with_shapes (dot, qualified — arrow tested separately)
- scoped_enum_marked (enum class Color)
- typedef_recorded_with_cpp_language
- struct_with_cpp_language (Point)
- forward_declarations_in_header (buffer.hpp)
- regression_no_java_tagged_nodes_in_cpp_files
Plus targeted unit tests for the defensive paths (forward class
declaration, catch (...), nil template list, recordAccessSpecifier
outside class).
graph_test.go updated:
- TestInitializeWithNonJavaFiles: .cpp is now parsed (this PR's whole
point), so the assertion changed from "1 node" to "Java class +
C++ function both present"
- TestBuildGraphFromASTPython{FunctionDefinition,ClassDefinition}: the
Java BlockStmt leak that artificially inflated the Python node counts
is now fixed by the parser.go guards; expectedNodeCount values
reduced to match the now-correct reality. The test still verifies
function name, parameters, and isPython flag — what it actually
intends to assert.
Co-Authored-By: Claude <noreply@anthropic.com>
1a773d0 to
b1dd894
Compare
shivasurya
added a commit
that referenced
this pull request
May 3, 2026
## Summary Adds the FQN foundation for the C/C++ call-graph builder. - **`graph/callgraph/core/c_module_types.go`** — `CModuleRegistry` (file-prefix, includes, function index) and `CppModuleRegistry` (embeds C registry plus namespace and class indices). - **`graph/callgraph/registry/c_module.go`** — `BuildCModuleRegistry`, `BuildCppModuleRegistry`, and `BuildCIncludeMap` walk the parsed `CodeGraph` and produce read-only registries. ### FQN format | Shape | Example | |---|---| | C function | `src/net/socket.c::connect_to_server` | | C++ free function (namespaced) | `src/utils.cpp::mylib::process` | | C++ class method | `src/socket.cpp::mylib::Socket::connect` | | C++ class method (no namespace) | `src/app.cpp::App::run` | | C++ free function (no namespace) | `src/main.cpp::main` | ### Include resolution order For `#include \"...\"`, first match wins: 1. Directory of the source file 2. `<projectRoot>/include/<header>` 3. `<projectRoot>/src/<header>` 4. `<projectRoot>/<header>` System includes (`#include <...>`) are skipped — they are owned by a future stdlib registry. ### Design notes - Method-to-class association uses byte-range containment within the same file. The registry never reads parser-internal context state, so it stays composable across future parser refactors. - Files outside the project root (`..`-prefixed relative paths) are dropped at registry-build time. - `FunctionIndex` deliberately preserves duplicates across header and source files so the call-graph builder can choose between declaration and definition. - `appendUnique` dedupes per-key entries within a single file (defensive against repeated graph visits). ## Test plan - [x] `go build ./...` - [x] `go test ./...` — full suite green (25 packages) - [x] `go vet ./...` - [x] `golangci-lint run ./graph/callgraph/registry/ ./graph/callgraph/core/` — 0 issues - [x] Coverage: `core` 94.3%, `registry` 91.7% on the new files - [x] 9 spec test cases covered: empty graph, files+functions, duplicate across files, duplicate same file, outside project root, namespace+class index, method-without-namespace, on-disk include resolution (4 search dirs + system + missing), language filter, plus defensive paths (orphan method, empty header name, directory-named-as-header). ## Stacked on `shiva/cpp-parser` (#671)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Stacked on #670 (C parser).
Adds the C++ parser. After this PR a
.cppproject produces a fullypopulated
CodeGraphwithLanguage=\"cpp\"on every node, plus the C++-only constructs the security analysis layer needs: classes with
inheritance, methods with access modifiers, namespaces (named and
anonymous, including nesting), templates, and exception flow
(throw/try/catch).
Files
graph/parser_cpp.go— new, ~890 lines. Class / namespace /template / field / throw / try / call / struct / enum / typedef. Uses
currentContextto detect class membership and propagate namespacePackageName through the AST recursion.
graph/parser_cpp_test.go—TestParseCppEndToEnd(15 sub-testscovering every gap-analysis point from the tech spec) plus four
targeted unit tests for defensive paths.
graph/testdata/cpp/{example.cpp,buffer.hpp}— single-projectfixture exercising every C++ construct the parser handles.
graph/parser.go— modified. Two existing cases gained a C++ branch(
function_definition,call_expression); five new cases forC++-only node types; existing
struct_specifier/enum_specifier/type_definitioncases now dispatch to the C-flavour or C++-flavourparse function based on file type. Java-only handlers (block,
if/while/do/for, yield, binary_expression, class_declaration,
block_comment) gated by
isJavaSourceFileto fix cross-languagepollution that produced Java-tagged nodes inside C/C++ files.
graph/parser_c.go— minor:parseCLikeDeclarationnow routesdestructor-shaped declarations to the C++ helper when in class context;
childrenByFieldNamerenamed tochildDeclarators(linter caughtunused generality).
graph/graph_test.go— two existing tests updated to reflect thenow-correct reality:
.cppfiles are parsed (not ignored), and theJava
BlockStmtleak that inflated Python-test node counts is fixed.Design choices
parser_c.goandparser_cpp.goareindependent. Where the AST shape is genuinely identical
(
declaration,preproc_include), the existing C functions take anisCppflag and emit the right Language tag. Where the shape differs(classes, namespaces, templates, throw/try, methods inside class
bodies), each language has its own parse function. Where the shape is
similar but C++ adds features (struct inheritance, scoped enums), the
C++ flavour is a separate function so future C++ extensions don't
ripple into the C path.
currentContext.parseCppFunctionDefinitiondetects class membership via
classFromContext(currentContext)andemits
method_declarationinstead offunction_definitionwheninside a class body. Same primitive disambiguates field_declaration:
int x;becomes a data member,void bar();becomes a methoddeclaration with
is_declaration=true.access_specifieras a sibling preceding the fields/methods itgoverns.
recordAccessSpecifiermutates the class node'sMetadata[current_access]; subsequent handlers read it. This avoidsa separate AST pre-pass while keeping the graph nodes
context-independent (each field/method carries its own Modifier).
nodeType*andmeta*declarednext to the parser that emits them. C++-only constants live in
parser_cpp.go; shared constants stay inparser_c.go.Java-tagged nodes for non-Java files. Each gate is a single-line
if isJavaSourceFile {}wrap — no Java parser internals touched.Test plan
go build ./...— cleango vet ./...— cleangolangci-lint run ./graph/...— 0 issuesgo test ./... -count=1— all 25 packages pass, zero regressionsTestParseCppEndToEnd— 15 sub-tests covering every gap-analysispoint: inheritance, namespace propagation, anonymous namespaces,
access + override + virtual + pure virtual, destructors, class
fields, templates, throw/try with catch types, dot/arrow/qualified
calls, scoped enums, typedefs, C++ structs, header forward decls,
and regression check against Java-tagged nodes leaking into C++
files
catch (...),nil template list,
recordAccessSpecifieroutside class context