Skip to content

feat(graph): C++ parser — classes, namespaces, templates, exception flow#671

Merged
shivasurya merged 1 commit intomainfrom
shiva/cpp-parser
May 3, 2026
Merged

feat(graph): C++ parser — classes, namespaces, templates, exception flow#671
shivasurya merged 1 commit intomainfrom
shiva/cpp-parser

Conversation

@shivasurya
Copy link
Copy Markdown
Owner

Summary

Stacked on #670 (C parser).

Adds the C++ parser. After this PR a .cpp project produces a fully
populated CodeGraph with Language=\"cpp\" on every node, plus the C++-
only constructs the security analysis layer needs: classes with
inheritance, methods with access modifiers, namespaces (named and
anonymous, including nesting), templates, and exception flow
(throw/try/catch).

Files

  • graph/parser_cpp.go — new, ~890 lines. Class / namespace /
    template / field / throw / try / call / struct / enum / typedef. Uses
    currentContext to detect class membership and propagate namespace
    PackageName through the AST recursion.
  • graph/parser_cpp_test.goTestParseCppEndToEnd (15 sub-tests
    covering every gap-analysis point from the tech spec) plus four
    targeted unit tests for defensive paths.
  • graph/testdata/cpp/{example.cpp,buffer.hpp} — single-project
    fixture exercising every C++ construct the parser handles.
  • graph/parser.go — modified. Two existing cases gained a C++ branch
    (function_definition, call_expression); five new cases for
    C++-only node types; existing struct_specifier / enum_specifier /
    type_definition cases now dispatch to the C-flavour or C++-flavour
    parse function based on file type. Java-only handlers (block,
    if/while/do/for, yield, binary_expression, class_declaration,
    block_comment) gated by isJavaSourceFile to fix cross-language
    pollution that produced Java-tagged nodes inside C/C++ files.
  • graph/parser_c.go — minor: parseCLikeDeclaration now routes
    destructor-shaped declarations to the C++ helper when in class context;
    childrenByFieldName renamed to childDeclarators (linter caught
    unused generality).
  • graph/graph_test.go — two existing tests updated to reflect the
    now-correct reality: .cpp files are parsed (not ignored), and the
    Java BlockStmt leak that inflated Python-test node counts is fixed.

Design choices

  • Separate files per language. parser_c.go and parser_cpp.go are
    independent. Where the AST shape is genuinely identical
    (declaration, preproc_include), the existing C functions take an
    isCpp flag and emit the right Language tag. Where the shape differs
    (classes, namespaces, templates, throw/try, methods inside class
    bodies), each language has its own parse function. Where the shape is
    similar but C++ adds features (struct inheritance, scoped enums), the
    C++ flavour is a separate function so future C++ extensions don't
    ripple into the C path.
  • Method dispatch via currentContext. parseCppFunctionDefinition
    detects class membership via classFromContext(currentContext) and
    emits method_declaration instead of function_definition when
    inside a class body. Same primitive disambiguates field_declaration:
    int x; becomes a data member, void bar(); becomes a method
    declaration with is_declaration=true.
  • Access specifier as side-channel state. tree-sitter emits
    access_specifier as a sibling preceding the fields/methods it
    governs. recordAccessSpecifier mutates the class node's
    Metadata[current_access]; subsequent handlers read it. This avoids
    a separate AST pre-pass while keeping the graph nodes
    context-independent (each field/method carries its own Modifier).
  • Constants over magic strings. nodeType* and meta* declared
    next to the parser that emits them. C++-only constants live in
    parser_cpp.go; shared constants stay in parser_c.go.
  • Pre-existing bugs fixed. Java-only parsers were producing
    Java-tagged nodes for non-Java files. Each gate is a single-line
    if isJavaSourceFile {} wrap — no Java parser internals touched.

Test plan

  • go build ./... — clean
  • go vet ./... — clean
  • golangci-lint run ./graph/... — 0 issues
  • go test ./... -count=1 — all 25 packages pass, zero regressions
  • TestParseCppEndToEnd — 15 sub-tests covering every gap-analysis
    point: inheritance, namespace propagation, anonymous namespaces,
    access + override + virtual + pure virtual, destructors, class
    fields, templates, throw/try with catch types, dot/arrow/qualified
    calls, scoped enums, typedefs, C++ structs, header forward decls,
    and regression check against Java-tagged nodes leaking into C++
    files
  • Targeted unit tests: forward class declaration, catch (...),
    nil template list, recordAccessSpecifier outside class context

@shivasurya shivasurya self-assigned this May 2, 2026
@safedep
Copy link
Copy Markdown

safedep Bot commented May 2, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code labels May 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Code Pathfinder Security Scan

Pass Critical High Medium Low Info

No security issues detected.

Metric Value
Files Scanned 10
Rules 205

Powered by Code Pathfinder

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 87.19852% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.18%. Comparing base (a7d99f7) to head (b1dd894).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sast-engine/graph/parser_cpp.go 85.19% 38 Missing and 31 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #671      +/-   ##
==========================================
+ Coverage   85.14%   85.18%   +0.03%     
==========================================
  Files         177      178       +1     
  Lines       25550    26064     +514     
==========================================
+ Hits        21754    22202     +448     
- Misses       2977     3012      +35     
- Partials      819      850      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Owner Author

shivasurya commented May 3, 2026

Merge activity

  • May 3, 1:15 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 3, 1:21 PM UTC: Graphite rebased this pull request as part of a merge.
  • May 3, 1:22 PM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya changed the base branch from shiva/c-parser to graphite-base/671 May 3, 2026 13:19
@shivasurya shivasurya changed the base branch from graphite-base/671 to main May 3, 2026 13:20
Add the C++ AST → graph.Node converter. Builds on the C parser (parser_c.go)
and the shared graph/clike helpers; together they give a .cpp project a
fully populated CodeGraph with Language="cpp" on every node.

# parser_cpp.go (new, ~890 lines)

C and C++ live in separate files. Where the AST shape genuinely differs
(classes, namespaces, templates, throw/try, methods inside class bodies),
parser_cpp.go has its own dedicated parse functions. Where the AST shape
is identical (variable declarations, #include directives), the existing
parseCLikeDeclaration / parseCLikeInclude in parser_c.go are reused via
their isCpp flag — no duplication. Where the shape is similar but the
language tag differs (struct/enum/typedef/call), parser_cpp.go has its
own thin parse functions to keep semantics-by-language explicit.

C++-specific node types added:
- class_declaration       (Name, SuperClass, Metadata[inheritance])
- method_declaration      (inline class methods + out-of-line decls)
- field_declaration       (class data members)
- namespace_definition    (PackageName context for contained nodes)
- template_declaration    (Metadata[template_params])
- ThrowStmt               (Metadata[throw_expression])
- TryStmt                 (Metadata[catch_clauses] = []string)

Notable features:
- Pure virtual: pure_virtual_clause → Metadata[is_pure_virtual]
- Override: virtual_specifier "override" → Metadata[is_override]
- Virtual: "virtual" keyword child → Metadata[is_virtual]
- Destructors: ~ClassName parsed as method, Metadata[is_destructor]
- Multi-inheritance: Metadata[inheritance] = ["public Animal", "private Logger"]
- Anonymous namespaces: Name="" with PackageName empty
- Nested namespaces: PackageName builds outer::inner chain
- Access propagation: access_specifier siblings update class node's
  Metadata[current_access]; subsequent fields/methods read it for Modifier
- Out-of-line method definitions: qualified declarator name kept as-is
  (Foo::bar) — call-graph builder will link to the inline declaration

# parser.go (modified)

Two existing cases gained a C++ branch (function_definition, call_expression)
via tidy `switch {}` chains. Five new cases for C++-only node types
(class_specifier, namespace_definition, template_declaration,
throw_statement, try_statement, access_specifier). The struct_specifier,
enum_specifier, type_definition, declaration, preproc_include cases now
dispatch to the C-flavour or C++-flavour parse function based on file type.

field_declaration dispatch fixed: previously parseJavaVariableDeclaration
ran for every field_declaration regardless of language, producing
Java-tagged variable_declaration nodes inside C struct bodies. Now C
files skip the case (struct fields are extracted in parseCStructSpecifier
via clike), C++ files route to parseCppFieldDeclaration, Java files keep
their existing handling.

Java-only handlers gated by isJavaSourceFile: block, yield_statement,
if_statement, while_statement, do_statement, for_statement,
binary_expression, class_declaration, block_comment. These were running
on every file's AST and producing Java-tagged nodes for C/C++ code that
happens to share the same node type names. Each guard is a single-line
addition that fixes cross-language pollution without touching the Java
parser internals.

# parser_c.go (touched)

parseCLikeDeclaration now delegates destructor-shaped declarations
(`~ClassName();` inside a class body) to
emitCppMethodDeclarationFromDeclaration in parser_cpp.go when in class
context. Previously these were emitted as function_definition with
is_declaration=true; they should be method_declaration. The dispatch is
a single class-context check guarded by the existing isCpp flag.

childrenByFieldName renamed to childDeclarators (only ever called with
"declarator" — unparam linter caught the dead generality).

# Tests

parser_cpp_test.go::TestParseCppEndToEnd parses testdata/cpp/ as a real
C++ project via Initialize() and asserts every gap-analysis point from
the tech spec:

- class_declarations_with_inheritance (Dog : public Animal)
- namespace_propagates_to_classes (Dog.PackageName = "mylib")
- anonymous_namespace_has_no_name
- method_declarations_with_access_and_override (public/private + override)
- destructor_recognised_as_method (~Animal with is_destructor)
- class_field_declaration (Dog.age, public)
- template_parameters_recorded (typename T)
- throw_statement and try_statement_with_catch_types
- call_expressions_with_shapes (dot, qualified — arrow tested separately)
- scoped_enum_marked (enum class Color)
- typedef_recorded_with_cpp_language
- struct_with_cpp_language (Point)
- forward_declarations_in_header (buffer.hpp)
- regression_no_java_tagged_nodes_in_cpp_files

Plus targeted unit tests for the defensive paths (forward class
declaration, catch (...), nil template list, recordAccessSpecifier
outside class).

graph_test.go updated:
- TestInitializeWithNonJavaFiles: .cpp is now parsed (this PR's whole
  point), so the assertion changed from "1 node" to "Java class +
  C++ function both present"
- TestBuildGraphFromASTPython{FunctionDefinition,ClassDefinition}: the
  Java BlockStmt leak that artificially inflated the Python node counts
  is now fixed by the parser.go guards; expectedNodeCount values
  reduced to match the now-correct reality. The test still verifies
  function name, parameters, and isPython flag — what it actually
  intends to assert.

Co-Authored-By: Claude <noreply@anthropic.com>
@shivasurya shivasurya merged commit cd00b49 into main May 3, 2026
6 checks passed
@shivasurya shivasurya deleted the shiva/cpp-parser branch May 3, 2026 13:22
shivasurya added a commit that referenced this pull request May 3, 2026
## Summary

Adds the FQN foundation for the C/C++ call-graph builder.

- **`graph/callgraph/core/c_module_types.go`** — `CModuleRegistry` (file-prefix, includes, function index) and `CppModuleRegistry` (embeds C registry plus namespace and class indices).
- **`graph/callgraph/registry/c_module.go`** — `BuildCModuleRegistry`, `BuildCppModuleRegistry`, and `BuildCIncludeMap` walk the parsed `CodeGraph` and produce read-only registries.

### FQN format

| Shape | Example |
|---|---|
| C function | `src/net/socket.c::connect_to_server` |
| C++ free function (namespaced) | `src/utils.cpp::mylib::process` |
| C++ class method | `src/socket.cpp::mylib::Socket::connect` |
| C++ class method (no namespace) | `src/app.cpp::App::run` |
| C++ free function (no namespace) | `src/main.cpp::main` |

### Include resolution order

For `#include \"...\"`, first match wins:

1. Directory of the source file
2. `<projectRoot>/include/<header>`
3. `<projectRoot>/src/<header>`
4. `<projectRoot>/<header>`

System includes (`#include <...>`) are skipped — they are owned by a future stdlib registry.

### Design notes

- Method-to-class association uses byte-range containment within the same file. The registry never reads parser-internal context state, so it stays composable across future parser refactors.
- Files outside the project root (`..`-prefixed relative paths) are dropped at registry-build time.
- `FunctionIndex` deliberately preserves duplicates across header and source files so the call-graph builder can choose between declaration and definition.
- `appendUnique` dedupes per-key entries within a single file (defensive against repeated graph visits).

## Test plan

- [x] `go build ./...`
- [x] `go test ./...` — full suite green (25 packages)
- [x] `go vet ./...`
- [x] `golangci-lint run ./graph/callgraph/registry/ ./graph/callgraph/core/` — 0 issues
- [x] Coverage: `core` 94.3%, `registry` 91.7% on the new files
- [x] 9 spec test cases covered: empty graph, files+functions, duplicate across files, duplicate same file, outside project root, namespace+class index, method-without-namespace, on-disk include resolution (4 search dirs + system + missing), language filter, plus defensive paths (orphan method, empty header name, directory-named-as-header).

## Stacked on

`shiva/cpp-parser` (#671)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant