Skip to content

feat(parser): comprehensive PHP/Laravel support — fix infrastructure + Laravel semantic edges#252

Open
Minidoracat wants to merge 7 commits intotirth8205:mainfrom
Minidoracat:feat/php-laravel-support
Open

feat(parser): comprehensive PHP/Laravel support — fix infrastructure + Laravel semantic edges#252
Minidoracat wants to merge 7 commits intotirth8205:mainfrom
Minidoracat:feat/php-laravel-support

Conversation

@Minidoracat
Copy link
Copy Markdown
Contributor

@Minidoracat Minidoracat commented Apr 12, 2026

Summary

  • Fix PHP parsing infrastructure: _get_call_name(), _get_bases(), and _extract_import() all had no PHP-specific branches, making CALLS / INHERITS / IMPORTS edges completely non-functional for PHP codebases
  • Add Laravel semantic edges: Route→Controller CALLS, Eloquent relationship REFERENCES, Blade template directive IMPORTS_FROM, PSR-4 namespace resolution
  • Add language-scoped entry points: PHP-specific patterns (handle, boot, register, up/down) don't pollute other languages

Motivation

PHP is listed as a supported language, but the parser produced zero CALLS edges and zero INHERITS edges for PHP files. The root cause: tree-sitter-php uses name as the AST node type for identifiers (not identifier like other grammars), so _get_call_name() could never match PHP call expressions. Similarly, _get_bases() and _extract_import() had no PHP branches, falling through to defaults that produced no useful edges.

Tested on real Laravel 9, 12, and 13 projects:

Metric Laravel 9 (before → after) Laravel 12 (before → after) Laravel 13 (before → after)
CALLS 4,962 → 35,771 (7.2x) 0 → 9,369 25,773 → 27,008 (+1,235)
INHERITS 0 → 346 0 → 481 0 → 49
REFERENCES 9 → 54 (6x) 2 → 74 (37x) 273 → 278
TESTED_BY 0 → 4,800 0 → 681 68 → 135
Total edges 13,525 → 49,527 (+266%) 4,703 → 15,338 (+226%) 35,455 → 36,813 (+3.8%)

Laravel 13 project has JS/TS frontend code (hence non-zero baseline CALLS), but INHERITS was still 0→49 — Filament resource inheritance chains now correctly detected.

All edges spot-checked for accuracy — Route→Controller mappings, Eloquent relationships, Filament resource inheritance, and Blade directives all correspond to real code relationships.

Changes

Phase 1 — PHP infrastructure fix (parser.py)

  • _get_call_name(): PHP-specific branches for 4 call expression types (function_call_expression, member_call_expression, scoped_call_expression, object_creation_expression)
  • _get_bases(): PHP branch for base_clause (extends) + class_interface_clause (implements)
  • _extract_import(): PHP branch handling simple, grouped (use Foo\{A, B}), and aliased imports
  • _CLASS_TYPES["php"]: add trait_declaration, enum_declaration
  • _CALL_TYPES["php"]: add scoped_call_expression, object_creation_expression

Phase 2 — Entry points + Blade detection

  • flows.py: _LANG_ENTRY_NAME_PATTERNS dict for language-scoped patterns; _matches_entry_name() accepts optional language parameter
  • parser.py: detect_language() checks .blade.php compound extension before generic suffix lookup

Phase 3 — Laravel semantic edges (parser.py)

  • _extract_php_constructs(): Route definitions (Route::get('/path', [Controller::class, 'method'])) → CALLS edge to controller method
  • Detect Eloquent relationships (hasMany, belongsTo, etc. — 11 methods) → REFERENCES edge to target model
  • _php_class_from_class_access(): handles both short (Post::class) and FQCN (\App\Models\Post::class) forms

Phase 4 — Blade templates + PSR-4 (parser.py)

  • _parse_blade(): regex-based extraction of @extends, @include, @component, @livewire as IMPORTS_FROM / REFERENCES edges
  • _find_php_composer_psr4(): resolve namespaces to file paths via composer.json autoload PSR-4 mappings with caching

Docs (README.md)

  • Update flow detection limitation to include PHP/Laravel
  • Add "Framework-aware parsing" row to features table

Test plan

  • 26 new test methods across test_multilang.py (TestPHPParsing: 14, TestLaravelParsing: 5, TestBladeParsing: 6) and test_flows.py (1)
  • 761 total tests pass, 0 regressions (2 pre-existing async test failures unrelated to this PR)
  • ruff check clean
  • Verified on real Laravel 9 project — CALLS 0→35k, INHERITS 0→346
  • Verified on real Laravel 12 project — HasMiddleware, PHP 8.1 Enums, Eloquent relationships
  • Verified on real Laravel 13 project — Filament resources, Route→Controller, INHERITS 0→49
  • Non-PHP languages unaffected — all PHP-specific code gated by language == "php" or in PHP-only methods

🤖 Generated with Claude Code

@Minidoracat Minidoracat force-pushed the feat/php-laravel-support branch from 069bcf1 to 66a48c7 Compare April 14, 2026 15:09
@tirth8205
Copy link
Copy Markdown
Owner

This PR now has merge conflicts with main after recent merges to parser.py and related files. Could you rebase on the latest main?

@Minidoracat Minidoracat force-pushed the feat/php-laravel-support branch from 66a48c7 to 6f0b869 Compare April 18, 2026 21:20
@Minidoracat
Copy link
Copy Markdown
Contributor Author

Rebased onto latest main (5360a6d). Resolved conflicts in code_review_graph/parser.py:

On top of the textual rebase, one additional test-only commit fix(tests): reconcile with upstream #298 PHP CALL format:

Verified:

  • uv run pytest tests/ — 907 passed, 2 xpassed
  • uv run ruff check on the files this PR touches — clean
  • uv run python -c 'from code_review_graph import parser; print("ok")' — imports clean

Ready for another look.

@Minidoracat Minidoracat force-pushed the feat/php-laravel-support branch from 6f0b869 to f7b3687 Compare April 18, 2026 21:27
@Minidoracat
Copy link
Copy Markdown
Contributor Author

Pushed another round:

  1. Re-rebased onto main (d7e61d8) after the recent merges (feat(embeddings): OpenAI-compatible embedding provider #321, fix: hooks.json schema validation fails with "Expected array, but received undefined" #288, fix: merge-based hook installation preserves existing user hooks #203, fix: CLI build/update/watch now run post-processing (signatures, FTS, flows, communities) #98, docs: add MCP tools documentation for code-review-graph #306, fix: resolve Windows stdio JSON parsing errors and add Claude configuration docs #292). Clean this time — no conflicts.
  2. Codex post-rebase review caught that _get_call_name() had two PHP blocks: upstream Fix PHP CALL extraction: add support for method, static, and unqualified function calls #297 #298's block (function/member/nullsafe/scoped) returns first, making the feat-branch block unreachable for those node types and leaving only object_creation_expression live — plus the two blocks disagreed on scoped-call formatting (:: vs .). Merged object_creation_expression into the upstream block with consistent _normalize_php_name handling and removed the shadowed duplicate (commit refactor(parser): remove shadowed PHP branch in _get_call_name).

uv run pytest tests/ — 973 passed, 2 xpassed.
uv run ruff check on touched files — clean.

@Minidoracat Minidoracat force-pushed the feat/php-laravel-support branch from f7b3687 to f43c297 Compare April 18, 2026 21:30
@Minidoracat
Copy link
Copy Markdown
Contributor Author

Minidoracat commented Apr 18, 2026

@tirth8205 heads up — CI is failing here (and on all other open PRs, and on main itself since 6d80d1a) because commit cc82d3f ("chore: resolve merge conflicts with main") accidentally added macOS Finder-style duplicate files — the filename 2.py / filename 3.py pattern Finder generates when you copy into a directory that already has that name. It looks like these slipped in during a manual conflict resolution on that branch, and the same pattern has been repeated in the subsequent chore: resolve merge conflicts commits on main (d1611e4, dfcbcc4, 186fa3c, cb9d25a).

Affected files:

  • Python modules (fail ruff N999 "Invalid module name" because of the space): code_review_graph/analysis 2.py, enrich 2.py, enrich 3.py, exports 2.py, exports 3.py, graph_diff 2.py, jedi_resolver 2.py, memory 2.py, memory 3.py, token_benchmark 2.py (~2,800 untested lines)
  • Binary coverage artifacts: .coverage 2, .coverage 3
  • Doc duplicates: AGENTS 2.md, GEMINI 2.md, README.ja-JP 2.md, README.ja-JP 3.md, README.zh-CN 2.md
  • Test fixtures: tests/fixtures/MarkdownMsg 2.tsx, sample 2.ex

Net effect on CI:

  • lint: 10 × N999 Invalid module name failures
  • test (3.10–3.13): coverage drops to 60% because the duplicated Python files are untracked code — fails --cov-fail-under=65

My branch touches none of these; once they're removed from main this PR's CI will go green on the next run (no rebase needed on my side — the bad files are in my tree only because they're on main). Happy to open a separate cleanup PR if that's easier than patching main directly.

@Minidoracat Minidoracat force-pushed the feat/php-laravel-support branch from f43c297 to 0967b7c Compare April 18, 2026 22:12
…ure + add Laravel semantic edges

PHP's core parsing infrastructure (CALLS, INHERITS, IMPORTS edges) was
completely non-functional because `_get_call_name()` could not match
tree-sitter-php's `name` node type, `_get_bases()` had no PHP branch,
and `_extract_import()` fell through to a raw-text fallback.

This commit fixes the PHP foundation and adds Laravel-specific semantic
analysis on top:

**Phase 1 — PHP infrastructure fix:**
- `_get_call_name()`: add PHP-specific branches for all 4 call expression
  types (function_call, member_call, scoped_call, object_creation)
- `_get_bases()`: add PHP branch for `base_clause` (extends) and
  `class_interface_clause` (implements)
- `_extract_import()`: add PHP branch handling simple, grouped, and
  aliased `use` statements with proper AST traversal
- `_CLASS_TYPES["php"]`: add `trait_declaration`, `enum_declaration`
- `_CALL_TYPES["php"]`: add `scoped_call_expression`,
  `object_creation_expression`

**Phase 2 — Entry points + Blade detection:**
- `_LANG_ENTRY_NAME_PATTERNS`: language-scoped entry-point patterns so
  PHP-specific names (handle, boot, register, up, down) don't pollute
  other languages
- `detect_language()`: handle `.blade.php` compound extension before
  the generic suffix lookup

**Phase 3 — Laravel semantic edges:**
- `_extract_php_constructs()`: detect Route definitions
  (`Route::get('/path', [Controller::class, 'method'])`) and emit CALLS
  edges to controller methods
- Detect Eloquent relationships (`hasMany`, `belongsTo`, etc.) and emit
  REFERENCES edges to target models
- `_php_class_from_class_access()`: correctly extract class names from
  both short (`Post::class`) and FQCN (`\App\Models\Post::class`) forms

**Phase 4 — Blade templates + PSR-4:**
- `_parse_blade()`: regex-based extraction of `@extends`, `@include`,
  `@component`, `@livewire` directives as IMPORTS_FROM/REFERENCES edges
- `_find_php_composer_psr4()`: resolve PHP namespaces to file paths via
  `composer.json` autoload PSR-4 mappings with caching

**Tested on real Laravel 9 and 12 projects:**
- CALLS edges: 0 → 9,369 (Laravel 12 project), 4,962 → 35,771 (Laravel 9)
- INHERITS edges: 0 → 481 / 0 → 346
- REFERENCES edges: 2 → 74 / 9 → 54
- Total edges: +226% / +266%

26 new tests covering all phases. 761 total tests pass, 0 regressions.
Update limitations section to reflect PHP/Laravel entry-point detection
and add framework-aware parsing row to the features table.
Sync zh-CN, ja-JP, ko-KR, hi-IN with the Framework-aware parsing
feature row added to the English README in the previous commit.
Upstream added ^handle$ to the universal _ENTRY_NAME_PATTERNS, so
'handle' now matches all languages — not just PHP. Narrow the negative
assertion to boot/register/up which remain PHP-specific.
After rebase onto upstream/main:

- `test_finds_static_calls`: upstream tirth8205#298 keeps `::` as the static-call
  separator instead of normalizing to `.`, so assert on `User::find`.
- Add `test_finds_calls_comprehensive` covering plain/member/nullsafe/
  scoped/global-namespaced extraction (the test upstream tirth8205#298 introduced
  in TestPHPParsing — rebase placed it inside TestBladeParsing by
  accident, where `sqlQuery`/`xl`/`text` aren't present in the Blade
  fixture).
- Remove unused `sources` local in `test_finds_inheritance` (F841).
Codex review flagged two PHP blocks in `_get_call_name()` after the
rebase: upstream tirth8205#298's block (function/member/nullsafe/scoped) runs
first and returns, making the later feat-branch block unreachable for
those node types.  The two blocks also disagreed on scoped-call
formatting (`::` vs `.`), which is exactly the sort of latent rebase
hazard that would bite the next editor.

- Merge the only live arm (`object_creation_expression`) into the
  upstream block with consistent `_normalize_php_name` handling.
- Delete the shadowed/duplicated PHP block entirely.
Upstream's current `main` no longer exports `_SHEBANG_PROBE_BYTES` /
`SHEBANG_INTERPRETER_TO_LANGUAGE` (tirth8205#276's shebang detection was
reverted in one of the chore merge commits).  An earlier pass of this
rebase preserved the shebang fallback in `detect_language`, which then
NameError'd at import time.

Narrow this PR's `detect_language` back to its actual scope — Blade
compound-extension check before the plain extension lookup.  Restoring
tirth8205#276 is upstream's job, not this branch's.
@Minidoracat Minidoracat force-pushed the feat/php-laravel-support branch from 0967b7c to 267b067 Compare April 18, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants