feat(parser): comprehensive PHP/Laravel support — fix infrastructure + Laravel semantic edges#252
feat(parser): comprehensive PHP/Laravel support — fix infrastructure + Laravel semantic edges#252Minidoracat wants to merge 7 commits intotirth8205:mainfrom
Conversation
069bcf1 to
66a48c7
Compare
|
This PR now has merge conflicts with main after recent merges to parser.py and related files. Could you rebase on the latest main? |
66a48c7 to
6f0b869
Compare
|
Rebased onto latest
On top of the textual rebase, one additional test-only commit
Verified:
Ready for another look. |
6f0b869 to
f7b3687
Compare
|
Pushed another round:
|
f7b3687 to
f43c297
Compare
|
@tirth8205 heads up — CI is failing here (and on all other open PRs, and on Affected files:
Net effect on CI:
My branch touches none of these; once they're removed from |
f43c297 to
0967b7c
Compare
…ure + add Laravel semantic edges
PHP's core parsing infrastructure (CALLS, INHERITS, IMPORTS edges) was
completely non-functional because `_get_call_name()` could not match
tree-sitter-php's `name` node type, `_get_bases()` had no PHP branch,
and `_extract_import()` fell through to a raw-text fallback.
This commit fixes the PHP foundation and adds Laravel-specific semantic
analysis on top:
**Phase 1 — PHP infrastructure fix:**
- `_get_call_name()`: add PHP-specific branches for all 4 call expression
types (function_call, member_call, scoped_call, object_creation)
- `_get_bases()`: add PHP branch for `base_clause` (extends) and
`class_interface_clause` (implements)
- `_extract_import()`: add PHP branch handling simple, grouped, and
aliased `use` statements with proper AST traversal
- `_CLASS_TYPES["php"]`: add `trait_declaration`, `enum_declaration`
- `_CALL_TYPES["php"]`: add `scoped_call_expression`,
`object_creation_expression`
**Phase 2 — Entry points + Blade detection:**
- `_LANG_ENTRY_NAME_PATTERNS`: language-scoped entry-point patterns so
PHP-specific names (handle, boot, register, up, down) don't pollute
other languages
- `detect_language()`: handle `.blade.php` compound extension before
the generic suffix lookup
**Phase 3 — Laravel semantic edges:**
- `_extract_php_constructs()`: detect Route definitions
(`Route::get('/path', [Controller::class, 'method'])`) and emit CALLS
edges to controller methods
- Detect Eloquent relationships (`hasMany`, `belongsTo`, etc.) and emit
REFERENCES edges to target models
- `_php_class_from_class_access()`: correctly extract class names from
both short (`Post::class`) and FQCN (`\App\Models\Post::class`) forms
**Phase 4 — Blade templates + PSR-4:**
- `_parse_blade()`: regex-based extraction of `@extends`, `@include`,
`@component`, `@livewire` directives as IMPORTS_FROM/REFERENCES edges
- `_find_php_composer_psr4()`: resolve PHP namespaces to file paths via
`composer.json` autoload PSR-4 mappings with caching
**Tested on real Laravel 9 and 12 projects:**
- CALLS edges: 0 → 9,369 (Laravel 12 project), 4,962 → 35,771 (Laravel 9)
- INHERITS edges: 0 → 481 / 0 → 346
- REFERENCES edges: 2 → 74 / 9 → 54
- Total edges: +226% / +266%
26 new tests covering all phases. 761 total tests pass, 0 regressions.
Update limitations section to reflect PHP/Laravel entry-point detection and add framework-aware parsing row to the features table.
Sync zh-CN, ja-JP, ko-KR, hi-IN with the Framework-aware parsing feature row added to the English README in the previous commit.
Upstream added ^handle$ to the universal _ENTRY_NAME_PATTERNS, so 'handle' now matches all languages — not just PHP. Narrow the negative assertion to boot/register/up which remain PHP-specific.
After rebase onto upstream/main: - `test_finds_static_calls`: upstream tirth8205#298 keeps `::` as the static-call separator instead of normalizing to `.`, so assert on `User::find`. - Add `test_finds_calls_comprehensive` covering plain/member/nullsafe/ scoped/global-namespaced extraction (the test upstream tirth8205#298 introduced in TestPHPParsing — rebase placed it inside TestBladeParsing by accident, where `sqlQuery`/`xl`/`text` aren't present in the Blade fixture). - Remove unused `sources` local in `test_finds_inheritance` (F841).
Codex review flagged two PHP blocks in `_get_call_name()` after the rebase: upstream tirth8205#298's block (function/member/nullsafe/scoped) runs first and returns, making the later feat-branch block unreachable for those node types. The two blocks also disagreed on scoped-call formatting (`::` vs `.`), which is exactly the sort of latent rebase hazard that would bite the next editor. - Merge the only live arm (`object_creation_expression`) into the upstream block with consistent `_normalize_php_name` handling. - Delete the shadowed/duplicated PHP block entirely.
Upstream's current `main` no longer exports `_SHEBANG_PROBE_BYTES` / `SHEBANG_INTERPRETER_TO_LANGUAGE` (tirth8205#276's shebang detection was reverted in one of the chore merge commits). An earlier pass of this rebase preserved the shebang fallback in `detect_language`, which then NameError'd at import time. Narrow this PR's `detect_language` back to its actual scope — Blade compound-extension check before the plain extension lookup. Restoring tirth8205#276 is upstream's job, not this branch's.
0967b7c to
267b067
Compare
Summary
_get_call_name(),_get_bases(), and_extract_import()all had no PHP-specific branches, making CALLS / INHERITS / IMPORTS edges completely non-functional for PHP codebaseshandle,boot,register,up/down) don't pollute other languagesMotivation
PHP is listed as a supported language, but the parser produced zero CALLS edges and zero INHERITS edges for PHP files. The root cause: tree-sitter-php uses
nameas the AST node type for identifiers (notidentifierlike other grammars), so_get_call_name()could never match PHP call expressions. Similarly,_get_bases()and_extract_import()had no PHP branches, falling through to defaults that produced no useful edges.Tested on real Laravel 9, 12, and 13 projects:
All edges spot-checked for accuracy — Route→Controller mappings, Eloquent relationships, Filament resource inheritance, and Blade directives all correspond to real code relationships.
Changes
Phase 1 — PHP infrastructure fix (
parser.py)_get_call_name(): PHP-specific branches for 4 call expression types (function_call_expression,member_call_expression,scoped_call_expression,object_creation_expression)_get_bases(): PHP branch forbase_clause(extends) +class_interface_clause(implements)_extract_import(): PHP branch handling simple, grouped (use Foo\{A, B}), and aliased imports_CLASS_TYPES["php"]: addtrait_declaration,enum_declaration_CALL_TYPES["php"]: addscoped_call_expression,object_creation_expressionPhase 2 — Entry points + Blade detection
flows.py:_LANG_ENTRY_NAME_PATTERNSdict for language-scoped patterns;_matches_entry_name()accepts optionallanguageparameterparser.py:detect_language()checks.blade.phpcompound extension before generic suffix lookupPhase 3 — Laravel semantic edges (
parser.py)_extract_php_constructs(): Route definitions (Route::get('/path', [Controller::class, 'method'])) → CALLS edge to controller methodhasMany,belongsTo, etc. — 11 methods) → REFERENCES edge to target model_php_class_from_class_access(): handles both short (Post::class) and FQCN (\App\Models\Post::class) formsPhase 4 — Blade templates + PSR-4 (
parser.py)_parse_blade(): regex-based extraction of@extends,@include,@component,@livewireas IMPORTS_FROM / REFERENCES edges_find_php_composer_psr4(): resolve namespaces to file paths viacomposer.jsonautoload PSR-4 mappings with cachingDocs (
README.md)Test plan
test_multilang.py(TestPHPParsing: 14, TestLaravelParsing: 5, TestBladeParsing: 6) andtest_flows.py(1)ruff checkcleanlanguage == "php"or in PHP-only methods🤖 Generated with Claude Code