Skip to content

Add warning for stray standalone is statements#431

Open
lwaern-intel wants to merge 3 commits intointel:mainfrom
lwaern-intel:lw/stray-is-check
Open

Add warning for stray standalone is statements#431
lwaern-intel wants to merge 3 commits intointel:mainfrom
lwaern-intel:lw/stray-is-check

Conversation

@lwaern-intel
Copy link
Copy Markdown
Contributor

@lwaern-intel lwaern-intel commented Mar 30, 2026

Depends on #430; only the latter two commits shown in this PR are relevant for review.

or
```
field f @ [31:0];
is read_only;
Copy link
Copy Markdown
Contributor Author

@lwaern-intel lwaern-intel Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you could probably guess it's this second part that first led me down the lexpos rabbit hole. though when I realized it was also why all error messages have been screwed up since forever, it kinda got personal

@syssimics
Copy link
Copy Markdown
Contributor

DML Verification : 6: ❌ failure

@syssimics
Copy link
Copy Markdown
Contributor

DML Verification : 7: ❌ failure

@syssimics
Copy link
Copy Markdown
Contributor

PR Verification: ❌ failure

@syssimics
Copy link
Copy Markdown
Contributor

PR Verification: ✅ success

@mandolaerik
Copy link
Copy Markdown
Contributor

mandolaerik commented Mar 31, 2026

Dealing with lexical details from the parser has a somewhat unhygienic smell about it; perhaps a different approach might be to do an isolated check on lexer level, essentially the equivalent of re.search("field [^\n]*; *is"). Which admittedly also has a bad smell to it, but a different one.

I think the emptyprod fixup looks good, and can be merged separately first if you want to. oh you did that already

@syssimics
Copy link
Copy Markdown
Contributor

PR Verification: ✅ success

@syssimics
Copy link
Copy Markdown
Contributor

PR Verification: ✅ success

@lwaern-intel lwaern-intel force-pushed the lw/stray-is-check branch 2 times, most recently from 3659e2f to 67dc202 Compare March 31, 2026 18:15
@lwaern-intel
Copy link
Copy Markdown
Contributor Author

lwaern-intel commented Mar 31, 2026

@mandolaerik

Dealing with lexical details from the parser has a somewhat unhygienic smell about it; perhaps a different approach might be to do an isolated check on lexer level, essentially the equivalent of re.search("field [^\n]*; *is"). Which admittedly also has a bad smell to it, but a different one.

The rough_end_site bit makes me inclined to agree with you that this solution is dirty, but I'm not convinced that doing it on the lexer level would be any better or easier. In particular, I'm dubious towards a regex because whatever heuristic we use for the warning has to be very context-sensitive. Even with the simplest pattern -- everything being on the same line -- we have to be careful it could not possible match within a comment, or a method, etc. The second pattern -- an is being indented underneath -- seems nightmarish to get right.

Working on the ast level on the other hand makes is trivial to establish context and let us understand perfectly what situations the warning is emitted. That's kind of a killer benefit and seems hard to beat.

In addition I'll note this is not completely without precedence, as port_dml relies on the line and column number reported by the porting messages in order to perform its transformations.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves source-location accuracy in the DML parser (especially around empty productions) and adds a new warning to catch likely-mistaken standalone is statements that appear to be intended to modify the immediately preceding object declaration.

Changes:

  • Track the latest lexed token during parsing and fix up lexpos for empty productions to improve reported line/column locations.
  • Add warning WSTRAYIS plus parser-side detection to flag suspicious standalone is statements.
  • Add regression tests (DML error test + Python unit test) and document the behavior in release notes.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/1.4/errors/T_WSTRAYIS.dml Adds an errors/warnings test case covering the new WSTRAYIS warning.
RELEASENOTES.md Documents the lexpos fix impact and the new stray-is warning.
py/dml/toplevel.py Passes a custom tokenfunc into PLY parsing to enable latest-token tracking.
py/dml/messages.py Introduces the WSTRAYIS warning definition and message text.
py/dml/logging.py Adds colno to Site and adjusts FileInfo state serialization and column calculation.
py/dml/dmlparse.py Implements empty-production lexpos fixups and adds the stray-is detection pass.
py/dml/dmlparse_test.py Adds unit tests to enforce empty-production fixups and validate site/column behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@syssimics
Copy link
Copy Markdown
Contributor

PR Verification: ✅ success

@syssimics
Copy link
Copy Markdown
Contributor

PR Verification: ✅ success

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Previous iteration went with the stance of "as long as multiple declarations
are on the same line, assume SOMETHING is messed up, no matter what the
declarations are."
This is not necessarily a wrong stance to take, trouble is that in those cases
emitting `WSTRAYIS` runs a risk of hurting rather than helping; see the updated
tests.
for (i, stmt) in enumerate(body):
if stmt.kind == 'object':
(_, obj_type, _, _, obj_body) = stmt.args
rough_end_site = (obj_body and obj_body[-1].site) or stmt.site
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if expression is the idiomatic construct you are looking for

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i like abusing boolean operators!
bah, fine

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In truth, this was an artifact from my early development of this PR; i I had a reason to do it this way rather than if expressions, because this way I also catch if obj_body[-1].site is None. Since then, however, I realized that isn't possible in the cases stray_is_check is meant to be used.

if stmt.kind == 'object':
(_, obj_type, _, _, obj_body) = stmt.args
rough_end_site = (obj_body and obj_body[-1].site) or stmt.site
j = i+1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i + 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say each call to stray_is_check in the parser deserves one case here. I think this is the main cost of doing it in the parser instead of the lexer, and looks like it's a small cost. This convinces me that your approach is better than a lexer based, whose impl would probably get messier.

I think there is an intrinsic value in these test cases: it concretizes the things we warn for, and lets us contemplate if you can conceive use cases where a stray is can make sense.

One interesting example I can think of is:

connect x { interface y; is z; }

where I would say it is highly unlikely that is z is intended for the interface, assuming the code compiles (because the only templates that are applicable to both connect and interface are the generic ones like init and hard_reset, and these are seldom applicable on interface objects). So this is a way to trigger a WSTRAYIS that not really is helpful. (this is not intended as an argument against your PR, it is just an interesting corner to contemplate).

Still, you can argue that connect x is z { interface y; } is a more idiomatic way to phrase the same, so the warning is not really wrong. With that in mind, a different way to approach the original problem could be to stricten the style rule further, and enforce (by warning) that standalone is statements within a block should appear before subobject declarations. Which would be way more controversial warning, so it is clearly outside the scope of this PR.

@mandolaerik
Copy link
Copy Markdown
Contributor

I'm not convinced that doing it on the lexer level would be any better or easier. In particular, I'm dubious towards a regex because whatever heuristic we use for the warning has to be very context-sensitive. Even with the simplest pattern -- everything being on the same line -- we have to be careful it could not possible match within a comment, or a method, etc.

I was thinknig of analyzing the token stream directly. We do that sometimes in port_dml. This would in principle be equivalent with a very complex regexp, like "field followed by some tokens including one ;, followed directly by is, all on the same line". That would address concerns about comments etc., but it would still be somewhat dirty, and there is also local i_t *is; local field foo; is = NULL;, so your approach is probably better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants