Skip to content

fix: reject Java optimizations with unused additions and unchanged target method#1947

Open
mashraf-222 wants to merge 6 commits intomainfrom
cf-1081-reject-unused-additions
Open

fix: reject Java optimizations with unused additions and unchanged target method#1947
mashraf-222 wants to merge 6 commits intomainfrom
cf-1081-reject-unused-additions

Conversation

@mashraf-222
Copy link
Copy Markdown
Contributor

Problem

The AI optimizer sometimes generates "optimizations" that add new fields or helper methods to a Java class without changing the target method at all. Because benchmark noise produces small timing variations, these fake optimizations pass the speedup critic and create PRs with no real improvement.

Example: 4 commons-lang PRs each added private static final Supplier<String> NULL_SUPPLIER = Suppliers.nul(); but the target methods (getJavaAwtHeadless, getJavaIoTmpdir, etc.) were never modified to use it — yet reported 7-151% speedups.

Root Cause

replace_function() in replacement.py accepts any optimization that changes the file, even if the target method body is identical to the original. The dedup check compares the entire candidate (function + helpers/fields), so adding a new field makes it "different" from the original, bypassing the identity check.

Fix

Added _has_unused_additions() in replacement.py that:

  1. Compares the target method body before and after optimization (whitespace-normalized)
  2. If unchanged AND new fields/helpers exist, checks if any addition identifiers are referenced in the target method body
  3. Rejects the candidate if additions are unreferenced, returning the original source unchanged

This causes replace_function_definitions_for_language() to return False (no update), which skips the candidate.

Validation

  • Unit tests: 37/37 pass (5 new + 32 existing, zero regressions)
  • E2E: Fibonacci optimization ran without false rejections — no "unreferenced" warnings in logs
  • Session: /home/ubuntu/e2e-sessions/2026-04-01_15-45_cf1081-unused-additions/

Test Coverage

New TestUnusedAdditionsRejection class with 5 tests:

  • test_unchanged_method_with_unused_field_rejected — unchanged method + unused field → rejected
  • test_unchanged_method_with_unused_helper_rejected — unchanged method + unused helper → rejected
  • test_changed_method_with_used_field_accepted — changed method + used field → accepted
  • test_changed_method_without_additions_accepted — normal optimization → accepted
  • test_unchanged_method_with_used_helper_accepted — method uses new helper → accepted

Closes CF-1081

…rget method

Adds a wiring check in replace_function() that detects when the AI generates
"optimizations" adding fields/helpers that the target method never references.
Previously these passed through because benchmark noise produced fake speedups.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 1, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

- Remove 8 qualified_name="..." kwargs passed to FunctionToOptimize:
  qualified_name is a @Property, not a constructor field. Pydantic silently
  accepts it at runtime but mypy strict mode (prek hook) rejects it.
- Add -> None return annotations and missing Path / JavaSupport parameter
  annotations to every test method + fixture in test_replacement.py so the
  prek mypy hook passes when the file is in the CI diff.
Rich renders the banner panel with box-drawing characters (╭, ╮, │, etc.)
that cp1252 cannot decode. On Windows, subprocess.run(..., text=True) uses
cp1252 by default, so decoding the child stdout raises UnicodeDecodeError
and subprocess sets result.stdout to None — breaking the assertion with a
misleading "argument of type 'NoneType' is not iterable".

Pass encoding="utf-8" explicitly so the test passes on every platform.
@mashraf-222
Copy link
Copy Markdown
Contributor Author

Review

Bug premise verified — real. On current main, replace_function in codeflash/languages/java/replacement.py accepts candidates where the target method body is byte-identical to the original but new fields/helpers were added (the commons-lang NULL_SUPPLIER pattern). The whole-file diff check at codeflash/languages/code_replacer.py:153 can't catch this because the field insertion still makes the file "different". CF-1081 is a genuine gap.

Fix is architecturally correct. _has_unused_additions is placed at the right point in replace_function — after target-method resolution, before _insert_class_members mutates source — matching the existing "return source → candidate rejected" pattern used elsewhere in the same file. 37/37 tests pass locally.

CI blockers addressed in the last two commits:

  1. prek / prek was failing on 4 FunctionToOptimize(..., qualified_name="...") calls — qualified_name is a read-only @property, not a constructor field; pydantic silently dropped it at runtime, mypy strict (now enforced via the prek mypy hook) rejects it. Fixed by removing the kwarg. Also added -> None annotations on every test method + the java_support fixture so the file is mypy-clean when it lands in the CI diff.
  2. unit-tests (windows-latest, 3.13)test_help_banner.py was failing on main and inherited here. Rich renders box-drawing characters the Windows cp1252 codepage can't decode, so subprocess.run(text=True) returned stdout=None. Added encoding="utf-8" to both subprocess calls (same fix proposed in PR fix: decode help-banner test subprocess output as UTF-8 #2120 against main).

All other failing checks (e2e-java/python 500/504, snyk quota) are infra flakes unrelated to this PR.

Relationship to PR #1950 (CF-1084): both close the same class of bug (AI-generated candidate leaves the target untouched). #1947 catches the "additions-with-no-body-reference" case; #1950 catches the "class member mod with untouched target" case — including constructors, which this PR doesn't. They are complementary, not duplicates; recommend merging both.

Ready for re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant