Skip to content

Conversation

@LalatenduMohanty
Copy link
Member

@LalatenduMohanty LalatenduMohanty commented May 16, 2025

Replaces the Python email library parser with packaging.metadata.Metadata for parsing wheel/package metadata.

Fixes #561

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from 8bd775c to 1247975 Compare May 19, 2025 10:45
@LalatenduMohanty LalatenduMohanty requested a review from a team as a code owner July 4, 2025 18:07
@LalatenduMohanty LalatenduMohanty changed the title [WIP] Replaceing the metadata parser from packaging.metadata Replaceing the metadata parser from packaging.metadata Jul 4, 2025
@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from cb3813b to bdada13 Compare July 4, 2025 18:35
@LalatenduMohanty
Copy link
Member Author

@tiran @dhellmann PTAL

Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?

@tiran
Copy link
Collaborator

tiran commented Jul 7, 2025

packaging.metadata.parse_email does not round trip and looses any field that it does not understand. Is this okay? Do we need a round trip-safe function?

@LalatenduMohanty
Copy link
Member Author

How many places in the code do we do something similar to parse metadata? How useful would it be to have a function that takes a path and returns the metadata?

My bad. I should have checked all code to see if same pattern exists else where. I can see https://github.com/python-wheel-build/fromager/blob/main/src/fromager/candidate.py#L82 . I do not think we need a common function yet. PTAL and let me know.

@LalatenduMohanty
Copy link
Member Author

packaging.metadata.parse_email does not round trip and looses any field that it does not understand. Is this okay? Do we need a round trip-safe function?

Let me get back to you on this.

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 3 times, most recently from 6e992d6 to a1a70e7 Compare July 7, 2025 19:42
@LalatenduMohanty
Copy link
Member Author

@tiran Since fromager only reads metadata for dependency resolution and doesn't need to write it back, round trip safety isn't necessary. The benefits of type safety and validation from packaging.metadata outweigh the loss of unknown fields that aren't being used anyway.

Copy link
Contributor

@rd4398 rd4398 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I will wait for @tiran to approve since he had clarification questions

Comment on lines 787 to 788
raw_metadata, _ = parse_email(f.read())
metadata = Metadata.from_raw(raw_metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using parse_email() + Metadata.from_raw() instead of Metadata.parse_email()? The Metadata.parse_email() combines parse_email(), Metadata.from_raw(), and additional validation.

This code should probably use fromager.dependencies.parse_metadata(metadata_filename).

@LalatenduMohanty
Copy link
Member Author

@tiran PTAL when you have a chance

@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 2 times, most recently from 5554a83 to c8ee75b Compare January 26, 2026 15:14
@LalatenduMohanty LalatenduMohanty force-pushed the issue_561 branch 4 times, most recently from b5ab037 to 7e8504b Compare January 27, 2026 04:17
Comment on lines 478 to 481
wheel_name_parts = wheel_filename.stem.split("-")
dist_name = wheel_name_parts[0]
dist_version = wheel_name_parts[1]
predicted_dist_info = f"{dist_name}-{dist_version}.dist-info/METADATA"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not invent our own wheel parsing algorithm. The function add_extra_metadata_to_wheels has some code to get the dist-info directory of a wheel file. Perhaps move the code into a common, shared helper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I am going to use parse_wheel_filename() i.e. the same underlying function from add_extra_metadata_to_wheels

p = BytesParser()
metadata = p.parse(f, headersonly=True)
return Version(metadata["Version"])
metadata = dependencies.parse_metadata(metadata_filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_metadata validates the metadata by default. This will raise an exception if the metadata version does not match the metadata content, e.g. license-file field in Metadata < 2.4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add validate=False here i.e. metadata = dependencies.parse_metadata(metadata_filename, validate=False)

Replaces the Python email library parser with packaging.metadata.Metadata
for parsing wheel/package metadata.

Fixes: python-wheel-build#561

Co-Authored-By: Claude <[email protected]>

Signed-off-by: Lalatendu Mohanty <[email protected]>
Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this version. @tiran had more detailed comments, so I will leave it open for him to approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use packaging library to parse metadata instead of doing it ourselves

4 participants