Skip to content

validate protobuf message index bounds on deserialize#2271

Open
uwezkhan wants to merge 1 commit into
confluentinc:masterfrom
uwezkhan:protobuf-msgidx-bounds
Open

validate protobuf message index bounds on deserialize#2271
uwezkhan wants to merge 1 commit into
confluentinc:masterfrom
uwezkhan:protobuf-msgidx-bounds

Conversation

@uwezkhan

Copy link
Copy Markdown

What

The protobuf deserializer picks the writer message type by walking the message-index array from the Confluent wire framing into the parsed FileDescriptorProto. SchemaId._read_index_array caps the array length but not the individual index values, and those values are zigzag varints, so a crafted message can carry a negative or out-of-range index. _get_message_desc_proto then indexes desc.message_type / desc.nested_type with the value directly.

Before: a negative index (for example uvarint 0x01, which zigzag-decodes to -1) wraps around and resolves to a different message type in the same file, so the payload is decoded against the wrong descriptor. An out-of-range positive index raises a bare IndexError.

After: an index outside 0 <= index < len(...) raises SerializationError with the index and the available count, matching the bound check the confluent-kafka-go reference (toMessageDesc) already applies. Valid indexes behave the same as today.

Tradeoff: a message that previously decoded against an unintended message type now fails fast instead of returning a mis-typed object. That is the point of the change, but it does convert a silent wrong-result into a surfaced error.

Checklist

  • Contains customer facing changes? Including API/behavior changes
  • Did you add sufficient unit test and/or integration test coverage for this PR?

References

JIRA:

Test & Review

Added test_message_index_in_range and test_message_index_out_of_range to tests/schema_registry/_async/test_proto.py (and the generated _sync counterpart). The out-of-range cases [-1], [2], [0, -1], [0, 5] reproduce the issue on master and pass after the change. _sync files were regenerated with tools/unasync.py.

Open questions / Follow-ups

None.

@uwezkhan uwezkhan requested review from a team and Matthew Seal (MSeal) as code owners June 10, 2026 06:04
@confluent-cla-assistant

Copy link
Copy Markdown

🎉 All Contributor License Agreements have been signed. Ready to merge.
✅ uwezkhan
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@rayokota Robert Yokota (rayokota) added the component:schema-registry Any schema registry related isues rather than kafka isolated ones label Jun 12, 2026
@rayokota

Copy link
Copy Markdown
Member

/sem-approve

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request hardens Protobuf deserialization against crafted Confluent wire-framed payloads by validating message-index bounds before indexing into FileDescriptorProto / DescriptorProto, preventing negative-index wraparound and replacing bare IndexError with a consistent SerializationError.

Changes:

  • Add explicit bounds checks for top-level and nested message indexes in _get_message_desc_proto (sync + async), raising SerializationError on invalid indexes.
  • Add unit tests (sync + async) that construct a minimal FileDescriptorProto and verify both in-range resolution and out-of-range failures.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
tests/schema_registry/_sync/test_proto.py Adds sync tests covering valid and invalid message-index arrays, asserting SerializationError for out-of-range indexes.
tests/schema_registry/_async/test_proto.py Adds async tests mirroring sync coverage for message-index bounds validation.
src/confluent_kafka/schema_registry/_sync/protobuf.py Validates top-level and nested message index bounds in _get_message_desc_proto and raises SerializationError when invalid.
src/confluent_kafka/schema_registry/_async/protobuf.py Same bounds validation and SerializationError behavior for the async implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:schema-registry Any schema registry related isues rather than kafka isolated ones

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants