Skip to content

[core] Introduce blob v2#7216

Draft
leaves12138 wants to merge 3 commits intoapache:masterfrom
leaves12138:blob_descriptor_adaptive_deserialization
Draft

[core] Introduce blob v2#7216
leaves12138 wants to merge 3 commits intoapache:masterfrom
leaves12138:blob_descriptor_adaptive_deserialization

Conversation

@leaves12138
Copy link
Contributor

@leaves12138 leaves12138 commented Feb 5, 2026

Purpose

Blob v2 contains the following new features:

  • Write adaptive. Paimon could find out the descriptor or the blob itself you wrote, user does not need to set 'blob-as-descriptor' to true when write blob descriptor to paimon
  • Descriptor storage. User could set blob.stored-descriptor-fields = 'xxx' to store blob descriptor in normal (parquet/orc/avro rather than blob)file. It just store the descriptor as bytes, does not copy anything from descriptor to paimon table.

Tests

API and Format

Documentation

@leaves12138 leaves12138 changed the title [flink] [spark] Write blob with adaptive match blob-descriptor [WIP] [flink] [spark] Write blob with adaptive match blob-descriptor Feb 5, 2026
@leaves12138 leaves12138 changed the title [WIP] [flink] [spark] Write blob with adaptive match blob-descriptor [flink] [spark] Write blob with adaptive match blob-descriptor Feb 5, 2026
ByteBuffer buffer = ByteBuffer.allocate(totalSize);
buffer.order(ByteOrder.LITTLE_ENDIAN);

buffer.putLong(MAGIC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in order to be compatible with older versions, we should design Magic Numbers after the version number, and Linux kernel images are designed in this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@leaves12138 leaves12138 force-pushed the blob_descriptor_adaptive_deserialization branch from 0084314 to 58ab357 Compare February 11, 2026 05:25
@JingsongLi
Copy link
Contributor

Due to compatibility issues with this PR, I suggest that we modify all relevant content in this PR, including:

  1. Blob-descriptor V2 and write blob with adaptive match blob-descriptor.
  2. Support blob-store-descriptor.
  3. Modify PyPaimon to support *1 and *2

@leaves12138 leaves12138 marked this pull request as draft February 14, 2026 01:59
@leaves12138 leaves12138 force-pushed the blob_descriptor_adaptive_deserialization branch from c28fc68 to 226f6e8 Compare February 14, 2026 03:01
@leaves12138 leaves12138 changed the title [flink] [spark] Write blob with adaptive match blob-descriptor [core] Introduce blob v2 Feb 14, 2026
@leaves12138
Copy link
Contributor Author

Due to compatibility issues with this PR, I suggest that we modify all relevant content in this PR, including:

  1. Blob-descriptor V2 and write blob with adaptive match blob-descriptor.
  2. Support blob-store-descriptor.
  3. Modify PyPaimon to support *1 and *2

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants