feat: add blobHash property type support (Weaviate 1.37.0+)#336
feat: add blobHash property type support (Weaviate 1.37.0+)#336mpartipilo merged 3 commits intomainfrom
Conversation
Introduces DataType.BlobHash, BlobHashPropertyConverter, and PropertyBag.GetBlobHash() to support the new blobHash schema type that stores a SHA-256 hash of the blob instead of the blob data itself, reducing disk footprint for multimodal workloads. Closes #323 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
Summary - Weaviate C# Client CoverageSummary
CoverageWeaviate.Client - 49.2%
Weaviate.Client.Analyzers - 0%
Weaviate.Client.VectorData - 50.3%
|
|
I would potentially add at least one integration test just as a sanity check. I know it's not the job of the client to test server functionality but it could still come in handy to catch various issues. |
Verifies that DataType.BlobHash is correctly serialized in the collection creation request and deserialized back from the schema API. Full insert tests require a multimodal vectorizer (e.g. multi2vec-clip) which is not available in the standard CI environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@g-despot Done — added Full insert/retrieval tests would require a multimodal vectorizer (like |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
DataType.BlobHashenum value (wire:"blobHash") andProperty.BlobHashstatic factoryBlobHashPropertyConverter— string passthrough, no array support, not registered by C# type to avoid shadowingTextPropertyConverterPropertyBag.GetBlobHash(string name) -> string?for reading the stored SHA-256 hex hashBackground
blobHash(introduced in Weaviate 1.37.0) accepts a base64-encoded blob on write, but the server computes a SHA-256 hash server-side after vectorization and stores only the 64-char hex digest. This reduces disk footprint for multimodal workloads where you want to track blob identity without storing the full payload.Wire format: both
blobandblobHashshare theblob_valuefield in the gRPC proto — on write the client sends base64, on read the server returns the SHA-256 hex string.Closes #323
🤖 Generated with Claude Code