Skip to content

Embeddings npu3, embeddings_calculator_ov#3933

Merged
rasapala merged 31 commits intomainfrom
embeddings_npu3
Feb 6, 2026
Merged

Embeddings npu3, embeddings_calculator_ov#3933
rasapala merged 31 commits intomainfrom
embeddings_npu3

Conversation

@rasapala
Copy link
Collaborator

🛠 Summary

JIRA CVS-179110

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@rasapala rasapala changed the title [WIP] Embeddings npu3, embeddings_calculator_ov Embeddings npu3, embeddings_calculator_ov Jan 30, 2026
outputTensorName = inferRequest.get_compiled_model().outputs().begin()->get_any_name();
SPDLOG_LOGGER_DEBUG(embeddings_calculator_logger, "Single embedding model output found with name {}", outputTensorName);
}
embeddingsTensors.push_back(inferRequest.get_tensor(outputTensorName.c_str()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tensor you push to embeddingsTensors is still tied to inferRequest and data underneath will be overriden if you perform another inference on it. and you do, in next for loop iteration. please correct me if I'm wrong here

Did you run any stress tests, high load tests + accuracy tests with different concurrency/queue size? This would showcase the issue which I point out

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@dkalinowski dkalinowski added this to the 2026.0 milestone Feb 3, 2026

namespace mediapipe {

void printTensor(const ov::Tensor& tensor) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be probbably removed?

SPDLOG_LOGGER_DEBUG(embeddings_calculator_logger, "Input size {} exceeds max_context_length {}", input_ids_size, max_context_length);
return absl::InvalidArgumentError("Input length " + std::to_string(input_ids_size) + " longer than allowed " + std::to_string(max_context_length));
size_t inputIdsSize = tokens.input_ids.get_shape()[1];
if (inputIdsSize > maxContextLength) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you could create method for this check and use it also in line 256?

if (elementType == ov::element::f32) {
const float* data = static_cast<const float*>(dataPtr);
std::cout << "Tensor data (f32): ";
for (size_t i = 0; i < tensor.get_size(); ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think its good idea to flood the console, keep it 20 but dont introduce segfaults

@rasapala rasapala merged commit 7cbab8d into main Feb 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants