Embeddings npu3, embeddings_calculator_ov by rasapala · Pull Request #3933 · openvinotoolkit/model_server

rasapala · 2026-01-28T16:21:37Z

🛠 Summary

JIRA CVS-179110

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

src/embeddings/embeddings_calculator_ov.cc

dkalinowski · 2026-02-02T12:03:12Z

src/embeddings/embeddings_calculator_ov.cc

+                        outputTensorName = inferRequest.get_compiled_model().outputs().begin()->get_any_name();
+                        SPDLOG_LOGGER_DEBUG(embeddings_calculator_logger, "Single embedding model output found with name {}", outputTensorName);
+                    }
+                    embeddingsTensors.push_back(inferRequest.get_tensor(outputTensorName.c_str()));


this tensor you push to embeddingsTensors is still tied to inferRequest and data underneath will be overriden if you perform another inference on it. and you do, in next for loop iteration. please correct me if I'm wrong here

Did you run any stress tests, high load tests + accuracy tests with different concurrency/queue size? This would showcase the issue which I point out

src/embeddings/embeddings_calculator_ov.cc

src/embeddings/embeddings_servable.cpp

Dockerfile.ubuntu

michalkulakowski · 2026-02-04T09:59:48Z

src/embeddings/embeddings_calculator_ov.cc


 namespace mediapipe {

+void printTensor(const ov::Tensor& tensor) {


this should be probbably removed?

michalkulakowski · 2026-02-04T10:07:56Z

src/embeddings/embeddings_calculator_ov.cc

-                    SPDLOG_LOGGER_DEBUG(embeddings_calculator_logger, "Input size {} exceeds max_context_length {}", input_ids_size, max_context_length);
-                    return absl::InvalidArgumentError("Input length " + std::to_string(input_ids_size) + " longer than allowed " + std::to_string(max_context_length));
+                size_t inputIdsSize = tokens.input_ids.get_shape()[1];
+                if (inputIdsSize > maxContextLength) {


maybe you could create method for this check and use it also in line 256?

demos/embeddings/compare_results.py

src/embeddings/embeddings_calculator_ov.cc

src/test/test_utils.cpp

src/sidepacket_servable.cpp

dkalinowski · 2026-02-06T13:06:28Z

src/test/test_utils.cpp

+    if (elementType == ov::element::f32) {
+        const float* data = static_cast<const float*>(dataPtr);
+        std::cout << "Tensor data (f32): ";
+        for (size_t i = 0; i < tensor.get_size(); ++i) {


I dont think its good idea to flood the console, keep it 20 but dont introduce segfaults

…model_server into embeddings_npu3

rasapala added 16 commits January 26, 2026 15:12

Npu init

084652b

Working hello

fe51117

Refactor

c373bab

Style

7fd4372

GPU,NPU makefile

be33d37

Add missing settings

56cbaf3

Fix if statement for dynamic npu

02e479f

Int

319eeed

Fix default

e9404f9

Working multi batch

8f90eee

Style

d5910bf

Cleanup

51e5a0a

Merge branch 'main' into embeddings_npu3

b58f695

Guard fix

b47632e

Fix tensor

54125df

Logs

179e6aa

rasapala requested review from dkalinowski and michalkulakowski January 30, 2026 15:02

rasapala changed the title ~~[WIP] Embeddings npu3, embeddings_calculator_ov~~ Embeddings npu3, embeddings_calculator_ov Jan 30, 2026