Computer vision initial services by javiermtorres · Pull Request #389 · mozilla-ai/encoderfile

javiermtorres · 2026-04-15T09:40:23Z

Preliminary implementation for computer vision tasks (object detection, image segmentation and image classification).

codecov-commenter · 2026-04-15T09:42:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

besaleli

Thoughts!!

besaleli · 2026-04-25T00:19:45Z

+}
+
+message ImageInput {
+  bytes image = 1;


nit: having 3 different ImageInput feels like a code smell. How would you feel about either/any combination of:

just throwing the images directly in ImageClassificationRequest and doing something like repeated bytes inputs = 1;. This would give us parity with repeated string inputs = 1; in text services

Putting it in a separate proto or reusing it

Condensing all of the protos into one file (would this be so horrible... this may make it easier if anyone is working with Kafka)

I'm now taking image classification as "gold standard" while I work at the impl, and I'll probably retrofit the other two categories from there. So I'll take these comments into consideration when I revisit the proto defs for them 👍

(It just felt easier to transcribe the hf pipelines in separate files at the start)

(separate proto + imports ftw, imho)

besaleli · 2026-04-25T00:20:42Z

+}
+
+message ObjectDetectionResponse {
+  repeated ImageBoundingBoxes boxes_batch = 1;


nit: would prefer this just be named boxes, but not a hill I need to die on

The thing is that for each image, we have boxes, but then for a bunch of input images we have groups of boxes.... so I ran out of imagination :-/ Any suggestions here?

renamed to "box" and "boxes", although the grammatical number agreement gets a bit weird :-/

besaleli · 2026-05-11T13:29:05Z


 macro_rules! run_cli {
-    ($model_type:ident, $cli:expr, $config:expr, $session:expr, $tokenizer:expr, $model_config:expr) => {{
+    ($model_type:ident, $cli:expr, $config:expr, $session:expr, $input_state:expr, $task_state:expr) => {{


merge in latest changes, we changed encoderfile-runtime/main.rs in the gpu update and model type dispatch is now happening elsewhere

besaleli · 2026-05-11T13:30:27Z

-    pub tokenizer: TokenizerService,
-    pub model_config: ModelConfig,
+    pub per_model_input_state: T::InputState,
+    pub per_task_state: T::TaskState,


naming here unclear. what does per_model and per_task mean? why can it just not be model_input_state and task_state?

I thought the "per_" would convey the meaning that this is an input state / task state (a config really) that depends on what input / task are used in your model. But I guess we can get rid of the per_ thing if it causes confusion.
At some point the state and config concepts need to be separated, but not today :)

besaleli · 2026-05-11T13:34:48Z

@@ -1,5 +1,9 @@
 macro_rules! model_type {
    [ $( $x:ident ),* $(,)? ] => {
+        pub trait ModelTypeSpec: Send + Sync + Clone + std::fmt::Debug + 'static {


why are we moving this inside the macro?

besaleli · 2026-05-11T13:35:27Z

 use std::{fs::File, io::BufReader};

 const EMBEDDING_DIR: &str = "../models/embedding";
+// CHECK sentence embedding????


embedding and sentence embedding can use the same model

besaleli · 2026-05-11T13:36:49Z

-                &[AssetKind::Transform]
-            }
-        }
+        impl AssetPolicySpec for crate::common::model_type::$model_type {}


if we are moving all of the logic out of the trait, what is the point in having the trait? we might as well just make it a function

besaleli · 2026-05-11T13:40:15Z

+
+    fn inference(&self, request: impl Into<Self::Input>) -> Result<Self::Output, ApiError> {
+        let request = request.into();
+        let rescale_factor = 0.00392156862745098 as f32;


we should make these preprocessing steps into lua bindings. with a Preprocess function that's extracted from the transform

Yep. I wanted to make it work first, but totally agreed. Let's see if we can get something closer to what one would expect in an hf pipeline.
BTW I'm not considering b/w (1 channel) images, for example, right now.

besaleli · 2026-05-11T13:41:11Z

+
+        let tensor = Tensor(data.into_dyn());
+
+        let result = func


I'll revisit this, I think I just copied whatever it was already there without much consideration 😂
I guess we can do the same for text logits in this case, but it will get more complicated for object detection and image segmentation. But shape checks will fix everything 😉

besaleli · 2026-05-11T13:41:59Z

@@ -0,0 +1,316 @@
+# Multipart OpenAPI Service Example


@angpt looping you in here. we should add this into the docs when we release new version

Initial image processing interfaces

9432f85

Align with current interfaces

91bd77a

besaleli reviewed Apr 25, 2026

View reviewed changes

javiermtorres added 6 commits May 1, 2026 15:41

WIP

3bfecee

Separate config per input type and task type

bb229ab

Reorganize traits

03b84b5

Fix inference for image classification

a24581d

Complete e2e process (but not yet correct)

bfdc5f4

Fixed image order (channel grouped to separate channels); still wip

b2a082f

javiermtorres requested a review from besaleli May 11, 2026 07:28

besaleli reviewed May 12, 2026

View reviewed changes

Conversation

javiermtorres commented Apr 15, 2026

Uh oh!

codecov-commenter commented Apr 15, 2026

Codecov Report

Uh oh!

besaleli left a comment

Choose a reason for hiding this comment

Uh oh!

besaleli Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

besaleli Apr 25, 2026 •

edited

Loading