Computer vision initial services#389
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| message ImageInput { | ||
| bytes image = 1; |
There was a problem hiding this comment.
nit: having 3 different ImageInput feels like a code smell. How would you feel about either/any combination of:
- just throwing the images directly in
ImageClassificationRequestand doing something likerepeated bytes inputs = 1;. This would give us parity withrepeated string inputs = 1;in text services - Putting it in a separate proto or reusing it
- Condensing all of the protos into one file (would this be so horrible... this may make it easier if anyone is working with Kafka)
There was a problem hiding this comment.
I'm now taking image classification as "gold standard" while I work at the impl, and I'll probably retrofit the other two categories from there. So I'll take these comments into consideration when I revisit the proto defs for them 👍
There was a problem hiding this comment.
(It just felt easier to transcribe the hf pipelines in separate files at the start)
There was a problem hiding this comment.
(separate proto + imports ftw, imho)
| } | ||
|
|
||
| message ObjectDetectionResponse { | ||
| repeated ImageBoundingBoxes boxes_batch = 1; |
There was a problem hiding this comment.
nit: would prefer this just be named boxes, but not a hill I need to die on
There was a problem hiding this comment.
The thing is that for each image, we have boxes, but then for a bunch of input images we have groups of boxes.... so I ran out of imagination :-/ Any suggestions here?
There was a problem hiding this comment.
renamed to "box" and "boxes", although the grammatical number agreement gets a bit weird :-/
|
|
||
| macro_rules! run_cli { | ||
| ($model_type:ident, $cli:expr, $config:expr, $session:expr, $tokenizer:expr, $model_config:expr) => {{ | ||
| ($model_type:ident, $cli:expr, $config:expr, $session:expr, $input_state:expr, $task_state:expr) => {{ |
There was a problem hiding this comment.
merge in latest changes, we changed encoderfile-runtime/main.rs in the gpu update and model type dispatch is now happening elsewhere
| pub tokenizer: TokenizerService, | ||
| pub model_config: ModelConfig, | ||
| pub per_model_input_state: T::InputState, | ||
| pub per_task_state: T::TaskState, |
There was a problem hiding this comment.
naming here unclear. what does per_model and per_task mean? why can it just not be model_input_state and task_state?
There was a problem hiding this comment.
I thought the "per_" would convey the meaning that this is an input state / task state (a config really) that depends on what input / task are used in your model. But I guess we can get rid of the per_ thing if it causes confusion.
At some point the state and config concepts need to be separated, but not today :)
| @@ -1,5 +1,9 @@ | |||
| macro_rules! model_type { | |||
| [ $( $x:ident ),* $(,)? ] => { | |||
| pub trait ModelTypeSpec: Send + Sync + Clone + std::fmt::Debug + 'static { | |||
There was a problem hiding this comment.
why are we moving this inside the macro?
| use std::{fs::File, io::BufReader}; | ||
|
|
||
| const EMBEDDING_DIR: &str = "../models/embedding"; | ||
| // CHECK sentence embedding???? |
There was a problem hiding this comment.
embedding and sentence embedding can use the same model
| &[AssetKind::Transform] | ||
| } | ||
| } | ||
| impl AssetPolicySpec for crate::common::model_type::$model_type {} |
There was a problem hiding this comment.
if we are moving all of the logic out of the trait, what is the point in having the trait? we might as well just make it a function
|
|
||
| fn inference(&self, request: impl Into<Self::Input>) -> Result<Self::Output, ApiError> { | ||
| let request = request.into(); | ||
| let rescale_factor = 0.00392156862745098 as f32; |
There was a problem hiding this comment.
we should make these preprocessing steps into lua bindings. with a Preprocess function that's extracted from the transform
There was a problem hiding this comment.
Yep. I wanted to make it work first, but totally agreed. Let's see if we can get something closer to what one would expect in an hf pipeline.
BTW I'm not considering b/w (1 channel) images, for example, right now.
|
|
||
| let tensor = Tensor(data.into_dyn()); | ||
|
|
||
| let result = func |
There was a problem hiding this comment.
I'll revisit this, I think I just copied whatever it was already there without much consideration 😂
I guess we can do the same for text logits in this case, but it will get more complicated for object detection and image segmentation. But shape checks will fix everything 😉
| @@ -0,0 +1,316 @@ | |||
| # Multipart OpenAPI Service Example | |||
There was a problem hiding this comment.
@angpt looping you in here. we should add this into the docs when we release new version
Preliminary implementation for computer vision tasks (object detection, image segmentation and image classification).