Build an image classifier

An image classifier takes a single image and returns a per-class probability vector — [Double], one entry per class the model knows about.

The skeleton is the same as a detector, with a smaller mixin set (no NMS, no bounding-box decoding) and a different output type.

1. Params

Notice the smaller mixin set — classification doesn't need Pad/Resize separately or ObjectDetection/DetectOutput/Rect:

struct Params : public detect::params::InputLayer,
                public detect::params::PadResize,
                public detect::params::Format,
                public detect::params::Normal,
                public triton::Params {
private:
  Params(const std::map<std::string, Object<Any>>& config)
      : detect::params::InputLayer(config),
        detect::params::PadResize(config),
        detect::params::Format(config),
        detect::params::Normal(config),
        triton::Params(config) {
    logits = read_config(config, "logits").as<Bool>();
    num_classes = read_config(config, "num_classes").as<UInt64>();
  }

public:
  bool logits;
  uint64_t num_classes;

  Params() : Params(parse_config()) {}

  std::vector<std::unique_ptr<infer::Tensor>> preprocess(const cv::Mat& img) const {
    std::vector<std::unique_ptr<infer::Tensor>> answer;
    answer.emplace_back(detect::params::InputLayer::preprocess(
        detect::params::Normal::preprocess(detect::params::Format::preprocess(
            detect::params::PadResize::preprocess(img)))));
    return answer;
  }
};

params::PadResize is a convenience that combines Resize + Pad. Used here because classifiers always want a fixed input shape.

Custom fields:

logits — whether the model emits logits (we softmax) or probabilities (we copy).
num_classes — the size of the output vector.

2. Postprocess

std::vector<double>
postprocess(const std::vector<std::unique_ptr<infer::Tensor>>& outputs,
            const Params& params) {
  auto begin = &outputs[0]->at<float>(0, 0);
  if (params.logits) {
    return math::softmax(std::span<const float>(begin, params.num_classes));
  }
  return std::vector<double>(begin, begin + params.num_classes);
}

Reads the first (and only) output tensor's data buffer, optionally applies softmax. Returns a std::vector<double> — pipelogic auto-converts to [Double] on the wire.

3. Run loop

PIPELOGIC_MAIN() {
  const Params params;

  auto model = std::make_shared<triton::InferenceModel>(params);

  auto pre = [&params](const cv::Mat& img) { return params.preprocess(img); };
  auto post = [&params](const auto& outputs, const cv::Mat& img) {
    return postprocess(outputs, params);
  };

  auto classifier = std::make_shared<detect::ImageClassifier>(
      model,
      std::vector<std::string>{model->inputs().at(0).name},
      std::vector<std::string>{model->outputs().at(0).name},
      pre, post);

  run([&params, classifier](Message input_image) -> Message {
    ocv::Image img = Object<ocv::types::Image>{input_image.as<ocv::types::Image>()};
    std::vector<double> predictions = classifier->classify(img.mat());
    return Object<List<Double>>{std::move(predictions)};
  });

  return EXIT_SUCCESS;
}

detect::ImageClassifier takes the same model + input/output names + pre/post pair as ObjectDetector. The output type is a flat [Double] of length num_classes.

4. component.yml

name: "Classify Image (Triton)"
language: cpp
platform: linux/amd64
build_system: 2-ml
tags: ["latest", "default"]

worker:
  input_type: "Image"
  output_type: "[Double]"

  file_schema:
    model:
      file_type: "model"
      config_key: "model_name"
      component: "triton"
      is_optional: false

  config_schema:
    color_model:
      type: "Image.RGB | Image.BGR | Image.GRAY"
    input_height:
      type: UInt64
    input_width:
      type: UInt64
    pad_value:
      type: Double
    pad_mode:
      type: String
    mean:
      type: "(Double, Double, Double)"
    std:
      type: "(Double, Double, Double)"
    int_input_type:
      type: Bool
    add_batch_layer:
      type: Bool
    change_channel_order:
      type: String
    logits:
      type: Bool
      default: true
    num_classes:
      type: UInt64

Build an image classifier

1. Params

2. Postprocess

3. Run loop

4. component.yml

What's next