Build an object detector

An object detector takes an image and returns a list of bounding boxes — each one a class label, a confidence score, and the rectangle in pixel coordinates.

Pipeml's detection mixins handle preprocessing (resize, pad, normalize), Triton invocation, and postprocessing (NMS, rescaling), so the component is mostly composition. The same skeleton fits any object-detection model.

1. The Params struct

struct Params : public detect::params::Format,
                public detect::params::InputLayer,
                public detect::params::Normal,
                public detect::params::Pad,
                public detect::params::Resize,
                public detect::params::ObjectDetection,
                public detect::params::DetectOutput,
                public detect::params::Rect,
                public triton::Params {
private:
  Params(const std::map<std::string, Object<Any>>& config)
      : detect::params::Format(config),
        detect::params::InputLayer(config),
        detect::params::Normal(config),
        detect::params::Pad(config),
        detect::params::Resize(config),
        detect::params::ObjectDetection(config),
        detect::params::DetectOutput(config),
        detect::params::Rect(config),
        triton::Params(config) {
    top_detections = read_config(config, "top_detections").as<UInt64>();
  }

public:
  size_t top_detections;

  Params() : Params(parse_config()) {}

  std::vector<std::unique_ptr<infer::Tensor>> preprocess(const cv::Mat& img) const {
    std::vector<std::unique_ptr<infer::Tensor>> answer;
    answer.emplace_back(detect::params::InputLayer::preprocess(
        detect::params::Normal::preprocess(
            detect::params::Format::preprocess(detect::params::Pad::preprocess(
                detect::params::Resize::preprocess(img))))));
    return answer;
  }
};

Each parent contributes a slice of config_schema and a preprocess() step. The composed preprocess runs them in order: Resize → Pad → Format (color conversion) → Normal (mean/std) → InputLayer (channel order, batch dim).

You can swap mixins out: skip Pad for variable-size models, swap Format for a custom color path, etc. The composition order in preprocess() is just function chaining — adjust it to match your model's preprocessing.

The custom field top_detections is read directly from config (no mixin needed).

2. Postprocess

std::vector<ocv::BoundingBox>
postprocess(const std::vector<std::unique_ptr<infer::Tensor>>& outputs,
            const cv::Mat& img, const Params& params) {
  float x_scale, y_scale, scale_factor;
  std::tie(x_scale, y_scale, scale_factor) =
      params.get_scale_factors(cv::Size(img.cols, img.rows));

  uint64_t num_detections =
      std::min(params.top_detections, params.detections_size(outputs));

  std::vector<ocv::BoundingBox> output_values;
  for (int i = 0; i < num_detections; ++i) {
    auto detected_class = params.get_detected_class(outputs, i);
    if (detected_class.confidence() <= 0.0 || detected_class.confidence() > 1.0)
      continue;

    output_values.push_back(ocv::BoundingBox(
        std::move(detected_class),
        params.extract_rectangle(outputs[params.output_detections_order], i,
                                 cv::Size(img.cols, img.rows),
                                 x_scale, y_scale, scale_factor)));
  }
  return output_values;
}

Three things are happening:

  • params::Rect::get_scale_factors() computes the inverse of the preprocess scaling so we can map model output back to image coordinates.
  • params::DetectOutput knows which output tensor holds boxes vs. classes vs. confidences. get_detected_class(outputs, i) reads detection i from the right tensor.
  • params::DetectOutput::extract_rectangle(...) reads the bounding-box coordinates and applies the scale factors.

The confidence guard <= 0.0 || > 1.0 filters padding entries that some detectors emit.

3. Run loop

PIPELOGIC_MAIN() {
  const Params params;

  Type in_type = dynamic_type_v<ocv::types::Image>;
  Type out_type = dynamic_type_v<List<ocv::types::BoundingBox>>;

  auto model = std::make_shared<triton::InferenceModel>(params);

  auto pre = [&params](const cv::Mat& img) {
    return params.preprocess(img);
  };

  auto post = [&params](const auto& outputs, const cv::Mat& img) {
    return postprocess(outputs, img, params);
  };

  auto detector = std::make_shared<detect::ObjectDetector>(
      model,
      std::vector<std::string>{model->inputs().at(0).name},
      params.output_layer,
      pre, post, params);

  run([detector](Message input_image) -> Message {
    ocv::Image img = std::move(input_image);
    Object<List<ocv::types::BoundingBox>> answer;

    if (!img.empty()) {
      auto bbs = detector->find_objects(img.mat());
      for (auto&& it : bbs) {
        if (!cv::Rect2d{it.rectangle()}.empty()) {
          answer.push_back(Object<ocv::types::BoundingBox>{std::move(it)});
        }
      }
    }
    return answer;
  });
  return EXIT_SUCCESS;
}

Key moves:

  • The Triton input name is queried from the model itself (model->inputs().at(0).name) — no need to hardcode it in component.yml.
  • Output names come from params.output_layer (a params::DetectOutput member populated from the output_layer: [String] config field).
  • The runtime worker is a lambda that takes a Message (the typed pipelang input) and returns one (the typed output). ocv::Image{...} consumes the move-construction from Message; the Object<List<...>> accumulator is the output.
  • Empty image → empty list (defensive).
  • Empty result rectangles are filtered (some Triton models pad with zero boxes).

4. component.yml

The matching component.yml declares all the config fields the mixins read:

name: "Detect Objects (Triton)"
language: cpp
platform: linux/amd64
build_system: 2-ml
tags: ["latest", "default"]

worker:
  input_type: "Image"
  output_type: "[BoundingBox]"

  file_schema:
    model:
      file_type: "model"
      config_key: "model_name"
      component: "triton"
      is_optional: false

  config_schema:
    # params::Format
    color_model:
      type: "Image.RGB | Image.BGR | Image.GRAY"

    # params::Resize
    input_height:
      type: UInt64
    input_width:
      type: UInt64

    # params::Pad
    pad_value:
      type: Double
    pad_mode:
      type: String

    # params::Normal
    mean:
      type: "(Double, Double, Double)"
    std:
      type: "(Double, Double, Double)"

    # params::InputLayer
    int_input_type:
      type: Bool
    add_batch_layer:
      type: Bool
    change_channel_order:
      type: String

    # params::ObjectDetection
    object_confidence_threshold:
      type: Double
    iou_threshold:
      type: Double

    # params::DetectOutput
    output_layer:
      type: "[String]"

    # custom
    top_detections:
      type: UInt64
      default: 100

(Real component.yml has more parameters; this is the minimum each mixin needs.)

To adapt for your detector

  1. Pick which params::* mixins your preprocessing needs. Keep the inheritance order the same; reorder the calls in preprocess() if the model requires a different chain.
  2. Implement postprocess() to read your model's specific output layout. params::DetectOutput plus params::Rect cover the most common case (boxes/classes/scores in three tensors).
  3. Add custom config fields by reading them in the Params ctor with read_config(config, "...").
  4. Update component.yml to match the union of all mixin-required fields plus your custom ones.

What's next

Was this page helpful?