Build a segmentation worker

A segmentation worker takes an image and returns one mask per detection, each tagged with a class label and paired with its bounding box.

The skeleton is the same as a detector, with an additional Mask mixin and a richer postprocess that decodes per-pixel labels into masks.

1. Params

struct Params : public detect::params::Format,
                public detect::params::InputLayer,
                public detect::params::Normal,
                public detect::params::Pad,
                public detect::params::Resize,
                public detect::params::SegmentOutput,
                public detect::params::Rect,
                public detect::params::Mask,
                public triton::Params {
public:
  std::vector<bool> classes;        // class allow-list
  float confidence_threshold;
  float nms_threshold;
  // ...
};

The new mixins:

  • params::SegmentOutput — like DetectOutput but for segmentation models. Knows where to read mask tensors, classification tensors, and (optionally) bounding-box tensors.
  • params::Mask — knows the mask resolution (mask_height, mask_width, num_classes, logits flag) and how to scale a small mask back up to the original image.

A typed classes vector is built from the classes: [UInt64] config value — any class index not in the list is filtered out.

2. Postprocess (two branches)

Semantic-segmentor branch

When the model emits per-pixel logits [batch, num_classes, H, W]:

if (params.logits) {
  std::vector<cv::Mat> masks(params.num_classes);
  for (int i = 0; i < params.mask_height; i++) {
    for (int j = 0; j < params.mask_width; j++) {
      // argmax across classes for each pixel
      uint64_t label = std::max_element(...) - begin;
      if (label >= params.num_classes || !params.classes[label]) continue;
      if (masks[label].empty()) {
        masks[label] = cv::Mat(params.mask_height, params.mask_width, CV_32F, 0);
      }
      masks[label].at<float>(i, j) = 1;
    }
  }
  // emit one Segmentation per non-empty class mask, with a default 0.5 confidence
}

Instance-segmentor branch

When the model emits per-instance masks plus boxes:

float x_scale, y_scale, scale_factor;
std::tie(x_scale, y_scale, scale_factor) =
    params.get_scale_factors(...);

for (int i = 0; i < num_detections; ++i) {
  auto detected_class = params.get_detected_class(outputs, i);
  if (low confidence || class not allowed) continue;

  auto rect = params.extract_rectangle(...);   // optional bbox

  cv::Mat mask(...);
  if (params.bbox_scale) {
    output.emplace_back(ocv::Segmentation{detected_class,
                                          params.scale_bbox(img_size, rect, mask)},
                        rect);
  } else {
    output.emplace_back(ocv::Segmentation{detected_class,
                                          params.scale(img_size, mask)},
                        rect);
  }
}

return detect::non_maximum_suppression(std::move(output), params.nms_threshold);

params.scale(...) and params.scale_bbox(...) come from params::Mask. detect::non_maximum_suppression is the segmentation-aware NMS from detect/segment.hpp.

3. Output type

using DetectionType = Tuple<ocv::types::Segmentation, ocv::types::BoundingBox>;
using OutputType    = List<DetectionType>;

A pair per detection: the segmentation mask (with class label), and the bounding box. Downstream visualizers consume both.

4. Run loop

PIPELOGIC_MAIN() {
  const Params params;
  auto model = std::make_shared<triton::InferenceModel>(params);

  auto pre = [&params](const cv::Mat& img) { return params.preprocess(img); };
  auto post = [&params](const auto& outputs, const cv::Mat& img) {
    return postprocess(outputs, img, params);
  };

  auto segmentor = std::make_shared<detect::PanopticSegmentor>(
      model,
      std::vector<std::string>{model->inputs().at(0).name},
      params.output_layer,
      pre, post);

  run([&segmentor, &params](Message input_image) -> Message {
    ocv::Image img = std::move(input_image);
    Object<OutputType> answer;

    if (!img.mat().empty()) {
      auto segs = segmentor->segment(img.mat());
      for (auto&& it : segs) {
        ocv::BoundingBox bbox{it.first.detected_class(), it.second};
        answer.push_back(Object<DetectionType>{
            Object<ocv::types::Segmentation>{std::move(it.first)},
            Object<ocv::types::BoundingBox>{std::move(bbox)}});
      }
    }
    return answer;
  });
  return EXIT_SUCCESS;
}

Pattern is the same as detect-objects — just with PanopticSegmentor and a tuple output.

5. component.yml

Same shape as detect-objects plus the segmentation-specific fields:

worker:
  input_type: "Image"
  output_type: "[(Segmentation, BoundingBox)]"
  config_schema:
    # ... format, resize, pad, normal, input_layer, rect (same as detect-objects)
    num_classes:
      type: UInt64
    mask_height:
      type: UInt64
    mask_width:
      type: UInt64
    logits:
      type: Bool
      default: false
    bbox_scale:
      type: Bool
      default: false
    classes:
      type: "[UInt64]"
      default: []
    confidence_threshold:
      type: Double
      default: 0.5
    nms_threshold:
      type: Double
      default: 0.5
    output_layer:
      type: "[String]"

What's next

Was this page helpful?