Build a segmentation worker
A segmentation worker takes an image and returns one mask per detection, each tagged with a class label and paired with its bounding box.
The skeleton is the same as a detector, with an additional Mask mixin and a richer postprocess that decodes per-pixel labels into masks.
1. Params
struct Params : public detect::params::Format,
public detect::params::InputLayer,
public detect::params::Normal,
public detect::params::Pad,
public detect::params::Resize,
public detect::params::SegmentOutput,
public detect::params::Rect,
public detect::params::Mask,
public triton::Params {
public:
std::vector<bool> classes; // class allow-list
float confidence_threshold;
float nms_threshold;
// ...
};
The new mixins:
params::SegmentOutput— likeDetectOutputbut for segmentation models. Knows where to read mask tensors, classification tensors, and (optionally) bounding-box tensors.params::Mask— knows the mask resolution (mask_height,mask_width,num_classes,logitsflag) and how to scale a small mask back up to the original image.
A typed classes vector is built from the classes: [UInt64] config value — any class index not in the list is filtered out.
2. Postprocess (two branches)
Semantic-segmentor branch
When the model emits per-pixel logits [batch, num_classes, H, W]:
if (params.logits) {
std::vector<cv::Mat> masks(params.num_classes);
for (int i = 0; i < params.mask_height; i++) {
for (int j = 0; j < params.mask_width; j++) {
// argmax across classes for each pixel
uint64_t label = std::max_element(...) - begin;
if (label >= params.num_classes || !params.classes[label]) continue;
if (masks[label].empty()) {
masks[label] = cv::Mat(params.mask_height, params.mask_width, CV_32F, 0);
}
masks[label].at<float>(i, j) = 1;
}
}
// emit one Segmentation per non-empty class mask, with a default 0.5 confidence
}
Instance-segmentor branch
When the model emits per-instance masks plus boxes:
float x_scale, y_scale, scale_factor;
std::tie(x_scale, y_scale, scale_factor) =
params.get_scale_factors(...);
for (int i = 0; i < num_detections; ++i) {
auto detected_class = params.get_detected_class(outputs, i);
if (low confidence || class not allowed) continue;
auto rect = params.extract_rectangle(...); // optional bbox
cv::Mat mask(...);
if (params.bbox_scale) {
output.emplace_back(ocv::Segmentation{detected_class,
params.scale_bbox(img_size, rect, mask)},
rect);
} else {
output.emplace_back(ocv::Segmentation{detected_class,
params.scale(img_size, mask)},
rect);
}
}
return detect::non_maximum_suppression(std::move(output), params.nms_threshold);
params.scale(...) and params.scale_bbox(...) come from params::Mask. detect::non_maximum_suppression is the segmentation-aware NMS from detect/segment.hpp.
3. Output type
using DetectionType = Tuple<ocv::types::Segmentation, ocv::types::BoundingBox>;
using OutputType = List<DetectionType>;
A pair per detection: the segmentation mask (with class label), and the bounding box. Downstream visualizers consume both.
4. Run loop
PIPELOGIC_MAIN() {
const Params params;
auto model = std::make_shared<triton::InferenceModel>(params);
auto pre = [¶ms](const cv::Mat& img) { return params.preprocess(img); };
auto post = [¶ms](const auto& outputs, const cv::Mat& img) {
return postprocess(outputs, img, params);
};
auto segmentor = std::make_shared<detect::PanopticSegmentor>(
model,
std::vector<std::string>{model->inputs().at(0).name},
params.output_layer,
pre, post);
run([&segmentor, ¶ms](Message input_image) -> Message {
ocv::Image img = std::move(input_image);
Object<OutputType> answer;
if (!img.mat().empty()) {
auto segs = segmentor->segment(img.mat());
for (auto&& it : segs) {
ocv::BoundingBox bbox{it.first.detected_class(), it.second};
answer.push_back(Object<DetectionType>{
Object<ocv::types::Segmentation>{std::move(it.first)},
Object<ocv::types::BoundingBox>{std::move(bbox)}});
}
}
return answer;
});
return EXIT_SUCCESS;
}
Pattern is the same as detect-objects — just with PanopticSegmentor and a tuple output.
5. component.yml
Same shape as detect-objects plus the segmentation-specific fields:
worker:
input_type: "Image"
output_type: "[(Segmentation, BoundingBox)]"
config_schema:
# ... format, resize, pad, normal, input_layer, rect (same as detect-objects)
num_classes:
type: UInt64
mask_height:
type: UInt64
mask_width:
type: UInt64
logits:
type: Bool
default: false
bbox_scale:
type: Bool
default: false
classes:
type: "[UInt64]"
default: []
confidence_threshold:
type: Double
default: 0.5
nms_threshold:
type: Double
default: 0.5
output_layer:
type: "[String]"