Build an image classifier
An image classifier takes a single image and returns a per-class probability vector — [Double], one entry per class the model knows about.
The skeleton is the same as a detector, with a smaller mixin set (no NMS, no bounding-box decoding) and a different output type.
1. Params
Notice the smaller mixin set — classification doesn't need Pad/Resize separately or ObjectDetection/DetectOutput/Rect:
struct Params : public detect::params::InputLayer,
public detect::params::PadResize,
public detect::params::Format,
public detect::params::Normal,
public triton::Params {
private:
Params(const std::map<std::string, Object<Any>>& config)
: detect::params::InputLayer(config),
detect::params::PadResize(config),
detect::params::Format(config),
detect::params::Normal(config),
triton::Params(config) {
logits = read_config(config, "logits").as<Bool>();
num_classes = read_config(config, "num_classes").as<UInt64>();
}
public:
bool logits;
uint64_t num_classes;
Params() : Params(parse_config()) {}
std::vector<std::unique_ptr<infer::Tensor>> preprocess(const cv::Mat& img) const {
std::vector<std::unique_ptr<infer::Tensor>> answer;
answer.emplace_back(detect::params::InputLayer::preprocess(
detect::params::Normal::preprocess(detect::params::Format::preprocess(
detect::params::PadResize::preprocess(img)))));
return answer;
}
};
params::PadResize is a convenience that combines Resize + Pad. Used here because classifiers always want a fixed input shape.
Custom fields:
logits— whether the model emits logits (we softmax) or probabilities (we copy).num_classes— the size of the output vector.
2. Postprocess
std::vector<double>
postprocess(const std::vector<std::unique_ptr<infer::Tensor>>& outputs,
const Params& params) {
auto begin = &outputs[0]->at<float>(0, 0);
if (params.logits) {
return math::softmax(std::span<const float>(begin, params.num_classes));
}
return std::vector<double>(begin, begin + params.num_classes);
}
Reads the first (and only) output tensor's data buffer, optionally applies softmax. Returns a std::vector<double> — pipelogic auto-converts to [Double] on the wire.
3. Run loop
PIPELOGIC_MAIN() {
const Params params;
auto model = std::make_shared<triton::InferenceModel>(params);
auto pre = [¶ms](const cv::Mat& img) { return params.preprocess(img); };
auto post = [¶ms](const auto& outputs, const cv::Mat& img) {
return postprocess(outputs, params);
};
auto classifier = std::make_shared<detect::ImageClassifier>(
model,
std::vector<std::string>{model->inputs().at(0).name},
std::vector<std::string>{model->outputs().at(0).name},
pre, post);
run([¶ms, classifier](Message input_image) -> Message {
ocv::Image img = Object<ocv::types::Image>{input_image.as<ocv::types::Image>()};
std::vector<double> predictions = classifier->classify(img.mat());
return Object<List<Double>>{std::move(predictions)};
});
return EXIT_SUCCESS;
}
detect::ImageClassifier takes the same model + input/output names + pre/post pair as ObjectDetector. The output type is a flat [Double] of length num_classes.
4. component.yml
name: "Classify Image (Triton)"
language: cpp
platform: linux/amd64
build_system: 2-ml
tags: ["latest", "default"]
worker:
input_type: "Image"
output_type: "[Double]"
file_schema:
model:
file_type: "model"
config_key: "model_name"
component: "triton"
is_optional: false
config_schema:
color_model:
type: "Image.RGB | Image.BGR | Image.GRAY"
input_height:
type: UInt64
input_width:
type: UInt64
pad_value:
type: Double
pad_mode:
type: String
mean:
type: "(Double, Double, Double)"
std:
type: "(Double, Double, Double)"
int_input_type:
type: Bool
add_batch_layer:
type: Bool
change_channel_order:
type: String
logits:
type: Bool
default: true
num_classes:
type: UInt64