triton and infer
The Triton client and the underlying infer abstraction. Together they let a C++ component talk to a Triton server (or any other backend implementing infer::Model) without the worker code knowing the protocol details.
triton::Params
The mixin worker components compose to read Triton config out of component.yml.
struct Params {
std::string model_name;
std::string triton_address = "triton";
std::string triton_port = "8001";
Params(const std::map<std::string, Object<Any>>& config);
};
Reads model_name from config_schema. The address and port default to the platform-managed Triton sidecar; override only for advanced setups.
You'll typically inherit from triton::Params alongside the detect::params::* mixins:
struct Params : public detect::params::Format,
public detect::params::Resize,
public detect::params::Pad,
public detect::params::Normal,
public detect::params::ObjectDetection,
public detect::params::DetectOutput,
public detect::params::Rect,
public triton::Params {
// ...
};
triton::TensorInfo
Metadata about a model's input or output tensor:
struct TensorInfo {
std::string name;
std::vector<int64_t> shape;
infer::DataType type;
TensorInfo(std::string name, std::vector<int64_t> shape, infer::DataType type);
};
triton::InferenceModel
The actual Triton client. Inherits from infer::Model so it plugs into any code that takes an infer::Model.
class InferenceModel : public infer::Model {
public:
InferenceModel(const std::string& server_address,
const std::string& model_name,
const std::string& model_version = "");
InferenceModel(const Params& params); // convenience
std::vector<TensorInfo> inputs() const;
std::vector<TensorInfo> outputs() const;
// infer() / move_infer() inherited from infer::Model
};
Construct it once at startup and pass a std::shared_ptr to your detector:
auto model = std::make_shared<triton::InferenceModel>(params);
detect::ObjectDetector detector(
model,
/* input_names */ {"image"},
/* output_names */ {"boxes", "labels", "scores"},
[¶ms](const cv::Mat& img) { return params.preprocess(img); },
[¶ms](const auto& outputs, const cv::Mat& img) { return postprocess(outputs, img, params); }
);
auto boxes = detector.find_objects(img);
The Triton client automatically adds a "batch" dimension when the model config allows it (i.e. max_batch_size > 0). For single-image inference you don't need to think about batching.
infer::Model (abstract)
The generic inference base class. Subclass it to plug ONNX Runtime, TorchScript, or anything else into the same detector machinery.
class Model {
public:
template <typename TensorPtrContainer, typename StringContainer>
std::vector<std::unique_ptr<Tensor>>
infer(const TensorPtrContainer& inputs,
const StringContainer& input_names,
const StringContainer& output_names);
template <typename TensorPtrContainer, typename StringContainer>
std::vector<std::unique_ptr<Tensor>>
move_infer(const TensorPtrContainer& inputs, ...);
virtual std::unique_ptr<Tensor> preferred_empty(std::vector<int64_t> dims, DataType type);
virtual ~Model() = default;
protected:
virtual std::vector<std::unique_ptr<Tensor>>
infer_impl(const std::vector<const Tensor*>& inputs,
const std::vector<std::string>& input_names,
const std::vector<std::string>& output_names) = 0;
virtual std::vector<std::unique_ptr<Tensor>>
move_infer_impl(...);
};
To implement a backend: subclass Model, override infer_impl, optionally override move_infer_impl for zero-copy paths, and optionally override preferred_empty to allocate tensors in the backend's preferred memory layout.
infer::DataType
enum class DataType {
BOOL_T,
UINT8_T, UINT32_T, UINT64_T,
INT8_T, INT32_T, INT64_T,
FP32_T, FP64_T,
};
bool compatible_types(DataType a, DataType b); // true for same, or BOOL_T <-> 8-bit ints
size_t type_size(DataType); // 1, 4, or 8 bytes
std::string datatype_to_string(DataType); // "FP32", "INT64", ...
DataType string_to_datatype(const std::string&); // throws on unknown
infer::Tensor (abstract)
The base class for typed input/output tensors. The ocv::MatTensor you saw in API: cv is one concrete subclass.
class Tensor {
public:
virtual int ndims() const = 0;
virtual int64_t ldim(int dim) const = 0;
virtual void* mutable_data() = 0;
virtual const void* data() const = 0;
virtual DataType type() const = 0;
virtual size_t size() const = 0; // total element count
};
Several concrete subclasses ship with pipeml:
| Class | Header | Purpose |
|---|---|---|
ocv::MatTensor | cv/tensor.hpp | OpenCV cv::Mat bridge |
infer::PipeTensor<T> | infer/tensor.hpp | Backed by a pipelang Tensor<T> |
infer::ContainerTensor<T> | infer/tensor.hpp | Backed by std::vector<T> |
infer::TensorWrapper | infer/tensor.hpp | View over an arbitrary buffer |
infer::ReshapeTensor | infer/tensor.hpp | View with reinterpreted shape |
infer::StringTensor | infer/tensor.hpp | For string outputs |
Most worker code uses ocv::MatTensor for image-shaped tensors and lets pipeml or Triton pick the right subclass for outputs.
What's next
- API: cv โ the
ocv::Imageandocv::MatTensoryou'll feed intoinfer(). - API: detect โ the
params::*mixins andObjectDetectorthat wraptriton::InferenceModel. - How to write a detection worker โ full walkthrough.