triton and infer

The Triton client and the underlying infer abstraction. Together they let a C++ component talk to a Triton server (or any other backend implementing infer::Model) without the worker code knowing the protocol details.

triton::Params

The mixin worker components compose to read Triton config out of component.yml.

struct Params {
  std::string model_name;
  std::string triton_address = "triton";
  std::string triton_port = "8001";

  Params(const std::map<std::string, Object<Any>>& config);
};

Reads model_name from config_schema. The address and port default to the platform-managed Triton sidecar; override only for advanced setups.

You'll typically inherit from triton::Params alongside the detect::params::* mixins:

struct Params : public detect::params::Format,
                public detect::params::Resize,
                public detect::params::Pad,
                public detect::params::Normal,
                public detect::params::ObjectDetection,
                public detect::params::DetectOutput,
                public detect::params::Rect,
                public triton::Params {
  // ...
};

triton::TensorInfo

Metadata about a model's input or output tensor:

struct TensorInfo {
  std::string name;
  std::vector<int64_t> shape;
  infer::DataType type;
  TensorInfo(std::string name, std::vector<int64_t> shape, infer::DataType type);
};

triton::InferenceModel

The actual Triton client. Inherits from infer::Model so it plugs into any code that takes an infer::Model.

class InferenceModel : public infer::Model {
public:
  InferenceModel(const std::string& server_address,
                 const std::string& model_name,
                 const std::string& model_version = "");

  InferenceModel(const Params& params);   // convenience

  std::vector<TensorInfo> inputs() const;
  std::vector<TensorInfo> outputs() const;

  // infer() / move_infer() inherited from infer::Model
};

Construct it once at startup and pass a std::shared_ptr to your detector:

auto model = std::make_shared<triton::InferenceModel>(params);

detect::ObjectDetector detector(
    model,
    /* input_names  */ {"image"},
    /* output_names */ {"boxes", "labels", "scores"},
    [&params](const cv::Mat& img) { return params.preprocess(img); },
    [&params](const auto& outputs, const cv::Mat& img) { return postprocess(outputs, img, params); }
);

auto boxes = detector.find_objects(img);

The Triton client automatically adds a "batch" dimension when the model config allows it (i.e. max_batch_size > 0). For single-image inference you don't need to think about batching.

infer::Model (abstract)

The generic inference base class. Subclass it to plug ONNX Runtime, TorchScript, or anything else into the same detector machinery.

class Model {
public:
  template <typename TensorPtrContainer, typename StringContainer>
  std::vector<std::unique_ptr<Tensor>>
  infer(const TensorPtrContainer& inputs,
        const StringContainer& input_names,
        const StringContainer& output_names);

  template <typename TensorPtrContainer, typename StringContainer>
  std::vector<std::unique_ptr<Tensor>>
  move_infer(const TensorPtrContainer& inputs, ...);

  virtual std::unique_ptr<Tensor> preferred_empty(std::vector<int64_t> dims, DataType type);

  virtual ~Model() = default;

protected:
  virtual std::vector<std::unique_ptr<Tensor>>
  infer_impl(const std::vector<const Tensor*>& inputs,
             const std::vector<std::string>& input_names,
             const std::vector<std::string>& output_names) = 0;

  virtual std::vector<std::unique_ptr<Tensor>>
  move_infer_impl(...);
};

To implement a backend: subclass Model, override infer_impl, optionally override move_infer_impl for zero-copy paths, and optionally override preferred_empty to allocate tensors in the backend's preferred memory layout.

infer::DataType

enum class DataType {
  BOOL_T,
  UINT8_T, UINT32_T, UINT64_T,
  INT8_T,  INT32_T,  INT64_T,
  FP32_T,  FP64_T,
};

bool compatible_types(DataType a, DataType b);   // true for same, or BOOL_T <-> 8-bit ints
size_t type_size(DataType);                      // 1, 4, or 8 bytes
std::string datatype_to_string(DataType);        // "FP32", "INT64", ...
DataType string_to_datatype(const std::string&); // throws on unknown

infer::Tensor (abstract)

The base class for typed input/output tensors. The ocv::MatTensor you saw in API: cv is one concrete subclass.

class Tensor {
public:
  virtual int ndims() const = 0;
  virtual int64_t ldim(int dim) const = 0;
  virtual void* mutable_data() = 0;
  virtual const void* data() const = 0;
  virtual DataType type() const = 0;
  virtual size_t size() const = 0;          // total element count
};

Several concrete subclasses ship with pipeml:

ClassHeaderPurpose
ocv::MatTensorcv/tensor.hppOpenCV cv::Mat bridge
infer::PipeTensor<T>infer/tensor.hppBacked by a pipelang Tensor<T>
infer::ContainerTensor<T>infer/tensor.hppBacked by std::vector<T>
infer::TensorWrapperinfer/tensor.hppView over an arbitrary buffer
infer::ReshapeTensorinfer/tensor.hppView with reinterpreted shape
infer::StringTensorinfer/tensor.hppFor string outputs

Most worker code uses ocv::MatTensor for image-shaped tensors and lets pipeml or Triton pick the right subclass for outputs.

What's next

Was this page helpful?