Build a pose estimator

A pose estimator takes an image plus the bounding boxes of detected people and returns a set of landmarks — keypoint locations like shoulders, hips, and knees — for each person.

Pipeml ships a mixin that handles preprocessing, model invocation, and decoding, so the component is mostly wiring: declare the inputs and output, plug in the mixin, and run.

1. Params

class Params : public detect::params::GeneralPoseEstimator,
               public triton::Params {
  Params(const std::map<std::string, Object<Any>>& config)
      : detect::params::GeneralPoseEstimator(config),
        triton::Params(config) {}

public:
  Params() : Params(parse_config()) {}
};

A single mixin handles all the pose-specific config (preprocess, postprocess, joint count, top-k, etc.).

2. Run loop

PIPELOGIC_MAIN() {
  const Params params;
  auto model = std::make_shared<triton::InferenceModel>(params);

  // GPE has 1 or 2 outputs depending on the model
  const auto& ou = model->outputs();
  std::vector<std::string> outputs{ou.at(0).name};
  if (ou.size() > 1) outputs.push_back(ou.at(1).name);

  detect::GeneralPoseEstimator detector(
      model, params, model->inputs().at(0).name, outputs);

  run([&detector](Message image, Message pbboxes) -> Message {
    ocv::Image img = std::move(image);

    auto bbox_list = pbboxes.as<List<ocv::types::BoundingBox>>();
    std::vector<ocv::BoundingBox> bboxes;
    for (int i = 0; i < bbox_list.size(); ++i) {
      bboxes.push_back(ocv::BoundingBox{bbox_list.data(i)});
    }

    auto landmarks_collection = detector.find_landmarks(img.mat(), bboxes);

    Object<List<List<ocv::types::Landmark>>> answer;
    for (const auto& landmarks : landmarks_collection) {
      Object<List<ocv::types::Landmark>> lds;
      for (auto&& ld : landmarks) {
        lds.push_back(Object<ocv::types::Landmark>{std::move(ld)});
      }
      answer.push_back(std::move(lds));
    }
    return answer;
  });

  return EXIT_SUCCESS;
}

Notes:

The worker takes two Message arguments. The pipelogic runtime supplies them in the order declared by component.yml's worker.input_types.
The bbox list is consumed via pbboxes.as<List<ocv::types::BoundingBox>>(), then unwrapped into a std::vector<ocv::BoundingBox> for the detector API.
The output type is [[Landmark]] — an outer list of pose instances, each an inner list of joints in the order defined by the pose family (e.g., COCO 17-point order for Landmarks2d::Human17).

3. component.yml

name: "Detect Landmarks"
language: cpp
platform: linux/amd64
build_system: 2-ml
tags: ["latest", "default"]

worker:
  input_types:
    - "Image"
    - "[BoundingBox]"
  output_type: "[[Landmark]]"

  file_schema:
    model:
      file_type: "model"
      config_key: "model_name"
      component: "triton"
      is_optional: false

  config_schema:
    # detect::params::GeneralPoseEstimator
    color_model:
      type: "Image.RGB | Image.BGR | Image.GRAY"
    input_height:
      type: UInt64
    input_width:
      type: UInt64
    mean:
      type: "(Double, Double, Double)"
    std:
      type: "(Double, Double, Double)"
    num_joints:
      type: UInt64
    pose_family:
      type: String   # e.g. "Human17", "Human22"

Pose families

The Landmarks2d::* named types fix the joint count and ordering:

Type	Joints	Source
`Landmarks2d::Human17`	17	COCO Body
`Landmarks2d::Human22`	22	COCO + MPII
`Landmarks2d::Human26`	26	Halpe
`Landmarks2d::Animal17`	17	AP-10K
`Landmarks2d::Hand21`	21	COCO Wholebody Hand

The label-*-landmarks components in the workers catalog convert raw [Landmark] lists into the typed Landmarks2d::* shape that downstream pose-aware components consume (detect-bodypart, filter-pose-bbox, etc.).

Build a pose estimator

1. Params

2. Run loop

3. component.yml

Pose families

What's next