# Pose Estimation

Pose estimation refers to computer vision techniques that detect human figures in images and video, so that one could determine, for example, where someone’s elbow shows up in an image.

PoseNet can be used to estimate **either** a **single pose** or **multiple poses**, meaning there is a version of the algorithm that can detect only one person in an image/video and one version that can detect multiple persons in an image/video.

At a high level pose estimation happens in two phases:

1. An *input RGB image* is fed through a convolutional neural network.
2. Either a single-pose or multi-pose decoding algorithm is used to decode *poses, pose confidence scores*,

   *keypoint positions,* and *keypoint confidence scores* from the model outputs.
