Skip to content

Commit fe3db69

Browse files
ahmedsabielina128
andauthored
Add segmmentation output to pose detection README (#939)
Co-authored-by: Na Li <[email protected]>
1 parent 4dfc3ff commit fe3db69

File tree

1 file changed

+17
-2
lines changed

1 file changed

+17
-2
lines changed

pose-detection/README.md

+17-2
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ the list will be empty.
5555
For each pose, it contains a confidence score of the pose and an array of keypoints.
5656
PoseNet and MoveNet both return 17 keypoints. MediaPipe BlazePose returns 33 keypoints.
5757
Each keypoint contains x, y, score and name. In addition, MediaPipe BlazePose
58-
also returns an array of 3D keypoints.
58+
also returns an array of 3D keypoints and a segmentation mask.
5959

6060
Example output:
6161
```
@@ -70,7 +70,16 @@ Example output:
7070
keypoints3D: [
7171
{x: 0.65, y: 0.11, z: 0.05, score: 0.99, name: "nose"},
7272
...
73-
]
73+
],
74+
segmentation: {
75+
maskValueToLabel: (maskValue: number) => { return 'person' },
76+
mask: {
77+
toCanvasImageSource(): ...
78+
toImageData(): ...
79+
toTensor(): ...
80+
getUnderlyingType(): ...
81+
}
82+
}
7483
}
7584
]
7685
```
@@ -94,6 +103,12 @@ and therefore setting a proper confidence threshold may involve some experimenta
94103

95104
The name provides a label for each keypoint, such as 'nose', 'left_eye', 'right_knee', etc.
96105

106+
The `mask` key of `segmentation` stores an object which provides access to the underlying mask image using the conversion functions toCanvasImageSource, toImageData, and toTensor depending on the desired output type. Note that getUnderlyingType can be queried to determine what is the type being used underneath the hood to avoid expensive conversions (such as from tensor to image data).
107+
108+
The semantics of the RGBA values of the `mask` is as follows: the image mask is the same size as the input image, where green and blue channels are always set to 0. Different red values denote different segmentation labels (see maskValueToLabel key below, currently only foreground/background segmentation is performed). Different alpha values denote the probability of pixel being a foreground pixel (0 being lowest probability and 255 being highest).
109+
110+
`maskValueToLabel` key of `segmentation` maps the `mask` key's foreground pixel’s red value to the name of that pixel. Should throw error for unsupported input values. BlazePose will always return 'person' since it is a binary segmentation.
111+
97112
Refer to each model's documentation for specific configurations for the model
98113
and their performance.
99114

0 commit comments

Comments
 (0)