Add segmmentation output to pose detection README (#939)

ahmedsabie · lina128 · web-flow · commit fe3db69df2c4 · 2022-01-31T14:47:48.000-05:00
Co-authored-by: Na Li &lt;linazhao@google.com&gt;
diff --git a/pose-detection/README.md b/pose-detection/README.md
@@ -55,7 +55,7 @@ the list will be empty.
 For each pose, it contains a confidence score of the pose and an array of keypoints.
 PoseNet and MoveNet both return 17 keypoints. MediaPipe BlazePose returns 33 keypoints.
 Each keypoint contains x, y, score and name. In addition, MediaPipe BlazePose
-also returns an array of 3D keypoints.
+also returns an array of 3D keypoints and a segmentation mask.
 
 Example output:
 ```
@@ -70,7 +70,16 @@ Example output:
     keypoints3D: [
       {x: 0.65, y: 0.11, z: 0.05, score: 0.99, name: "nose"},
       ...
-    ]
+    ],
+    segmentation: {
+      maskValueToLabel: (maskValue: number) => { return 'person' },
+      mask: {
+        toCanvasImageSource(): ...
+        toImageData(): ...
+        toTensor(): ...
+        getUnderlyingType(): ...
+      }
+    }
   }
 ]
 ```
@@ -94,6 +103,12 @@ and therefore setting a proper confidence threshold may involve some experimenta
 
 The name provides a label for each keypoint, such as 'nose', 'left_eye', 'right_knee', etc.
 
+The `mask` key of `segmentation` stores an object which provides access to the underlying mask image using the conversion functions toCanvasImageSource, toImageData, and toTensor depending on the desired output type. Note that getUnderlyingType can be queried to determine what is the type being used underneath the hood to avoid expensive conversions (such as from tensor to image data).
+
+The semantics of the RGBA values of the `mask` is as follows: the image mask is the same size as the input image, where green and blue channels are always set to 0. Different red values denote different segmentation labels (see maskValueToLabel key below, currently only foreground/background segmentation is performed). Different alpha values denote the probability of pixel being a foreground pixel (0 being lowest probability and 255 being highest).
+
+`maskValueToLabel` key of `segmentation` maps the `mask` key's foreground pixel’s red value to the name of that pixel. Should throw error for unsupported input values. BlazePose will always return 'person' since it is a binary segmentation.
+
 Refer to each model's documentation for specific configurations for the model
 and their performance.