Skip to content

Commit c012530

Browse files
committed
Add blockwise broadcastable dimension. Add Q/DQ emulation and more complete algorithm.
1 parent 6572481 commit c012530

File tree

1 file changed

+113
-14
lines changed

1 file changed

+113
-14
lines changed

index.bs

Lines changed: 113 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3685,9 +3685,7 @@ partial dictionary MLOpSupportLimits {
36853685

36863686

36873687
### dequantizeLinear ### {#api-mlgraphbuilder-dequantizelinear}
3688-
Dequantizes an integer tensor to floating point tensor using the scale and zero-point bias, where `output = (input - zeroPoint) * scale`.
3689-
3690-
TODO: Elaborate on blockwise broadcasting - The operation will be [=broadcast=] according to [[!numpy-broadcasting-rule]]. The input tensors must be [=bidirectionally broadcastable=]. The [=MLOperand/rank=] of the output tensor is the maximum [=MLOperand/rank=] of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors, and each dimension must be blockwise compatible with the output (e.g. given an input shape *[12]*, scales of the following shapes are blockwise compatible {*[1]*, *[3]*, *[4]*, *[6]*, *[12]*} as they are all multiples of the input dimensions, but a shape of *[5]* would not be).
3688+
Dequantizes an integer tensor to floating point tensor using the scale and zero-point bias, where `output = (input - zeroPoint) * scale`. The *scale* and *zeroPoint* tensors can be smaller than the *input* tensor as they are [=blockwise broadcastable=].
36913689

36923690
<script type=idl>
36933691
partial interface MLGraphBuilder {
@@ -3712,7 +3710,7 @@ partial dictionary MLOpSupportLimits {
37123710
<div dfn-for="MLGraphBuilder/dequantizeLinear(input, scale, zeroPoint, options)" dfn-type=argument>
37133711
**Arguments:**
37143712
- <dfn>input</dfn>: an {{MLOperand}}. The input tensor.
3715-
- <dfn>scale</dfn>: an {{MLOperand}}. The scale tensor to multiply each input value by after adjusting by the zero point. It has the same [=MLOperand/rank=] as the input, and its [=MLOperand/shape=] must evenly divide into the input [=MLOperand/shape=].
3713+
- <dfn>scale</dfn>: an {{MLOperand}}. The scale tensor to multiply each input value by after adjusting by the zero point. It must be [=blockwise broadcastable=] with the input.
37163714
- <dfn>zeroPoint</dfn>: an {{MLOperand}}. The zero point tensor to subtract from each input value. It has the same [=MLOperand/shape=] as the scale.
37173715
- <dfn>options</dfn>: an {{MLOperatorOptions}}. Specifies the optional parameters of the operation.
37183716

@@ -3776,9 +3774,16 @@ partial dictionary MLOpSupportLimits {
37763774
1. If [=MLGraphBuilder/validating operand=] with [=this=] and any of |input|, |scale|, and |zeroPoint| returns false, then [=exception/throw=] a {{TypeError}}.
37773775
1. If |scale|'s [=MLOperand/rank=] or |zeroPoint|'s [=MLOperand/rank=] mismatches |input|'s [=MLOperand/rank=], then [=exception/throw=] a {{TypeError}}.
37783776
1. If |scale|'s [=MLOperand/shape=] mismatches |zeroPoint|'s [=MLOperand/shape=], then [=exception/throw=] a {{TypeError}}.
3779-
1. [=list/For each=] |axis| in [=the range=] 0 to |input|'s [=MLOperand/rank=], exclusive:
3780-
1. If |scale|'s [=MLOperand/shape=][|axis|] is not exactly divisible into |input|'s [=MLOperand/shape=][|axis|], then [=exception/throw=] a {{TypeError}}.
3781-
1. If |zeroPoint|'s [=MLOperand/shape=][|axis|] is not exactly divisible into |input|'s [=MLOperand/shape=][|axis|], then [=exception/throw=] a {{TypeError}}.
3777+
1. If [=blockwise broadcasting=] |scale|'s [=MLOperand/shape=] and |input|'s [=MLOperand/shape=] returns failure, then [=exception/throw=] a {{TypeError}}.
3778+
1. If [=blockwise broadcasting=] |zeroPoints|'s [=MLOperand/shape=] and |input|'s [=MLOperand/shape=] returns failure, then [=exception/throw=] a {{TypeError}}.
3779+
1. Let |outputDescriptor| be the result of [=creating an MLOperandDescriptor=] given |input|'s [=MLOperand/dataType=] and |input|'s [=MLOperand/shape=].
3780+
1. *Make graph connections:*
3781+
1. Let |output| be the result of [=creating an MLOperand=] given [=this=] and |outputDescriptor|.
3782+
1. Let |operator| be an [=operator=] for the "dequantizeLinear" operation, given |input|, |scale|, |zeroPoint|, and |options|.
3783+
1. Set |output|.{{MLOperand/[[operator]]}} to |operator|.
3784+
1. Set |operator|'s [=operator/input=] to |input|.
3785+
1. Set |operator|'s [=operator/output=] to |output|.
3786+
1. Return |output|.
37823787
</details>
37833788

37843789
<div class="note">
@@ -3787,16 +3792,65 @@ partial dictionary MLOpSupportLimits {
37873792
The behavior of this operation can be [EMULATED]
37883793
</summary>
37893794
<pre highlight="js">
3790-
TODO:
3795+
function dequantizeLinear(builder, input, scale, zeroPoint, options) {
3796+
// output = (input - zeroPoint) * scale
3797+
const floatInput = builder.cast(input, scale.dataType);
3798+
const floatZeroPoint = builder.cast(zeroPoint, scale.dataType);
3799+
const upsampledScale = blockwiseBroadcast(builder, scale, input.shape);
3800+
const upsampledZeroPoint = blockwiseBroadcast(builder, floatZeroPoint, input.shape);
3801+
return builder.mul(builder.sub(floatInput, upsampledZeroPoint), upsampledScale);
3802+
}
3803+
3804+
function blockwiseBroadcast(builder, input, targetShape) {
3805+
// This expands each axis by repeating the block the number of times per that axis, given the
3806+
// original input shape and target shape. However, backend implementations may have much more
3807+
// efficient upsampling operators that can accept multiple dimensions to upsample all
3808+
// dimensions at once by integer multiples (like tile) using nearest neighbor resampling:
3809+
// output = resample(scale, {sizes: input.shape})
3810+
3811+
let expandedInput = input;
3812+
3813+
for (let axis = 0; axis < input.shape.length; ++axis) {
3814+
const inputShape = expandedInput.shape;
3815+
const oldDimensionLength = inputShape[axis];
3816+
const newDimensionLength = targetShape[axis];
3817+
3818+
if (newDimensionLength != oldDimensionLength) {
3819+
// Since tile/expand can only accept repetitions of entire dimension slices (not repeating
3820+
// individual elements along an axis), temporarily reshape the tensor to enable them to broadcast
3821+
// the elements up to the full block size, utilizing an inserted dimension of size 1.
3822+
const elementRepeatCount = newDimensionLength / oldDimensionLength;
3823+
const flattenedShape = getFlattenedShapeAroundAxis(inputShape, axis);
3824+
const unexpandedShape = [flattenedShape[0], flattenedShape[1], 1, flattenedShape[2]];
3825+
const expandedShape = [flattenedShape[0], flattenedShape[1], elementRepeatCount, flattenedShape[2]];
3826+
const reshapedInput = builder.reshape(expandedInput, unexpandedShape);
3827+
expandedInput = builder.expand(reshapedInput, expandedShape);
3828+
}
3829+
3830+
let newInputShape = [...inputShape];
3831+
newInputShape[axis] = newDimensionLength;
3832+
expandedInput = builder.reshape(expandedInput, newInputShape);
3833+
}
3834+
3835+
return expandedInput;
3836+
}
3837+
3838+
// Compute the flattened shape before and after the given axis, yielding a 3-element list.
3839+
// e.g. inputShape = [2,3,4,5,6] with axis = 2 yields shape [6,4,30].
3840+
// e.g. inputShape = [4] with axis = 0 yields shape [1,4,1].
3841+
function getFlattenedShapeAroundAxis(inputShape, axis) {
3842+
axis = Math.max(Math.min(axis, input.shape.length - 1), 0);
3843+
const countBefore = axis.slice(0, axis).reduce((a, b) => a * b);
3844+
const countAfter = axis.slice(axis + 1, input.shape.length).reduce((a, b) => a * b);
3845+
return [countBefore, inputShape[axis], countAfter];
3846+
}
37913847
</pre>
37923848
</details>
37933849
</div>
37943850

37953851

37963852
### quantizeLinear ### {#api-mlgraphbuilder-quantizelinear}
3797-
Quantizes a floating point tensor to integer tensor using the scale and zero-point bias, where `output = clamp(roundToNearestEvens(input / scale) + zeroPoint, 0, 255)`.
3798-
3799-
TODO: Elaborate on blockwise broadcasting - The operation will be [=broadcast=] according to [[!numpy-broadcasting-rule]]. The input tensors must be [=bidirectionally broadcastable=]. The [=MLOperand/rank=] of the output tensor is the maximum [=MLOperand/rank=] of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors, and each dimension must be blockwise compatible with the output (e.g. given an input shape *[12]*, scales of the following shapes are blockwise compatible {*[1]*, *[3]*, *[4]*, *[6]*, *[12]*} as they are all multiples of the input dimensions, but a shape of *[5]* would not be).
3853+
Quantizes a floating point tensor to integer tensor using the scale and zero-point bias, where `output = clamp(roundToNearestEvens(input / scale) + zeroPoint, 0, 255)`. The *scale* and *zeroPoint* tensors can be smaller than the *input* tensor as they are [=blockwise broadcast=].
38003854

38013855
<script type=idl>
38023856
partial interface MLGraphBuilder {
@@ -3821,7 +3875,7 @@ partial dictionary MLOpSupportLimits {
38213875
<div dfn-for="MLGraphBuilder/quantizeLinear(input, scale, zeroPoint, options)" dfn-type=argument>
38223876
**Arguments:**
38233877
- <dfn>input</dfn>: an {{MLOperand}}. The condition tensor.
3824-
- <dfn>scale</dfn>: an {{MLOperand}}. The scale tensor to divide each input value by after adjusting by the zero point. It has the same [=MLOperand/rank=] as the input, and its [=MLOperand/shape=] must evenly divide into the input [=MLOperand/shape=].
3878+
- <dfn>scale</dfn>: an {{MLOperand}}. The scale tensor to divide each input value by after adjusting by the zero point. It must be [=blockwise broadcastable=] with the input.
38253879
- <dfn>zeroPoint</dfn>: an {{MLOperand}}. The zero point tensor to add to each rescaled input value. It has the same [=MLOperand/shape=] as the scale.
38263880
- <dfn>options</dfn>: an {{MLOperatorOptions}}. Specifies the optional parameters of the operation.
38273881

@@ -3884,16 +3938,37 @@ partial dictionary MLOpSupportLimits {
38843938
</summary>
38853939
1. If [=this=].{{MLGraphBuilder/[[hasBuilt]]}} is true, then [=exception/throw=] an "{{InvalidStateError}}" {{DOMException}}.
38863940
1. If [=MLGraphBuilder/validating operand=] with [=this=] and any of |input|, |scale|, and |zeroPoint| returns false, then [=exception/throw=] a {{TypeError}}.
3887-
TODO: Add validation for scale and zero point shape.
3941+
1. If |scale|'s [=MLOperand/rank=] or |zeroPoint|'s [=MLOperand/rank=] mismatches |input|'s [=MLOperand/rank=], then [=exception/throw=] a {{TypeError}}.
3942+
1. If |scale|'s [=MLOperand/shape=] mismatches |zeroPoint|'s [=MLOperand/shape=], then [=exception/throw=] a {{TypeError}}.
3943+
1. If [=blockwise broadcasting=] |scale|'s [=MLOperand/shape=] and |input|'s [=MLOperand/shape=] returns failure, then [=exception/throw=] a {{TypeError}}.
3944+
1. If [=blockwise broadcasting=] |zeroPoints|'s [=MLOperand/shape=] and |input|'s [=MLOperand/shape=] returns failure, then [=exception/throw=] a {{TypeError}}.
3945+
1. Let |outputDescriptor| be the result of [=creating an MLOperandDescriptor=] given |input|'s [=MLOperand/dataType=] and |input|'s [=MLOperand/shape=].
3946+
1. *Make graph connections:*
3947+
1. Let |output| be the result of [=creating an MLOperand=] given [=this=] and |outputDescriptor|.
3948+
1. Let |operator| be an [=operator=] for the "quantizeLinear" operation, given |input|, |scale|, |zeroPoint|, and |options|.
3949+
1. Set |output|.{{MLOperand/[[operator]]}} to |operator|.
3950+
1. Set |operator|'s [=operator/input=] to |input|.
3951+
1. Set |operator|'s [=operator/output=] to |output|.
3952+
1. Return |output|.
38883953
</details>
38893954

38903955
<div class="note">
38913956
<details open>
38923957
<summary>
38933958
The behavior of this operation can be [EMULATED]
38943959
</summary>
3960+
This emulation relies on a pending `roundEven` operator in <a href="https://github.com/webmachinelearning/webnn/issues/817">[Issue webnn#817]</a>.
38953961
<pre highlight="js">
3896-
TODO:
3962+
function dequantizeLinear(builder, input, scale, zeroPoint, options) {
3963+
// output = clamp(roundEven(input / scale) + zeroPoint, 0, 255)
3964+
const floatZeroPoint = builder.cast(zeroPoint, scale.dataType);
3965+
const upsampledScale = blockwiseBroadcast(builder, scale, input.shape);
3966+
const upsampledZeroPoint = blockwiseBroadcast(builder, floatZeroPoint, input.shape);
3967+
const quantizedInput = builder.roundEven(builder.div(input, upsampledScale);
3968+
const zeroPointAdjustedInput = builder.add(quantizedInput, upsampledZeroPoint);
3969+
const clampedInput = builder.clamp(zeroPointAdjustedInput, {'minValue': 0, 'maxValue': 255});
3970+
return builder.cast(clampedInput, zeroPoint.dataType);
3971+
}
38973972
</pre>
38983973
</details>
38993974
</div>
@@ -9246,6 +9321,8 @@ The shapes of the input tensors must be compatible. A tensor is [=unidirectional
92469321

92479322
Two tensors are [=bidirectionally broadcastable=] if they can be mutually "stretched" (repeated) across their various dimensions, starting from the last dimension. For example, a *[5,1]* tensor can be bidirectionally broadcast with a *[1,6]* tensor by repeating the first tensor 6 times in the last dimension and the second tensor 5 times in preceding dimension. The result of the operation will be a *[5,6]* tensor. Bidirectional broadcasting is convenient for element-wise operations.
92489323

9324+
A tensor is [=blockwise broadcastable=] if the all dimensions can be upsampled by integer multiples to the target tensor's shape. For example, a *[4,5]* tensor can be blockwise broadcast up to a *[16,10]* tensor as it is an exact multiple (16 % 4 = 0, 10 % 5 = 0) by repeating every element 4 times in the first dimension and every element 2 times in the last dimension (e.g. values *[1,2,3,4,5]* in a single slice would be repeated to *[1,1,2,2,3,3,4,4,5,5]*). However, a *[4,5]* tensor would be incompatible with a *[9,3]* tensor since both dimensions have a nonzero remainder (9 % 4 = 1, 3 % 5 = 3). Blockwise broadcasting is useful for sharing common values in larger blocks to save memory. Both tensors are expected to have the same rank, and the output shape is simply the target tensor's shape which the smaller one is being upsampled to.
9325+
92499326
Some operations allow broadcasting with special semantics. For example, {{MLGraphBuilder/matmul()}} treats the last two dimensions of the input tensors as the rows and columns of the matrices, and the number of columns in the first matrix must be equal to the number of rows in the second matrix. The matrix multiplication is bidirectionally broadcast across any additional dimensions, treating the input tensors as stacks of matrices to multiply.
92509327

92519328
<details open algorithm>
@@ -9298,6 +9375,28 @@ To <dfn data-lt="bidirectionally broadcasting">bidirectionally broadcast the sha
92989375
|shapeA| is <dfn>bidirectionally broadcastable</dfn> to |shapeB| if [=bidirectionally broadcasting=] |shapeA| and |shapeB| does not result in failure.
92999376
</p>
93009377

9378+
<details open algorithm>
9379+
<summary>
9380+
To <dfn data-lt="blockwise broadcasting">blockwise broadcast the shapes</dfn> |shapeFrom| and |shapeTo|, perform the following steps. |shapeFrom| and |shapeTo| are [=/lists=] of positive integers, representing the dimensions of tensors, and the steps return a new [=/list=] of positive integers, or failure.
9381+
</summary>
9382+
9383+
1. Let |sizeFrom| be |shapeFrom|'s [=list/size=].
9384+
1. Let |sizeTo| be |shapeTo|'s [=list/size=].
9385+
1. If |sizeFrom| != |sizeTo|, then return failure.
9386+
1. Let |outputShape| be a new [=/list=].
9387+
1. [=list/For each=] |index| in [=the range=] 0 to |sizeTo|, exclusive:
9388+
1. Let |dimFrom| be |shapeFrom|[|index|].
9389+
1. Let |dimTo| be |shapeTo|[|index|].
9390+
1. If |dimFrom| is not an exactly divisible into |dimTo|, then return failure.
9391+
1. [=list/Append=] |dimTo| to |outputShape|.
9392+
1. Return |outputShape|.
9393+
9394+
</details>
9395+
9396+
<p algorithm>
9397+
|shapeFrom| is <dfn>blockwise broadcastable</dfn> to |shapeTo| if [=blockwise broadcasting=] |shapeFrom| and |shapeTo| does not result in failure.
9398+
</p>
9399+
93019400
## Casting ## {#algorithms-casting}
93029401

93039402
Explicit numeric casting is used in algorithms where parameters passed as {{MLNumber}} or {{double}} need to be converted to match the {{MLOperandDataType}} of input or output {{MLOperand}}s.

0 commit comments

Comments
 (0)