Skip to content

Bit-Packing Codec for ADC Data Optimization #2942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions src/zarr/codecs/bitpack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
from __future__ import annotations

Check warning on line 1 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L1

Added line #L1 was not covered by tests

import asyncio
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any

Check warning on line 5 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L3-L5

Added lines #L3 - L5 were not covered by tests

import numpy as np

Check warning on line 7 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L7

Added line #L7 was not covered by tests

from zarr.abc.codec import BytesBytesCodec
from zarr.core.common import JSON, parse_named_configuration
from zarr.registry import register_codec

Check warning on line 11 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L9-L11

Added lines #L9 - L11 were not covered by tests

if TYPE_CHECKING:
from typing import Self

from zarr.core.array_spec import ArraySpec
from zarr.core.buffer import Buffer


@dataclass(frozen=True)
class BitPackingCodec(BytesBytesCodec):

Check warning on line 21 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L20-L21

Added lines #L20 - L21 were not covered by tests
"""
Codec for bit-packing integer data that doesn't use the full range of its data type.

This codec is particularly useful for ADC (Analog-to-Digital Converter) data that
typically returns values using fewer bits (e.g., 10 or 12 bits) than standard integer
types (16, 32, or 64 bits).
"""

# Number of bits to use for each value in the packed format.
bits_per_value: int

Check warning on line 31 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L31

Added line #L31 was not covered by tests

# Original data type (for unpacking)
original_dtype: np.dtype[Any]

Check warning on line 34 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L34

Added line #L34 was not covered by tests

def __init__(

Check warning on line 36 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L36

Added line #L36 was not covered by tests
self,
*,
bits_per_value: int,
original_dtype: str | np.dtype[Any],
) -> None:
if bits_per_value <= 0: # ignore
raise ValueError(f"bits_per_value must be a positive integer, got {bits_per_value}")

Check warning on line 43 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L42-L43

Added lines #L42 - L43 were not covered by tests

if isinstance(original_dtype, str):
original_dtype = np.dtype(original_dtype)

Check warning on line 46 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L45-L46

Added lines #L45 - L46 were not covered by tests

object.__setattr__(self, "bits_per_value", bits_per_value)
object.__setattr__(self, "original_dtype", original_dtype)

Check warning on line 49 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L48-L49

Added lines #L48 - L49 were not covered by tests

@classmethod
def from_dict(cls, data: dict[str, JSON]) -> Self:
_, configuration_parsed = parse_named_configuration(data, "bitpacking")
return cls(**configuration_parsed) # type: ignore[arg-type]

Check warning on line 54 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L51-L54

Added lines #L51 - L54 were not covered by tests

def to_dict(self) -> dict[str, JSON]:
return {

Check warning on line 57 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L56-L57

Added lines #L56 - L57 were not covered by tests
"name": "bitpacking",
"configuration": {
"bits_per_value": self.bits_per_value,
"original_dtype": str(self.original_dtype),
},
}

async def _encode_single(

Check warning on line 65 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L65

Added line #L65 was not covered by tests
self,
chunk_bytes: Buffer,
chunk_spec: ArraySpec,
) -> Buffer | None:
"""Pack the data using only the necessary bits per value."""
return await asyncio.to_thread(

Check warning on line 71 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L71

Added line #L71 was not covered by tests
self._bit_pack,
chunk_bytes,
chunk_spec,
)

async def _decode_single(

Check warning on line 77 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L77

Added line #L77 was not covered by tests
self,
chunk_bytes: Buffer,
chunk_spec: ArraySpec,
) -> Buffer:
"""Unpack the bit-packed data back to original format."""
return await asyncio.to_thread(

Check warning on line 83 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L83

Added line #L83 was not covered by tests
self._bit_unpack,
chunk_bytes,
chunk_spec,
)

def _bit_pack(self, chunk_bytes: Buffer, chunk_spec: ArraySpec) -> Buffer:

Check warning on line 89 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L89

Added line #L89 was not covered by tests
"""
Implement the bit-packing algorithm here.
Convert the input array to a bit-packed format.
"""
dtype = chunk_spec.dtype

Check warning on line 94 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L94

Added line #L94 was not covered by tests

arr = np.frombuffer(chunk_bytes.as_numpy_array(), dtype=dtype).reshape(chunk_spec.shape)

Check warning on line 96 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L96

Added line #L96 was not covered by tests

print(arr)
original_bytes = arr.nbytes
original_bits = original_bytes * 8

Check warning on line 100 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L98-L100

Added lines #L98 - L100 were not covered by tests

print("===== BIT PACKING STATISTICS =====")
print(f"Original array shape: {arr.shape}")
print(f"Original data type: {arr.dtype} ({arr.dtype.itemsize} bytes per value)")
print(f"Original data size: {original_bytes} bytes ({original_bits} bits)")

Check warning on line 105 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L102-L105

Added lines #L102 - L105 were not covered by tests

# Create a bit mask for the values
mask = np.uint16((1 << self.bits_per_value) - 1)

Check warning on line 108 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L108

Added line #L108 was not covered by tests

total_values = arr.size
output_size = (total_values * self.bits_per_value + 7) // 8

Check warning on line 111 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L110-L111

Added lines #L110 - L111 were not covered by tests

# Print bit packing settings
print(f"Bit-packing using {self.bits_per_value} bits per value")
print(f"Total values: {total_values}")
print(f"Theoretical packed size: {total_values * self.bits_per_value / 8:.2f} bytes")
print(f"Actual packed size: {output_size} bytes")
print(f"Storage savings: {(1 - output_size / original_bytes) * 100:.2f}%")

Check warning on line 118 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L114-L118

Added lines #L114 - L118 were not covered by tests

# Calculate output size
total_values = arr.size
output_size = (total_values * self.bits_per_value + 7) // 8

Check warning on line 122 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L121-L122

Added lines #L121 - L122 were not covered by tests

packed = np.zeros(output_size, dtype=np.uint8)

Check warning on line 124 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L124

Added line #L124 was not covered by tests

# Pack the values
for i in range(total_values):
value = arr.flat[i] & mask
bit_pos = (i * self.bits_per_value) % 8
byte_pos = (i * self.bits_per_value) // 8

Check warning on line 130 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L127-L130

Added lines #L127 - L130 were not covered by tests

# Handle values that cross byte boundaries
if bit_pos + self.bits_per_value <= 8:
packed[byte_pos] |= value << bit_pos

Check warning on line 134 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L133-L134

Added lines #L133 - L134 were not covered by tests
else:
# Value spans two bytes
bits_in_first = 8 - bit_pos

Check warning on line 137 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L137

Added line #L137 was not covered by tests

packed[byte_pos] |= (value & ((1 << bits_in_first) - 1)) << bit_pos
packed[byte_pos + 1] |= value >> bits_in_first

Check warning on line 140 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L139-L140

Added lines #L139 - L140 were not covered by tests

print("==============================")

Check warning on line 142 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L142

Added line #L142 was not covered by tests

return chunk_spec.prototype.buffer.from_bytes(packed.tobytes())

Check warning on line 144 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L144

Added line #L144 was not covered by tests

def _bit_unpack(self, chunk_bytes: Buffer, chunk_spec: ArraySpec) -> Buffer:

Check warning on line 146 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L146

Added line #L146 was not covered by tests
"""
Implement the bit-unpacking algorithm here.
Convert the bit-packed format back to the original array.
"""

# Print packed data information
packed_bytes = chunk_bytes.as_numpy_array()
print("===== BIT UNPACKING STATISTICS =====")
print(f"Packed data size: {len(packed_bytes)} bytes")

Check warning on line 155 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L153-L155

Added lines #L153 - L155 were not covered by tests

packed = np.frombuffer(chunk_bytes.as_numpy_array(), dtype=np.uint8)

Check warning on line 157 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L157

Added line #L157 was not covered by tests

# Calculate original array size
total_bits = packed.size * 8
total_values = total_bits // self.bits_per_value
expected_output_bytes = total_values * np.dtype(self.original_dtype).itemsize

Check warning on line 162 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L160-L162

Added lines #L160 - L162 were not covered by tests

print(f"Unpacking using {self.bits_per_value} bits per value")
print(f"Total packed bits: {total_bits}")
print(f"Calculated number of values: {total_values}")
print(f"Expected output size: {expected_output_bytes} bytes")

Check warning on line 167 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L164-L167

Added lines #L164 - L167 were not covered by tests

unpacked = np.zeros(total_values, dtype=self.original_dtype)

Check warning on line 169 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L169

Added line #L169 was not covered by tests

mask = (1 << self.bits_per_value) - 1

Check warning on line 171 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L171

Added line #L171 was not covered by tests

for i in range(total_values):
bit_pos = (i * self.bits_per_value) % 8
byte_pos = (i * self.bits_per_value) // 8

Check warning on line 175 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L173-L175

Added lines #L173 - L175 were not covered by tests

if bit_pos + self.bits_per_value <= 8:
value = (packed[byte_pos] >> bit_pos) & mask

Check warning on line 178 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L177-L178

Added lines #L177 - L178 were not covered by tests
else:
bits_in_first = 8 - bit_pos
bits_in_second = self.bits_per_value - bits_in_first

Check warning on line 181 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L180-L181

Added lines #L180 - L181 were not covered by tests

value_first = packed[byte_pos] >> bit_pos
value_second = packed[byte_pos + 1] & ((1 << bits_in_second) - 1)

Check warning on line 184 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L183-L184

Added lines #L183 - L184 were not covered by tests

value = value_first | (value_second << bits_in_first)

Check warning on line 186 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L186

Added line #L186 was not covered by tests

unpacked[i] = value

Check warning on line 188 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L188

Added line #L188 was not covered by tests

# Reshape to match original array shape
unpacked = unpacked.reshape(chunk_spec.shape)

Check warning on line 191 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L191

Added line #L191 was not covered by tests

print(f"First few unpacked values: {unpacked.flat[: min(10, unpacked.size)]}")
print(f"Actual unpacked size: {unpacked.nbytes} bytes")
print(f"Size expansion: {(unpacked.nbytes / len(packed_bytes)):.2f}x")
print("================================")

Check warning on line 196 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L193-L196

Added lines #L193 - L196 were not covered by tests

return chunk_spec.prototype.buffer.from_bytes(unpacked.tobytes())

Check warning on line 198 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L198

Added line #L198 was not covered by tests


register_codec("bitpack", BitPackingCodec)

Check warning on line 201 in src/zarr/codecs/bitpack.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/codecs/bitpack.py#L201

Added line #L201 was not covered by tests