Skip to content

Commit 9ae0d82

Browse files
Update ndim-reductions.rst
1 parent 7e8ea93 commit 9ae0d82

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

Diff for: posts/ndim-reductions.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ NumPy is widely recognized for its ability to perform efficient computations and
1212

1313
`Python-Blosc2 <https://www.blosc.org/python-blosc2>`_ leverages the power of NumPy to perform reductions on compressed multidimensional arrays. But, by compressing data with Blosc2, it is possible to reduce the memory and storage space required to store large datasets, while maintaining fast reduction times. This is especially beneficial for systems with memory constraints, as it allows for faster data access and operation.
1414

15-
In this blog, we will explore how Python-Blosc2 can perform data reductions in in-memory `NDArray <https://www.blosc.org/python-blosc2/reference/ndarray.html>`_ objects (or any other object fulfilling the `LazyArray interface <https://www.blosc.org/python-blosc2/reference/lazyarray.html>`_) and how the speed of these operations can be optimized by using different chunk shapes, compression levels and codecs. We will then compare the performance of Python-Blosc2 with NumPy.
15+
In this blog, we will explore how Python-Blosc2 can perform data reductions with in-memory `NDArray <https://www.blosc.org/python-blosc2/reference/ndarray.html>`_ objects (or any other object fulfilling the `LazyArray interface <https://www.blosc.org/python-blosc2/reference/lazyarray.html>`_) and how the speed of these operations can be optimized by using different chunk shapes, compression levels and codecs. We will then compare the performance of Python-Blosc2 with NumPy.
1616

1717
**Note**: The code snippets shown in this blog are part of a `Jupyter notebook <https://github.com/Blosc/python-blosc2/blob/main/doc/getting_started/tutorials/04.reductions.ipynb>`_ that you can run on your own machine. For that, you will need to install a recent version of Python-Blosc2: `pip install 'blosc2>=3.0.0b3'`; feel free to experiment with different parameters and share your results with us!
1818

@@ -85,7 +85,7 @@ Let's plot the results for the X, Y, and Z axes, comparing the performance of Py
8585
.. image:: /images/ndim-reductions/plot_automatic_chunking.png
8686
:width: 50%
8787

88-
We can see that reduction along the X axis is much slower than those along the Y and Z axis for the Blosc2 case. This is because the automatically computed chunk shape is (1, 1000, 1000) making the overhead of partial sums larger. In addition, we see that, when reducing in all axes, as well as in Y and Z axes, Blosc2+LZ4+SHUFFLE actually achieves far better performance than NumPy. Finally, when not using compression inside Blosc2, we never see an advantage. See later for a discussion on these results.
88+
We can see that reduction along the X axis is much slower than those along the Y and Z axis for the Blosc2 case. This is because the automatically computed chunk shape is (1, 1000, 1000) making the overhead of partial sums larger. In addition, we see that, with the exception of the X axis, Blosc2+LZ4+SHUFFLE actually achieves far better performance than NumPy. Finally, when not using compression inside Blosc2, we never see an advantage. See later for a discussion on these results.
8989

9090
Manual chunking
9191
~~~~~~~~~~~~~~~
@@ -149,7 +149,7 @@ Effect of using different codecs in Python-Blosc2
149149

150150
Compression and decompression consume CPU and memory resources. Differentiating between various codecs and configurations allows for evaluating how each option impacts the use of these resources, helping to choose the most efficient option for the operating environment. Finding the right balance between compression ratio and speed is crucial for optimizing performance.
151151

152-
In the plots above, we can see how using the LZ4 codec is striking such a balance, as it achieves the best performance in general, even above a non-compressed scenario. This is because LZ4 is tuned towards speed, and the time to compress and decompress the data is very low. On the other hand, ZSTD is a codec that is optimized for compression ratio (although not shown, in this case it typically compresses between 2x and x more than LZ4), and hence it is a bit slower. However, it is still faster than the non-compressed case, as compression requires reduced memory transmission, and this compensates for the additional CPU time required for compression and decompression.
152+
In the plots above, we can see how using the LZ4 codec is striking such a balance, as it achieves the best performance in general, even above a non-compressed scenario. This is because LZ4 is tuned towards speed, and the time to compress and decompress the data is very low. On the other hand, ZSTD is a codec that is optimized for compression ratio (although not shown, in this case it typically compresses between 2x and 3x more than LZ4), and hence it is a bit slower. However, it is still faster than the non-compressed case, as compression requires reduced memory transmission, and this compensates for the additional CPU time required for compression and decompression.
153153

154154
We have just scraped the surface for some of the compression parameters that can be tuned in Blosc2. You can use the `cparams` dict with the different parameters in `blosc2.compress2() <https://www.blosc.org/python-blosc2/reference/autofiles/top_level/blosc2.compress2.html#blosc2>`_ to set the compression level, `codec <https://www.blosc.org/python-blosc2/reference/autofiles/top_level/blosc2.Codec.html>`_ , `filters <https://www.blosc.org/python-blosc2/reference/autofiles/top_level/blosc2.Filter.html>`_ and other parameters.
155155

0 commit comments

Comments
 (0)