|
| 1 | +--- |
| 2 | +title: 'Mpi4py-fft' |
| 3 | +tags: |
| 4 | + - Fast Fourier transforms |
| 5 | + - Fast Chebyshev transforms |
| 6 | + - MPI |
| 7 | + - Python |
| 8 | +authors: |
| 9 | + - name: Mikael Mortensen |
| 10 | + orcid: 0000-0002-3293-7573 |
| 11 | + affiliation: "1" |
| 12 | + - name: Lisandro Dalcin |
| 13 | + orcid: 0000-0001-8086-0155 |
| 14 | + affiliation: "2" |
| 15 | +affiliations: |
| 16 | + - name: University of Oslo, Department of Mathematics |
| 17 | + index: 1 |
| 18 | + - name: King Abdullah University of Science and Technology, Extreme Computing Research Center |
| 19 | + index: 2 |
| 20 | +date: 7 November 2018 |
| 21 | +bibliography: paper.bib |
| 22 | +--- |
| 23 | + |
| 24 | +# Summary |
| 25 | + |
| 26 | +The fast Fourier transform (FFT) is an algorithm that efficiently computes the |
| 27 | +discrete Fourier transform. The FFT is one of the most important algorithms |
| 28 | +utilized throughout science and society and it has been named *the most |
| 29 | +important numerical algorith of our time* by Prof Gilbert Strang [@strang]. |
| 30 | + |
| 31 | +``Mpi4py-fft`` (https://bitbucket.org/mpi4py/mpi4py-fft) is an open-source |
| 32 | +Python package for computing (in parallel) FFTs of possibly very large and |
| 33 | +distributed multidimensional arrays. |
| 34 | +A multidimensional FFT is computed sequentially over all axes, one axis at the time. |
| 35 | +A problem with parallel FFTs is that, to fit in the memory of multiple processors, |
| 36 | +multidimensional arrays will be distributed along some, but not all, of its axes. |
| 37 | +Consequently, parallel FFTs are computed as sequential (serial) transforms over |
| 38 | +non-distributed axes, combined with global redistributions (using MPI) that |
| 39 | +realign the arrays for further serial transforms. A parallel FFT is, in other |
| 40 | +words, computed as a combination of serial FFTs and global redistributions. |
| 41 | + |
| 42 | +For global redistribution ``mpi4py-fft`` makes use of a new and completely |
| 43 | +generic algorithm [@dalcin18] that allows for any index sets of a |
| 44 | +multidimensional array to be distributed. We can distribute just one index |
| 45 | +(a slab decomposition), two index sets (pencil decomposition) or even more for |
| 46 | +higher-dimensional arrays. The required MPI communications are always handled |
| 47 | +under the hood by MPI for Python. For serial transforms |
| 48 | +``mpi4py-fft`` wraps most of the FFTW library using Cython, making it callable |
| 49 | +from Python. We include wrappers for complex-to-complex, real-to-complex, |
| 50 | +complex-to-real and real-to-real transforms. |
| 51 | + |
| 52 | +``Mpi4py-fft`` is highly configurable in how it distributes and redistributes |
| 53 | +arrays. Large arrays may be globally redistributed for alignement |
| 54 | +along any given axis, whenever needed by the user. This |
| 55 | +flexibility has enabled the development of ``shenfun`` |
| 56 | +[@mortensen_joss,@mortensen17], which is a computing platform |
| 57 | +for solving partial differential equations (PDEs) by the spectral Galerkin method. |
| 58 | +In ``shenfun`` it is possible to solve PDEs of any given dimensionality, by creating |
| 59 | +tensor product bases as outer products of one-dimensional bases. This leads to |
| 60 | +large multidimensional arrays that are distributed effortlessly using ``mpi4py-fft``. |
| 61 | + |
| 62 | +``Mpi4py-fft`` can be utilized by anyone that needs to perform FFTs on large |
| 63 | +multidimensional arrays. It is installable from ``pypi`` and conda-forge, and |
| 64 | +released under a permissive 2-clause BSD-license, in the hope that it will be |
| 65 | +useful. |
| 66 | + |
| 67 | +# Acknowledgements |
| 68 | + |
| 69 | +M Mortensen acknowledges support from the 4DSpace Strategic Research Initiative at the |
| 70 | +University of Oslo |
| 71 | + |
| 72 | +# References |
0 commit comments