diff --git a/README.md b/README.md index 40d5b90..432d49a 100644 --- a/README.md +++ b/README.md @@ -71,7 +71,7 @@ The distributed-ranges library provides data-structures, algorithms and views de Algorithms and data structures are designed to take the user off the need to worry about the technical details of their parallelism. An example would be the definition of a distributed vector in memory of multiple nodes connected using MPI. ```cpp -dr::mhp::distributed_vector dv(N); +dr::mp::distributed_vector dv(N); ``` Such a vector, containing N elements, is automatically distributed among all the nodes involved in the calculation, with individual nodes storing an equal (if possible) amount of data. @@ -82,12 +82,12 @@ In this way, many of the technical details related to the parallel execution of ### Namespaces General namespace used in the library is `dr::` -For program using a single node with shared memory available for multiple CPUs and one or more GPUs, data structures and algorithms from `dr::shp::` namespace are provided. -For distributed memory model, use the `dr::mhp::` namespace. +For program using a single node with shared memory available for multiple CPUs and one or more GPUs, data structures and algorithms from `dr::sp::` namespace are provided. +For distributed memory model, use the `dr::mp::` namespace. ### Data structures -Content of distributes-ranges' data structures is distributed over available nodes. For example, segments of `dr::mhp::distributed_vector` are located in memory of different nodes (mpi processes). Still, global view of the `distributed_vector` is uniform, with contiguous indices. +Content of distributes-ranges' data structures is distributed over available nodes. For example, segments of `dr::mp::distributed_vector` are located in memory of different nodes (mpi processes). Still, global view of the `distributed_vector` is uniform, with contiguous indices. #### Halo concept @@ -98,7 +98,7 @@ To support this situation, the concept of halo was introduced. A halo is an area ### Algorithms -Following algorithms are included in distributed-ranges, both in mhp and shp versions: +Following algorithms are included in distributed-ranges, both in mp and sp versions: ```cpp copy() @@ -151,16 +151,28 @@ The example shows the distributed nature of dr data structures. The distributed_ [./src/example4.cpp](src/example4.cpp) -This example illustrates adding two distributed,multidimensional arrays. Each array has two dimensions and is initialized by an `std::array`. The arrays are populated with sequential values using a distributed version of iota called `mhp::iota`. A for_each loop is the main part of the code, computing the sum on a specified number of nodes. It takes a lambda copy function along with two input arrays (a and b) and an output array (c) as parameters. The result is printed on a node 0. +This example illustrates adding two distributed,multidimensional arrays. Each array has two dimensions and is initialized by an `std::array`. The arrays are populated with sequential values using a distributed version of iota called `mp::iota`. A for_each loop is the main part of the code, computing the sum on a specified number of nodes. It takes a lambda copy function along with two input arrays (a and b) and an output array (c) as parameters. The result is printed on a node 0. ### Example 5 [./src/example5.cpp](src/example5.cpp) -Example 5 outlines a method for calculating a 2D 5-point stencil with distributed multidimensional arrays, specifically utilizing `dr::mhp::distributed_mdarray`. Initially, it involves setting up key parameters like the radius for element exchange between nodes through `dr::mhp::halo`, and defining the start and end points of the array slice. The example's core is the `mhp::stencil_for_each` function, which applies a lambda function to two subsets of the array, designated as input and output. The `mdspan_stencil_op` lambda function conducts a simple calculation that involves adding together the values of an element and its adjacent elements and subsequently calculating their average. The `mhp::halo().exchange()` enables values to be shared across distinct nodes, making this process feasible. Ultimately, the outcomes of the calculation are neatly displayed on node 0 using mdspan(), resulting in a clear indication of the modifications made to the 2D array. This example is a practical demonstration of executing stencil operations on distributed arrays. +Example 5 outlines a method for calculating a 2D 5-point stencil with distributed multidimensional arrays, specifically utilizing `dr::mp::distributed_mdarray`. Initially, it involves setting up key parameters like the radius for element exchange between nodes through `dr::mp::halo`, and defining the start and end points of the array slice. The example's core is the `mp::stencil_for_each` function, which applies a lambda function to two subsets of the array, designated as input and output. The `mdspan_stencil_op` lambda function conducts a simple calculation that involves adding together the values of an element and its adjacent elements and subsequently calculating their average. The `mp::halo().exchange()` enables values to be shared across distinct nodes, making this process feasible. Ultimately, the outcomes of the calculation are neatly displayed on node 0 using mdspan(), resulting in a clear indication of the modifications made to the 2D array. This example is a practical demonstration of executing stencil operations on distributed arrays. ### Example 6 [./src/example6.cpp](src/example6.cpp) -This example's code demonstrates a 2D pattern search in a distributed, multidimensional array (`mhp::distributed_mdarray`). It initializes a 2D array, populates it with `mhp::iota`, converts it to binary values using `mhp::transform` and defines a pattern of 2x2. A lambda function is used to scan the array and mark occurrences of the pattern in a separate array. The process is similar to the one demonstrated in example5. +This example's code demonstrates a 2D pattern search in a distributed, multidimensional array (`mp::distributed_mdarray`). It initializes a 2D array, populates it with `mp::iota`, converts it to binary values using `mp::transform` and defines a pattern of 2x2. A lambda function is used to scan the array and mark occurrences of the pattern in a separate array. The process is similar to the one demonstrated in example5. + +### Example 7 + +[./src/example7.cpp](src/example7.cpp) + +This example showcases usage of `mp::distributed_sparse_matrix`. It retrieves data from `resources/example.mtx` file in root node, and distributes it between all nodes. The root node initializes vector and broadcasts it to every other node. After that, the `mp::gemv` operation is performed and result is returned to `std::vector` in the root. Finally, the root prints the multiplied vector and the result. + +### Example 8 + +[./src/example8.cpp](src/example8.cpp) + +The example 8 is exactly the same as example 7, the only thing that is different is the initialization of the matrix data. Here the matrix is generated inside the code, has different shape and uses random values. Additionally, we print matrix data together with vector and result. diff --git a/resources/example.mtx b/resources/example.mtx new file mode 100644 index 0000000..428295f --- /dev/null +++ b/resources/example.mtx @@ -0,0 +1,51 @@ +%%MatrixMarket matrix coordinate real general +25 25 49 +1 1 1.000e+00 +2 2 2.000e+00 +3 3 3.000e+00 +4 4 4.000e+00 +5 5 5.000e+00 +6 6 6.000e+00 +7 7 7.000e+00 +8 8 8.000e+00 +9 9 9.000e+00 +10 10 1.000e+01 +11 11 2.000e+01 +12 12 3.000e+01 +13 13 4.000e+01 +14 14 5.000e+01 +15 15 6.000e+01 +16 16 7.000e+01 +17 17 8.000e+01 +18 18 8.000e+01 +19 19 9.000e+01 +20 20 1.000e+02 +21 21 2.000e+02 +22 22 2.000e+02 +23 23 3.000e+02 +24 24 4.000e+02 +25 25 5.000e+02 +1 2 1.000e+00 +2 3 2.000e+00 +3 4 3.000e+00 +4 5 4.000e+00 +5 6 5.000e+00 +6 7 6.000e+00 +7 8 7.000e+00 +8 9 8.000e+00 +9 10 9.000e+00 +10 11 1.000e+01 +11 12 2.000e+01 +12 13 3.000e+01 +13 14 4.000e+01 +14 15 5.000e+01 +15 16 6.000e+01 +16 17 7.000e+01 +17 18 8.000e+01 +18 19 9.000e+01 +19 20 1.000e+01 +20 21 2.000e+01 +21 22 3.000e+01 +22 23 4.000e+01 +23 24 5.000e+01 +24 25 6.000e+01 diff --git a/scripts/build_run.sh b/scripts/build_run.sh index f5c0099..ae8895e 100755 --- a/scripts/build_run.sh +++ b/scripts/build_run.sh @@ -16,3 +16,5 @@ mpirun -n 3 ./build/src/example3 mpirun -n 3 ./build/src/example4 mpirun -n 3 ./build/src/example5 mpirun -n 3 ./build/src/example6 +mpirun -n 3 ./build/src/example7 +mpirun -n 3 ./build/src/example8 diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 11b3150..fcda0fa 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -41,3 +41,13 @@ add_executable(example6 example6.cpp) target_compile_definitions(example6 INTERFACE DR_FORMAT) target_link_libraries(example6 DR::mpi fmt::fmt) + +add_executable(example7 example7.cpp) + +target_compile_definitions(example7 INTERFACE DR_FORMAT) +target_link_libraries(example7 DR::mpi fmt::fmt) + +add_executable(example8 example8.cpp) + +target_compile_definitions(example8 INTERFACE DR_FORMAT) +target_link_libraries(example8 DR::mpi fmt::fmt) diff --git a/src/example7.cpp b/src/example7.cpp new file mode 100644 index 0000000..257fa75 --- /dev/null +++ b/src/example7.cpp @@ -0,0 +1,68 @@ +// SPDX-FileCopyrightText: Intel Corporation +// +// SPDX-License-Identifier: BSD-3-Clause + +#include +#include +#include + +/* Sparse band matrix vector multiplication */ +int main() { + dr::mp::init(sycl::default_selector_v); + using I = long; + using V = double; + + dr::views::csr_matrix_view local_data; + auto root = 0; + if (root == dr::mp::rank()) { + // x x 0 0 ... 0 + // 0 x x 0 ... 0 + // ............. + // 0 ... 0 0 x x + auto source = "./resources/example.mtx"; + local_data = dr::read_csr(source); + } + + dr::mp::distributed_sparse_matrix< + V, I, dr::mp::MpiBackend, + dr::mp::csr_eq_distribution> + matrix(local_data, root); + + dr::mp::broadcasted_vector broadcasted_b; + std::vector b; + if (root == dr::mp::rank()) { + b.resize(matrix.shape().second); + std::iota(b.begin(), b.end(), 1); + + broadcasted_b.broadcast_data(matrix.shape().second, 0, b, + dr::mp::default_comm()); + } else { + broadcasted_b.broadcast_data(matrix.shape().second, 0, + std::ranges::empty_view(), + dr::mp::default_comm()); + } + std::vector res(matrix.shape().first); + gemv(root, res, matrix, broadcasted_b); + + if (root == dr::mp::rank()) { + fmt::print("Matrix imported from {}\n", "./resources/example.mtx"); + fmt::print("Input: "); + for (auto x : b) { + fmt::print("{} ", x); + } + fmt::print("\n"); + fmt::print("Matrix vector multiplication res: "); + for (auto x : res) { + fmt::print("{} ", x); + } + fmt::print("\n"); + } + + if (root == dr::mp::default_comm().rank()) { + dr::__detail::destroy_csr_matrix_view(local_data, std::allocator{}); + } + + dr::mp::finalize(); + + return 0; +} diff --git a/src/example8.cpp b/src/example8.cpp new file mode 100644 index 0000000..fb4ef14 --- /dev/null +++ b/src/example8.cpp @@ -0,0 +1,93 @@ +// SPDX-FileCopyrightText: Intel Corporation +// +// SPDX-License-Identifier: BSD-3-Clause + +#include +#include +#include +#include + +/* Sparse band matrix vector multiplication */ +int main() { + dr::mp::init(sycl::default_selector_v); + using I = long; + using V = double; + dr::views::csr_matrix_view local_data; + auto root = 0; + if (root == dr::mp::rank()) { + auto size = 10; + auto nnz = 20; + auto colInd = new I[nnz]; + auto rowInd = new I[size + 1]; + auto values = new V[nnz]; + std::uniform_real_distribution unif(0, 1); + std::default_random_engine re; + // x x 0 0 ... 0 + // x x 0 0 ... 0 + // x 0 x 0 ... 0 + // x 0 0 x ... 0 + // ............. + // x ... 0 0 0 x + for (auto i = 0; i <= size; i++) { + rowInd[i] = i * 2; // two elements per row + } + for (auto i = 0; i < nnz; i++) { + colInd[i] = + (i % 2) * (std::max(i / 2, 1)); // column on 0 and diagonal (with + // additional entry in first row) + values[i] = unif(re); + } + + local_data = dr::views::csr_matrix_view(values, rowInd, colInd, + {size, size}, nnz, root); + } + + dr::mp::distributed_sparse_matrix< + V, I, dr::mp::MpiBackend, + dr::mp::csr_eq_distribution> + matrix(local_data, root); + + dr::mp::broadcasted_vector broadcasted_b; + std::vector b; + if (root == dr::mp::rank()) { + b.resize(matrix.shape().second); + std::iota(b.begin(), b.end(), 1); + + broadcasted_b.broadcast_data(matrix.shape().second, 0, b, + dr::mp::default_comm()); + } else { + broadcasted_b.broadcast_data(matrix.shape().second, 0, + std::ranges::empty_view(), + dr::mp::default_comm()); + } + + std::vector res(matrix.shape().first); + gemv(root, res, matrix, broadcasted_b); + + if (root == dr::mp::rank()) { + fmt::print("Matrix with {} x {} and number of non-zero entries equal to {} " + "and entries:\n", + matrix.shape().first, matrix.shape().second, matrix.size()); + for (auto [i, v] : matrix) { + auto [n, m] = i; + fmt::print("Matrix entry <{}, {}, {}>\n", n, m, v); + } + fmt::print("Input: "); + for (auto x : b) { + fmt::print("{} ", x); + } + fmt::print("\n"); + fmt::print("Matrix vector multiplication res: "); + for (auto x : res) { + fmt::print("{} ", x); + } + fmt::print("\n"); + } + + if (root == dr::mp::default_comm().rank()) { + dr::__detail::destroy_csr_matrix_view(local_data, std::allocator{}); + } + dr::mp::finalize(); + + return 0; +}