Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement Tiles as OneToMany associator #58

Closed
wants to merge 84 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
ee8de58
Add overload of `resizeTiles`
sbaldu Apr 3, 2024
6a835ce
Update type
sbaldu Apr 3, 2024
3df2a6e
Setup temporary Tiles using `host_buffer`
sbaldu Apr 3, 2024
4b52ac8
Formatting
sbaldu Apr 3, 2024
37b6898
Set TilesAlpaka constructor as default
sbaldu Apr 3, 2024
af27522
Delete old overload of `resizeTiles`
sbaldu Apr 3, 2024
933ced1
Update to version `2.2.1`
sbaldu Apr 4, 2024
4155f8d
Fix parameters in `__main__`
sbaldu Apr 4, 2024
bc3b8e4
Change parameters in test
sbaldu Apr 5, 2024
4c74162
Rework members of TilesAlpaka as private
sbaldu Apr 5, 2024
8430d74
Inline new methods
sbaldu Apr 5, 2024
75c7bcb
Fix typo
sbaldu Apr 8, 2024
c66766b
Add resize of tiles outer VecArray
sbaldu Apr 8, 2024
0ee66e8
Change default values for tile number and depths
sbaldu Apr 8, 2024
61f4023
Formatting
sbaldu Apr 8, 2024
34b4642
Use `pointsPerTile` when calculating tile number
sbaldu Apr 9, 2024
b44da4e
Set default `ppbin` to 128
sbaldu Apr 9, 2024
f28e76e
Add non-const `data()` to VecArray
sbaldu Apr 22, 2024
c07a4b5
Update `TilesAlpaka`
sbaldu Apr 22, 2024
312b63f
Add `KernelResetTiles`
sbaldu Apr 22, 2024
09ccdf9
Setup Tiles directly on device
sbaldu Apr 22, 2024
720ab69
Add overload of `resizeTiles`
sbaldu Apr 3, 2024
f6b8176
Update type
sbaldu Apr 3, 2024
d9f177d
Setup temporary Tiles using `host_buffer`
sbaldu Apr 3, 2024
6ba74d5
Formatting
sbaldu Apr 3, 2024
1b97754
Set TilesAlpaka constructor as default
sbaldu Apr 3, 2024
fd35d2f
Delete old overload of `resizeTiles`
sbaldu Apr 3, 2024
0edc4a3
Update to version `2.2.1`
sbaldu Apr 4, 2024
0cfd065
Fix parameters in `__main__`
sbaldu Apr 4, 2024
490827e
Change parameters in test
sbaldu Apr 5, 2024
fc756d7
Rework members of TilesAlpaka as private
sbaldu Apr 5, 2024
40d2176
Inline new methods
sbaldu Apr 5, 2024
0d8e470
Fix typo
sbaldu Apr 8, 2024
b86f994
Add resize of tiles outer VecArray
sbaldu Apr 8, 2024
317fa23
Change default values for tile number and depths
sbaldu Apr 8, 2024
253becf
Formatting
sbaldu Apr 8, 2024
07efc80
Use `pointsPerTile` when calculating tile number
sbaldu Apr 9, 2024
f7fc294
Set default `ppbin` to 128
sbaldu Apr 9, 2024
b66e73f
Merge branch 'fix_alpakatiles_setup_ondevice' into fix_alpakatiles_setup
sbaldu Apr 22, 2024
d411d92
Fix typo
sbaldu Apr 22, 2024
74bfd55
Set tile size on device manually
sbaldu Apr 29, 2024
be94528
Define `CoordinateExtremes` class
sbaldu May 2, 2024
9152714
Update `min_max` and `tile_size`
sbaldu May 2, 2024
2b7a3cf
Update setup of tiles
sbaldu May 2, 2024
e2d9e72
Make `min_max` and `tile_size` private
sbaldu May 6, 2024
0ec2c0a
Remove comments
sbaldu May 7, 2024
d606cf5
Add headers for `OneToManyAssoc`
sbaldu Jun 25, 2024
1afbb2a
Update include guards
sbaldu Jun 25, 2024
e8ed3e2
Formatting
sbaldu Jun 25, 2024
fccc40f
Formatting
sbaldu Jun 26, 2024
06b58fb
Merge branch 'fix_alpakatiles_setup' into rework_tiles_onetomanyassoc
sbaldu Jun 28, 2024
3a4199d
Merge branch 'feature_onetomanyassoc' into rework_tiles_onetomanyassoc
sbaldu Jun 28, 2024
e00767e
Define `span` class
sbaldu Jun 28, 2024
9b31d76
Start tiles rework with associator
sbaldu Jun 30, 2024
be51a92
Define `span` header
sbaldu Jul 22, 2024
abed615
Import utilities from cms alpakatools
sbaldu Jul 22, 2024
081b564
Include folder
sbaldu Jul 22, 2024
014d9f7
Remove random access associator
sbaldu Jul 22, 2024
63f335a
Remove alpaka asserts
sbaldu Jul 22, 2024
6f84182
Update tiles methods
sbaldu Jul 22, 2024
72e14e1
Formatting
sbaldu Jul 22, 2024
db15d4c
Cleaning
sbaldu Jul 22, 2024
c6a8762
Rewrite kernels for filling associator
sbaldu Jul 22, 2024
7a79aa0
Add iterators for span
sbaldu Aug 1, 2024
7739436
Fix offset kernel
sbaldu Aug 1, 2024
023c21e
Fix assoc fill kernel
sbaldu Aug 1, 2024
f02b9e8
Fix calculation of offsets
sbaldu Aug 1, 2024
f46c3c2
Update tiles methods and clean
sbaldu Aug 1, 2024
26b6117
Simplify Assoc and move to data formats
sbaldu Aug 2, 2024
a05661c
Fix include
sbaldu Aug 2, 2024
9443d88
Remove unneeded headers
sbaldu Aug 2, 2024
7142151
Fix accumulation with memcpy
sbaldu Aug 2, 2024
b199558
Update version
sbaldu Aug 2, 2024
09f179b
Merge branch 'main' into rework_tiles_onetomanyassoc
sbaldu Aug 2, 2024
644430f
Oops
sbaldu Aug 2, 2024
425dc38
Fix warnings for nvcc compilation
sbaldu Aug 3, 2024
0a270cd
Merge branch 'main' into rework_tiles_onetomanyassoc
sbaldu Oct 5, 2024
293febe
Formatting
sbaldu Oct 5, 2024
d01ddff
Update clang-format workflow
sbaldu Oct 5, 2024
5ccd590
Merge branch 'main' into rework_tiles_onetomanyassoc
sbaldu Oct 29, 2024
fc267b9
Remove unneeded file
sbaldu Oct 29, 2024
6f3f980
Add const
sbaldu Oct 30, 2024
c30b66a
Fix work div
sbaldu Oct 30, 2024
17686b6
Merge branch 'main' into rework_tiles_onetomanyassoc
sbaldu Nov 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/clang_format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,5 @@ jobs:
- name: Run clang-format style check
uses: jidicula/[email protected]
with:
clang-format-version: '17'
clang-format-version: '18'
check-path: ${{ matrix.path }}
exclude-regex: 'CLUEstering/include/test/doctest.h'
72 changes: 72 additions & 0 deletions CLUEstering/alpaka/AlpakaCore/AtomicPairCounter.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#ifndef AtomicPairCounter_h
#define AtomicPairCounter_h

#include <cstdint>

#include <alpaka/alpaka.hpp>

namespace cms::alpakatools {

class AtomicPairCounter {
public:
using DoubleWord = uint64_t;

ALPAKA_FN_HOST_ACC constexpr AtomicPairCounter() : counter_{0} {}
ALPAKA_FN_HOST_ACC constexpr AtomicPairCounter(uint32_t first, uint32_t second)
: counter_{pack(first, second)} {}
ALPAKA_FN_HOST_ACC constexpr AtomicPairCounter(DoubleWord values)
: counter_{values} {}

ALPAKA_FN_HOST_ACC constexpr AtomicPairCounter& operator=(DoubleWord values) {
counter_.as_doubleword = values;
return *this;
}

struct Counters {
uint32_t first; // in a "One to Many" association is the number of "One"
uint32_t
second; // in a "One to Many" association is the total number of associations
};

ALPAKA_FN_HOST_ACC constexpr Counters get() const { return counter_.as_counters; }

// atomically add as_counters, and return the previous value
template <typename TAcc>
ALPAKA_FN_ACC ALPAKA_FN_INLINE constexpr Counters add(const TAcc& acc, Counters c) {
Packer value{pack(c.first, c.second)};
Packer ret{0};
ret.as_doubleword = alpaka::atomicAdd(
acc, &counter_.as_doubleword, value.as_doubleword, alpaka::hierarchy::Blocks{});
return ret.as_counters;
}

// atomically increment first and add i to second, and return the previous value
template <typename TAcc>
ALPAKA_FN_ACC ALPAKA_FN_INLINE Counters constexpr inc_add(const TAcc& acc,
uint32_t i) {
return add(acc, {1u, i});
}

private:
union Packer {
DoubleWord as_doubleword;
Counters as_counters;
constexpr Packer(DoubleWord _as_doubleword) : as_doubleword(_as_doubleword) { ; };
constexpr Packer(Counters _as_counters) : as_counters(_as_counters) { ; };
};

// pack two uint32_t values in a DoubleWord (aka uint64_t)
// this is needed because in c++17 a union can only be aggregate-initialised to its first type
// it can be probably removed with c++20, and replace with a designated initialiser
static constexpr DoubleWord pack(uint32_t first, uint32_t second) {
Packer ret{0};
ret.as_counters = {first, second};
return ret.as_doubleword;
}

Packer counter_;
};

} // namespace cms::alpakatools

#endif // HeterogeneousCore_AlpakaInterface_interface_AtomicPairCounter_h
40 changes: 40 additions & 0 deletions CLUEstering/alpaka/AlpakaCore/alpakaWorkDiv.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,33 @@ namespace cms::alpakatools {
return (value + divisor - 1) / divisor;
}

// Trait describing whether or not the accelerator expects the threads-per-block and elements-per-thread to be swapped
template <typename TAcc, typename = std::enable_if_t<alpaka::isAccelerator<TAcc>>>
struct requires_single_thread_per_block : public std::true_type {};

#ifdef ALPAKA_ACC_GPU_CUDA_ENABLED
template <typename TDim>
struct requires_single_thread_per_block<alpaka::AccGpuCudaRt<TDim, Idx>>
: public std::false_type {};
#endif // ALPAKA_ACC_GPU_CUDA_ENABLED

#ifdef ALPAKA_ACC_GPU_HIP_ENABLED
template <typename TDim>
struct requires_single_thread_per_block<alpaka::AccGpuHipRt<TDim, Idx>>
: public std::false_type {};
#endif // ALPAKA_ACC_GPU_HIP_ENABLED

#ifdef ALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED
template <typename TDim>
struct requires_single_thread_per_block<alpaka::AccCpuThreads<TDim, Idx>>
: public std::false_type {};
#endif // ALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED

// Whether or not the accelerator expects the threads-per-block and elements-per-thread to be swapped
template <typename TAcc, typename = std::enable_if_t<alpaka::isAccelerator<TAcc>>>
inline constexpr bool requires_single_thread_per_block_v =
requires_single_thread_per_block<TAcc>::value;

/*
* Creates the accelerator-dependent workdiv for 1-dimensional operations.
*/
Expand Down Expand Up @@ -373,6 +400,19 @@ namespace cms::alpakatools {
Vec<alpaka::Dim<TAcc>>::zeros();
}

/* once_per_block
*
* `once_per_block(acc)` returns true for a single thread within the block.
*
* Usually the condition is true for thread 0, but this index should not be relied upon.
*/

template <typename TAcc, typename = std::enable_if_t<alpaka::isAccelerator<TAcc>>>
ALPAKA_FN_ACC inline constexpr bool once_per_block(TAcc const& acc) {
return alpaka::getIdx<alpaka::Block, alpaka::Threads>(acc) ==
Vec<alpaka::Dim<TAcc>>::zeros();
}

/*
* Overload for elementIdxShift = 0
*/
Expand Down
56 changes: 48 additions & 8 deletions CLUEstering/alpaka/CLUE/CLUEAlgoAlpaka.h
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,9 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
Queue queue_,
std::size_t block_size) {
// calculate the number of tiles and their size
const auto nTiles{std::ceil(h_points.n / static_cast<float>(pointsPerTile_))};
auto nTiles{std::ceil(h_points.n / static_cast<float>(pointsPerTile_))};
const auto nPerDim{std::ceil(std::pow(nTiles, 1. / Ndim))};
nTiles = std::pow(nPerDim, Ndim);

CoordinateExtremes<Ndim> min_max;
float tile_size[Ndim];
Expand Down Expand Up @@ -158,12 +159,55 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
cms::alpakatools::make_host_view(h_points.m_weight.data(), h_points.n));
alpaka::memset(queue_, *d_seeds, 0x00);

// Define the working division
auto tileIds = cms::alpakatools::make_device_buffer<uint32_t[]>(queue_, h_points.n);
// now we scan the dataset and calculate the tile of each point
// we can do it on the GPU
const Idx grid_size = cms::alpakatools::divide_up_by(h_points.n, block_size);
const auto working_div = cms::alpakatools::make_workdiv<Acc1D>(grid_size, block_size);
const auto work_div = cms::alpakatools::make_workdiv<Acc1D>(grid_size, block_size);
alpaka::enqueue(queue_,
alpaka::createTaskKernel<Acc1D>(work_div,
KernelScanDatasetTileId{},
m_tiles,
d_points.view(),
tileIds.data(),
h_points.n));
alpaka::enqueue(queue_,
alpaka::createTaskKernel<Acc1D>(work_div,
KernelCalculateOffset{},
tileIds.data(),
m_tiles->offset(),
h_points.n));
auto temp = cms::alpakatools::make_device_buffer<uint32_t[]>(queue_, h_points.n);
++nTiles;
auto test_grid_size = cms::alpakatools::divide_up_by(nTiles, 32);
auto test_work_div = cms::alpakatools::make_workdiv<Acc1D>(test_grid_size, 32);
alpaka::enqueue(queue_,
alpaka::createTaskKernel<Acc1D>(test_work_div,
KernelOffsetAccumulate{},
m_tiles->offset(),
temp.data(),
nTiles));
alpaka::memcpy(queue_,
cms::alpakatools::make_device_view(device, m_tiles->offset(), nTiles),
cms::alpakatools::make_device_view(device, temp.data(), nTiles));
alpaka::wait(queue_);
alpaka::enqueue(queue_,
alpaka::createTaskKernel<Acc1D>(
work_div, KernelZeroBuffer{}, temp.data(), h_points.n));

alpaka::enqueue(queue_,
alpaka::createTaskKernel<Acc1D>(work_div,
KernelFillAssociator{},
tileIds.data(),
m_tiles->offset(),
temp.data(),
m_tiles->content(),
h_points.n));
alpaka::memset(queue_, (*d_seeds), 0x00);

alpaka::enqueue(queue_,
alpaka::createTaskKernel<Acc1D>(
working_div, KernelResetFollowers{}, m_followers, h_points.n));
work_div, KernelResetFollowers{}, m_followers, h_points.n));
}

// Public methods
Expand All @@ -179,10 +223,6 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {

const Idx grid_size = cms::alpakatools::divide_up_by(h_points.n, block_size);
auto working_div = cms::alpakatools::make_workdiv<Acc1D>(grid_size, block_size);
alpaka::enqueue(
queue_,
alpaka::createTaskKernel<Acc1D>(
working_div, KernelFillTiles{}, d_points.view(), m_tiles, h_points.n));

alpaka::enqueue(queue_,
alpaka::createTaskKernel<Acc1D>(working_div,
Expand Down
98 changes: 82 additions & 16 deletions CLUEstering/alpaka/CLUE/CLUEAlpakaKernels.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,81 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
template <uint8_t Ndim>
using PointsView = typename PointsAlpaka<Ndim>::PointsAlpakaView;

struct KernelScanDatasetTileId {
template <typename TAcc, uint8_t Ndim>
ALPAKA_FN_ACC void operator()(const TAcc& acc,
TilesAlpaka<Ndim>* d_tiles,
PointsView<Ndim>* d_points,
uint32_t* tileIds,
uint32_t n_points) const {
cms::alpakatools::for_each_element_in_grid(acc, n_points, [&](uint32_t i) -> void {
tileIds[i] = d_tiles->getGlobalBin(acc, d_points->coords[i]);
});
}
};

struct KernelCalculateOffset {
template <typename TAcc>
ALPAKA_FN_ACC void operator()(const TAcc& acc,
uint32_t* tileIds,
uint32_t* offsets,
uint32_t n_points) const {
if (cms::alpakatools::once_per_grid(acc)) {
offsets[0] = 0;
}
cms::alpakatools::for_each_element_in_grid(acc, n_points, [&](uint32_t i) -> void {
alpaka::atomicAdd(acc, &offsets[tileIds[i] + 1], 1u);
});
}
};

struct KernelZeroBuffer {
template <typename TAcc, typename T>
ALPAKA_FN_ACC void operator()(const TAcc& acc, T* buffer, uint32_t size) const {
cms::alpakatools::for_each_element_in_grid(
acc, size, [&](uint32_t i) -> void { buffer[i] = 0; });
}
};

template <typename TAcc>
ALPAKA_FN_ACC uint32_t accumulate(const TAcc& acc, const uint32_t* buf, uint32_t size) {
uint32_t sum{0};
for (uint32_t i{}; i <= size; ++i) {
sum += buf[i];
}

return sum;
};

struct KernelOffsetAccumulate {
template <typename TAcc>
ALPAKA_FN_ACC void operator()(const TAcc& acc,
const uint32_t* offset,
uint32_t* temp,
uint32_t n_tiles) const {
cms::alpakatools::for_each_element_in_grid(
acc, n_tiles, [&](uint32_t tile) -> void {
temp[tile] = accumulate(acc, offset, tile);
});
}
};

struct KernelFillAssociator {
template <typename TAcc>
ALPAKA_FN_ACC void operator()(const TAcc& acc,
const uint32_t* tileIds,
const uint32_t* offsets,
uint32_t* temp,
uint32_t* content,
uint32_t n_points) const {
cms::alpakatools::for_each_element_in_grid(acc, n_points, [&](uint32_t i) -> void {
const auto tileId{tileIds[i]};
const auto contentPosition{alpaka::atomicAdd(acc, &temp[tileId], 1u)};
content[offsets[tileId] + contentPosition] = i;
});
}
};

struct KernelResetTiles {
template <typename TAcc, uint8_t Ndim>
ALPAKA_FN_ACC void operator()(TAcc const& acc,
Expand All @@ -31,7 +106,9 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
tiles->resizeTiles(nTiles, nPerDim);
}
cms::alpakatools::for_each_element_in_grid(
acc, nTiles, [&](uint32_t i) -> void { tiles->clear(i); });
acc, nTiles, [&](uint32_t i) -> void { /*tiles->clear(i);*/
;
});
}
};

Expand All @@ -45,17 +122,6 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
}
};

struct KernelFillTiles {
template <typename TAcc, uint8_t Ndim>
ALPAKA_FN_ACC void operator()(const TAcc& acc,
PointsView<Ndim>* points,
TilesAlpaka<Ndim>* tiles,
uint32_t n_points) const {
cms::alpakatools::for_each_element_in_grid(
acc, n_points, [&](uint32_t i) { tiles->fill(acc, points->coords[i], i); });
}
};

template <typename TAcc, uint8_t Ndim, uint8_t N_, typename KernelType>
ALPAKA_FN_HOST_ACC void for_recursion(
const TAcc& acc,
Expand All @@ -70,9 +136,9 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
float dc,
uint32_t point_id) {
if constexpr (N_ == 0) {
int binId{tiles->getGlobalBinByBin(acc, base_vec)};
auto binId{tiles->getGlobalBinByBin(acc, base_vec)};
// get the size of this bin
int binSize{static_cast<int>((*tiles)[binId].size())};
auto binSize{static_cast<int>((*tiles)[binId].size())};

// iterate inside this bin
for (int binIter{}; binIter < binSize; ++binIter) {
Expand Down Expand Up @@ -172,9 +238,9 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE {
float dm_sq,
uint32_t point_id) {
if constexpr (N_ == 0) {
int binId{tiles->getGlobalBinByBin(acc, base_vec)};
auto binId{tiles->getGlobalBinByBin(acc, base_vec)};
// get the size of this bin
int binSize{(*tiles)[binId].size()};
auto binSize{(*tiles)[binId].size()};

// iterate inside this bin
for (int binIter{}; binIter < binSize; ++binIter) {
Expand Down
6 changes: 3 additions & 3 deletions CLUEstering/alpaka/DataFormats/Points.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ struct Points {
Points() = default;
Points(const std::vector<VecArray<float, Ndim>>& coords,
const std::vector<float>& weight)
: m_coords{coords}, m_weight{weight}, n{weight.size()} {
: m_coords{coords}, m_weight{weight}, n{static_cast<uint32_t>(weight.size())} {
m_rho.resize(n);
m_delta.resize(n);
m_nearestHigher.resize(n);
m_clusterIndex.resize(n);
m_isSeed.resize(n);
}
Points(const std::vector<std::vector<float>>& coords, const std::vector<float>& weight)
: m_weight{weight}, n{weight.size()} {
: m_weight{weight}, n{static_cast<uint32_t>(weight.size())} {
for (const auto& x : coords) {
VecArray<float, Ndim> temp_vecarray;
for (auto value : x) {
Expand All @@ -43,7 +43,7 @@ struct Points {
std::vector<int> m_clusterIndex;
std::vector<int> m_isSeed;

size_t n;
uint32_t n;
};

#endif
Loading
Loading