Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'indirect_sort' #117

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/algorithm.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ Convert a sequence of hexadecimal characters into a sequence of integers or char
Convert a sequence of integral types into a lower case hexadecimal sequence of characters
[endsect:hex_lower]

[include indirect_sort.qbk]

[include is_palindrome.qbk]

[include is_partitioned_until.qbk]
Expand Down
111 changes: 111 additions & 0 deletions doc/indirect_sort.qbk
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
[/ File indirect_sort.qbk]

[section:indirect_sort indirect_sort ]

[/license
Copyright (c) 2023 Marshall Clow

Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]

There are times that you want a sorted version of a sequence, but for some reason you don't want to modify it. Maybe the elements in the sequence can't be moved/copied, e.g. the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.

That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns a "permutation" of the elements that, when applied, will put the elements in the sequence in a sorted order.

Assume have a sequence `[first, last)` of 1000 items that are expensive to swap:
```
std::sort(first, last); // ['O(N ln N)] comparisons and ['O(N ln N)] swaps (of the element type).
```

On the other hand, using indirect sorting:
```
auto perm = indirect_sort(first, last); // ['O(N lg N)] comparisons and ['O(N lg N)] swaps (of size_t).
apply_permutation(first, last, perm.begin(), perm.end()); // ['O(N)] swaps (of the element type)
```

If the element type is sufficiently expensive to swap, then 10,000 swaps of size_t + 1000 swaps of the element_type could be cheaper than 10,000 swaps of the element_type.

Or maybe you don't need the elements to actually be sorted - you just want to traverse them in a sorted order:
```
auto permutation = indirect_sort(first, last);
for (size_t idx: permutation)
std::cout << first[idx] << std::endl;
```


Assume that instead of an "array of structures", you have a "struct of arrays".
```
struct AType {
Type0 key;
Type1 value1;
Type1 value2;
};

std::array<AType, 1000> arrayOfStruct;
```

versus:

```
template <size_t N>
struct AType {
std::array<Type0, N> key;
std::array<Type1, N> value1;
std::array<Type2, N> value2;
};

AType<1000> structOfArrays;
```

Sorting the first one is easy, because each set of fields (`key`, `value1`, `value2`) are part of the same struct. But with indirect sorting, the second one is easy to sort as well - just sort the keys, then apply the permutation to the keys and the values:
```
auto perm = indirect_sort(std::begin(structOfArrays.key), std::end(structOfArrays.key));
apply_permutation(structOfArrays.key.begin(), structOfArrays.key.end(), perm.begin(), perm.end());
apply_permutation(structOfArrays.value1.begin(), structOfArrays.value1.end(), perm.begin(), perm.end());
apply_permutation(structOfArrays.value2.begin(), structOfArrays.value2.end(), perm.begin(), perm.end());
```

[heading interface]

The function `indirect_sort` returns a `vector<size_t>` containing the permutation necessary to put the input sequence into a sorted order. One version uses `std::less` to do the comparisons; the other lets the caller pass predicate to do the comparisons.

There is also a variant called `indirect_stable_sort`; it bears the same relation to `indirect_sort` that `std::stable_sort` does to `std::sort`.

```
template <typename RAIterator>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last);

template <typename RAIterator, typename BinaryPredicate>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, BinaryPredicate pred);

template <typename RAIterator>
std::vector<size_t> indirect_stable_sort (RAIterator first, RAIterator last);

template <typename RAIterator, typename BinaryPredicate>
std::vector<size_t> indirect_stable_sort (RAIterator first, RAIterator last, BinaryPredicate pred);
```

[heading Examples]

[heading Iterator Requirements]

`indirect_sort` requires random-access iterators.

[heading Complexity]

Both of the variants of `indirect_sort` run in ['O(N lg N)] time; they are not more (or less) efficient than `std::sort`. There is an extra layer of indirection on each comparison, but all of the swaps are done on values of type `size_t`

[heading Exception Safety]

[heading Notes]

In numpy, this algorithm is known as `argsort`.

[endsect]

[/ File indirect_sort.qbk
Copyright 2023 Marshall Clow
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt).
]
207 changes: 207 additions & 0 deletions include/boost/algorithm/indirect_sort.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
/*
Copyright (c) Marshall Clow 2023.

Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

*/

/// \file indirect_sort.hpp
/// \brief indirect sorting algorithms
/// \author Marshall Clow
///

#ifndef BOOST_ALGORITHM_INDIRECT_SORT
#define BOOST_ALGORITHM_INDIRECT_SORT

#include <algorithm> // for std::sort (and others)
#include <functional> // for std::less
#include <vector> // for std::vector

#include <boost/algorithm/cxx11/iota.hpp>

namespace boost { namespace algorithm {

typedef std::vector<size_t> Permutation;

namespace detail {

template <class Predicate, class Iter>
struct indirect_predicate {
indirect_predicate (Predicate pred, Iter iter)
: pred_(pred), iter_(iter) {}

bool operator ()(size_t a, size_t b) const {
return pred_(iter_[a], iter_[b]);
}

Predicate pred_;
Iter iter_;
};

// Initialize a permutation of size 'size'. [ 0, 1, 2, ... size-1 ]
// Note: it would be nice to use 'iota' here, but that call writes over
// existing elements - not append them. I don't want to initialize
// the elements of the permutation to zero, and then immediately
// overwrite them.
void init_permutation (Permutation &p, size_t size) {
p.reserve(size);
boost::algorithm::iota_n(
std::back_insert_iterator<Permutation>(p), size_t(0), size);
}
}

// ===== sort =====

/// \fn indirect_sort (RAIterator first, RAIterator last, Predicate pred)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::sort(first, last, pred)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_sort (RAIterator first, RAIterator last, Pred pred) {

Permutation ret;
detail::init_permutation(ret, std::distance(first, last));
std::sort(ret.begin(), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_sort (RAIterator first, RAIterator last)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::sort(first, last)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_sort (RAIterator first, RAIterator last) {
return indirect_sort(first, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== stable_sort =====

/// \fn indirect_stable_sort (RAIterator first, RAIterator last, Predicate pred)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::stable_sort(first, last, pred)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_stable_sort (RAIterator first, RAIterator last, Pred pred) {
Permutation ret;
detail::init_permutation(ret, std::distance(first, last));
std::stable_sort(ret.begin(), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_stable_sort (RAIterator first, RAIterator last)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::stable_sort(first, last)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_stable_sort (RAIterator first, RAIterator last) {
return indirect_stable_sort(first, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== partial_sort =====

/// \fn indirect_partial_sort (RAIterator first, RAIterator middle, RAIterator last, Predicate pred)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::partial_sort(first, middle, last, pred)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param middle The end of the range to be sorted
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_partial_sort (RAIterator first, RAIterator middle,
RAIterator last, Pred pred) {
Permutation ret;
detail::init_permutation(ret, std::distance(first, last));
std::partial_sort(ret.begin(), ret.begin() + std::distance(first, middle), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_partial_sort (RAIterator first, RAIterator middle, RAIterator last)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::partial_sort(first, middle, last)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param middle The end of the range to be sorted
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_partial_sort (RAIterator first, RAIterator middle, RAIterator last) {
return indirect_partial_sort(first, middle, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== nth_element =====

/// \fn indirect_nth_element (RAIterator first, RAIterator nth, RAIterator last, Predicate p)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::nth_element(first, nth, last, p)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param nth The sort partition point in the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_nth_element (RAIterator first, RAIterator nth,
RAIterator last, Pred pred) {
Permutation ret;
detail::init_permutation(ret, std::distance(first, last));
std::nth_element(ret.begin(), ret.begin() + std::distance(first, nth), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_nth_element (RAIterator first, RAIterator nth, RAIterator last)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is ordered as if 'std::nth_element(first, nth, last)'
// was called on the sequence.
///
/// \param first The start of the input sequence
/// \param nth The sort partition point in the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_nth_element (RAIterator first, RAIterator nth, RAIterator last) {
return indirect_nth_element(first, nth, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

}}

#endif // BOOST_ALGORITHM_INDIRECT_SORT
4 changes: 4 additions & 0 deletions test/Jamfile.v2
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ alias unit_test_framework

# Apply_permutation tests
[ run apply_permutation_test.cpp unit_test_framework : : : : apply_permutation_test ]

# Indirect_sort tests
[ run indirect_sort_test.cpp unit_test_framework : : : : indirect_sort_test ]

# Find tests
[ run find_not_test.cpp unit_test_framework : : : : find_not_test ]
[ run find_backward_test.cpp unit_test_framework : : : : find_backward_test ]
Expand Down
Loading