diff --git a/doc/specs/index.md b/doc/specs/index.md
index 098f2ce21..0883ea07c 100644
--- a/doc/specs/index.md
+++ b/doc/specs/index.md
@@ -19,6 +19,7 @@ This is and index/directory of the specifications (specs) for each new module/fe
  - [logger](./stdlib_logger.html) - Runtime logging system
  - [optval](./stdlib_optval.html) - Fallback value for optional arguments
  - [quadrature](./stdlib_quadrature.html) - Numerical integration
+ - [sorting](.stdlib_sorting.html) - Sorting of rank one arrays
  - [stats](./stdlib_stats.html) - Descriptive Statistics
  - [stats_distribution_PRNG](./stdlib_stats_distribution_PRNG.html) - Probability Distributions random number generator
  - [string\_type](./stdlib_string_type.html) - Basic string support
diff --git a/doc/specs/stdlib_sorting.md b/doc/specs/stdlib_sorting.md
new file mode 100644
index 000000000..83429166d
--- /dev/null
+++ b/doc/specs/stdlib_sorting.md
@@ -0,0 +1,644 @@
+---
+title: Sorting Procedures
+---
+
+# The `stdlib_sorting` module
+
+(TOC)
+
+## Overview of sorting
+
+The sorting of collections of data is useful in the analysis of those
+collections.
+With its absence of generics and limited polymorphism, it is
+impractical, in current Fortran, to provide sorting routines for
+arbitrary collections of arbitrary types of data.
+However Fortran's arrays are by far its most widely used collection,
+and arrays of arbitrary types of data can often be sorted in terms of
+a single component of intrinsic type.
+The Fortran Standard Library therefore provides a module,
+`stdlib_sorting`, with procedures to sort arrays of simple intrinsic
+numeric types, i.e. the different kinds of integers and reals, the
+default assumed length character, and the `stdlib_string_type`
+module's `string_type` type.
+
+## Overview of the module
+
+The module `stdlib_sorting` defines several public entities one
+default integer parameter, `int_size`, and three overloaded
+subroutines: `ORD_SORT`, `SORT`, and `SORT_INDEX`. The
+overloaded subroutines also each have seven specific names for
+versions corresponding to different types of array arguments.
+
+### The `int_size` parameter
+
+The `int_size` parameter is used to specify the kind of integer used
+in indexing the various arrays. Currently the module sets `int_size`
+to the value of `int64` from the `stdlib_kinds` module.
+
+### The module subroutines
+
+The `stdlib_sorting` module provides three different overloaded
+subroutines intended to sort three different kinds of arrays of
+data:
+* `ORD_SORT` is intended to sort simple arrays of intrinsic data
+  that have significant sections that were partially ordered before
+  the sort;
+* `SORT_INDEX` is based on ORD_SORT, but in addition to sorting the
+  input array, it returns indices that map the original array to its
+  sorted version. This enables related arrays to be re-ordered in the
+  same way; and
+* `SORT` is intended to sort simple arrays of intrinsic data
+  that are effectively unordered before the sort.
+
+#### Licensing
+
+The Fortran Standard Library is distributed under the MIT
+License. However components of the library may be based on code with
+additional licensing restriction. In particular `ORD_SORT`,
+`SORT_INDEX`, and `SORT` are translations of codes with their
+own distribution restrictions.
+
+The `ORD_SORT` and `SORT_INDEX` subroutines are essentially
+translations to Fortran 2008 of the `"rust" sort` of the Rust Language
+distributed as part of
+[`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs).
+The header of the `slice.rs` file has as its licensing requirements:
+
+    Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+    file at the top-level directory of this distribution and at
+    http://rust-lang.org/COPYRIGHT.
+
+    Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+    http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+    <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+    option. This file may not be copied, modified, or distributed
+    except according to those terms.
+
+so the license for the `slice.rs` code is compatible with the use of
+modified versions of the code in the Fortran Standard Library under
+the MIT license.
+
+The `SORT` subroutine is essentially a translation to Fortran
+2008 of the
+[`introsort`]((http://www.cs.rpi.edu/~musser/gp/introsort.ps) of David
+Musser.  David Musser has given permission to include a variant of
+`introsort` in the Fortran Standard Library under the MIT license
+provided we cite:
+
+    Musser, D.R., “Introspective Sorting and Selection Algorithms,”
+    Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
+
+as the official source of the algorithm.
+
+
+#### The `ORD_SORT` subroutine
+
+`ORD_SORT` is a translation of the `"Rust" sort` sorting algorithm
+contained in [`slice.rs`]
+(https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs).
+`"Rust" sort`, in turn, is inspired by the [`timsort` algorithm]
+(http://svn.python.org/projects/python/trunk/Objects/listsort.txt)
+that Tim Peters created for the Python Language.
+`ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
+and `insertion sort`. It has always at worst O(N Ln(N)) runtime
+performance in sorting random data, having a performance about 15-25%
+slower than `SORT` on such data. However it has much better
+performance than `SORT` on partially sorted data, having O(N)
+performance on uniformly increasing or decreasing data.
+
+
+`ORD_SORT` begins by traversing the array starting in its tail
+attempting to identify `runs` in the array, where a run is either a
+uniformly decreasing sequence, `ARRAY(i-1) > ARRAY(i)`, or a
+non-decreasing, `ARRAY(i-1) <= ARRAY(i)`, sequence. Once delimitated
+decreasing sequences are reversed in their order. Then, if the
+sequence has less than `MIN_RUN` elements, previous elements in the
+array are added to the run using `insertion sort` until the run
+contains `MIN_RUN` elements or the array is completely processed. As
+each run is identified the start and length of the run 
+is then pushed onto a stack and the stack is then processed using
+`merge` until it obeys the stack invariants:
+
+1. len(i-2) > len(i-1) + len(i)
+2. len(i-1) > len(i)
+
+ensuring that processing the stack is, at worst, of order `O(N
+Ln(N))`. However, because of the identification of decreasing and
+non-decreasing runs, processing of structured data can be much faster,
+with processing of uniformly decreasing or non-decreasing arrays being
+of order O(N). The result in our tests is that `ORD_SORT` is about
+25% slower than `SORT` on purely random data, depending on
+the compiler, but can be `Ln(N)` faster than `SORT` on highly
+structured data. As a modified `merge sort`, `ORD_SORT` requires the
+use of a "scratch" array, that may be provided as an optional `work`
+argument or allocated internally on the stack.
+
+#### The `SORT_INDEX` subroutine
+
+The `SORT` and `ORD_SORT` subroutines can sort rank 1 isolated
+arrays of intrinsic types, but do nothing for the coordinated sorting
+of related data, e.g., multiple related rank 1 arrays, higher rank
+arrays, or arrays of derived types. For such related data, what is
+useful is an array of indices that maps a rank 1 array to its sorted
+form. For such a sort, a stable sort is useful, therefore the module
+provides a subroutine, `SORT_INDEX`, that generates such an array of
+indices based on the `ORD_SORT` algorithm, in addition to sorting
+the input array.
+
+The logic of `SORT_INDEX` parallels that of `ORD_SORT`, with
+additional housekeeping to keep the array of indices consistent with
+the sorted positions of the input array. Because of this additional
+housekeeping it has slower runtime performance than `ORD_SORT`.
+`SORT_INDEX` requires the use of two "scratch" arrays, that may be
+provided as optional `work` and `iwork` arguments or allocated
+internally on the stack.
+
+#### The `SORT` subroutine
+
+`SORT` uses the `introsort` sorting algorithm of David Musser.
+`introsort` is a hybrid unstable comparison algorithm combining
+`quicksort`, `insertion sort`, and `heap sort`. While this algorithm's
+runtime performance is always O(N Ln(N)), it is relatively fast on
+randomly ordered data, but does not show the improvement in
+performance on partly sorted data found for `ORD_SORT`.
+
+As with `introsort`, `SORT` is an unstable hybrid algorithm.
+First it examines the array and estimates the depth of recursion a
+quick sort would require for ideal (random) data, `D =
+Ceiling(Ln(N)/Ln(2))`. It then defines a limit to the number of
+`quicksort` recursions to be allowed in processing,
+`D_limit = factor * D`, where factor is currently 2, and
+calls `introsort` proper. `introsort` proper then:
+
+1. Examines the number of elements remaining to be sorted, and, if
+   they are less than 16, sorts them using insertion sort and returns;
+2. If they are not less than 16, checks whether the current depth of
+   recursion exceeds `D_limit` and, if it does, processes the remaining
+   elements with heap sort and returns;
+3. If the current depth of recursion does not exceed `D_limit`, then
+   in effect does a `quicksort` step:
+
+    * Partitions the remaining array using a median of three,
+    * Calls `introsort` proper on the leftmost partition,
+    * Calls `introsort` proper on the rightmost partition, and then
+      returns.
+
+The resulting algorithm is of order O(N Ln(N)) run time performance
+for all inputs. Because it relies on `quicksort`, the coefficient of
+the O(N Ln(N)) behavior is typically small compared to other sorting
+algorithms on random data. On partially sorted data it can show either
+slower `heap sort` performance, or enhanced performance by up to a
+factor of six. Still, even when it shows enhanced performance, its
+performance on partially sorted data is typically an order of
+magnitude slower than `ORD_SORT`. Its memory requirements are also
+low, being of order O(Ln(N)), while the memory requirements of
+`ORD_SORT` and `SORT_INDEX` are of order O(N).
+
+### Tentative specifications of the `stdlib_sorting` procedures
+
+#### `ord_sort` - sorts an input array
+
+##### Status
+
+Experimental
+
+##### Description
+
+Returns an input `array` with the elements sorted in order of
+increasing value.
+
+##### Syntax
+
+`call [[stdlib_sorting(module):ord_sort(subroutine)]]ord_sort ( array[, work ] )`
+
+##### Class
+
+Generic subroutine.
+
+##### Arguments
+
+`array` : shall be a rank one array of any of the types:
+`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
+`real(sp)`, `real(dp)`, `real(qp)`, `character(*)`, or
+`type(string_type)`. It is an `intent(inout)` argument. On input it is
+the array to be sorted. If both the type of `array` is real and at
+least one of the elements is a `NaN`, then the ordering of the result
+is undefined. Otherwise on return its elements will be sorted in order
+of non-decreasing value.
+
+`work` (optional): shall be a rank one array of the same type as
+array, and shall have at least `size(array)/2` elements. It is an
+`intent(inout)` argument. It is intended to be used as "scratch"
+memory for internal record keeping. If associated with an array in
+static storage, its use can significantly reduce the stack memory
+requirements for the code. Its contents on return are undefined.
+
+##### Notes
+
+`ORD_SORT` implements a hybrid sorting algorithm combining
+`merge sort`, and `insertion sort`. For most purposes it behaves like
+a `merge sort`, providing worst case `O(N Ln(N))` run time performance
+for most random arrays, that is typically slower than `SORT`.
+However, if the array has significant runs of decreasing or
+non-decreasing values, performance can be much better than
+`SORT`, with `O(N)` behavior on uniformly decreasing, or
+non-decreasing arrays. The optional `work` array replaces "scratch"
+memory that would otherwise be allocated on the stack. If `array` is of
+any type `REAL` the order of its elements on return undefined if any
+element of `array` is a `NaN`.  Sorting of `CHARACTER(*)` and
+`STRING_TYPE` arrays are based on the operator `>`, and not on the
+function `LGT`.
+
+
+##### Example
+
+```fortran
+    ...
+    ! Read arrays from sorted files
+    call read_sorted_file( 'dummy_file1', array1 )
+    call read_sorted_file( 'dummy_file2', array2 )
+    ! Concatenate the arrays
+	array = [ array1, array2 ]
+    ! Sort the resulting array
+    call ord_sort( array, work )
+    ! Process the sorted array
+    call array_search( array, values )
+    ...
+```
+
+#### `sort` - sorts an input array
+
+##### Status
+
+Experimental
+
+##### Description
+
+Returns an input array with the elements sorted in order of increasing
+value.
+
+##### Syntax
+
+`call [[stdlib_sorting(module):sort(subroutine)]]sort ( array )`
+
+##### Class
+
+Pure generic subroutine.
+
+##### Arguments
+
+`array` : shall be a rank one array of any of the types:
+`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
+`real(sp)`, `real(dp)`, `real(qp)`. `character(*)`, or
+`type(string_type)`. It is an `intent(inout)` argument. On return its
+input elements will be sorted in order of non-decreasing value.
+
+##### Notes
+
+`SORT` implements a hybrid sorting algorithm combining
+`quicksort`, `merge sort`, and `insertion sort`. For most purposes it
+behaves like a `quicksort` with a median of three partition, providing
+good, `O(N Ln(N))`, run time performance for most random arrays, but
+defaulting to `merge sort` if the structure of the array results in
+the `quicksort` not converging as rapidly as expected. If `array` is of
+any type `REAL`, the behavior of the sorting is undefined if any
+element of `array` is a `NaN`.  Sorting of `CHARACTER(*)` and
+`STRING_TYPE` arrays are based on the operators `<`, `<=`, `>`, and
+`>=`, and not on the functions `LLT`, `LLE`, `LGT`, or `LGE`.
+
+##### Example
+
+```fortran
+    ...
+    ! Read random data from a file
+    call read_file( 'dummy_file', array )
+    ! Sort the random data
+    call sort( array )
+    ! Process the sorted data
+    call array_search( array, values )
+    ...
+```
+
+#### `sort_index` - creates an array of sorting indices for an input array, while also sorting the array.
+
+##### Status
+
+Experimental
+
+##### Description
+
+Returns the input `array` sorted in the direction requested while
+retaining order stability, and an integer array whose elements would
+sort the input `array` to produce the output `array`.
+
+##### Syntax
+
+`call [[stdlib_sorting(module):sort_index(subroutine)]]sort_index ( array, index[, work, iwork, reverse ] )`
+
+##### Class
+
+Generic subroutine.
+
+##### Arguments
+
+`array`: shall be a rank one array of any of the types:
+`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`,
+`real(sp)`, `real(dp)`, `real(qp)`, `character(*)`, or
+`type(string_type)`. It is an `intent(inout)` argument. On input it
+will be an array whose sorting indices are to be determined. On return
+it will be the sorted array.
+
+`index`: shall be a rank one integer array of kind `int_size` and of
+the size of `array`. It is an `intent(out)` argument. On return it
+shall have values that are the indices needed to sort the original
+array in the desired direction.
+
+`work` (optional): shall be a rank one array of any of the same type as
+`array`, and shall have at least `size(array)/2` elements. It is an
+`intent(inout)` argument. It is intended to be used as "scratch"
+memory for internal record keeping. If associated with an array in
+static storage, its use can significantly reduce the stack memory
+requirements for the code. Its contents on return are undefined.
+
+`iwork` (optional): shall be a rank one integer array of kind
+`int_size`, and shall have at least `size(array)/2` elements. It
+is an `intent(inout)` argument.  It is intended to be used as "scratch"
+memory for internal record keeping. If associated with an array in
+static storage, its use can significantly reduce the stack memory
+requirements for the code. Its contents on return are undefined.
+
+`reverse` (optional): shall be a scalar of type default logical. It
+is an `intent(in)` argument. If present with a value of `.true.` then
+`index` will sort `array` in order of non-increasing values in stable
+order. Otherwise index will sort `array` in order of non-decreasing
+values in stable order.
+
+##### Notes
+
+`SORT_INDEX` implements the hybrid sorting algorithm of `ORD_SORT`,
+keeping the values of `index` consistent with the elements of `array`
+as it is sorted. As a `merge sort` based algorithm, it is a stable
+sorting comparison algorithm. The optional `work` and `iwork` arrays
+replace "scratch" memory that would otherwise be allocated on the
+stack. If `array` is of any kind of `REAL` the order of the elements in
+`index` and `array` on return are undefined if any element of `array`
+is a `NaN`. Sorting of `CHARACTER(*)` and `STRING_TYPE` arrays are
+based on the operator `>`, and not on the function `LGT`.
+
+It should be emphasized that the order of `array` will typically be
+different on return
+
+
+##### Examples
+
+Sorting a related rank one array:
+
+```Fortran
+    subroutine sort_related_data( a, b, work, index, iwork )
+        ! Sort `a`, and  also  sort `b` to be reorderd the same way as `a`
+        integer, intent(inout)           :: a(:)
+        integer(int32), intent(inout)    :: b(:) ! The same size as a
+		integer(int32), intent(inout)    :: work(:)
+		integer(int_size), intent(inout) :: index(:)
+		integer(int_size), intent(inout) :: iwork(:)
+		! Find the indices to sort a
+        call sort_index(a, index(1:size(a)),&
+            work(1:size(a)/2), iwork(1:size(a)/2))
+		! Sort b based on the sorting of a
+		b(:) = b( index(1:size(a)) )
+	end subroutine sort_related_data
+```
+
+Sorting a rank 2 array based on the data in a column
+
+```Fortran
+	subroutine sort_related_data( array, column, work, index, iwork )
+	    ! Reorder rows of `array` such that `array(:, column)` is  sorted
+	    integer, intent(inout)           :: array(:,:)
+		integer(int32), intent(in)       :: column
+		integer(int32), intent(inout)    :: work(:)
+		integer(int_size), intent(inout) :: index(:)
+		integer(int_size), intent(inout) :: iwork(:)
+		integer, allocatable             :: dummy(:)
+		integer :: i
+		allocate(dummy(size(array, dim=1)))
+		! Extract a column of `array`
+		dummy(:) = array(:, column)
+		! Find the indices to sort the column
+		call sort_index(dummy, index(1:size(dummy)),&
+		    work(1:size(dummy)/2), iwork(1:size(dummy)/2))
+		! Sort a based on the sorting of its column
+		do i=1, size(array, dim=2)
+		    array(:, i) = array(index(1:size(array, dim=1)), i)
+		end do
+	end subroutine sort_related_data
+```
+
+Sorting an array of a derived type based on the data in one component
+
+```fortran
+	subroutine sort_a_data( a_data, a, work, index, iwork )
+	    ! Sort `a_data` in terms or its component `a`
+	    type(a_type), intent(inout)      :: a_data(:)
+		integer(int32), intent(inout)    :: a(:)
+		integer(int32), intent(inout)    :: work(:)
+		integer(int_size), intent(inout) :: index(:)
+		integer(int_size), intent(inout) :: iwork(:)
+		! Extract a component of `a_data`
+		a(1:size(a_data)) = a_data(:) % a
+		! Find the indices to sort the component
+		call sort_index(a(1:size(a_data)), index(1:size(a_data)),&
+		    work(1:size(a_data)/2), iwork(1:size(a_data)/2))
+		! Sort a_data based on the sorting of that component
+		a_data(:) = a_data( index(1:size(a_data)) )
+	end subroutine sort_a_data
+```
+
+#### Specific procedures
+
+Usually the name of a generic procedure is the most convenient way of
+invoking it. However sometimes it is necessary to pass a procedure as
+an argument to another procedure. In that case it is usually necessary
+to know the name of the specific procedure desired. The following
+table lists the specific subroutines and the corresponding types of
+their `array` arguments.
+
+| Generic Subroutine | Specific Subroutine | Array type      |
+|--------------------|---------------------|-----------------|
+| `ORD_SORT`         | `INT8_ORD_SORT`     | `INTEGER(INT8)` |
+|                    | `INT16_ORD_SORT`    | `INTEGER(INT16)` |
+|                    | `INT32_ORD_SORT`    | `INTEGER(INT32)` |
+|                    | `INT64_ORD_SORT`    | `INTEGER(INT64)` |
+|                    | `SP_ORD_SORT`    | `REAL(SP)` |
+|                    | `DP_ORD_SORT`    | `REAL(DP)` |
+|                    | `QP_ORD_SORT`    | `REAL(QP)` |
+|                    | `CHAR_ORD_SORT`  | `CHARACTER(*)`  |
+|                    | `STRING_ORD_SORT`  | `STRING_TYPE`  |
+| `SORT`         | `INT8_SORT`     | `INTEGER(INT8)` |
+|                    | `INT16_SORT`    | `INTEGER(INT16)` |
+|                    | `INT32_SORT`    | `INTEGER(INT32)` |
+|                    | `INT64_SORT`    | `INTEGER(INT64)` |
+|                    | `SP_SORT`    | `REAL(SP)` |
+|                    | `DP_SORT`    | `REAL(DP)` |
+|                    | `QP_SORT`    | `REAL(QP)` |
+|                    | `CHAR_SORT`  | `CHARACTER(*)` |
+|                    | `STRING_SORT`  | `STRING_TYPE` |
+| `SORT_INDEX`      | `INT8_SORT_INDEX`     | `INTEGER(INT8)` |
+|                    | `INT16_SORT_INDEX`    | `INTEGER(INT16)` |
+|                    | `INT32_SORT_INDEX`    | `INTEGER(INT32)` |
+|                    | `INT64_SORT_INDEX`    | `INTEGER(INT64)` |
+|                    | `SP_SORT_INDEX`    | `REAL(SP)` |
+|                    | `DP_SORT_INDEX`    | `REAL(DP)` |
+|                    | `QP_SORT_INDEX`    | `REAL(QP)` |
+|                    | `CHAR_SORT_INDEX`    | `CHARACTER(*)` |
+|                    | `STRING_SORT_INDEX`    | `STRING_TYPE` |
+
+
+### Performance benchmarks
+
+We have performed benchmarks of the procedures on nine different
+integer arrays each of size `2**20`:
+
+* Blocks - the array is divided into six blocks, each of distinct
+  uniformly increasing integers.
+* Decreasing - values decrease uniformly from `2**20-1` to `0`.
+* Identical - all integers have the same value of 10.
+* Increasing - values increase uniformly from `0` to `2**20-1`.
+* Random dense - the integers are generated randomly from a set of
+  values from `0` to `2**18-1` so duplicates are dense.
+* Random order - a set of integers from `0` to `2**20 - 1` in random
+  order.
+* Random sparse - the integers are generated randomly from a set of
+  values from `0` to `2**22-1` so duplicates are sparse.
+* Random-3 - the increasing array has 3 random exchanges of individual
+  elements.
+* Random-10 - the final ten elements of the increasing array are
+  replaced by random values.
+
+On three different default character arrays, each of length 4 and of
+size `26**4`:
+
+* Char. Decreasing - values decrease uniformly from `"zzzz"` to
+  `"aaaa"`.
+* Char. Increasing - values decrease uniformly from `"aaaa"` to
+  `"zzzz"`.
+* Char. Random - the set of strings from `"aaaa"` to `"zzzz"` in
+  random order.
+
+On three different `string_type` arrays, each of length 4 elements and
+of size `26**3`:
+
+* String Decreasing - values decrease uniformly from `"zzz"` to
+  `"aaa"`.
+* String Increasing - values decrease uniformly from `"aaa"` to
+  `"zzz"`.
+* String Random - the set of strings from `"aaa"` to `"zzz"` in
+  random order.
+
+These benchmarks have been performed on two different compilers, both
+on a MacBook Pro, featuring a 2.3 GHz Quad-Core Intel Core i5, with 8
+GB 2133 MHz LPDDR3 memory. The first compiler was GNU Fortran
+(GCC) 10.2.0, with the following results:
+
+|    Type     | Elements |    Array Name   |    Method   |  Time (s) |
+|-------------|----------|-----------------|-------------|-----------|
+|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00738 |
+|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00380 |
+|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00220 |
+|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00209 |
+|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.17972 |
+|     Integer | 1048576  |    Random order |    Ord_Sort |   0.17503 |
+|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.17340 |
+|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00847 |
+|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00484 |
+|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00763 |
+|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00414 |
+|   Character |  456976  |    Char. Random |    Ord_Sort |   0.23746 |
+| String_type |   17576  | String Decrease |    Ord_Sort |   0.00543 |
+| String_type |   17576  | String Increase |    Ord_Sort |   0.00347 |
+| String_type |   17576  |   String Random |    Ord_Sort |   0.09461 |
+|     Integer | 1048576  |          Blocks |        Sort |   0.10556 |
+|     Integer | 1048576  |      Decreasing |        Sort |   0.13348 |
+|     Integer | 1048576  |       Identical |        Sort |   0.15719 |
+|     Integer | 1048576  |      Increasing |        Sort |   0.05316 |
+|     Integer | 1048576  |    Random dense |        Sort |   0.15047 |
+|     Integer | 1048576  |    Random order |        Sort |   0.15176 |
+|     Integer | 1048576  |   Random sparse |        Sort |   0.15767 |
+|     Integer | 1048576  |        Random 3 |        Sort |   0.19907 |
+|     Integer | 1048576  |       Random 10 |        Sort |   0.34244 |
+|   Character |  456976  |  Char. Decrease |        Sort |   0.30723 |
+|   Character |  456976  |  Char. Increase |        Sort |   0.10984 |
+|   Character |  456976  |    Char. Random |        Sort |   0.20642 |
+| String_type |   17576  | String Decrease |        Sort |   0.15101 |
+| String_type |   17576  | String Increase |        Sort |   0.05569 |
+| String_type |   17576  |   String Random |        Sort |   0.08499 |
+|     Integer | 1048576  |          Blocks |  Sort_Index |   0.01163 |
+|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00720 |
+|     Integer | 1048576  |       Identical |  Sort_Index |   0.00451 |
+|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00452 |
+|     Integer | 1048576  |    Random dense |  Sort_Index |   0.20295 |
+|     Integer | 1048576  |    Random order |  Sort_Index |   0.20190 |
+|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.20221 |
+|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.01406 |
+|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00765 |
+|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00912 |
+|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00515 |
+|   Character |  456976  |    Char. Random |  Sort_Index |   0.24693 |
+| String_type |   17576  | String Decrease |  Sort_Index |   0.00528 |
+| String_type |   17576  | String Increase |  Sort_Index |   0.00341 |
+| String_type |   17576  |   String Random |  Sort_Index |   0.09554 |
+
+The second compiler was Intel(R) Fortran Intel(R) 64 Compiler Classic
+for applications running on Intel(R) 64, Version 2021.2.0 Build
+20210228_000000, with the following results:
+
+|    Type     | Elements |   Array Name    |    Method   |  Time (s) |
+|-------------|----------|-----------------|-------------|-----------|
+|     Integer | 1048576  |          Blocks |    Ord_Sort |   0.00320 |
+|     Integer | 1048576  |      Decreasing |    Ord_Sort |   0.00142 |
+|     Integer | 1048576  |       Identical |    Ord_Sort |   0.00102 |
+|     Integer | 1048576  |      Increasing |    Ord_Sort |   0.00158 |
+|     Integer | 1048576  |    Random dense |    Ord_Sort |   0.09859 |
+|     Integer | 1048576  |    Random order |    Ord_Sort |   0.09704 |
+|     Integer | 1048576  |   Random sparse |    Ord_Sort |   0.09599 |
+|     Integer | 1048576  |        Random 3 |    Ord_Sort |   0.00396 |
+|     Integer | 1048576  |       Random 10 |    Ord_Sort |   0.00183 |
+|   Character |  456976  |  Char. Decrease |    Ord_Sort |   0.00763 |
+|   Character |  456976  |  Char. Increase |    Ord_Sort |   0.00341 |
+|   Character |  456976  |    Char. Random |    Ord_Sort |   0.21991 |
+| String_type |   17576  | String Decrease |    Ord_Sort |   0.01957 |
+| String_type |   17576  | String Increase |    Ord_Sort |   0.00573 |
+| String_type |   17576  |   String Random |    Ord_Sort |   0.37850 |
+|     Integer | 1048576  |          Blocks |        Sort |   0.03668 |
+|     Integer | 1048576  |      Decreasing |        Sort |   0.04073 |
+|     Integer | 1048576  |       Identical |        Sort |   0.03884 |
+|     Integer | 1048576  |      Increasing |        Sort |   0.01279 |
+|     Integer | 1048576  |    Random dense |        Sort |   0.06945 |
+|     Integer | 1048576  |    Random order |        Sort |   0.07151 |
+|     Integer | 1048576  |   Random sparse |        Sort |   0.07224 |
+|     Integer | 1048576  |        Random 3 |        Sort |   0.07954 |
+|     Integer | 1048576  |       Random 10 |        Sort |   0.14395 |
+|   Character |  456976  |  Char. Decrease |        Sort |   0.30367 |
+|   Character |  456976  |  Char. Increase |        Sort |   0.11316 |
+|   Character |  456976  |    Char. Random |        Sort |   0.20233 |
+| String_type |   17576  | String Decrease |        Sort |   0.64479 |
+| String_type |   17576  | String Increase |        Sort |   0.23737 |
+| String_type |   17576  |   String Random |        Sort |   0.31361 |
+|     Integer | 1048576  |          Blocks |  Sort_Index |   0.00643 |
+|     Integer | 1048576  |      Decreasing |  Sort_Index |   0.00219 |
+|     Integer | 1048576  |       Identical |  Sort_Index |   0.00126 |
+|     Integer | 1048576  |      Increasing |  Sort_Index |   0.00130 |
+|     Integer | 1048576  |    Random dense |  Sort_Index |   0.12911 |
+|     Integer | 1048576  |    Random order |  Sort_Index |   0.13024 |
+|     Integer | 1048576  |   Random sparse |  Sort_Index |   0.12956 |
+|     Integer | 1048576  |        Random 3 |  Sort_Index |   0.00781 |
+|     Integer | 1048576  |       Random 10 |  Sort_Index |   0.00281 |
+|   Character |  456976  |  Char. Decrease |  Sort_Index |   0.00779 |
+|   Character |  456976  |  Char. Increase |  Sort_Index |   0.00393 |
+|   Character |  456976  |    Char. Random |  Sort_Index |   0.22561 |
+| String_type |   17576  | String Decrease |  Sort_Index |   0.01878 |
+| String_type |   17576  | String Increase |  Sort_Index |   0.00543 |
+| String_type |   17576  |   String Random |  Sort_Index |   0.37748 |
+
+
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 96a6ebcce..92f06fc04 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -9,6 +9,10 @@ set(fppFiles
     stdlib_linalg.fypp
     stdlib_linalg_diag.fypp
     stdlib_optval.fypp
+    stdlib_sorting.fypp
+    stdlib_sorting_ord_sort.fypp
+    stdlib_sorting_sort.fypp
+    stdlib_sorting_sort_index.fypp 
     stdlib_stats.fypp
     stdlib_stats_corr.fypp
     stdlib_stats_cov.fypp
diff --git a/src/Makefile.manual b/src/Makefile.manual
index 85541e038..fb176a235 100644
--- a/src/Makefile.manual
+++ b/src/Makefile.manual
@@ -9,6 +9,10 @@ SRCFYPP =\
         stdlib_quadrature.fypp \
         stdlib_quadrature_trapz.fypp \
         stdlib_quadrature_simps.fypp \
+        stdlib_sorting.fypp \
+        stdlib_sorting_ord_sort.fypp \
+        stdlib_sorting_sort.fypp \
+        stdlib_sorting_sort_index.fypp \
         stdlib_stats.fypp \
         stdlib_stats_corr.fypp \
         stdlib_stats_cov.fypp \
@@ -79,6 +83,15 @@ stdlib_quadrature_trapz.o: \
 	stdlib_quadrature.o \
 	stdlib_error.o \
 	stdlib_kinds.o
+stdlib_sorting.o: \
+	stdlib_kinds.o \
+	stdlib_string_type.o
+stdlib_sorting_ord_sort.o: \
+	stdlib_sorting.o
+stdlib_sorting_sort.o: \
+	stdlib_sorting.o
+stdlib_sorting_sort_index.o: \
+	stdlib_sorting.o
 stdlib_stats.o: \
 	stdlib_kinds.o
 stdlib_stats_corr.o: \
diff --git a/src/stdlib_sorting.fypp b/src/stdlib_sorting.fypp
new file mode 100644
index 000000000..8908ccfb7
--- /dev/null
+++ b/src/stdlib_sorting.fypp
@@ -0,0 +1,573 @@
+#:include "common.fypp"
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! Two of the generic subroutines, `ORD_SORT` and `SORT_INDEX`, are
+!! substantially translations to Fortran 2008 of the `"Rust" sort` sorting
+!! routines in
+!! [`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+!! The `rust sort` implementation is distributed with the header:
+!!
+!!   Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+!!   file at the top-level directory of this distribution and at
+!!   http://rust-lang.org/COPYRIGHT.
+!!
+!!   Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+!!   http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+!!   <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+!!   option. This file may not be copied, modified, or distributed
+!!   except according to those terms.
+!!
+!! so the license for the original`slice.rs` code is compatible with the use
+!! of modified versions of the code in the Fortran Standard Library under
+!! the MIT license.
+!!
+!! One of the generic subroutines, `SORT`, is substantially a
+!! translation to Fortran 2008, of the `introsort` of David Musser.
+!! David Musser has given permission to include a variant of `introsort`
+!! in the Fortran Standard Library under the MIT license provided
+!! we cite:
+!!
+!!   Musser, D.R., “Introspective Sorting and Selection Algorithms,”
+!!   Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
+!!
+!! as the official source of the algorithm.
+
+module stdlib_sorting
+!! This module implements overloaded sorting subroutines named `ORD_SORT`,
+!! `SORT_INDEX`, and `SORT`, that each can be used to sort four kinds
+!! of `INTEGER` arrays and three kinds of `REAL` arrays. By default, sorting
+!! is in order of increasing value, though `SORT_INDEX` has the option of
+!! sorting in order of decresasing value. All the subroutines have worst
+!! case run time performance of `O(N Ln(N))`, but on largely sorted data
+!! `ORD_SORT` and `SORT_INDEX` can have a run time performance of `O(N)`.
+!!
+!! `ORD_SORT` is a translation of the `"Rust" sort` sorting algorithm in
+!! `slice.rs`:
+!! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs
+!! which in turn is inspired by the `timsort` algorithm of Tim Peters,
+!! http://svn.python.org/projects/python/trunk/Objects/listsort.txt.
+!! `ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
+!! and `insertion sort`. It is always at worst O(N Ln(N)) in sorting random
+!! data, having a performance about 25% slower than `SORT` on such
+!! data, but has much better performance than `SORT` on partially
+!! sorted data, having O(N) performance on uniformly non-increasing or
+!! non-decreasing data.
+!!
+!! `SORT_INDEX` is a modification of `ORD_SORT` so that in addition to
+!! sorting the input array, it returns the indices that map to a
+!! stable sort of the original array. These indices are
+!! intended to be used to sort data that is correlated with the input
+!! array, e.g., different arrays in a database, different columns of a
+!! rank 2 array, different elements of a derived type. It is less
+!! efficient than `ORD_SORT` at sorting a simple array.
+!!
+!! `SORT` uses the `INTROSORT` sorting algorithm of David Musser,
+!! http://www.cs.rpi.edu/~musser/gp/introsort.ps. `introsort` is a hybrid
+!! unstable comparison algorithm combining `quicksort`, `insertion sort`, and
+!! `heap sort`. While this algorithm is always O(N Ln(N)) it is relatively
+!! fast on randomly ordered data, but inconsistent in performance on partly
+!! sorted data, sometimes having `merge sort` performance, sometimes having
+!! better than `quicksort` performance. `UNORD_SOORT` is about 25%
+!! more efficient than `ORD_SORT` at sorting purely random data, but af an
+!! order of `Ln(N)` less efficient at sorting partially sorted data.
+
+    use stdlib_kinds, only: &
+        int8,               &
+        int16,              &
+        int32,              &
+        int64,              &
+        sp,                 &
+        dp,                 &
+        qp
+
+    use stdlib_string_type, only: string_type, assignment(=), operator(>), &
+        operator(>=), operator(<), operator(<=)
+
+    implicit none
+    private
+
+    public :: int_size
+
+    integer, parameter :: int_size = int64 !! Integer kind for indexing
+
+! Constants for use by tim_sort
+    integer, parameter :: &
+! The maximum number of entries in a run stack, good for an array of
+! 2**64 elements see
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+        max_merge_stack = int( ceiling( log( 2._dp**64 ) / &
+                               log(1.6180339887_dp) ) )
+
+    type run_type
+!! Version: experimental
+!!
+!! Used to pass state around in a stack among helper functions for the
+!! `ORD_SORT` and `SORT_INDEX` algorithms
+        integer(int_size) :: base = 0
+        integer(int_size) :: len = 0
+    end type run_type
+
+    public ord_sort
+!! Version: experimental
+!!
+!! The generic subroutine implementing the `ORD_SORT` algorithm to return
+!! an input array with its elements sorted in order of non-decreasing
+!! value. Its use has the syntax:
+!!
+!!     call ord_sort( array[, work] )
+!!
+!! with the arguments:
+!!
+!! * array: the rank 1 array to be sorted. It is an `intent(inout)`
+!!   argument of any of the types `integer(int8)`, `integer(int16)`,
+!!   `integer(int32)`, `integer(int64)`, `real(real32)`, `real(real64)`,
+!!   `real(real128)`, or `character(*)`. If both the type of `array` is
+!!   real and at least one of the elements is a `NaN`, then the ordering
+!!   of the result is undefined. Otherwise it is defined to be the
+!!   original elements in non-decreasing order.
+!!
+!! * work (optional): shall be a rank 1 array of the same type as
+!!   `array`, and shall have at least `size(array)/2` elements. It is an
+!!   `intent(inout)` argument to be used as "scratch" memory
+!!   for internal record keeping. If associated with an array in static
+!!   storage, its use can significantly reduce the stack memory requirements
+!!   for the code. Its value on return is undefined.
+!!
+!!#### Example
+!!
+!!```fortran
+!!    ...
+!!    ! Read arrays from sorted files
+!!    call read_sorted_file( 'dummy_file1', array1 )
+!!    call read_sorted_file( 'dummy_file2', array2 )
+!!    ! Concatenate the arrays
+!!    allocate( array( size(array1) + size(array2) ) )
+!!    array( 1:size(array1) ) = array1(:)
+!!    array( size(array1)+1:size(array1)+size(array2) ) = array2(:)
+!!    ! Sort the resulting array
+!!    call ord_sort( array, work )
+!!    ! Process the sorted array
+!!    call array_search( array, values )
+!!    ...
+!!```
+
+    public sort
+!! Version: experimental
+!!
+!! The generic subroutine implementing the `SORT` algorithm to return
+!! an input array with its elements sorted in order of non-decreasing
+!! value. Its use has the syntax:
+!!
+!!     call sort( array )
+!!
+!! with the arguments:
+!!
+!! * array: the rank 1 array to be sorted. It is an `intent(inout)`
+!!   argument of any of the types `integer(int8)`, `integer(int16)`,
+!!   `integer(int32)`, `integer(int64)`, `real(real32)`, `real(real64)`,
+!!   `real(real128)`, or `character(*)`. If both the type of `array` is
+!!   real and at least one of the elements is a `NaN`, then the ordering
+!!   of the result is undefined. Otherwise it is defined to be the
+!!   original elements in non-decreasing order.
+!!
+!!#### Example
+!!
+!!```fortran
+!!    ...
+!!    ! Read random data from a file
+!!    call read_file( 'dummy_file', array )
+!!    ! Sort the random data
+!!    call sort( array )
+!!    ! Process the sorted data
+!!    call array_search( array, values )
+!!    ...
+!!```
+
+    public sort_index
+!! Version: experimental
+!!
+!! The generic subroutine implementing the `SORT_INDEX` algorithm to
+!! return an index array whose elements would sort the input array in the
+!! desired direction. It is primarily intended to be used to sort a
+!! derived type array based on the values of a component of the array.
+!! Its use has the syntax:
+!!
+!!     call sort_index( array, index[, work, iwork, reverse ] )
+!!
+!! with the arguments:
+!!
+!! * array: the rank 1 array to be sorted. It is an `intent(inout)`
+!!   argument of any of the types `integer(int8)`, `integer(int16)`,
+!!   `integer(int32)`, `integer(int64)`, `real(real32)`, `real(real64)`,
+!!   `real(real128)`, or `character(*)`. If both the type of `array` is
+!!   real and at least one of the elements is a `NaN`, then the ordering
+!!   of the `array` and `index` results is undefined. Otherwise it is
+!!   defined to be as specified by reverse.
+!!
+!! * index: a rank 1 array of sorting indices. It is an `intent(inout)`
+!!   argument of the type `integer(int_size)`. Its size shall be the
+!!   same as `array`. On return, if defined, its elements would
+!!   sort the input `array` in the direction specified by `reverse`.
+!!
+!! * work (optional): shall be a rank 1 array of the same type as
+!!   `array`, and shall have at least `size(array)/2` elements. It is an
+!!   `intent(inout)` argument to be used as "scratch" memory
+!!   for internal record keeping. If associated with an array in static
+!!   storage, its use can significantly reduce the stack memory requirements
+!!   for the code. Its value on return is undefined.
+!!
+!! * iwork (optional): shall be a rank 1 integer array of kind `int_size`,
+!!   and shall have at least `size(array)/2` elements. It is an
+!!   `intent(inout)` argument to be used as "scratch" memory
+!!   for internal record keeping. If associated with an array in static
+!!   storage, its use can significantly reduce the stack memory requirements
+!!   for the code. Its value on return is undefined.
+!!
+!! * `reverse` (optional): shall be a scalar of type default logical. It
+!!   is an `intent(in)` argument. If present with a value of `.true.` then
+!!   `index` will sort `array` in order of non-increasing values in stable
+!!   order. Otherwise index will sort `array` in order of non-decreasing
+!!   values in stable order.
+!!
+!!#### Examples
+!!
+!! Sorting a related rank one array:
+!!
+!!```Fortran
+!!    subroutine sort_related_data( a, b, work, index, iwork )
+!!        ! Sort `b` in terms or its related array `a`
+!!        integer, intent(inout)           :: a(:)
+!!        integer(int32), intent(inout)    :: b(:) ! The same size as a
+!!        integer(int32), intent(inout)    :: work(:)
+!!        integer(int_size), intent(inout) :: index(:)
+!!        integer(int_size), intent(inout) :: iwork(:)
+!!    ! Find the indices to sort a
+!!        call sort_index(a, index(1:size(a)),&
+!!            work(1:size(a)/2), iwork(1:size(a)/2))
+!!    ! Sort b based on the sorting of a
+!!        b(:) = b( index(1:size(a)) )
+!!    end subroutine sort_related_data
+!!```
+!!
+!! Sorting a rank 2 array based on the data in a column
+!!
+!!```Fortran
+!!    subroutine sort_related_data( array, column, work, index, iwork )
+!!    ! Sort `a_data` in terms or its component `a`
+!!        integer, intent(inout)           :: a(:,:)
+!!        integer(int32), intent(in)       :: column
+!!        integer(int32), intent(inout)    :: work(:)
+!!        integer(int_size), intent(inout) :: index(:)
+!!        integer(int_size), intent(inout) :: iwork(:)
+!!        integer, allocatable             :: dummy(:)
+!!        integer :: i
+!!        allocate(dummy(size(a, dim=1)))
+!!    ! Extract a component of `a_data`
+!!        dummy(:) = a(:, column)
+!!    ! Find the indices to sort the column
+!!        call sort_index(dummy, index(1:size(dummy)),&
+!!                        work(1:size(dummy)/2), iwork(1:size(dummy)/2))
+!!    ! Sort a based on the sorting of its column
+!!        do i=1, size(a, dim=2)
+!!            a(:, i) = a(index(1:size(a, dim=1)), i)
+!!        end do
+!!    end subroutine sort_related_data
+!!```
+!!
+!! Sorting an array of a derived type based on the dsta in one component
+!!```fortran
+!!    subroutine sort_a_data( a_data, a, work, index, iwork )
+!!    ! Sort `a_data` in terms or its component `a`
+!!        type(a_type), intent(inout)      :: a_data(:)
+!!        integer(int32), intent(inout)    :: a(:)
+!!        integer(int32), intent(inout)    :: work(:)
+!!        integer(int_size), intent(inout) :: index(:)
+!!        integer(int_size), intent(inout) :: iwork(:)
+!!    ! Extract a component of `a_data`
+!!        a(1:size(a_data)) = a_data(:) % a
+!!    ! Find the indices to sort the component
+!!        call sort_index(a(1:size(a_data)), index(1:size(a_data)),&
+!!                        work(1:size(a_data)/2), iwork(1:size(a_data)/2))
+!!    ! Sort a_data based on the sorting of that component
+!!        a_data(:) = a_data( index(1:size(a_data)) )
+!!    end subroutine sort_a_data
+!!```
+
+#:for k1 in INT_KINDS
+    public ${k1}$_ord_sort
+!! Version: experimental
+!!
+!! `${k1}$_ord_sort` is the specific `ORD_SORT` subroutine for an `array`
+!! argument of type `integer(${k1}$)`.
+
+    public ${k1}$_sort
+!! Version: experimental
+!!
+!! `${k1}$_sort` is the specific `SORT` subroutine for an `array`
+!! argument of type `integer(${k1}$)`.
+
+    public ${k1}$_sort_index
+!! Version: experimental
+!!
+!! `${k1}$_sort_index` is the specific `SORT_INDEX` subroutine for an `array`
+!! argument of type `integer(${k1}$)`.
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+    public ${k1}$_ord_sort
+!! Version: experimental
+!!
+!! `${k1}$_ord_sort` is the specific `ORD_SORT` subroutine for an `array`
+!! argument of type `real(${k1}$)`.
+
+    public ${k1}$_sort
+!! Version: experimental
+!!
+!! `${k1}$_sort` is the specific `SORT` subroutine for an `array`
+!! argument of type `real(${k1}$)`.
+
+    public ${k1}$_sort_index
+!! Version: experimental
+!!
+!! `${k1}$_sort_index` is the specific `SORT_INDEX` subroutine for an `array`
+!! argument of type `real(${k1}$)`.
+
+#:endfor
+
+    public char_ord_sort
+
+    public char_sort
+
+    public char_sort_index
+
+    public string_ord_sort
+
+    public string_sort
+
+    public string_sort_index
+
+    interface ord_sort
+!! Version: experimental
+!!
+!! The generic subroutine interface implementing the `ORD_SORT` algorithm,
+!! a translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+!! `slice.rs`
+!! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+!! `ORD_SORT` is a hybrid stable comparison algorithm combining `merge sort`,
+!! and `insertion sort`. It is always at worst O(N Ln(N)) in sorting random
+!! data, having a performance about 25% slower than `SORT` on such
+!! data, but has much better performance than `SORT` on partially
+!! sorted data, having O(N) performance on uniformly non-increasing or
+!! non-decreasing data.
+
+#:for k1 in INT_KINDS
+        module subroutine ${k1}$_ord_sort( array, work )
+!! Version: experimental
+!!
+!! `${k1}$_ord_sort( array )` sorts the input `ARRAY` of type `INTEGER(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            integer(${k1}$), intent(inout)           :: array(0:)
+            integer(${k1}$), intent(inout), optional :: work(0:)
+        end subroutine ${k1}$_ord_sort
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+        module subroutine ${k1}$_ord_sort( array, work )
+ !! Version: experimental
+!!
+!! `${k1}$_ord_sort( array )` sorts the input `ARRAY` of type `REAL(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            real(${k1}$), intent(inout)           :: array(0:)
+            real(${k1}$), intent(inout), optional :: work(0:)
+        end subroutine ${k1}$_ord_sort
+
+#:endfor
+        module subroutine char_ord_sort( array, work )
+!! Version: experimental
+!!
+!! `char_ord_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            character(len=*), intent(inout)                    :: array(0:)
+            character(len=len(array)), intent(inout), optional :: work(0:)
+        end subroutine char_ord_sort
+
+        module subroutine string_ord_sort( array, work )
+!! Version: experimental
+!!
+!! `char_ord_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+            type(string_type), intent(inout)           :: array(0:)
+            type(string_type), intent(inout), optional :: work(0:)
+        end subroutine string_ord_sort
+
+    end interface ord_sort
+
+    interface sort
+!! Version: experimental
+!!
+!! The generic subroutine interface implementing the `SORT` algorithm, based
+!! on the `introsort` of David Musser.
+
+#:for k1 in INT_KINDS
+        pure module subroutine ${k1}$_sort( array )
+!! Version: experimental
+!!
+!! `${k1}$_sort( array )` sorts the input `ARRAY` of type `INTEGER(${k1}$)`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            integer(${k1}$), intent(inout) :: array(0:)
+        end subroutine ${k1}$_sort
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+        pure module subroutine ${k1}$_sort( array )
+!! Version: experimental
+!!
+!! `${k1}$_sort( array )` sorts the input `ARRAY` of type `REAL(${k1}$)`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            real(${k1}$), intent(inout) :: array(0:)
+        end subroutine ${k1}$_sort
+
+#:endfor
+
+        pure module subroutine char_sort( array )
+!! Version: experimental
+!!
+!! `char_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            character(len=*), intent(inout) :: array(0:)
+        end subroutine char_sort
+
+        pure module subroutine string_sort( array )
+!! Version: experimental
+!!
+!! `string_sort( array )` sorts the input `ARRAY` of type `STRING_TYPE`
+!! using a hybrid sort based on the `introsort` of David Musser.
+!! The algorithm is of order O(N Ln(N)) for all inputs.
+!! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+!! behavior is small for random data compared to other sorting algorithms.
+            type(string_type), intent(inout) :: array(0:)
+        end subroutine string_sort
+
+    end interface sort
+
+    interface sort_index
+!! Version: experimental
+!!
+!! The generic subroutine interface implementing the `SORT_INDEX` algorithm,
+!! based on the `"Rust" sort` algorithm found in `slice.rs`
+!! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+!! but modified to return an array of indices that would provide a stable
+!! sort of the rank one `ARRAY` input. The indices by default correspond to a
+!! non-decreasing sort, but if the optional argument `REVERSE` is present
+!! with a value of `.TRUE.` the indices correspond to a non-increasing sort.
+
+#:for k1 in INT_KINDS
+        module subroutine ${k1}$_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `${k1}$_sort_index( array )` sorts an input `ARRAY` of type `INTEGER(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            integer(${k1}$), intent(inout)             :: array(0:)
+            integer(int_size), intent(inout)           :: index(0:)
+            integer(${k1}$), intent(inout), optional   :: work(0:)
+            integer(int_size), intent(inout), optional :: iwork(0:)
+            logical, intent(in), optional              :: reverse
+        end subroutine ${k1}$_sort_index
+
+#:endfor
+
+#:for k1 in REAL_KINDS
+        module subroutine ${k1}$_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `${k1}$_sort_index( array )` sorts an input `ARRAY` of type `REAL(${k1}$)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            real(${k1}$), intent(inout)                :: array(0:)
+            integer(int_size), intent(inout)           :: index(0:)
+            real(${k1}$), intent(inout), optional      :: work(0:)
+            integer(int_size), intent(inout), optional :: iwork(0:)
+            logical, intent(in), optional              :: reverse
+        end subroutine ${k1}$_sort_index
+
+#:endfor
+        module subroutine char_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `char_sort_index( array )` sorts an input `ARRAY` of type `CHARACTER(*)`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            character(len=*), intent(inout)                    :: array(0:)
+            integer(int_size), intent(inout)                   :: index(0:)
+            character(len=len(array)), intent(inout), optional :: work(0:)
+            integer(int_size), intent(inout), optional         :: iwork(0:)
+            logical, intent(in), optional                      :: reverse
+        end subroutine char_sort_index
+
+        module subroutine string_sort_index( array, index, work, iwork, &
+            reverse )
+!! Version: experimental
+!!
+!! `string_sort_index( array )` sorts an input `ARRAY` of type `STRING_TYPE`
+!! using a hybrid sort based on the `'Rust" sort` algorithm found in `slice.rs`
+!! and returns the sorted `ARRAY` and an array `INDEX of indices in the
+!! order that would sort the input `ARRAY` in the desired direction.
+            type(string_type), intent(inout)           :: array(0:)
+            integer(int_size), intent(inout)           :: index(0:)
+            type(string_type), intent(inout), optional :: work(0:)
+            integer(int_size), intent(inout), optional :: iwork(0:)
+            logical, intent(in), optional              :: reverse
+        end subroutine string_sort_index
+
+    end interface sort_index
+
+
+end module stdlib_sorting
diff --git a/src/stdlib_sorting_ord_sort.fypp b/src/stdlib_sorting_ord_sort.fypp
new file mode 100644
index 000000000..eb0e58480
--- /dev/null
+++ b/src/stdlib_sorting_ord_sort.fypp
@@ -0,0 +1,1419 @@
+#! Integer kinds to be considered during templating
+#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
+
+#! Real kinds to be considered during templating
+#:set REAL_KINDS = ["sp", "dp", "qp"]
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! The generic subroutine, `ORD_SORT`, is substantially a translation to
+!! Fortran 2008 of the `"Rust" sort` sorting routines in
+!! [`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+!! The `rust sort` implementation is distributed with the header:
+!!
+!!   Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+!!   file at the top-level directory of this distribution and at
+!!   http://rust-lang.org/COPYRIGHT.
+!!
+!!   Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+!!   http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+!!   <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+!!   option. This file may not be copied, modified, or distributed
+!!   except according to those terms.
+!!
+!! so the license for the original`slice.rs` code is compatible with the use
+!! of modified versions of the code in the Fortran Standard Library under
+!! the MIT license.
+
+submodule(stdlib_sorting) stdlib_sorting_ord_sort
+
+    implicit none
+
+contains
+
+#:for k1 in INT_KINDS
+
+    module subroutine ${k1}$_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initially processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        integer(${k1}$), intent(inout)           :: array(0:)
+        integer(${k1}$), intent(inout), optional :: work(0:)
+
+        integer(${k1}$), allocatable :: buf(:)
+        integer(int_size) :: array_size
+        integer :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            integer(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(${k1}$) :: tmp
+            integer(int_size) :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(in)  :: mid
+            integer(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: lo, hi
+            integer(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_ord_sort
+
+#:endfor
+
+
+#:for k1 in REAL_KINDS
+
+    module subroutine ${k1}$_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initially processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        real(${k1}$), intent(inout)           :: array(0:)
+        real(${k1}$), intent(inout), optional :: work(0:)
+
+        real(${k1}$), allocatable :: buf(:)
+        integer(int_size) :: array_size
+        integer :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            real(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            real(${k1}$), intent(inout) :: array(0:)
+
+            real(${k1}$) :: tmp
+            integer(int_size) :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            real(${k1}$), intent(inout) :: array(0:)
+            real(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(in)  :: mid
+            real(${k1}$), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: lo, hi
+            real(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_ord_sort
+
+#:endfor
+
+
+    module subroutine char_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initially processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        character(len=*), intent(inout)                    :: array(0:)
+        character(len=len(array)), intent(inout), optional :: work(0:)
+
+        character(len=:), allocatable :: buf(:)
+        integer(int_size) :: array_size
+        integer :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( character(len=len(array)) :: buf(0:array_size/2-1), &
+                      stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: i, j
+            character(len=len(array)) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            character(len=*), intent(inout) :: array(0:)
+
+            character(len=len(array)) :: tmp
+            integer(int_size)         :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            character(len=*), intent(inout)          :: array(0:)
+            character(len=len(array)), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            character(len=*), intent(inout)          :: array(0:)
+            integer(int_size), intent(in)            :: mid
+            character(len=len(array)), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: lo, hi
+            character(len=len(array)) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine char_ord_sort
+
+
+    module subroutine string_ord_sort( array, work )
+! A translation to Fortran 2008, of the `"Rust" sort` algorithm found in
+! `slice.rs`
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version in turn is a simplification of the Timsort algorithm
+! described in
+! https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size and
+! initially processed by an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original `listsort.txt`, and an optional `work` array to be used as
+! scratch memory.
+        type(string_type), intent(inout)           :: array(0:)
+        type(string_type), intent(inout), optional :: work(0:)
+
+        type(string_type), allocatable :: buf(:)
+        integer(int_size)              :: array_size
+        integer                        :: stat
+
+        if ( present(work) ) then
+! Use the work array as scratch memory
+            call merge_sort( array, work )
+        else
+! Allocate a buffer to use as scratch memory.
+            array_size = size( array, kind=int_size )
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of buffer failed."
+            call merge_sort( array, buf )
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array )
+! Sorts `ARRAY` using an insertion sort.
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            type(string_type) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical           :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+
+            type(string_type), intent(inout) :: array(0:)
+
+            type(string_type) :: tmp
+            integer(int_size) :: i
+
+            tmp = array(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+            end do find_hole
+            array(i-1) = tmp
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, buf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case.
+
+            type(string_type), intent(inout) :: array(0:)
+            type(string_type), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type)    :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least
+! min_run elements. Slices of up to this length are sorted using insertion
+! sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array )
+                return
+            end if
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                       right % base + right % len - 1 ), &
+                                left % len, buf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            type(string_type), intent(inout) :: buf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter ! check that it is stable
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array )
+! Reverse a segment of an array in place
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: lo, hi
+            type(string_type) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine string_ord_sort
+
+end submodule stdlib_sorting_ord_sort
+
diff --git a/src/stdlib_sorting_sort.fypp b/src/stdlib_sorting_sort.fypp
new file mode 100644
index 000000000..d0b803bf9
--- /dev/null
+++ b/src/stdlib_sorting_sort.fypp
@@ -0,0 +1,740 @@
+#! Integer kinds to be considered during templating
+#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
+
+#! Real kinds to be considered during templating
+#:set REAL_KINDS = ["sp", "dp", "qp"]
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! The generic subroutine, `SORT`, is substantially a
+!! translation to Fortran 2008, of the `introsort` of David Musser.
+!! David Musser has given permission to include a variant of `introsort`
+!! in the Fortran Standard Library under the MIT license provided
+!! we cite:
+!!
+!!   Musser, D.R., “Introspective Sorting and Selection Algorithms,”
+!!   Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
+!!
+!! as the official source of the algorithm.
+
+submodule(stdlib_sorting) stdlib_sorting_sort
+!! This submodule implements the overloaded sorting subroutine `SORT`
+!! that can be used to sort four kinds of `INTEGER` arrays and three kinds
+!! of `REAL` arrays. Sorting is in order of increasing value, with the worst
+!! case run time performance of `O(N Ln(N))`.
+!!
+!! `SORT` uses the `INTROSORT` sorting algorithm of David Musser,
+!! http://www.cs.rpi.edu/~musser/gp/introsort.ps. `introsort` is a hybrid
+!! unstable comparison algorithm combining `quicksort`, `insertion sort`, and
+!! `heap sort`. While this algorithm is always O(N Ln(N)) it is relatively
+!! fast on randomly ordered data, but inconsistent in performance on partly
+!! sorted data, sometimes having `merge sort` performance, sometimes having
+!! better than `quicksort` performance.
+
+    implicit none
+
+contains
+
+
+#:for k1 in INT_KINDS
+
+    pure module subroutine ${k1}$_sort( array )
+! `${k1}$_sort( array )` sorts the input `ARRAY` of type `INTEGER(${k1}$)`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `${k1}$_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        integer(${k1}$), intent(inout) :: array(0:)
+
+        integer(int32) :: depth_limit
+
+        depth_limit = 2 * int( floor( log( real( size( array, kind=int64 ),  &
+                                                 kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion
+! of the `quicksort` is too slow as estimated from `DEPTH_LIMIT`,
+! otherwise sorting is done by a `quicksort`.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int32), intent(in)     :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(out) :: index
+
+            integer(${k1}$) :: u, v, w, x, y
+            integer(int_size) :: i, j
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array( size( array, kind=int_size )/2-1 ) = &
+                    array( size( array, kind=int_size )-1 )
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array.
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            integer(${k1}$)   :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            integer(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, heap_size
+            integer(${k1}$)   :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(in)  :: i, heap_size
+
+            integer(int_size) :: l, r, largest
+            integer(${k1}$)   :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine ${k1}$_sort
+
+#:endfor
+
+
+#:for k1 in REAL_KINDS
+
+    pure module subroutine ${k1}$_sort( array )
+
+! `${k1}$_sort( array )` sorts the input `ARRAY` of type `REAL(${k1}$)`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `${k1}$_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        real(${k1}$), intent(inout) :: array(0:)
+        integer(int32)              :: depth_limit
+
+        depth_limit = &
+            2 * int( floor( log( real( size( array, kind=int_size ),   &
+                                             kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion of
+! the quicksort is too slow as estimated from `DEPTH_LIMIT`, otherwise
+! sorting is done by a `quicksort`.
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int32), intent(in)  :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            real(${k1}$), intent(inout)    :: array(0:)
+            integer(int_size), intent(out) :: index
+
+            real(${k1}$)      :: u, v, w, x, y
+            integer(int_size) :: i, j
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array(size( array, kind=int_size )/2-1) = &
+                    array(size( array, kind=int_size )-1)
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            real(${k1}$)      :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            real(${k1}$), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, heap_size
+            real(${k1}$)      :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(in) :: i, heap_size
+
+            integer(int_size) :: l, r, largest
+            real(${k1}$)      :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine ${k1}$_sort
+
+#:endfor
+
+
+    pure module subroutine char_sort( array )
+! `char_sort( array )` sorts the input `ARRAY` of type `CHARACTER(*)`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `char_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        character(len=*), intent(inout) :: array(0:)
+
+        integer(int32) :: depth_limit
+
+        depth_limit = 2 * int( floor( log( real( size( array, kind=int64 ),  &
+                                                 kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion
+! of the `quicksort` is too slow as estimated from `DEPTH_LIMIT`,
+! otherwise sorting is done by a `quicksort`.
+            character(len=*), intent(inout) :: array(0:)
+            integer(int32), intent(in)      :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            character(len=*), intent(inout) :: array(0:)
+            integer(int_size), intent(out)  :: index
+
+            integer(int_size)         :: i, j
+            character(len=len(array)) :: u, v, w, x, y
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array( size( array, kind=int_size )/2-1 ) = &
+                    array( size( array, kind=int_size )-1 )
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array.
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: i, j
+            character(len=len(array)) :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            character(len=*), intent(inout) :: array(0:)
+
+            integer(int_size)         :: i, heap_size
+            character(len=len(array)) :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            character(len=*), intent(inout) :: array(0:)
+            integer(int_size), intent(in)   :: i, heap_size
+
+            integer(int_size)         :: l, r, largest
+            character(len=len(array)) :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine char_sort
+
+    pure module subroutine string_sort( array )
+! `string_sort( array )` sorts the input `ARRAY` of type `STRING_TyPE`
+! using a hybrid sort based on the `introsort` of David Musser. As with
+! `introsort`, `string_sort( array )` is an unstable hybrid comparison
+! algorithm using `quicksort` for the main body of the sort tree,
+! supplemented by `insertion sort` for the outer brances, but if
+! `quicksort` is converging too slowly the algorithm resorts
+! to `heapsort`. The algorithm is of order O(N Ln(N)) for all inputs.
+! Because it relies on `quicksort`, the coefficient of the O(N Ln(N))
+! behavior is typically small compared to other sorting algorithms.
+
+        type(string_type), intent(inout) :: array(0:)
+
+        integer(int32) :: depth_limit
+
+        depth_limit = 2 * int( floor( log( real( size( array, kind=int64 ),  &
+                                                 kind=dp) ) / log(2.0_dp) ), &
+                               kind=int32 )
+        call introsort(array, depth_limit)
+
+    contains
+
+        pure recursive subroutine introsort( array, depth_limit )
+! It devolves to `insertionsort` if the remaining number of elements
+! is fewer than or equal to `INSERT_SIZE`, `heapsort` if the completion
+! of the `quicksort` is too slow as estimated from `DEPTH_LIMIT`,
+! otherwise sorting is done by a `quicksort`.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int32), intent(in)      :: depth_limit
+
+            integer(int_size), parameter :: insert_size = 16_int_size
+            integer(int_size)            :: index
+
+            if ( size(array, kind=int_size) <= insert_size ) then
+                ! May be best at the end of SORT processing the whole array
+                ! See Musser, D.R., “Introspective Sorting and Selection
+                ! Algorithms,” Software—Practice and Experience, Vol. 27(8),
+                ! 983–993 (August 1997).
+
+                call insertion_sort( array )
+            else if ( depth_limit == 0 ) then
+                call heap_sort( array )
+            else
+                call partition( array, index )
+                call introsort( array(0:index-1), depth_limit-1 )
+                call introsort( array(index+1:), depth_limit-1 )
+            end if
+
+        end subroutine introsort
+
+
+        pure subroutine partition( array, index )
+! quicksort partition using median of three.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(out)  :: index
+
+            integer(int_size) :: i, j
+            type(string_type) :: u, v, w, x, y
+
+! Determine median of three and exchange it with the end.
+            u = array( 0 )
+            v = array( size(array, kind=int_size)/2-1 )
+            w = array( size(array, kind=int_size)-1 )
+            if ( (u > v) .neqv. (u > w) ) then
+                x = u
+                y = array(0)
+                array(0) = array( size( array, kind=int_size ) - 1 )
+                array( size( array, kind=int_size ) - 1 ) = y
+            else if ( (v < u) .neqv. (v < w) ) then
+                x = v
+                y = array(size( array, kind=int_size )/2-1)
+                array( size( array, kind=int_size )/2-1 ) = &
+                    array( size( array, kind=int_size )-1 )
+                array( size( array, kind=int_size )-1 ) = y
+            else
+                x = w
+            end if
+! Partition the array.
+            i = -1_int_size
+            do j = 0_int_size, size(array, kind=int_size)-2
+                if ( array(j) <= x ) then
+                    i = i + 1
+                    y = array(i)
+                    array(i) = array(j)
+                    array(j) = y
+                end if
+            end do
+            y = array(i+1)
+            array(i+1) = array(size(array, kind=int_size)-1)
+            array(size(array, kind=int_size)-1) = y
+            index = i + 1
+
+        end subroutine partition
+
+        pure subroutine insertion_sort( array )
+! Bog standard insertion sort.
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, j
+            type(string_type) :: key
+
+            do j=1_int_size, size(array, kind=int_size)-1
+                key = array(j)
+                i = j - 1
+                do while( i >= 0 )
+                    if ( array(i) <= key ) exit
+                    array(i+1) = array(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+            end do
+
+        end subroutine insertion_sort
+
+        pure subroutine heap_sort( array )
+! A bog standard heap sort
+            type(string_type), intent(inout) :: array(0:)
+
+            integer(int_size) :: i, heap_size
+            type(string_type) :: y
+
+            heap_size = size( array, kind=int_size )
+! Build the max heap
+            do i = (heap_size-2)/2_int_size, 0_int_size, -1_int_size
+                call max_heapify( array, i, heap_size )
+            end do
+            do i = heap_size-1, 1_int_size, -1_int_size
+! Swap the first element with the current final element
+                y = array(0)
+                array(0) = array(i)
+                array(i) = y
+! Sift down using max_heapify
+                call max_heapify( array, 0_int_size, i )
+            end do
+
+        end subroutine heap_sort
+
+        pure recursive subroutine max_heapify( array, i, heap_size )
+! Transform the array into a max heap
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(in)    :: i, heap_size
+
+            integer(int_size) :: l, r, largest
+            type(string_type) :: y
+
+            largest = i
+            l = 2_int_size * i + 1_int_size
+            r = l + 1_int_size
+            if ( l < heap_size ) then
+                if ( array(l) > array(largest) ) largest = l
+            end if
+            if ( r < heap_size ) then
+                if ( array(r) > array(largest) ) largest = r
+            end if
+            if ( largest /= i ) then
+                y = array(i)
+                array(i) = array(largest)
+                array(largest) = y
+                call max_heapify( array, largest, heap_size )
+            end if
+
+        end subroutine max_heapify
+
+    end subroutine string_sort
+
+end submodule stdlib_sorting_sort
diff --git a/src/stdlib_sorting_sort_index.fypp b/src/stdlib_sorting_sort_index.fypp
new file mode 100644
index 000000000..222c2cd97
--- /dev/null
+++ b/src/stdlib_sorting_sort_index.fypp
@@ -0,0 +1,1688 @@
+#! Integer kinds to be considered during templating
+#:set INT_KINDS = ["int8", "int16", "int32", "int64"]
+
+#! Real kinds to be considered during templating
+#:set REAL_KINDS = ["sp", "dp", "qp"]
+
+!! Licensing:
+!!
+!! This file is subjec† both to the Fortran Standard Library license, and
+!! to additional licensing requirements as it contains translations of
+!! other software.
+!!
+!! The Fortran Standard Library, including this file, is distributed under
+!! the MIT license that should be included with the library's distribution.
+!!
+!!   Copyright (c) 2021 Fortran stdlib developers
+!!
+!!   Permission is hereby granted, free of charge, to any person obtaining a
+!!   copy of this software and associated documentation files (the
+!!   "Software"),  to deal in the Software without restriction, including
+!!   without limitation the rights to use, copy, modify, merge, publish,
+!!   distribute, sublicense, and/or sellcopies of the Software, and to permit
+!!   persons to whom the Software is furnished to do so, subject to the
+!!   following conditions:
+!!
+!!   The above copyright notice and this permission notice shall be included
+!!   in all copies or substantial portions of the Software.
+!!
+!!   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+!!   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+!!   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+!!   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+!!   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+!!
+!! The generic subroutine, `SORT_INDEX`, is substantially a translation to
+!! Fortran 2008 of the `"Rust" sort` sorting routines in
+!! [`slice.rs`](https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs)
+!! The `rust sort` implementation is distributed with the header:
+!!
+!!   Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
+!!   file at the top-level directory of this distribution and at
+!!   http://rust-lang.org/COPYRIGHT.
+!!
+!!   Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
+!!   http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
+!!   <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
+!!   option. This file may not be copied, modified, or distributed
+!!   except according to those terms.
+!!
+!! so the license for the original`slice.rs` code is compatible with the use
+!! of modified versions of the code in the Fortran Standard Library under
+!! the MIT license.
+
+submodule(stdlib_sorting) stdlib_sorting_sort_index
+
+    implicit none
+
+contains
+
+#:for k1 in INT_KINDS
+
+    module subroutine ${k1}$_sort_index( array, index, work, iwork, reverse )
+! A modification of `${k1}$_ord_sort` to return an array of indices that
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        integer(${k1}$), intent(inout)             :: array(0:)
+        integer(int_size), intent(inout)           :: index(0:)
+        integer(${k1}$), intent(inout), optional   :: work(0:)
+        integer(int_size), intent(inout), optional :: iwork(0:)
+        logical, intent(in), optional              :: reverse
+
+        integer(int_size) :: array_size, i, stat
+        integer(${k1}$), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: i, j, key_index
+            integer(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            integer(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(${k1}$) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            integer(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            integer(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            integer(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            integer(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: itemp, lo, hi
+            integer(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_sort_index
+
+#:endfor
+
+
+#:for k1 in REAL_KINDS
+
+    module subroutine ${k1}$_sort_index( array, index, work, iwork, reverse )
+! A modification of `${k1}$_ord_sort` to return an array of indices that
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        real(${k1}$), intent(inout)             :: array(0:)
+        integer(int_size), intent(inout)           :: index(0:)
+        real(${k1}$), intent(inout), optional   :: work(0:)
+        integer(int_size), intent(inout), optional :: iwork(0:)
+        logical, intent(in), optional              :: reverse
+
+        integer(int_size) :: array_size, i, stat
+        real(${k1}$), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: i, j, key_index
+            real(${k1}$) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            real(${k1}$) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            real(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            real(${k1}$), intent(inout)   :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            real(${k1}$), intent(inout)   :: buf(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            real(${k1}$), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: itemp, lo, hi
+            real(${k1}$) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine ${k1}$_sort_index
+
+#:endfor
+
+    module subroutine char_sort_index( array, index, work, iwork, reverse )
+! A modification of `char_ord_sort` to return an array of indices that
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        character(len=*), intent(inout)                      :: array(0:)
+        integer(int_size), intent(inout)                     :: index(0:)
+        character(len=len(array)), intent(inout), optional   :: work(0:)
+        integer(int_size), intent(inout), optional           :: iwork(0:)
+        logical, intent(in), optional                        :: reverse
+
+        integer(int_size) :: array_size, i, stat
+        character(len=:), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( character(len=len(array)) :: buf(0:array_size/2-1), &
+                      stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            character(len=*), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size)         :: i, j, key_index
+            character(len=len(array)) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            character(len=*), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            character(len=len(array)) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            character(len=*), intent(inout)          :: array(0:)
+            integer(int_size), intent(inout)         :: index(0:)
+            character(len=len(array)), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout)         :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type) :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            character(len=*), intent(inout)          :: array(0:)
+            integer(int_size), intent(in)            :: mid
+            character(len=len(array)), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout)         :: index(0:)
+            integer(int_size), intent(inout)         :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            character(len=*), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size)         :: itemp, lo, hi
+            character(len=len(array)) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine char_sort_index
+
+    module subroutine string_sort_index( array, index, work, iwork, reverse )
+! A modification of `string_ord_sort` to return an array of indices that
+! would perform a stable sort of the `ARRAY` as input, and also sort `ARRAY`
+! as desired. The indices by default
+! correspond to a non-decreasing sort, but if the optional argument
+! `REVERSE` is present with a value of `.TRUE.` the indices correspond to
+! a non-increasing sort. The logic of the determination of indexing largely
+! follows the `"Rust" sort` found in `slice.rs`:
+! https://github.com/rust-lang/rust/blob/90eb44a5897c39e3dff9c7e48e3973671dcd9496/src/liballoc/slice.rs#L2159
+! The Rust version is a simplification of the Timsort algorithm described
+! in https://svn.python.org/projects/python/trunk/Objects/listsort.txt, as
+! it drops both the use of 'galloping' to identify bounds of regions to be
+! sorted and the estimation of the optimal `run size`. However it remains
+! a hybrid sorting algorithm combining an iterative Merge sort controlled
+! by a stack of `RUNS` identified by regions of uniformly decreasing or
+! non-decreasing sequences that may be expanded to a minimum run size, with
+! an insertion sort.
+!
+! Note the Fortran implementation simplifies the logic as it only has to
+! deal with Fortran arrays of intrinsic types and not the full generality
+! of Rust's arrays and lists for arbitrary types. It also adds the
+! estimation of the optimal `run size` as suggested in Tim Peters'
+! original listsort.txt, and the optional `work` and `iwork` arraya to be
+! used as scratch memory.
+
+        type(string_type), intent(inout)           :: array(0:)
+        integer(int_size), intent(inout)           :: index(0:)
+        type(string_type), intent(inout), optional :: work(0:)
+        integer(int_size), intent(inout), optional :: iwork(0:)
+        logical, intent(in), optional              :: reverse
+
+        integer(int_size)              :: array_size, i, stat
+        type(string_type), allocatable :: buf(:)
+        integer(int_size), allocatable :: ibuf(:)
+
+        array_size = size(array, kind=int_size)
+
+        do i = 0, array_size-1
+            index(i) = i+1
+        end do
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+! If necessary allocate buffers to serve as scratch memory.
+        if ( present(work) ) then
+            if ( present(iwork) ) then
+                call merge_sort( array, index, work, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, work, ibuf )
+            end if
+        else
+            allocate( buf(0:array_size/2-1), stat=stat )
+            if ( stat /= 0 ) error stop "Allocation of array buffer failed."
+            if ( present(iwork) ) then
+                call merge_sort( array, index, buf, iwork )
+            else
+                allocate( ibuf(0:array_size/2-1), stat=stat )
+                if ( stat /= 0 ) error stop "Allocation of index buffer failed."
+                call merge_sort( array, index, buf, ibuf )
+            end if
+        end if
+
+        if ( present(reverse) ) then
+            if ( reverse ) then
+                call reverse_segment( array, index )
+            end if
+        end if
+
+    contains
+
+        pure function calc_min_run( n ) result(min_run)
+!! Returns the minimum length of a run from 32-63 so that N/MIN_RUN is
+!! less than or equal to a power of two. See
+!! https://svn.python.org/projects/python/trunk/Objects/listsort.txt
+            integer(int_size)             :: min_run
+            integer(int_size), intent(in) :: n
+
+            integer(int_size) :: num, r
+
+            num = n
+            r = 0_int_size
+
+            do while( num >= 64 )
+                r = ior( r, iand(num, 1_int_size) )
+                num = ishft(num, -1_int_size)
+            end do
+            min_run = num + r
+
+        end function calc_min_run
+
+
+        pure subroutine insertion_sort( array, index )
+! Sorts `ARRAY` using an insertion sort, while maintaining consistency in
+! location of the indices in `INDEX` to the elements of `ARRAY`.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: i, j, key_index
+            type(string_type) :: key
+
+            do j=1, size(array, kind=int_size)-1
+                key = array(j)
+                key_index = index(j)
+                i = j - 1
+                do while( i >= 0 .and. array(i) > key )
+                    array(i+1) = array(i)
+                    index(i+1) = index(i)
+                    i = i - 1
+                end do
+                array(i+1) = key
+                index(i+1) = key_index
+            end do
+
+        end subroutine insertion_sort
+
+
+        pure function collapse( runs ) result ( r )
+! Examine the stack of runs waiting to be merged, identifying adjacent runs
+! to be merged until the stack invariants are restablished:
+!
+! 1. len(-3) > len(-2) + len(-1)
+! 2. len(-2) > len(-1)
+
+            integer(int_size) :: r
+            type(run_type), intent(in), target :: runs(0:)
+
+            integer(int_size) :: n
+            logical :: test
+
+            n = size(runs, kind=int_size)
+            test = .false.
+            if (n >= 2) then
+                if ( runs( n-1 ) % base == 0 .or. &
+                     runs( n-2 ) % len <= runs(n-1) % len ) then
+                    test = .true.
+                else if ( n >= 3 ) then ! X exists
+                    if ( runs(n-3) % len <= &
+                         runs(n-2) % len + runs(n-1) % len ) then
+                        test = .true.
+!               |X| <= |Y| + |Z| => will need to merge due to rho1 or rho2
+                    else if( n >= 4 ) then
+                        if ( runs(n-4) % len <= &
+                             runs(n-3) % len + runs(n-2) % len ) then
+                            test = .true.
+!               |W| <= |X| + |Y| => will need to merge due to rho1 or rho3
+                        end if
+                    end if
+                end if
+            end if
+            if ( test ) then
+! By default merge Y & Z, rho2 or rho3
+                if ( n >= 3 ) then
+                    if ( runs(n-3) % len < runs(n-1) % len ) then
+                        r = n - 3
+! |X| < |Z| => merge X & Y, rho1
+                        return
+                    end if
+                end if
+                r = n - 2
+! |Y| <= |Z| => merge Y & Z, rho4
+                return
+            else
+                r = -1
+            end if
+
+        end function collapse
+
+
+        pure subroutine insert_head( array, index )
+! Inserts `array(0)` into the pre-sorted sequence `array(1:)` so that the
+! whole `array(0:)` becomes sorted, copying the first element into
+! a temporary variable, iterating until the right place for it is found.
+! copying every traversed element into the slot preceding it, and finally,
+! copying data from the temporary variable into the resulting hole.
+! Consistency of the indices in `index` with the elements of `array`
+! are maintained.
+
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            type(string_type) :: tmp
+            integer(int_size) :: i, tmp_index
+
+            tmp = array(0)
+            tmp_index = index(0)
+            find_hole: do i=1, size(array, kind=int_size)-1
+                if ( array(i) >= tmp ) exit find_hole
+                array(i-1) = array(i)
+                index(i-1) = index(i)
+            end do find_hole
+            array(i-1) = tmp
+            index(i-1) = tmp_index
+
+        end subroutine insert_head
+
+
+        subroutine merge_sort( array, index, buf, ibuf )
+! The Rust merge sort borrows some (but not all) of the ideas from TimSort,
+! which is described in detail at
+! (http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
+!
+! The algorithm identifies strictly descending and non-descending
+! subsequences, which are called natural runs. Where these runs are less
+! than a minimum run size they are padded by adding additional samples
+! using an insertion sort. The merge process is driven by a stack of
+! pending unmerged runs. Each newly found run is pushed onto the stack,
+! and then pairs of adjacentd runs are merged until these two invariants
+! are satisfied:
+!
+! 1. for every `i` in `1..size(runs)-1`: `runs(i - 1)%len > runs(i)%len`
+! 2. for every `i` in `2..size(runs)-1`: `runs(i - 2)%len >
+!    runs(i - 1)%len + runs(i)%len`
+!
+! The invariants ensure that the total running time is `O(n log n)`
+! worst-case. Consistency of the indices in `index` with the elements of
+! `array` are maintained.
+
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            type(string_type), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_size, finish, min_run, r, r_count, &
+                start
+            type(run_type)    :: runs(0:max_merge_stack-1), left, right
+
+            array_size = size(array, kind=int_size)
+
+! Very short runs are extended using insertion sort to span at least this
+! many elements. Slices of up to this length are sorted using insertion sort.
+            min_run = calc_min_run( array_size )
+
+            if ( array_size <= min_run ) then
+                if ( array_size >= 2 ) call insertion_sort( array, index )
+                return
+            end if
+
+
+! Following Rust sort, natural runs in `array` are identified by traversing
+! it backwards. By traversing it backward, merges more often go in the
+! opposite direction (forwards). According to developers of Rust sort,
+! merging forwards is slightly faster than merging backwards. Therefore
+! identifying runs by traversing backwards should improve performance.
+            r_count = 0
+            finish = array_size - 1
+            do while ( finish >= 0 )
+! Find the next natural run, and reverse it if it's strictly descending.
+                start = finish
+                if ( start > 0 ) then
+                    start = start - 1
+                    if ( array(start+1) < array(start) ) then
+                        Descending: do while ( start > 0 )
+                            if ( array(start) >= array(start-1) ) &
+                                exit Descending
+                            start = start - 1
+                        end do Descending
+                        call reverse_segment( array(start:finish), &
+                                            index(start:finish) )
+                    else
+                        Ascending: do while( start > 0 )
+                            if ( array(start) < array(start-1) ) exit Ascending
+                            start = start - 1
+                        end do Ascending
+                    end if
+                end if
+
+! If the run is too short insert some more elements using an insertion sort.
+                Insert: do while ( start > 0 )
+                    if ( finish - start >= min_run - 1 ) exit Insert
+                    start = start - 1
+                    call insert_head( array(start:finish), index(start:finish) )
+                end do Insert
+                if ( start == 0 .and. finish == array_size - 1 ) return
+
+                runs(r_count) = run_type( base = start, &
+                                          len = finish - start + 1 )
+                finish = start-1
+                r_count = r_count + 1
+
+! Determine whether pairs of adjacent runs need to be merged to satisfy
+! the invariants, and, if so, merge them.
+                Merge_loop: do
+                    r = collapse( runs(0:r_count - 1) )
+                    if ( r < 0 .or. r_count <= 1 ) exit Merge_loop
+                    left = runs( r + 1 )
+                    right = runs( r )
+                    call merge( array( left % base: &
+                                     right % base + right % len - 1 ), &
+                                left % len, buf, &
+                                index( left % base: &
+                                     right % base + right % len - 1 ), ibuf )
+
+                    runs(r) = run_type( base = left % base, &
+                                        len = left % len + right % len )
+                    if ( r == r_count - 3 ) runs(r+1) = runs(r+2)
+                    r_count = r_count - 1
+
+                end do Merge_loop
+            end do
+            if ( r_count /= 1 ) &
+                error stop "MERGE_SORT completed without RUN COUNT == 1."
+
+        end subroutine merge_sort
+
+
+        pure subroutine merge( array, mid, buf, index, ibuf )
+! Merges the two non-decreasing runs `ARRAY(0:MID-1)` and `ARRAY(MID:)`
+! using `BUF` as temporary storage, and stores the merged runs into
+! `ARRAY(0:)`. `MID` must be > 0, and < `SIZE(ARRAY)-1`. Buffer `BUF`
+! must be long enough to hold the shorter of the two runs.
+            type(string_type), intent(inout) :: array(0:)
+            integer(int_size), intent(in)    :: mid
+            type(string_type), intent(inout) :: buf(0:)
+            integer(int_size), intent(inout) :: index(0:)
+            integer(int_size), intent(inout) :: ibuf(0:)
+
+            integer(int_size) :: array_len, i, j, k
+
+            array_len = size(array, kind=int_size)
+
+! Merge first copies the shorter run into `buf`. Then, depending on which
+! run was shorter, it traces the copied run and the longer run forwards
+! (or backwards), comparing their next unprocessed elements and then
+! copying the lesser (or greater) one into `array`.
+
+            if ( mid <= array_len - mid ) then ! The left run is shorter.
+                buf(0:mid-1) = array(0:mid-1)
+                ibuf(0:mid-1) = index(0:mid-1)
+                i = 0
+                j = mid
+                merge_lower: do k = 0, array_len-1
+                    if ( buf(i) <= array(j) ) then
+                        array(k) = buf(i)
+                        index(k) = ibuf(i)
+                        i = i + 1
+                        if ( i >= mid ) exit merge_lower
+                    else
+                        array(k) = array(j)
+                        index(k) = index(j)
+                        j = j + 1
+                        if ( j >= array_len ) then
+                            array(k+1:) = buf(i:mid-1)
+                            index(k+1:) = ibuf(i:mid-1)
+                            exit merge_lower
+                        end if
+                    end if
+                end do merge_lower
+            else ! The right run is shorter
+                buf(0:array_len-mid-1) = array(mid:array_len-1)
+                ibuf(0:array_len-mid-1) = index(mid:array_len-1)
+                i = mid - 1
+                j = array_len - mid -1
+                merge_upper: do k = array_len-1, 0, -1
+                    if ( buf(j) >= array(i) ) then
+                        array(k) = buf(j)
+                        index(k) = ibuf(j)
+                        j = j - 1
+                        if ( j < 0 ) exit merge_upper
+                    else
+                        array(k) = array(i)
+                        index(k) = index(i)
+                        i = i - 1
+                        if ( i < 0 ) then
+                            array(0:k-1) = buf(0:j)
+                            index(0:k-1) = ibuf(0:j)
+                            exit merge_upper
+                        end if
+                    end if
+                end do merge_upper
+            end if
+        end subroutine merge
+
+
+        pure subroutine reverse_segment( array, index )
+! Reverse a segment of an array in place
+            type(string_type), intent(inout)  :: array(0:)
+            integer(int_size), intent(inout) :: index(0:)
+
+            integer(int_size) :: itemp, lo, hi
+            type(string_type) :: temp
+
+            lo = 0
+            hi = size( array, kind=int_size ) - 1
+            do while( lo < hi )
+                temp = array(lo)
+                array(lo) = array(hi)
+                array(hi) = temp
+                itemp = index(lo)
+                index(lo) = index(hi)
+                index(hi) = itemp
+                lo = lo + 1
+                hi = hi - 1
+            end do
+
+        end subroutine reverse_segment
+
+    end subroutine string_sort_index
+
+end submodule stdlib_sorting_sort_index
diff --git a/src/tests/CMakeLists.txt b/src/tests/CMakeLists.txt
index 288445de9..381226222 100644
--- a/src/tests/CMakeLists.txt
+++ b/src/tests/CMakeLists.txt
@@ -12,6 +12,7 @@ add_subdirectory(io)
 add_subdirectory(linalg)
 add_subdirectory(logger)
 add_subdirectory(optval)
+add_subdirectory(sorting)
 add_subdirectory(stats)
 add_subdirectory(string)
 add_subdirectory(system)
diff --git a/src/tests/Makefile.manual b/src/tests/Makefile.manual
index 773066b1c..57f663bef 100644
--- a/src/tests/Makefile.manual
+++ b/src/tests/Makefile.manual
@@ -6,6 +6,7 @@ all test clean:
 	$(MAKE) -f Makefile.manual --directory=io $@
 	$(MAKE) -f Makefile.manual --directory=logger $@
 	$(MAKE) -f Makefile.manual --directory=optval $@
+	$(MAKE) -f Makefile.manual --directory=sorting $@
 	$(MAKE) -f Makefile.manual --directory=quadrature $@
 	$(MAKE) -f Makefile.manual --directory=stats $@
 	$(MAKE) -f Makefile.manual --directory=string $@
diff --git a/src/tests/sorting/CMakeLists.txt b/src/tests/sorting/CMakeLists.txt
new file mode 100644
index 000000000..bb4b27896
--- /dev/null
+++ b/src/tests/sorting/CMakeLists.txt
@@ -0,0 +1,2 @@
+ADDTEST(sorting)
+
diff --git a/src/tests/sorting/Makefile.manual b/src/tests/sorting/Makefile.manual
new file mode 100644
index 000000000..39657c8f3
--- /dev/null
+++ b/src/tests/sorting/Makefile.manual
@@ -0,0 +1,3 @@
+PROGS_SRC = test_sorting.f90
+
+include ../Makefile.manual.test.mk
diff --git a/src/tests/sorting/test_sorting.f90 b/src/tests/sorting/test_sorting.f90
new file mode 100644
index 000000000..4025aef42
--- /dev/null
+++ b/src/tests/sorting/test_sorting.f90
@@ -0,0 +1,625 @@
+program test_sorting
+
+    use, intrinsic :: iso_fortran_env, only: &
+        compiler_version
+    use stdlib_kinds, only: int32, int64, dp, sp
+    use stdlib_sorting
+    use stdlib_string_type, only: string_type, assignment(=), operator(>), &
+        write(formatted)
+
+    implicit none
+
+    integer(int32), parameter :: test_size = 2_int32**20
+    integer(int32), parameter :: char_size = 26**4
+    integer(int32), parameter :: string_size = 26**3
+    integer(int32), parameter :: block_size = test_size/6
+    integer, parameter        :: repeat = 8
+
+    integer(int32) ::             &
+        blocks(0:test_size-1),    &
+        decrease(0:test_size-1),  &
+        identical(0:test_size-1), &
+        increase(0:test_size-1),  &
+        rand0(0:test_size-1),     &
+        rand1(0:test_size-1),     &
+        rand2(0:test_size-1),     &
+        rand3(0:test_size-1),     &
+        rand10(0:test_size-1)
+    character(len=4) ::               &
+        char_decrease(0:char_size-1), &
+        char_increase(0:char_size-1), &
+        char_rand(0:char_size-1)
+    type(string_type) ::                  &
+        string_decrease(0:string_size-1), &
+        string_increase(0:string_size-1), &
+        string_rand(0:string_size-1)
+
+    integer(int32)          :: dummy(0:test_size-1)
+    character(len=4)        :: char_dummy(0:char_size-1)
+    type(string_type)       :: string_dummy(0:string_size-1)
+    integer(int_size)       :: index(0:test_size-1)
+    integer(int32)          :: work(0:test_size/2-1)
+    character(len=4)        :: char_work(0:char_size/2-1)
+    type(string_type)       :: string_work(0:string_size/2-1)
+    integer(int_size)       :: iwork(0:test_size/2-1)
+    integer                 :: count, i, index1, index2, j, k, l, temp
+    real(sp)                :: arand, brand
+    character(*), parameter :: filename = 'test_sorting.txt'
+    integer                 :: lun
+    character(len=4)        :: char_temp
+    type(string_type)       :: string_temp
+
+! Create the test arrays
+    identical(:) = 10
+    do i=0, test_size-1
+        increase(i) = i
+        decrease(i) = test_size - 1 - i
+        call random_number( arand )
+        rand0(i) = int( floor( 4 * arand * test_size ), kind=int32 )
+        rand1(i) = int( floor( arand * test_size / 4 ), kind=int32 )
+    end do
+    blocks(:) = increase(:)
+    blocks(0:block_size-1) = increase(4*block_size:5*block_size-1)
+    blocks(block_size:2*block_size-1) = increase(0:block_size-1)
+    blocks(2*block_size:3*block_size-1) = increase(2*block_size:3*block_size-1)
+    blocks(3*block_size:4*block_size-1) = increase(block_size:2*block_size-1)
+    blocks(4*block_size:5*block_size-1) = increase(3*block_size:4*block_size-1)
+    rand2(:) = increase(:)
+    do i=0, test_size-1
+        call random_number( arand )
+        index1 = int( floor( arand * test_size ), kind=int32 )
+        temp = rand2(i)
+        rand2(i) = rand2(index1)
+        rand2(index1) = temp
+    end do
+    rand3(:) = increase(:)
+    do i=0, 2
+        call random_number( arand )
+        call random_number( brand )
+        index1 = int( floor( arand * test_size ), kind=int32 )
+        index2 = int( floor( brand * test_size ), kind=int32 )
+        temp = rand3(index1)
+        rand3(index1) = rand3(index2)
+        rand3(index2) = temp
+    end do
+    rand10(:) = increase(:)
+    do i=test_size-10, test_size-1
+        call random_number( arand )
+        rand10(i) = int( floor( arand * test_size ), kind=int32 )
+    end do
+
+    count = 0
+    do i=0, 25
+        do j=0, 25
+            do k=0, 25
+                do l=0, 25
+                    char_increase(count) = achar(97+i) // achar(97+j) // &
+                        achar(97+k) // achar(97+l)
+                    count = count + 1
+                end do
+            end do
+        end do
+    end do
+
+    do i=0, char_size-1
+        char_decrease(char_size-1-i) = char_increase(i)
+    end do
+
+    char_rand(:) = char_increase(:)
+    do i=0, char_size-1
+        call random_number( arand )
+        index1 = int( floor( arand * char_size ), kind=int32 )
+        char_temp = char_rand(i)
+        char_rand(i) = char_rand(index1)
+        char_rand(index1) = char_temp
+    end do
+
+    count = 0
+    do i=0, 25
+        do j=0, 25
+            do k=0, 25
+                string_increase(count) = achar(97+i) // achar(97+j) // &
+                    achar(97+k)
+                count = count + 1
+            end do
+        end do
+    end do
+
+    do i=0, string_size-1
+        string_decrease(string_size - 1 - i) = char_increase(i)
+    end do
+
+    string_rand(:) = string_increase(:)
+    do i=0, string_size-1
+        call random_number( arand )
+        index1 = int( floor( arand * string_size ), kind=int32 )
+        string_temp = string_rand(i)
+        string_rand(i) = string_rand(index1)
+        string_rand(index1) = string_temp
+    end do
+
+! Create and intialize file to report the results of the sortings
+    open( newunit=lun, file=filename, access='sequential', action='write', &
+        form='formatted', status='replace' )
+    write( lun, '(a)' ) trim(compiler_version())
+    write( lun, * )
+    write( lun, '("|    Type     | Elements |    Array Name   |    Method ' // &
+        '  |  Time (s) |")' )
+    write( lun, '("|-------------|----------|-----------------|-----------' // &
+        '--|-----------|")' )
+
+! test the sorting routines on the test arrays
+    call test_int_ord_sorts( )
+
+    call test_char_ord_sorts( )
+
+    call test_string_ord_sorts( )
+
+    call test_int_sorts( )
+
+    call test_char_sorts( )
+
+    call test_string_sorts( )
+
+    call test_int_sort_indexes( )
+
+    call test_char_sort_indexes( )
+
+    call test_string_sort_indexes( )
+
+contains
+
+    subroutine test_int_ord_sorts( )
+
+        call test_int_ord_sort( blocks, "Blocks" )
+        call test_int_ord_sort( decrease, "Decreasing" )
+        call test_int_ord_sort( identical, "Identical" )
+        call test_int_ord_sort( increase, "Increasing" )
+        call test_int_ord_sort( rand1, "Random dense" )
+        call test_int_ord_sort( rand2, "Random order" )
+        call test_int_ord_sort( rand0, "Random sparse" )
+        call test_int_ord_sort( rand3, "Random 3" )
+        call test_int_ord_sort( rand10, "Random 10" )
+
+    end subroutine test_int_ord_sorts
+
+
+    subroutine test_int_ord_sort( a, a_name )
+        integer(int32), intent(in) :: a(:)
+        character(*), intent(in)   :: a_name 
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            dummy = a
+            call system_clock( t0, rate )
+            call ord_sort( dummy, work )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "ORD_SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a12, 2i7)') 'dummy(i-1:i) = ', dummy(i-1:i)
+        end if
+        write( lun, '("|     Integer |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            test_size, a_name, "Ord_Sort", tdiff/rate
+
+    end subroutine test_int_ord_sort
+
+    subroutine test_char_ord_sorts( )
+
+        call test_char_ord_sort( char_decrease, "Char. Decrease" )
+        call test_char_ord_sort( char_increase, "Char. Increase" )
+        call test_char_ord_sort( char_rand, "Char. Random" )
+
+    end subroutine test_char_ord_sorts
+
+    subroutine test_char_ord_sort( a, a_name )
+        character(len=4), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            char_dummy = a
+            call system_clock( t0, rate )
+            call ord_sort( char_dummy, char_work )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_char_sort( char_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "ORD_SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'char_dummy(i-1:i) = ', char_dummy(i-1:i)
+        end if
+        write( lun, '("|   Character |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            char_size, a_name, "Ord_Sort", tdiff/rate
+
+    end subroutine test_char_ord_sort
+
+    subroutine test_string_ord_sorts( )
+
+        call test_string_ord_sort( string_decrease, "String Decrease" )
+        call test_string_ord_sort( string_increase, "String Increase" )
+        call test_string_ord_sort( string_rand, "String Random" )
+
+    end subroutine test_string_ord_sorts
+
+    subroutine test_string_ord_sort( a, a_name )
+        type(string_type), intent(in) :: a(0:)
+        character(*), intent(in)      :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            string_dummy = a
+            call system_clock( t0, rate )
+            call ord_sort( string_dummy, string_work )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_string_sort( string_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "ORD_SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'string_dummy(i-1:i) = ', &
+                string_dummy(i-1:i)
+        end if
+        write( lun, '("| String_type |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            string_size, a_name, "Ord_Sort", tdiff/rate
+
+    end subroutine test_string_ord_sort
+
+
+    subroutine test_int_sorts( )
+
+        call test_int_sort( blocks, "Blocks" )
+        call test_int_sort( decrease, "Decreasing" )
+        call test_int_sort( identical, "Identical" )
+        call test_int_sort( increase, "Increasing" )
+        call test_int_sort( rand1, "Random dense" )
+        call test_int_sort( rand2, "Random order" )
+        call test_int_sort( rand0, "Random sparse" )
+        call test_int_sort( rand3, "Random 3" )
+        call test_int_sort( rand10, "Random 10" )
+
+    end subroutine test_int_sorts
+
+    subroutine test_int_sort( a, a_name )
+        integer(int32), intent(in) :: a(:)
+        character(*), intent(in)   :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            dummy = a
+            call system_clock( t0, rate )
+            call sort( dummy )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a12, 2i7)') 'dummy(i-1:i) = ', dummy(i-1:i)
+        end if
+        write( lun, '("|     Integer |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            test_size, a_name, "Sort", tdiff/rate
+
+    end subroutine test_int_sort
+
+    subroutine test_char_sorts( )
+
+        call test_char_sort( char_decrease, "Char. Decrease" )
+        call test_char_sort( char_increase, "Char. Increase" )
+        call test_char_sort( char_rand, "Char. Random" )
+
+    end subroutine test_char_sorts
+
+    subroutine test_char_sort( a, a_name )
+        character(len=4), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            char_dummy = a
+            call system_clock( t0, rate )
+            call sort( char_dummy )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_char_sort( char_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'char_dummy(i-1:i) = ', char_dummy(i-1:i)
+        end if
+        write( lun, '("|   Character |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            char_size, a_name, "Sort", tdiff/rate
+
+    end subroutine test_char_sort
+
+    subroutine test_string_sorts( )
+
+        call test_string_sort( string_decrease, "String Decrease" )
+        call test_string_sort( string_increase, "String Increase" )
+        call test_string_sort( string_rand, "String Random" )
+
+    end subroutine test_string_sorts
+
+    subroutine test_string_sort( a, a_name )
+        type(string_type), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            string_dummy = a
+            call system_clock( t0, rate )
+            call sort( string_dummy )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_string_sort( string_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'string_dummy(i-1:i) = ', &
+                string_dummy(i-1:i)
+        end if
+        write( lun, '("| String_type |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            string_size, a_name, "Sort", tdiff/rate
+
+    end subroutine test_string_sort
+
+    subroutine test_int_sort_indexes( )
+
+        call test_int_sort_index( blocks, "Blocks" )
+        call test_int_sort_index( decrease, "Decreasing" )
+        call test_int_sort_index( identical, "Identical" )
+        call test_int_sort_index( increase, "Increasing" )
+        call test_int_sort_index( rand1, "Random dense" )
+        call test_int_sort_index( rand2, "Random order" )
+        call test_int_sort_index( rand0, "Random sparse" )
+        call test_int_sort_index( rand3, "Random 3" )
+        call test_int_sort_index( rand10, "Random 10" )
+
+    end subroutine test_int_sort_indexes
+
+    subroutine test_int_sort_index( a, a_name )
+        integer(int32), intent(inout) :: a(:)
+        character(*), intent(in)      :: a_name
+
+        integer(int64)                 :: t0, t1, tdiff
+        real(dp)                       :: rate
+        integer(int64)                 :: i
+        logical                        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            dummy = a
+            call system_clock( t0, rate )
+            call sort_index( dummy, index, work, iwork )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        dummy = a(index)
+        call verify_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a18, 2i7)') 'a(index(i-1:i)) = ', a(index(i-1:i))
+        end if
+        write( lun, '("|     Integer |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            test_size, a_name, "Sort_Index", tdiff/rate
+
+        dummy = a
+        call sort_index( dummy, index, work, iwork, reverse=.true. )
+        dummy = a(index)
+        call verify_reverse_sort( dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not reverse sort " // &
+                a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a18, 2i7)') 'a(index(i-1:i)) = ', a(index(i-1:i))
+        end if
+
+    end subroutine test_int_sort_index
+
+    subroutine test_char_sort_indexes( )
+
+        call test_char_sort_index( char_decrease, "Char. Decrease" )
+        call test_char_sort_index( char_increase, "Char. Increase" )
+        call test_char_sort_index( char_rand, "Char. Random" )
+
+    end subroutine test_char_sort_indexes
+
+    subroutine test_char_sort_index( a, a_name )
+        character(len=4), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            char_dummy = a
+            call system_clock( t0, rate )
+            call sort_index( char_dummy, index, char_work, iwork )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_char_sort( char_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'char_dummy(i-1:i) = ', char_dummy(i-1:i)
+        end if
+        write( lun, '("|   Character |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            char_size, a_name, "Sort_Index", tdiff/rate
+
+    end subroutine test_char_sort_index
+
+    subroutine test_string_sort_indexes( )
+
+        call test_string_sort_index( string_decrease, "String Decrease" )
+        call test_string_sort_index( string_increase, "String Increase" )
+        call test_string_sort_index( string_rand, "String Random" )
+
+    end subroutine test_string_sort_indexes
+
+    subroutine test_string_sort_index( a, a_name )
+        type(string_type), intent(in) :: a(0:)
+        character(*), intent(in) :: a_name
+
+        integer(int64) :: t0, t1, tdiff
+        real(dp)       :: rate
+        integer(int64) :: i
+        logical        :: valid
+
+        tdiff = 0
+        do i = 1, repeat
+            string_dummy = a
+            call system_clock( t0, rate )
+            call sort_index( string_dummy, index, string_work, iwork )
+            call system_clock( t1, rate )
+            tdiff = tdiff + t1 - t0
+        end do
+        tdiff = tdiff/repeat
+
+        call verify_string_sort( string_dummy, valid, i )
+        if ( .not. valid ) then
+            write( *, * ) "SORT_INDEX did not sort " // a_name // "."
+            write(*,*) 'i = ', i
+            write(*,'(a17, 2(1x,a4))') 'string_dummy(i-1:i) = ', &
+                string_dummy(i-1:i)
+        end if
+        write( lun, '("| String_type |", 1x, i7, 2x, "|", 1x, a15, " |", ' // &
+            'a12, " |",  F10.5, " |" )' ) &
+            string_size, a_name, "Sort_Index", tdiff/rate
+
+    end subroutine test_string_sort_index
+
+
+    subroutine verify_sort( a, valid, i )
+        integer(int32), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) > a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_sort
+
+
+    subroutine verify_string_sort( a, valid, i )
+        type(string_type), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) > a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_string_sort
+
+    subroutine verify_char_sort( a, valid, i )
+        character(len=4), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) > a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_char_sort
+
+
+    subroutine verify_reverse_sort( a, valid, i )
+        integer(int32), intent(in) :: a(0:)
+        logical, intent(out) :: valid
+        integer(int64), intent(out) :: i
+
+        integer(int64) :: n
+
+        n = size( a, kind=int64 )
+        valid = .false.
+        do i=1, n-1
+            if ( a(i-1) < a(i) ) return
+        end do
+        valid = .true.
+
+    end subroutine verify_reverse_sort
+
+end program test_sorting