Skip to content

Latest commit

 

History

History
375 lines (272 loc) · 10.8 KB

sycl_ext_oneapi_tangle.asciidoc

File metadata and controls

375 lines (272 loc) · 10.8 KB

sycl_ext_oneapi_tangle

Notice

Copyright © 2022 Intel Corporation. All rights reserved.

Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.

Contact

To report problems with this extension, please open a new issue at:

Dependencies

This extension is written against the SYCL 2020 revision 9 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision.

This extension also depends on the following other SYCL extensions:

Status

This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this specification are implemented in DPC++, but they are not finalized and may change incompatibly in future versions of DPC++ without prior notice. Shipping software products should not rely on APIs defined in this specification.

Backend support status

The APIs in this extension may be used only on a device that has one or more of the extension aspects. The application must check that the device has this aspect before submitting a kernel using any of the APIs in this extension. If the application fails to do this, the implementation throws a synchronous exception with the errc::kernel_not_supported error code when the kernel is submitted to the queue.

Overview

This proposal introduces a new class to represent the set of work-items in a group which are currently active, simplifying the use of group functions and algorithms within nested control flow.

Specification

Feature test macro

This extension provides a feature-test macro as described in the core SYCL specification. An implementation supporting this extension must predefine the macro SYCL_EXT_ONEAPI_TANGLE to one of the values defined in the table below. Applications can test for the existence of this macro to determine if the implementation supports this feature, or applications can test the macro’s value to determine which of the extension’s features the implementation supports.

Value Description

1

The APIs of this experimental extension are not versioned, so the feature-test macro always has this value.

Extension to enum class aspect

namespace sycl {
enum class aspect {
  ...
  ext_oneapi_tangle
}
}

If a SYCL device has this aspect, that device supports tangle groups.

Control Flow

The SYCL specification defines control flow as below:

When all work-items in a group are executing the same sequence of statements, they are said to be executing under converged control flow. Control flow diverges when different work-items in a group execute a different sequence of statements, typically as a result of evaluating conditions differently (e.g. in selection statements or loops).

This extension introduces some new terminology to describe other kinds of control flow, to simplify the description of the behavior for new group types.

A tangle is a collection of work-items from the same group executing under converged control flow.

Group Taxonomy

The is_user_constructed_group<T>::value trait defined in the sycl_ext_oneapi_non_uniform_groups extension is std::true_type if T is tangle.

Additionally, the is_group<T>::value trait from the core SYCL specification is std::true_type if T is tangle.

Group Functions and Algorithms

When a user-constructed group is passed to a group function or group algorithm, all work-items in the group must call the function or algorithm in converged control flow. Violating this restriction results in undefined behavior.

If a work-item calls a group function or group algorithm using an object that represents a group to which the work-item does not belong, this results in undefined behavior.

The following group functions support tangles:

  • group_barrier

  • group_broadcast

The following group algorithms support tangles:

  • joint_any_of and any_of_group

  • joint_all_of and all_of_group

  • joint_none_of and none_of_group

  • shift_group_left

  • shift_group_right

  • permute_group_by_xor

  • select_from_group

  • joint_reduce and reduce_over_group

  • joint_exclusive_scan and exclusive_scan_over_group

  • joint_inclusive_scan and inclusive_scan_over_group

Tangle

A tangle is a non-contiguous subset of a group representing work-items executing in a tangle. A tangle can therefore be used to capture all work-items currently executing the same control flow.

Creation

A new tangle can only be created by partitioning an existing group, using the entangle free-function.

namespace ext::oneapi::experimental {

template <typename ParentGroup>
tangle<ParentGroup> entangle(ParentGroup group);

} // namespace ext::oneapi::experimental

Constraints: Available only if ParentGroup is sycl::sub_group.

Effects: Blocks until all work-items in parent that will reach this synchronization point have reached this synchronization point.

Synchronization: The call in each work-item happens before any work-item blocking on the same synchronization point is unblocked. Synchronization operations used by an implementation must respect the memory scope reported by ParentGroup::fence_scope.

Returns: A tangle consisting of all work-items in parent which will reach this synchronization point in the same control flow.

Note
This function provides stronger guarantees than get_opportunistic_group(), which returns a group consisting of some of the work-items in parent which will reach the synchronization point.

tangle Class

namespace sycl::ext::oneapi::experimental {

template <typename ParentGroup>
class tangle {
public:
  using id_type = id<1>;
  using range_type = range<1>;
  using linear_id_type = uint32_t;
  static constexpr int dimensions = 1;
  static constexpr sycl::memory_scope fence_scope = ParentGroup::fence_scope;

  id_type get_group_id() const;

  id_type get_local_id() const;

  range_type get_group_range() const;

  range_type get_local_range() const;

  linear_id_type get_group_linear_id() const;

  linear_id_type get_local_linear_id() const;

  linear_id_type get_group_linear_range() const;

  linear_id_type get_local_linear_range() const;

  bool leader() const;
};

}
id_type get_group_id() const;

Returns: An id representing the index of the tangle.

Note
This will always be an id with all values set to 0, since there can only be one tangle.
id_type get_local_id() const;

Returns: An id representing the calling work-item’s position within the tangle.

range_type get_group_range() const;

Returns: A range representing the number of tangles.

Note
This will always return a range of 1 as there can only be one tangle.
range_type get_local_range() const;

Returns: A range representing the number of work-items in the tangle.

id_type get_group_linear_id() const;

Returns: A linearized version of the id returned by get_group_id().

id_type get_local_linear_id() const;

Returns: A linearized version of the id returned by get_local_id().

linear_id_type get_group_linear_range() const;

Returns: A linearized version of the range returned by get_group_range().

linear_id_type get_local_linear_range() const;

Returns: A linearized version of the range returned by get_local_range().

bool leader() const;

Returns: true for exactly one work-item in the tangle, if the calling work-item is the leader of the tangle, and false for all other work-items in the tangle. The leader of the tangle is guaranteed to be the work-item for which get_local_id() returns 0.

Usage examples

A tangle can be used in conjunction with constructs like loops and branches to safely communicate between all work-items executing the same control flow.

Note
This differs from the fragment returned by get_opportunistic_group() because a tangle_group requires the implementation to track group membership. Which group type to use will depend on a combination of implementation/backend/device and programmer preference.
auto sg = it.get_sub_group();

auto will_branch = sg.get_local_linear_id() % 2 == 0;
if (will_branch)
{
  // wait for all work-items that took the branch to hit the barrier
  auto inner = sycl::ext::oneapi::experimental::entangle(sg);

  // reduce across subset of outer work-items that took the branch
  float ix = sycl::reduce_over_group(inner, x, plus<>());
}

Implementation notes

This non-normative section provides information about one possible implementation of this extension. It is not part of the specification of the extension’s API.

For SPIR-V backends, tangles are expected to be implemented using SPIR-V’s non-uniform instructions.

For CUDA backends, supporting tangle may require the compiler to construct masks when encountering control flow constructions, and to pass those masks across call boundaries.

Issues

  1. Should tangle support work-groups or just sub-groups?

    SPIR-V "tangled instructions" include group and sub-group instructions, but it is unclear how to identify which work-items in different sub-groups are executing the same control flow (without introducing significant overhead). If we decide at a later date that tangle should support only sub-groups, we should revisit the name to avoid creating confusion.

  2. Should we introduce additional functionality to simplify reasoning about convergence?

    There are some open questions about when an implementation should force work-items to reconverge, which can impact membership in a tangle. Adding a function like assert_convergence to allow users to control when reconvergence happens and tying that function to tangles may simplify both usage and implementation of tangles.