Skip to content
This repository was archived by the owner on Feb 5, 2024. It is now read-only.

Meeting notes and slides for 19 September 2023 #129

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 147 additions & 1 deletion language/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,156 @@ Potential Topics
* Function pointers revisited
* oneDPL C++ standard library support

2023-09-19
=============

* Ruyman Reyes (Intel/Codeplay)
* Lukas Sommer (Codeplay Software Ltd)
* Benie (Codeplay Software Ltd)
* Hyesun Hong (Samsung SAIT)
* Julian Oppermann (Codeplay Software Ltd)
* Mehdi Goli (Codeplay Software Ltd)
* Lueck, Gregory (Intel)
* Jesus Labarta (BSC) (Guest)
* Brodman, James (Intel)
* Hanwoong Jung (Samsung SAIT)
* Brice Goglin (Invité)
* Plaska, Oskar (Contractor, Cognizant)
* Tom Deakin (Univ. of Bristol)
* Marcin (N/A)
* Victor Lomuller (Codeplay Software Ltd)
* Biagio COSENZA (Università degli Studi di Salerno)
* Voss, Michael J (Intel)
* Kukanov, Alexey (Intel)
* Richards, Alison L (Intel)
* Adam Kuźniar (Mobica)
* Slavova, Gergana S (Intel)
* bongjun kim (Samsung SAIT)
* Keryell, Ronan (XILINX LABS)
* Juan Fumero (University of Manchester)
* Gordon Brown (Codeplay Software Ltd)
* Tim (N/A)
* Kinsner, Michael (Intel)
* Petersen, Paul (Intel)
* Videau, Brice (ANL)
* Holmes, Daniel John (Intel)
* Frank Brill (Cadence)
* Mrozek, Michal (Intel)
* Reble, Pablo (Intel)
* Andrew Richards (Intel/Codeplay)
* Smith, Timmie (Intel)


SYCL Extension Proposal for PIM/PNM
--------------------------------------

Hyesun Hong,
`Slides <presentation/2023-09-19-HS-sycl-pim-extensions.pdf>`

* PIM/PNM technology enables computation directly on memory
* Prevents data movement improving performance and reducing consumption
* Operates directly on memory banks by reading and storing on rows and columns
* Aquabolt-XL is the first demonstrator
* Can be drop in on any memory controller
* CXL-PNM is the CXL variant for PNM, can work with multiple PIM

SYCL Extension for PIM/PNM
* Work in collaboration with Codeplay Software team
* Goals

* Seamlessly integrate PIM/PNM operation into SYCL
* Allow combination of xGPU and PIM/PNM in one device kernel
* Not specific to one hardware

* Design

* Vector operation seem like natural fit
* no convergence guarantee and vector size explicit

* Model as special function unit

* Aligns with trends to model special functional units inside accelerators
* Compiler automatic mapping often not possible
* joint_matrix-like interface


* Group functions

* Easy to use
* Can easily be combined with device code
* Give necessary convergence guarantees


* Recap of SYCL work-item, work-group and group functions

* Group functions must be encountered in converged control flow

* Extension

* Extended group functions with additional overload of joint_reduce
* and new joint_transform and joint_inner_product
* Block size as template parameter, number of blocks as runtime parameter
* allows calculation of number of elements to process

* Extension for PNM

* Added new overloads of joint_exclusive_scan,
* joint_inclusive_scan, reduce_over_group

* PNM standalone has less opportunity for parallelism

* limited by memory controller
* -> Combine PNM and PIM, PNM generates commands for PIM blocks

* Two modes

* PIM mode: PIM blocks can operate independently, can choose number of blocks
* PNM mode: Synchronized execution on multiple PIM blocks

* Mapping

* Every PIM block is one work-item
* PNM with attached PIM blocks forms one work-group

* Execution

* Work-item operations map to PIM operation
* Group functions map to PNM operation

* Example

* work-item execution maps to PIM
* group function maps to PNM

* Conclusion

* Integrate support for PIM/PNM into SYCL

Q&A
* Are the proposed functions specific to PIM, could also be used with other HW?

* Can also be used with other hardware.
* Semantics not PIM-specific, but translation of C++ to SYCL
* Can also map nicely to other types of hardware, e.g. vector processor

* Why have the user explicitly specify a block-size?

* Not a hardware detail
* Rather a promise by the user that data-blocks
will always be at least that big
* Promise allows device compiler to perform optimizations,
efficient looping inside PIM unit

* Could num_blocks runtime parameter be replaced by iterator?

* requires to be divisable by block-size
* Yes, that is possible, mainly a design question
* Current version might have additional implications regarding alignment


2023-06-05
==========


* Ruyman Reyes
* Rod Burns
* Cohn, Robert S
Expand Down
Binary file not shown.