Skip to content
This repository was archived by the owner on Feb 5, 2024. It is now read-only.

Commit 5f39d03

Browse files
committed
Samsung SAIT presentation about SYCL PIM language extensions
1 parent 7dcdbe9 commit 5f39d03

File tree

1 file changed

+75
-33
lines changed

1 file changed

+75
-33
lines changed

language/README.rst

+75-33
Original file line numberDiff line numberDiff line change
@@ -71,61 +71,103 @@ Hyesun Hong,
7171

7272
* PIM/PNM technology enables computation directly on memory
7373
* Prevents data movement improving performance and reducing consumption
74-
* PIM operates directly on memory banks by reading and storing on rows and columns
74+
* Operates directly on memory banks by reading and storing on rows and columns
7575
* Aquabolt-XL is the first demonstrator
7676
* Can be drop in on any memory controller
7777
* CXL-PNM is the CXL variant for PNM, can work with multiple PIM
7878

7979
SYCL Extension for PIM/PNM
80-
* Goals
81-
* Seamlessly integrate PIM/PNM operation into SYCL
82-
* Allow combination of xGPU and PIM/PNM in one device kernel
83-
* Not specific to one hardware
84-
* Design
85-
* Vector operation seem like natural fit, but no convergence guarantee and vector size explicit
86-
* Model as special function unit
87-
* Aligns with trends to model special functional units inside accelerators
88-
* Compiler automatic mapping often not possible
89-
* joint_matrix
90-
* Group functions
91-
* Easy to use
92-
* Can easily be combined with device code
93-
* Give necessary convergence guarantees
94-
* Recap of SYCL work-item, work-group and group functions
95-
* Group functions must be encountered in converged control flow
80+
* Work in collaboration with Codeplay Software team
81+
* Goals
82+
83+
* Seamlessly integrate PIM/PNM operation into SYCL
84+
* Allow combination of xGPU and PIM/PNM in one device kernel
85+
* Not specific to one hardware
86+
87+
* Design
88+
89+
* Vector operation seem like natural fit
90+
* no convergence guarantee and vector size explicit
91+
92+
* Model as special function unit
93+
94+
* Aligns with trends to model special functional units inside accelerators
95+
* Compiler automatic mapping often not possible
96+
* joint_matrix-like interface
97+
98+
99+
* Group functions
100+
101+
* Easy to use
102+
* Can easily be combined with device code
103+
* Give necessary convergence guarantees
104+
105+
106+
* Recap of SYCL work-item, work-group and group functions
107+
108+
* Group functions must be encountered in converged control flow
109+
96110
* Extension
97-
* Extended group functions with additional overload of joint_reduce and new joint_transform and joint_inner_product
98-
* Block size as template parameter, number of blocks as runtime parameter -> allows calculation of number of elements to process
111+
112+
* Extended group functions with additional overload of joint_reduce
113+
* and new joint_transform and joint_inner_product
114+
* Block size as template parameter, number of blocks as runtime parameter
115+
* allows calculation of number of elements to process
116+
99117
* Extension for PNM
100-
* Added new overloads of joint_exclusive_scan, joint_inclusive_scan, reduce_over_group
101-
* PNM standalone has less opportunity for parallelism, also limited by memory controller
102-
* -> Combine PNM and PIM, PNM generates commands for PIM blocks
118+
119+
* Added new overloads of joint_exclusive_scan,
120+
* joint_inclusive_scan, reduce_over_group
121+
122+
* PNM standalone has less opportunity for parallelism
123+
124+
* limited by memory controller
125+
* -> Combine PNM and PIM, PNM generates commands for PIM blocks
126+
103127
* Two modes
128+
104129
* PIM mode: PIM blocks can operate independently, can choose number of blocks
105130
* PNM mode: Synchronized execution on multiple PIM blocks
131+
106132
* Mapping
133+
107134
* Every PIM block is one work-item
108135
* PNM with attached PIM blocks forms one work-group
136+
109137
* Execution
110-
* Work-item operations map to PIM operation
111-
* Group functions map to PNM operation
138+
139+
* Work-item operations map to PIM operation
140+
* Group functions map to PNM operation
141+
112142
* Example
143+
113144
* work-item execution maps to PIM
114145
* group function maps to PNM
146+
115147
* Conclusion
148+
116149
* Integrate support for PIM/PNM into SYCL
117150

118151
Q&A
119-
* Are the proposed functions specific to PIM or could also be used with other HW?
120-
* Can also be used with other hardware. Semantics not PIM-specific, but translation of C++ to SYCL
121-
* Can also map nicely to other types of hardware, for example vector processor
152+
* Are the proposed functions specific to PIM, could also be used with other HW?
153+
154+
* Can also be used with other hardware.
155+
* Semantics not PIM-specific, but translation of C++ to SYCL
156+
* Can also map nicely to other types of hardware, e.g. vector processor
157+
122158
* Why have the user explicitly specify a block-size?
123-
* Not a hardware detail
124-
* Rather a promise by the user that data-blocks will always be at least that big
125-
* Promise allows device compiler to perform optimizations, efficient looping inside PIM unit
126-
* Could num_blocks runtime parameter be replaced by iterator, requiring to be divisable by block-size
127-
* Yes, that is possible, mainly a design question
128-
* Current version might have additional implications regarding alignment
159+
160+
* Not a hardware detail
161+
* Rather a promise by the user that data-blocks
162+
will always be at least that big
163+
* Promise allows device compiler to perform optimizations,
164+
efficient looping inside PIM unit
165+
166+
* Could num_blocks runtime parameter be replaced by iterator?
167+
168+
* requires to be divisable by block-size
169+
* Yes, that is possible, mainly a design question
170+
* Current version might have additional implications regarding alignment
129171

130172

131173
2023-06-05

0 commit comments

Comments
 (0)