Skip to content

Tim's OpenCL Materials

dellswor edited this page Jan 21, 2014 · 2 revisions

Timothy Mattson from Intel kindly provided us a hands-on tutorial material for OpenCL under Creative Commons License. The tutorial contains one slides and several exercises (with solutions) and they are split 50/50. Note that parallel program paradigms are not in scope of this tutorial. Materials can be found here.

The tutorial is supposed to be go though in one day.

The Slides

The slides has 182 pages and covers basic principles and APIs of OpenCL.

  1. Page 5-17: Brief introduction to OpenCL origins, platform model, memory model, and execution module

  2. Page 18-33 : Detailed discussion on host program. It teaches you step by step how to

  • create a context and queue
  • create and build the program
  • setup memory object
  • define the kernel
  • enqueue commends
  1. Page 34-62: Detailed discussion on kernels programming, which includes
  • brief introduction to OpenCL C kernel language
  • example: convert sequential matrix multiplication to parallel
  • build/compile the program object and error handling
  1. Page 63-91: Detailed discussion on OpenCL memory hierarchy
  • private, local, global and host memory
  • example: parallel matrix multiplication optimize
  • memory consistency, barrier
  1. Page 92-115: More discussion on high performance
  • more details on execution model
  • work-item divergence (wrap/wavefront/SIMD)
  • more tips on performance optimization
  1. Page 116-127: Synchronization in OpenCL
  • barrier
  • example: parallel reduction, Pi.
  1. Page 128-133: Concluding remarks, link to other resources

  2. Page 135-143: Appendix A - Vector operations within kernels

  3. Page 144-159: Appendix B - OpenCL event model

  • generating and consuming events
  • synchronization: queues & events
  • profiling with events
  1. Page 160-176: Appendix C - C++ for C programmers

  2. Page 177-182: Appendix D - Memory Coalescence

The Exercises

  1. Exercises 1-3: You are asked to play with a vector add program and expand it from 2 vector add to 3 vector add. Those are "hello world" style exercises to make you get familiar with basic strctures of the OpenCL programs.
  2. Exercises 4-5: You are asked to convert a sequential matrix multiplication program to OpenCL parallel program.
  3. Exercises 6-9: You are asked to optimize the parallel matrix multiplication using different techniques (memory hierarchy, sgemm).
  4. Exercises 10: The Pi program, parallel reduction, barrier
  5. Exercises A: vectorized Pi program
Clone this wiki locally