19M18837 JI Cunyuan
This is the repository for "High Performance Computing" lecture of Tokyo Tech, Spring Semester 2020.
❗️)There seem to be an upper limit of 1024 threads for the bucket sort with cuda.
Topic | Sample code | |
---|---|---|
Class 1 | Introduction to parallel programming | |
Class 2 | Shared memory parallelization | 02_openmp ✅ |
Class 3 | Distributed memory parallelization | 03_mpi ✅ |
Class 4 | SIMD parallelization | 04_simd ✅ |
Class 5 | GPU programming | 05_cuda,05_openacc ✅ |
Class 6 | Parallel programing models | 06_starpu |
Class 7 | Cache blocking | 07_cache_cpu,07_cache_gpu❓ |
Class 8 | High Performance Python | 08_python |
Class 9 | I/O libraries | 09_io ✅ |
Class 10 | Parallel debugger | 10_debugger |
Class 11 | Parallel profiler | 11_profiler |
Class 12 | Containers | |
Class 13 | Scientific computing | 13_pde |
Class 14 | Deep Learning | 14_dl |