All notable changes to this project will be documented in this file.
The 2.5.1 release corresponds to the OmpSs-2 2020.11.1 release. It introduces bug fixes and code improvements.
- Unify instrumentation, monitoring and hwcounter points
- Efficient support for taskloop dependencies
- Fix reductions in taskloops and taskfors
- Centralize configuration variables
- Fully implement
assert
directive - Abort execution when an invalid config variable is enabled
- Fix CTF instrumentation bugs
- Bugfixes, performance and code improvements
The 2.5 release corresponds to the OmpSs-2 2020.11 release. It introduces several features and fixes that improve general performance. It replaces the configuration environment variables with a configuration file, improving the usability of the runtime system. It also makes the discrete dependency system the default implementation.
- Replace all environment variables with a configuration file
- Add
NANOS6_CONFIG
environment variable to specify the configuration file - Add
NANOS6_CONFIG_OVERRIDE
to override options from the configuration file - Enhance performance in architectures with hyperthreading
- Improve locking performance
- Allocate critical C++ containers with the custom memory allocator
- Support the
assert
directive to check the loaded dependency system
- Make
discrete
the default dependency system - Improve allocations in discrete dependencies
- Add support for CUDA task reductions in discrete dependencies
- Use address translation tables for specifying task reductions' storage
- Add support for kernel events in CTF instrumentation
- Add new Paraver views for CTF traces
- Add fixes for OpenACC and CUDA devices
The 2.4.1 release corresponds to the OmpSs-2 2020.06.1 release. It introduces bug fixes and performance improvements.
- Improve the interface and performance of the scheduler's lock
- Fix CTF instrumentation bugs and limitations
- Fix PAPI hardware counters backend
- Support newer versions of GCC, Clang and GLIBC
- Fix task external events API
- Remove preemption mechanism from critical sections
- Fix initialization of locks
- Add test suite built with the OmpSs-2 compiler based on LLVM
- Add new tests
The 2.4 release corresponds to the OmpSs-2 2020.06 release. It introduces several features that improve the general performance of OmpSs-2 applications. It adds a new variant to extract execution traces with a lightweight internal tracer. It also improves the support for CUDA and provides support for OpenACC tasks.
- Use jemalloc as a scalable multi-threading memory allocator
- Add
turbo
variant enabling floating-point optimizations and the discrete dependency system - Refactor of CPU Manager and DLB support improvements
- Bugfixes, performance and code improvements
- Improve taskfor distribution policy
- Improve scheduling performance and code
- Add the
nanos6_wait_for
function to efficiently pause a task for a given time
- Implement the discrete dependency system with lock-free techniques
- Add support for weak dependencies in discrete
- Add support for commutative and concurrent dependencies in discrete
- Refactor the hardware counters infrastructure and support both PAPI and PQoS counters
- Add
ctf
variant to extract execution traces in CTF format using a lightweight internal tracer - Provide the
ctf2prv
tool to convert CTF traces to Paraver traces - Avoid Extrae trace desynchronizations in hybrid MPI+OmpSs-2 executions
- Remove the
stats-papi
instrumentation variant
- Refactor of the devices' infrastructure
- Perform transparent CUDA Unified Memory prefetching
- Add support for cuBLAS and similar CUDA APIs
- Add support for OpenACC tasks
The 2.3.2 release corresponds to the OmpSs-2 2019.11.2 release. It mainly introduces bug fixes.
- Fix important error at the runtime initialization
- Fix in discrete dependency system
- Several fixes for OmpSs-2@Cluster
The 2.3.1 release corresponds to the OmpSs-2 2019.11.1 release. It introduces bug fixes and performance improvements.
- Fix execution of CUDA tasks
- Fix
dmalloc
in OmpSs-2@Cluster - Add missing calls to CPU Manager
- Improve taskfor performance
- Improve general performance by using a reasonable cache line size padding
- Add tests checking the execution of CUDA tasks
The 2.3 release corresponds to the OmpSs-2 2019.11 release. It introduces a new optimized data dependency implementation.
It improves the usability, performance and code of the scheduling infrastructure and the task for
feature. It also adds
support for DLB and OmpSs-2@Linter.
- Data dependency implementation can be decided at run-time through
NANOS6_DEPENDENCIES
variable - Performance and code improvements on the
task for
feature - Add support for Dynamic Load Balancing (DLB) tool
- Add support for OmpSs-2@Linter
- Important bugfix in memory allocator (used by OmpSs-2@Cluster)
- Bugfixes, performance and code improvements
- Add new optimized discrete dependency system implementation; enabled by
NANOS6_DEPENDENCIES=discrete
- Usability, performance and code improvements on the scheduling infrastructure
- Remove profile instrumentation variant
- Remove interception mechanism of memory allocation functions
The 2.2.2 release corresponds to the OmpSs-2 2019.06.2 release. It introduces bug fixes.
- Compile extrae variant with high optimization flags
- Remove backtrace sampling from the extrae variant
The 2.2.1 release corresponds to the OmpSs-2 2019.06.1 release. It mainly introduces bug fixes and code improvements.
- Rename loop directive to task for
- Tasks can leverage reductions and external events at the same time (over distinct data regions)
- OmpSs-2@Cluster bugfixes
- Fix binding information reported by nanos6-info binary
- Support for the TAGASPI library
- Other bugfixes and code improvements
The 2.2 release corresponds to the OmpSs-2 2019.06 release. It mainly introduces the new support for OmpSs-2@Cluster. It also includes some improvements and optimizations for array task reductions and general bugfixes.
- Support for OmpSs-2@Cluster
- Bugfixes and performance improvements
- Bugfixes and optimization for array reductions
- Delete obsolete task data dependency implementations
- Delete obsolete schedulers
The 2.1 release corresponds to the OmpSs-2 2018.11 release. It provides full support for the TAMPI library. It also includes general bugfixes and performance improvements.
- Full support for TAMPI
- Bugfixes and performance improvements
- Bugfixes in task external events API
The 2.0.2 release corresponds to the OmpSs-2 2018.06.2 release.
- Bugfixes in HWLOC support
The 2.0.1 release corresponds to the OmpSs-2 2018.06.1 release.
- Bugfixes in task reductions
The 2.0 release corresponds to the OmpSs-2 2018.06 release. It introduces support for OmpSs-2@CUDA in Unified Memory NVIDIA devices. It also supports array task reductions in C/C++ and task priorities. Additionally, it provides two new APIs used by the TAMPI library.
- Support for OmpSs-2@CUDA Unified Memory
- Bugfixes and performance improvements
- Support for array task reductions in C/C++
- Support for task priorities
- Add priority scheduler
- Add polling services API
- Add task external events API
- Rename taskloop construct to loop
The 1.0.1 release corresponds to the OmpSs-2 2017.11.1 release.
- Fixes for the building system
- Fixes for the loading system
The 1.0 release corresponds to the OmpSs-2 2017.11 release. It is the first release of the Nanos6 runtime system. It implements the basic infrastructure to manage the parallelism of user tasks (task creation, task scheduling, etc) and their data dependencies. The task dependency system supports the nested dependency domain connection, and both early release and weak dependency models.
- General infrastructure of the runtime system
- Support for user tasks and nesting of tasks
- Implement different schedulers: FIFO, LIFO, etc
- Implementation of a task data dependency system
- Support for nested dependency domain connection
- Support for early release of task dependencies
- Support for weak task dependencies
- Support for reductions
- Taskloop construct with dependencies
- Task pause/resume API