Description
Feature
During execution of aot_call_function()
there are up to 3 calls to aot_copy_exception()
(for the case of exported function has no or single result, which I believe is the most common case). And each aot_copy_exception()
internally locks global mutex _exception_lock
. This can be improved by:
- Remove redundant calls to
aot_copy_exception()
- Make
exception_lock()
to lock module's mutex instead of global one
Benefit
In my project in a highly concurrent environment I use hundreds of instantiated modules, and each module processes about 1k requests per second. While each module performance is limited by execution time of exported function, I need to increase the number of modules to gain sufficient throughput. With global locks I have very strict limits to which I can add new executors. The more modules I add, the slower each of them works.
By removing global locks (or at least by reducing the number of locks required to aot_call_function()
) we can achieve linear scalability.
Implementation
- Reducing the number of
aot_copy_exception()
. There are 3 calls I mentioned above: inside invoke_native_internal, then one for debugging purpose, and finally just before return.invoke_native_internal()
uses exception check as its whole result, so further calls toaot_copy_exception()
seems to be redundant. The branch functions withresult_count > 1
reuses return result in similar places. - The comment inside
exception_lock()
body tells that there were some plans to make a mutex belong to a particular module. Probably this requires complex changes, but it will give great performance boost in highly concurrent environments.