Speed up function shorrocks_index
by 10%
#810
Open
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 10% (0.10x) speedup for
shorrocks_index
inquantecon/_inequality.py
Hello, I found a few more optimizations for this project at https://github.com/codeflash-ai/QuantEcon.py/pulls?q=is%3Apr+is%3Aopen+review%3Aapproved, would love to contribute them here.
⏱️ Runtime :
474 microseconds
→432 microseconds
(best of242
runs)📝 Explanation and details
The optimization replaces
np.diag(A).sum()
withnp.trace(A)
to compute the sum of diagonal elements. This single change provides a 9% speedup because:What changed: The diagonal sum calculation was changed from a two-step process (
np.diag()
then.sum()
) to a single optimized NumPy function (np.trace()
).Why it's faster:
np.trace()
is specifically designed to compute the sum of diagonal elements directly, avoiding the intermediate array creation thatnp.diag()
requires. The line profiler shows the diagonal sum computation time decreased from 680,989 ns to 600,218 ns (12% faster for that line).Performance characteristics: The optimization is most effective for:
However, for very large sparse matrices or random matrices with complex access patterns, the optimization may show smaller gains or even slight regressions (as seen in the large random test cases), likely due to cache behavior differences. Overall, this is a clean micro-optimization that improves performance across most typical use cases without changing any behavior or dependencies.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
test_inequality.py::test_shorrocks_index
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-shorrocks_index-mgg0hw20
and push.