-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(optimization): Tiny optimization for pointsInternal #60138
chore(optimization): Tiny optimization for pointsInternal #60138
Conversation
🪟 Windows buildsDownload Windows builds of this PR for testing. 🪟 Windows Qt6 buildsDownload Windows Qt6 builds of this PR for testing. |
Are those results on a release build? |
I'm not sure the compiler (before C++26 which will declare them constexpr: https://en.cppreference.com/w/cpp/numeric/math/cos) can infer that the output of std::cos() doesn't change for a given output.
hum, I would expect that your optimization would just divide the time by 2, not by 10. Did you look at the dissembled code to understand why you get such perf gain ? Which compiler do you use ? Maybe OpenMP is used and the array computation is multithreaded ? In any case adding a comment in the code explaining why the optimization works and under which circumstances would make later maintenance easier. |
@uclaros I'm live only in a Release mode 😉
@rouault I did a more extensive test, which you can find here. It's not real QGIS anymore, but I think it's still representative. @ptitjano and I have been talking about testing OpenMP on QGIS for several years now, and I've been looking a lot at what's being done on GRASS. We're thinking of proposing a QEP within the year, but first we need to do some testing (and learn OpenMP properly). In the meantime, for this test, you can see that depending on the number of threads used, OpenMP is interesting around 256 vertices. In any case, it's better to use method 2 with the creation of cos/sin in the loop, as you suggest, and not to pre-calculate it as I did. So we have choice between: OpenMP ready for (int i = 0; i < static_cast<int>(segments); ++i) {
const double cosT{ std::cos(t[i]) };
const double sinT{ std::sin(t[i]) };
x[i] = centerX + mSemiMajorAxis * cosT * cosAzimuth -
mSemiMinorAxis * sinT * sinAzimuth;
y[i] = centerY + mSemiMajorAxis * cosT * sinAzimuth +
mSemiMinorAxis * sinT * cosAzimuth;
if (hasZ) z[i] = centerZ;
if (hasM) m[i] = centerM;
} Actual with cos/sin calculated only one time. for (double it : t) {
const double cosT{ std::cos(it) };
const double sinT{ std::sin(it) };
*xOut++ = centerX + mSemiMajorAxis * cosT * cosAzimuth -
mSemiMinorAxis * sinT * sinAzimuth;
*yOut++ = centerY + mSemiMajorAxis * cosT * sinAzimuth +
mSemiMinorAxis * sinT * cosAzimuth;
if (zOut) *zOut++ = centerZ;
if (mOut) *mOut++ = centerM;
} |
@lbartoletti I'm confused if you put this standalone benchmark just to answer my question... So please bear with my "too long didn't read" attitude here. Basically I was just curious about the reason why this PR gives such perf boost. In short, is OpenMP implicitly used by your optimizer... ? |
No worries, it was just an interesting test for us!
No, I don't think. |
The QGIS project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 14 days and is being automatically marked as "stale". If you think this pull request should be merged, please check
|
Can you do this so that this change can be merged? |
712b820
to
08ce466
Compare
done. Thanks |
08ce466
to
bfa37d5
Compare
… angle for QgsEllipse Some benchmarks here: https://github.com/lbartoletti/lbartoletti.github.io/tree/master/archives/qgis/ellipse_benchmark
bfa37d5
to
3c5f439
Compare
Description
This PR optimizes the pointsInternal method by caching trigonometric calculations used in ellipse point generation. The implementation precomputes sine and cosine values before generating ellipse points, reducing the number of expensive trigonometric function calls in the main loop.
Performance testing shows significant improvements for large segment counts (>50), while introducing negligible overhead for the default segment count (~32). This trade-off is acceptable as it optimizes the most computationally intensive cases where high-resolution ellipses are required.
benchmark: