chore(optimization): Tiny optimization for pointsInternal #60138

lbartoletti · 2025-01-14T10:47:15Z

Description

This PR optimizes the pointsInternal method by caching trigonometric calculations used in ellipse point generation. The implementation precomputes sine and cosine values before generating ellipse points, reducing the number of expensive trigonometric function calls in the main loop.
Performance testing shows significant improvements for large segment counts (>50), while introducing negligible overhead for the default segment count (~32). This trade-off is acceptable as it optimizes the most computationally intensive cases where high-resolution ellipses are required.

benchmark:

236: Segments: 4
236: toLineString Time: 2 µs
236: toLineString2 Time: 3 µs
236: ------------------------------------
236: Segments: 8
236: toLineString Time: 2 µs
236: toLineString2 Time: 2 µs
236: ------------------------------------
236: Segments: 16
236: toLineString Time: 2 µs
236: toLineString2 Time: 2 µs
236: ------------------------------------
236: Segments: 32
236: toLineString Time: 4 µs
236: toLineString2 Time: 6 µs
236: ------------------------------------
236: Segments: 64
236: toLineString Time: 6 µs
236: toLineString2 Time: 5 µs
236: ------------------------------------
236: Segments: 128
236: toLineString Time: 12 µs
236: toLineString2 Time: 9 µs
236: ------------------------------------
236: Segments: 256
236: toLineString Time: 19 µs
236: toLineString2 Time: 17 µs
236: ------------------------------------
236: Segments: 512
236: toLineString Time: 37 µs
236: toLineString2 Time: 33 µs
236: ------------------------------------
236: Segments: 1024
236: toLineString Time: 73 µs
236: toLineString2 Time: 72 µs
236: ------------------------------------
236: Segments: 2048
236: toLineString Time: 153 µs
236: toLineString2 Time: 132 µs
236: ------------------------------------
236: Segments: 4096
236: toLineString Time: 303 µs
236: toLineString2 Time: 268 µs
236: ------------------------------------
236: Segments: 8192
236: toLineString Time: 607 µs
236: toLineString2 Time: 552 µs
236: ------------------------------------
236: Segments: 16384
236: toLineString Time: 1200 µs
236: toLineString2 Time: 1123 µs
236: ------------------------------------
236: Segments: 32768
236: toLineString Time: 2510 µs
236: toLineString2 Time: 2177 µs
236: ------------------------------------
236: Segments: 65536
236: toLineString Time: 4882 µs
236: toLineString2 Time: 4486 µs
236: ------------------------------------
236: Segments: 131072
236: toLineString Time: 10016 µs
236: toLineString2 Time: 9122 µs
236: ------------------------------------
236: Segments: 262144
236: toLineString Time: 19958 µs
236: toLineString2 Time: 18301 µs
236: ------------------------------------
236: Segments: 524288
236: toLineString Time: 39929 µs
236: toLineString2 Time: 36671 µs
236: ------------------------------------

github-actions · 2025-01-14T11:10:07Z

🪟 Windows builds

Download Windows builds of this PR for testing.
Debug symbols for this build are available here.
(Built from commit 3c5f439)

🪟 Windows Qt6 builds

Download Windows Qt6 builds of this PR for testing.
(Built from commit 3c5f439)

uclaros · 2025-01-14T16:12:48Z

Are those results on a release build?
I'd expect the compiler/cpu to render this kind of optimization unnecessary.

rouault · 2025-01-14T23:06:57Z

I'd expect the compiler/cpu to render this kind of optimization unnecessary.

I'm not sure the compiler (before C++26 which will declare them constexpr: https://en.cppreference.com/w/cpp/numeric/math/cos) can infer that the output of std::cos() doesn't change for a given output.
But instead of creating an array, which takes time, I'd suggest to just to just create temporary c = std::cos(t[i]) and s= std::sin(t[i]) variables in the loop.

reducing the number of expensive trigonometric function calls in the main loop.

hum, I would expect that your optimization would just divide the time by 2, not by 10. Did you look at the dissembled code to understand why you get such perf gain ? Which compiler do you use ? Maybe OpenMP is used and the array computation is multithreaded ? In any case adding a comment in the code explaining why the optimization works and under which circumstances would make later maintenance easier.

lbartoletti · 2025-01-21T07:47:31Z

Are those results on a release build?

@uclaros I'm live only in a Release mode 😉

I'd expect the compiler/cpu to render this kind of optimization unnecessary.

I'm not sure the compiler (before C++26 which will declare them constexpr: https://en.cppreference.com/w/cpp/numeric/
math/cos) can infer that the output of std::cos() doesn't change for a given output. But instead of creating an array, which takes time, I'd suggest to just to just create temporary c = std::cos(t[i]) and s= std::sin(t[i]) variables in the loop.

reducing the number of expensive trigonometric function calls in the main loop.

hum, I would expect that your optimization would just divide the time by 2, not by 10. Did you look at the dissembled code to understand why you get such perf gain ? Which compiler do you use ? Maybe OpenMP is used and the array computation is multithreaded ? In any case adding a comment in the code explaining why the optimization works and under which circumstances would make later maintenance easier.

@rouault I did a more extensive test, which you can find here. It's not real QGIS anymore, but I think it's still representative.

@ptitjano and I have been talking about testing OpenMP on QGIS for several years now, and I've been looking a lot at what's being done on GRASS. We're thinking of proposing a QEP within the year, but first we need to do some testing (and learn OpenMP properly).

In the meantime, for this test, you can see that depending on the number of threads used, OpenMP is interesting around 256 vertices. In any case, it's better to use method 2 with the creation of cos/sin in the loop, as you suggest, and not to pre-calculate it as I did.

So we have choice between:

OpenMP ready

for (int i = 0; i < static_cast<int>(segments); ++i) {
            const double cosT{ std::cos(t[i]) };
            const double sinT{ std::sin(t[i]) };
            x[i] = centerX + mSemiMajorAxis * cosT * cosAzimuth -
                mSemiMinorAxis * sinT * sinAzimuth;
            y[i] = centerY + mSemiMajorAxis * cosT * sinAzimuth +
                mSemiMinorAxis * sinT * cosAzimuth;
            if (hasZ) z[i] = centerZ;
            if (hasM) m[i] = centerM;
        }

Actual with cos/sin calculated only one time.

 for (double it : t) {
            const double cosT{ std::cos(it) };
            const double sinT{ std::sin(it) };
            *xOut++ = centerX + mSemiMajorAxis * cosT * cosAzimuth -
                mSemiMinorAxis * sinT * sinAzimuth;
            *yOut++ = centerY + mSemiMajorAxis * cosT * sinAzimuth +
                mSemiMinorAxis * sinT * cosAzimuth;
            if (zOut) *zOut++ = centerZ;
            if (mOut) *mOut++ = centerM;
        }

rouault · 2025-01-21T09:49:06Z

@rouault I did a more extensive test, which you can find here. It's not real QGIS anymore, but I think it's still representative.

@lbartoletti I'm confused if you put this standalone benchmark just to answer my question... So please bear with my "too long didn't read" attitude here. Basically I was just curious about the reason why this PR gives such perf boost. In short, is OpenMP implicitly used by your optimizer... ?

lbartoletti · 2025-01-29T06:59:16Z

@rouault I did a more extensive test, which you can find here. It's not real QGIS anymore, but I think it's still representative.

@lbartoletti I'm confused if you put this standalone benchmark just to answer my question... So please bear with my "too long didn't read" attitude here.

No worries, it was just an interesting test for us!

Basically I was just curious about the reason why this PR gives such perf boost. In short, is OpenMP implicitly used by your optimizer... ?

No, I don't think.

github-actions · 2025-02-13T02:46:48Z

The QGIS project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 14 days and is being automatically marked as "stale". If you think this pull request should be merged, please check

that all unit tests are passing
that all comments by reviewers have been addressed
that there is enough information for reviewers, in particular
- link to any issues which this pull request fixes
- add a description of workflows which this pull request fixes
- add screenshots if applicable
that you have written unit tests where possible
In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request.
If there is no further activity on this pull request, it will be closed in a week.

nyalldawson · 2025-02-14T22:49:46Z

@lbartoletti

But instead of creating an array, which takes time, I'd suggest to just to just create temporary c = std::cos(t[i]) and s= std::sin(t[i]) variables in the loop.

Can you do this so that this change can be merged?

lbartoletti · 2025-02-17T10:14:25Z

@lbartoletti

But instead of creating an array, which takes time, I'd suggest to just to just create temporary c = std::cos(t[i]) and s= std::sin(t[i]) variables in the loop.

Can you do this so that this change can be merged?

done. Thanks

… angle for QgsEllipse Some benchmarks here: https://github.com/lbartoletti/lbartoletti.github.io/tree/master/archives/qgis/ellipse_benchmark

github-actions bot added this to the 3.42.0 milestone Jan 14, 2025

lbartoletti self-assigned this Jan 14, 2025

lbartoletti added API API improvement only, no visible user interface changes backport release-3_34 backport release-3_40 labels Jan 14, 2025

agiudiceandrea removed the backport release-3_34 label Jan 19, 2025

github-actions bot added the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Feb 13, 2025

github-actions bot removed the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Feb 14, 2025

lbartoletti force-pushed the ellipse_pointsInternal_tinyoptim branch from 712b820 to 08ce466 Compare February 17, 2025 10:14

lbartoletti force-pushed the ellipse_pointsInternal_tinyoptim branch from 08ce466 to bfa37d5 Compare February 17, 2025 12:57

chore(optimization): Tiny optimization with pre-computed cos/sin from…

3c5f439

… angle for QgsEllipse Some benchmarks here: https://github.com/lbartoletti/lbartoletti.github.io/tree/master/archives/qgis/ellipse_benchmark

lbartoletti force-pushed the ellipse_pointsInternal_tinyoptim branch from bfa37d5 to 3c5f439 Compare February 19, 2025 09:28

ptitjano self-requested a review February 19, 2025 09:48

ptitjano approved these changes Feb 19, 2025

View reviewed changes

lbartoletti merged commit f08eda1 into qgis:master Feb 19, 2025
31 checks passed

lbartoletti deleted the ellipse_pointsInternal_tinyoptim branch February 19, 2025 13:08

qgis-bot mentioned this pull request Feb 19, 2025

[Backport release-3_40] chore(optimization): Tiny optimization for pointsInternal #60669

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(optimization): Tiny optimization for pointsInternal #60138

chore(optimization): Tiny optimization for pointsInternal #60138

lbartoletti commented Jan 14, 2025

github-actions bot commented Jan 14, 2025 •

edited

Loading

uclaros commented Jan 14, 2025

rouault commented Jan 14, 2025

lbartoletti commented Jan 21, 2025

rouault commented Jan 21, 2025

lbartoletti commented Jan 29, 2025

github-actions bot commented Feb 13, 2025

nyalldawson commented Feb 14, 2025

lbartoletti commented Feb 17, 2025

chore(optimization): Tiny optimization for pointsInternal #60138

chore(optimization): Tiny optimization for pointsInternal #60138

Conversation

lbartoletti commented Jan 14, 2025

Description

github-actions bot commented Jan 14, 2025 • edited Loading

🪟 Windows builds

🪟 Windows Qt6 builds

uclaros commented Jan 14, 2025

rouault commented Jan 14, 2025

lbartoletti commented Jan 21, 2025

rouault commented Jan 21, 2025

lbartoletti commented Jan 29, 2025

github-actions bot commented Feb 13, 2025

nyalldawson commented Feb 14, 2025

lbartoletti commented Feb 17, 2025

github-actions bot commented Jan 14, 2025 •

edited

Loading