gh-129695: Optimize `_PyFloat_FromDouble_ConsumeInputs()` #129697

nascheme · 2025-02-05T19:12:26Z

Add optimization for _PyFloat_FromDouble_ConsumeInputs() for the free-threaded build. Implement free-threaded versions of _Py_DECREF_SPECIALIZED and _Py_DECREF_NO_DEALLOC as well.

pyperformance results vs merge base

Based on a microbenchmark I wrote, this seems to give a small speed up for binary operations on floats, it there are temporary results (refcnt == 1) that can be re-used. The _Py_DECREF_SPECIALIZED and _Py_DECREF_SPECIALIZED changes seem to help as well but only a little (hard to accurately measure).

The longobject.c file also uses _Py_DECREF_SPECIALIZED and I guess that might be the reason the pidigits benchmark got faster. I not sure why the pyflate benchmark got slower. Perhaps due to _Py_DECREF_SPECIALIZED taking the slow path too often.

Issue: Add free-threading optimization for _PyFloat_FromDouble_ConsumeInputs #129695

Add optimization for ``_PyFloat_FromDouble_ConsumeInputs()`` for the free-threaded build. Implement free-threaded versions of ``_Py_DECREF_SPECIALIZED`` and ``_Py_DECREF_NO_DEALLOC`` as well.

These calls need to be be wrapped in Py_REF_DEBUG ifdefs.

colesbury · 2025-02-06T17:47:37Z

Objects/floatobject.c

-}
-
-#else // Py_GIL_DISABLED
-
 PyObject *_PyFloat_FromDouble_ConsumeInputs(_PyStackRef left, _PyStackRef right, double value)


I think we should inline this into the call sites, even at the cost of repeating code. I don't think the stackpointer manipulations will be correct otherwise. (cc @markshannon)

We should also use PyStackRef_CLOSE_SPECIALIZED instead of _Py_DECREF_SPECIALIZED. The same applies to _Py_DECREF_NO_DEALLOC, but I think that stackref variant will need to be added.

nascheme added 2 commits February 5, 2025 10:57

Optimize _PyFloat_FromDouble_ConsumeInputs()

b856732

Add optimization for ``_PyFloat_FromDouble_ConsumeInputs()`` for the free-threaded build. Implement free-threaded versions of ``_Py_DECREF_SPECIALIZED`` and ``_Py_DECREF_NO_DEALLOC`` as well.

Add NEWS.

970949e

bedevere-app bot mentioned this pull request Feb 5, 2025

Add free-threading optimization for _PyFloat_FromDouble_ConsumeInputs #129695

Open

Fix bug, _Py_DECREF_DecRefTotal().

ecf15ed

These calls need to be be wrapped in Py_REF_DEBUG ifdefs.

nascheme marked this pull request as ready for review February 6, 2025 03:48

bedevere-app bot added the awaiting core review label Feb 6, 2025

nascheme requested a review from mpage February 6, 2025 03:49

colesbury reviewed Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-129695: Optimize `_PyFloat_FromDouble_ConsumeInputs()` #129697

gh-129695: Optimize `_PyFloat_FromDouble_ConsumeInputs()` #129697

nascheme commented Feb 5, 2025 •

edited

Loading

colesbury Feb 6, 2025 •

edited

Loading

gh-129695: Optimize _PyFloat_FromDouble_ConsumeInputs() #129697

Are you sure you want to change the base?

gh-129695: Optimize _PyFloat_FromDouble_ConsumeInputs() #129697

Conversation

nascheme commented Feb 5, 2025 • edited Loading

colesbury Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

gh-129695: Optimize `_PyFloat_FromDouble_ConsumeInputs()` #129697

gh-129695: Optimize `_PyFloat_FromDouble_ConsumeInputs()` #129697

nascheme commented Feb 5, 2025 •

edited

Loading

colesbury Feb 6, 2025 •

edited

Loading