Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-129695: Optimize _PyFloat_FromDouble_ConsumeInputs() #129697

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Feb 5, 2025

Add optimization for _PyFloat_FromDouble_ConsumeInputs() for the free-threaded build. Implement free-threaded versions of _Py_DECREF_SPECIALIZED and _Py_DECREF_NO_DEALLOC as well.

pyperformance results vs merge base

Based on a microbenchmark I wrote, this seems to give a small speed up for binary operations on floats, it there are temporary results (refcnt == 1) that can be re-used. The _Py_DECREF_SPECIALIZED and _Py_DECREF_SPECIALIZED changes seem to help as well but only a little (hard to accurately measure).

The longobject.c file also uses _Py_DECREF_SPECIALIZED and I guess that might be the reason the pidigits benchmark got faster. I not sure why the pyflate benchmark got slower. Perhaps due to _Py_DECREF_SPECIALIZED taking the slow path too often.

Add optimization for ``_PyFloat_FromDouble_ConsumeInputs()`` for the
free-threaded build.  Implement free-threaded versions of
``_Py_DECREF_SPECIALIZED`` and ``_Py_DECREF_NO_DEALLOC`` as well.
These calls need to be be wrapped in Py_REF_DEBUG ifdefs.
@nascheme nascheme marked this pull request as ready for review February 6, 2025 03:48
@nascheme nascheme requested a review from mpage February 6, 2025 03:49
}

#else // Py_GIL_DISABLED

PyObject *_PyFloat_FromDouble_ConsumeInputs(_PyStackRef left, _PyStackRef right, double value)
Copy link
Contributor

@colesbury colesbury Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should inline this into the call sites, even at the cost of repeating code. I don't think the stackpointer manipulations will be correct otherwise. (cc @markshannon)

We should also use PyStackRef_CLOSE_SPECIALIZED instead of _Py_DECREF_SPECIALIZED. The same applies to _Py_DECREF_NO_DEALLOC, but I think that stackref variant will need to be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants