DES: How much perf penalty will we accept to get rid of libreduction? #40263

jbrockmendel · 2021-03-06T03:35:31Z

libreduction and the associated callers are a disproportionate maintenance headache [citation needed]. It would be nice to be able to rip it out and just have one path for those methods, but that would entail a non-trivial performance hit. Recently, though, we've managed to optimize the pure-python path a bit, and im optimistic we can shave off some more of the difference.

The question: how much of a perf penalty are we willing to accept in order to remove libreduction?

Copying from #40171 (comment)

I'll throw out a number: if we could get worst-case down to within about 3x and most-cases within 1.5x, I'd be open to removing the cython paths. They are a disproportionate producer of headaches. (From #36459 (possibly out of date) "entirely disabling the cython path leads to 4 currently-xfailed tests passing")

Besides which, if/when cython3 becomes available, we may have to get rid of these anyway.

jorisvandenbossche · 2021-04-02T18:14:44Z

The referenced benchmark is:

N = 10 ** 4
labels = np.random.randint(0, 2000, size=N)
labels2 = np.random.randint(0, 3, size=N)
df = DataFrame(
    {
        "key": labels,
        "key2": labels2,
        "value1": np.random.randn(N),
        "value2": ["foo", "bar", "baz", "qux"] * (N // 4),
    }
)
df.groupby("key").apply(lambda x: 1)

Running this with current master, I get:

In [2]: %timeit df.groupby("key").apply(lambda x: 1)
5.45 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

When disabling the usage of libreduction fast_apply using this patch:

--- a/pandas/core/groupby/ops.py
+++ b/pandas/core/groupby/ops.py
@@ -390,6 +390,7 @@ class BaseGrouper:
             # for now -> relies on BlockManager internals
             pass
         elif (
+            False and
             com.get_callable_name(f) not in base.plotting_methods
             and isinstance(splitter, FrameSplitter)
             and axis == 0

I get the following timing:

In [4]: %timeit df.groupby("key").apply(lambda x: 1)
16.7 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

So a 3-4x slowdown.

However, the applied function is not doing anything useful (just returning a constant), so basically this benchmark is only measuring the overhead. And whether a 3-4x slowdown in the overhead is significant in a real use case, depends on how much time this overhead itself takes.

So using a slightly more complex example, calculating the mean of one of the columns (which is still a relatively simple/fast function, I think). With master, this gives

In [4]: %timeit df.groupby("key").apply(lambda x: x['value1'].mean())
147 ms ± 5.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

with libreduction disabled, I get:

In [6]: %timeit df.groupby("key").apply(lambda x: x['value1'].mean())
182 ms ± 3.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So still slower, but no longer a 3-4x slowdown.
And this slowdown is for me in the range of what's acceptable to get rid of libreduction, I think.

(it's probably useful to see if those numbers are similar on different machines)

mroeschke · 2021-08-19T01:45:24Z

@jbrockmendel did #42992 close this?

jbrockmendel · 2021-08-19T04:06:23Z

Only half of it

jbrockmendel · 2021-10-02T03:07:32Z

Closed by #43189

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 6, 2021

jbrockmendel mentioned this issue Mar 9, 2021

REF: de-duplicate Block.__init__ #38134

Merged

5 tasks

jbrockmendel mentioned this issue Mar 24, 2021

PERF: cache_readonly for Block properties #40620

Merged

This was referenced Apr 20, 2021

PERF: put BlockManager constructor in cython #40842

Merged

[ArrayManager] Add libreduction frame Slider for ArrayManager #40171

Closed

jbrockmendel mentioned this issue May 11, 2021

Cython 3.0 Checklist #34213

Open

10 tasks

jbrockmendel mentioned this issue Jul 7, 2021

PERF: Try fast/slow paths only once in DataFrameGroupby.transform #42195

Merged

3 tasks

jbrockmendel mentioned this issue Aug 23, 2021

REF: remove libreduction.SeriesBinGrouper #43189

Merged

4 tasks

jbrockmendel closed this as completed Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DES: How much perf penalty will we accept to get rid of libreduction? #40263

DES: How much perf penalty will we accept to get rid of libreduction? #40263

jbrockmendel commented Mar 6, 2021

jorisvandenbossche commented Apr 2, 2021

mroeschke commented Aug 19, 2021

jbrockmendel commented Aug 19, 2021

jbrockmendel commented Oct 2, 2021

DES: How much perf penalty will we accept to get rid of libreduction? #40263

DES: How much perf penalty will we accept to get rid of libreduction? #40263

Comments

jbrockmendel commented Mar 6, 2021

jorisvandenbossche commented Apr 2, 2021

mroeschke commented Aug 19, 2021

jbrockmendel commented Aug 19, 2021

jbrockmendel commented Oct 2, 2021