Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dask #89

Merged
merged 12 commits into from
Feb 27, 2024
Merged

Fix dask #89

merged 12 commits into from
Feb 27, 2024

Conversation

lithomas1
Copy link
Contributor

No description provided.

sort functionality (no `sort` or `argsort`), and limited support for the optional `linalg`
and `fft` extensions.

In particular, the `fft` namespace is not compliant with the array API spec. Any functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, is the plan to implement array_api_compat.dask.{fft, linalg} or wait for support from dask itself? A similar question w.r.t JAX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't attempted to wrap fft yet - waiting on #78 to do so.

Linalg can only be partially supported by us since there's missing methods in dask.

@lithomas1
Copy link
Contributor Author

@asmeurer

if is_numpy_array(x):
if is_numpy_array(x) or is_dask_array(x):
# TODO: dask technically can support GPU arrays
# Detecting the array backend isn't easy for dask, though, so just return CPU for now
return "cpu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted on the other PR, it would probably be better to use some kind of basic DaskDevice object here instead of the string "cpu", given that CPU isn't necessarily an accurate description of the device dask is running on. See https://github.com/data-apis/array-api-strict/blob/main/array_api_strict/_array_object.py#L43-L49 for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I return cpu now only if the type of the array backing the dask array is a ndarray.

The rest of the time, I return a DaskDevice.

Is this something close to what you wanted?

(We might be able to do this for cupy, but it's tricky for e.g. multigpu cases I guess)

@asmeurer
Copy link
Member

asmeurer commented Feb 9, 2024

@honno we've been getting some interesting errors from hypothesis:

___________________________________ test_std ___________________________________

x = array([ 1.34078079e+154,  1.34078079e+154, -1.19209290e-007,
        1.34078079e+154])
data = data(...)

    @given(
        x=hh.arrays(
            dtype=xps.floating_dtypes(),
            shape=hh.shapes(min_side=1),
            elements={"allow_nan": False},
        ).filter(lambda x: math.prod(x.shape) >= 2),
        data=st.data(),
    )
    def test_std(x, data):
        axis = data.draw(hh.axes(x.ndim), label="axis")
        _axes = sh.normalise_axis(axis, x.ndim)
        N = sum(side for axis, side in enumerate(x.shape) if axis not in _axes)
        correction = data.draw(
            st.floats(0.0, N, allow_infinity=False, allow_nan=False) | st.integers(0, N),
            label="correction",
        )
        _keepdims = data.draw(st.booleans(), label="keepdims")
>       kw = data.draw(
            hh.specified_kwargs(
                ("axis", axis, None),
                ("correction", correction, 0.0),
                ("keepdims", _keepdims, False),
            ),
            label="kw",
        )

array_api_tests/test_statistical_functions.py:207: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/strategies/_internal/core.py:2138: in draw
    result = self.conjecture_data.draw(strategy, observe_as=f"generate:{desc}")
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/data.py:1751: in draw
    return strategy.do_draw(self)
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/strategies/_internal/lazy.py:163: in do_draw
    return data.draw(self.wrapped_strategy)
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/data.py:1746: in draw
    return strategy.do_draw(self)
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/strategies/_internal/core.py:1765: in do_draw
    return self.definition(data.draw, *self.args, **self.kwargs)
array_api_tests/hypothesis_helpers.py:506: in specified_kwargs
    if value is not default or draw(booleans()):
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/data.py:1746: in draw
    return strategy.do_draw(self)
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/strategies/_internal/lazy.py:163: in do_draw
    return data.draw(self.wrapped_strategy)
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/data.py:1746: in draw
    return strategy.do_draw(self)
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/strategies/_internal/misc.py:123: in do_draw
    return data.draw_boolean()
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/data.py:1665: in draw_boolean
    self.observer.draw_boolean(
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:884: in draw_boolean
    self.draw_value("boolean", value, was_forced=was_forced, kwargs=kwargs)
/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:952: in draw_value
    inconsistent_generation()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def inconsistent_generation():
>       raise Flaky(
            "Inconsistent data generation! Data generation behaved differently "
            "between different runs. Is your data generation depending on external "
            "state?"
        )
E       hypothesis.errors.Flaky: Inconsistent data generation! Data generation behaved differently between different runs. Is your data generation depending on external state?

/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:50: Flaky

During handling of the above exception, another exception occurred:

    @given(
>       x=hh.arrays(
            dtype=xps.floating_dtypes(),
            shape=hh.shapes(min_side=1),
            elements={"allow_nan": False},
        ).filter(lambda x: math.prod(x.shape) >= 2),
        data=st.data(),
    )

array_api_tests/test_statistical_functions.py:191: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <hypothesis.internal.conjecture.datatree.TreeRecordingObserver object at 0x7fc7f8ed7d90>
status = Status.INTERESTING
interesting_origin = InterestingOrigin(exc_type=<class 'hypothesis.errors.Flaky'>, filename='/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py', lineno=50, context=(), group_elems=())

    def conclude_test(self, status, interesting_origin):
        """Says that ``status`` occurred at node ``node``. This updates the
        node if necessary and checks for consistency."""
        if status == Status.OVERRUN:
            return
        i = self.__index_in_current_node
        node = self.__current_node
    
        if i < len(node.values) or isinstance(node.transition, Branch):
            inconsistent_generation()
    
        new_transition = Conclusion(status, interesting_origin)
    
        if node.transition is not None and node.transition != new_transition:
            # As an, I'm afraid, horrible bodge, we deliberately ignore flakiness
            # where tests go from interesting to valid, because it's much easier
            # to produce good error messages for these further up the stack.
            if isinstance(node.transition, Conclusion) and (
                node.transition.status != Status.INTERESTING
                or new_transition.status != Status.VALID
            ):
>               raise Flaky(
                    f"Inconsistent test results! Test case was {node.transition!r} "
                    f"on first run but {new_transition!r} on second"
                )
E               hypothesis.errors.Flaky: Inconsistent test results! Test case was Conclusion(status=Status.VALID, interesting_origin=None) on first run but Conclusion(status=Status.INTERESTING, interesting_origin=InterestingOrigin(exc_type=<class 'hypothesis.errors.Flaky'>, filename='/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py', lineno=50, context=(), group_elems=())) on second

/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:1007: Flaky

Any idea what is going on here?

setup.py Outdated
@@ -8,7 +8,7 @@
setup(
name='array_api_compat',
version=array_api_compat.__version__,
packages=find_packages(include=['array_api_compat*']),
packages=find_namespace_packages(include=["array_api_compat*"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_api_compat.dask doesn't contain an __init__.py file, so setuptools ended up skipping it when doing an install.

From the setuptools docs,
image

IIUC, this doesn't show up in the CI since the CI runs array-api-compat from inside the checked out repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add an __init__.py file. I don't think it's a good idea to mess with namespace packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I reverted the change.

@asmeurer
Copy link
Member

Just pushed a merge conflict fix from the recent PR I just merged. A bunch of improvements to the linalg tests were recently merged to array-api-tests and it looks like there are some issues with the dask linalg wrappers. I now most of the linalg tests are skipped, but you may want to have another look at them. For example, there is this test failure

    | Traceback (most recent call last):
    |   File "/Users/aaronmeurer/Documents/array-api-tests/array_api_tests/test_linalg.py", line 127, in test_cholesky
    |     res = linalg.cholesky(x, **kw)
    |           ^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/aaronmeurer/Documents/array-api-compat/array_api_compat/_internal.py", line 28, in wrapped_f
    |     return f(*args, xp=xp, **kwargs)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/aaronmeurer/Documents/array-api-compat/array_api_compat/common/_linalg.py", line 63, in cholesky
    |     if get_xp(xp)(isdtype)(U.dtype, 'complex floating'):
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/aaronmeurer/Documents/array-api-compat/array_api_compat/_internal.py", line 28, in wrapped_f
    |     return f(*args, xp=xp, **kwargs)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/aaronmeurer/Documents/array-api-compat/array_api_compat/common/_aliases.py", line 540, in isdtype
    |     return xp.issubdtype(dtype, xp.complexfloating)
    |            ^^^^^^^^^^^^^
    | AttributeError: module 'dask.array' has no attribute 'issubdtype'
    | Falsifying example: test_cholesky(
    |     x=dask.array<eye, shape=(0, 0), dtype=float32, chunksize=(0, 0), chunktype=numpy.ndarray>,  # or any other generated value
    |     kw={'upper': True},
    | )

which looks like a bug in the wrapper code.

Also, some of the tests fail and are not currently skipped here, so they will need to either be fixed or skipped as well.

@lithomas1
Copy link
Contributor Author

lithomas1 commented Feb 26, 2024

Thanks, can you hold off on merging anything else for now (to avoid more conflicts)?

I'll try to get everything passing this evening/tomorrow morning.
(I wanna do one full runthrough of the tests including the ones that are now xpassing).

@asmeurer
Copy link
Member

Certainly. Sorry for all the churn here that's been difficult to keep up with.

@lithomas1
Copy link
Contributor Author

CI should be as green as it gets at this point.

I ended up skipping most of the failing linalg tests, they at least don't affect what I'm trying with scikit-learn (svd is the important thing there).

@asmeurer asmeurer merged commit 560d189 into data-apis:main Feb 27, 2024
20 of 27 checks passed
@lithomas1 lithomas1 deleted the fix-dask branch February 27, 2024 23:23
@lithomas1
Copy link
Contributor Author

Thanks for the reviews !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants