Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specification for the __binsparse__ protocol #912

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,7 @@ tmp/
*.egg
dist/
.DS_STORE

# pixi environments
.pixi
*.egg-info
33 changes: 28 additions & 5 deletions spec/draft/design_topics/data_interchange.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,17 +85,40 @@ page gives a high-level specification for data exchange in Python using DLPack.
below. They are not required to return an array object from ``from_dlpack``
which conforms to this standard.

binsparse: Extending to sparse arrays
-------------------------------------

Sparse arrays can be represented in-memory by a collection of 1-dimensional and 2-dimensional
dense arrays, alongside some metadata on how to interpret these arrays. This allows us to re-use
the DLPack protocol for the storage of the constituent arrays. The work of specifying the
accompanying metadata has already been performed by the
`binsparse specification <https://graphblas.org/binsparse-specification/>`_.

While initially intended to target file formats, binsparse has relatively few requirements from
back-ends:

1. The ability to represent and parse JSON.
2. The ability to represent/store a key-value store of 1-dimensional (and optionally 2-dimensional)
arrays.

It is the only such specification for sparse representations to have these minimal requirements.
We can satisfy both: the former with the ``json`` built-in Python module or a Python ``dict`` and
the latter with the DLPack protocol.

.. note::
See the `RFC to adopt binsparse <https://github.com/data-apis/array-api/issues/840>`_
for discussion that preceded the adoption of the binsparse protocol.

See :ref:`sparse_interchange` for the Python specification of this protocol.


Non-supported use cases
-----------------------

Use of DLPack requires that the data can be represented by a strided, in-memory
layout on a single device. This covers usage by a large range of, but not all,
known and possible array libraries. Use cases that are not supported by DLPack
include:

- Distributed arrays, i.e., the data residing on multiple nodes or devices,
- Sparse arrays, i.e., sparse representations where a data value (typically
zero) is implicit.
include distributed arrays, i.e., the data residing on multiple nodes or devices.

There may be other reasons why it is not possible or desirable for an
implementation to materialize the array as strided data in memory. In such
Expand Down
1 change: 1 addition & 0 deletions spec/draft/extensions/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ the array API standard. See :ref:`api-specification`.

fourier_transform_functions
linear_algebra_functions
sparse_interchange
99 changes: 99 additions & 0 deletions spec/draft/extensions/sparse_interchange.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
.. _sparse_interchange:

Sparse interchange
==================

Array API specification for sparse interchange functions using `binsparse <https://graphblas.org/binsparse-specification/>`_.

Extension name and usage
------------------------

If implemented, this extension must be retrievable via::

>>> xp = x.__array_namespace__()
>>> if hasattr(xp, 'sparse'):
>>> # Use the extension

To convert an object from another library supporting also supporting the sparse interchange extension::

>>> xp1 = xp1.sparse.from_binsparse(xp2_array) # Convert with the same formats
>>> xp1 = xp1.sparse.from_binsparse(xp2_array, descriptor=binsparse_descriptor)

.. _binsparse_descriptor_examples:

Examples of binsparse descriptors
---------------------------------

While the `binsparse specification <https://graphblas.org/binsparse-specification/>`_ uses JSON for its descriptor,
we will work with equivalent Python objects instead. Here are some examples of binsparse descriptors::

>>> coo_2d_descriptor = {
"binsparse": {
"version": "0.1",
"format": "COOR",
"shape": [10, 12],
"number_of_stored_values": 20,
"data_types": {
"indices_0": "uint64",
"indices_1": "uint64",
"values": "float32",
},
},
"original_source": f"{library_name!s}, version {library_version!s}",
}
>>> csr_2d_descriptor = {
"binsparse": {
"version": "0.1",
"format": "CSR",
"shape": [20, 24],
"number_of_stored_values": 20,
"data_types": {
"pointers_to_1": "uint64",
"indices_1": "uint64",
"values": "float32",
},
},
"original_source": f"{library_name!s}, version {library_version!s}",
}
>>> compressed_vector_descriptor = {
"binsparse": {
"version": "0.1",
"format": "CVEC",
"shape": [30],
"number_of_stored_values": 3,
"data_types": {
"indices_0": "uint64",
"values": "float32",
},
},
"original_source": f"{library_name!s}, version {library_version!s}",
}

Objects in API
--------------

.. currentmodule:: array_api

A conforming implementation of this extension must provide and support the following
functions/methods. In addition, the ``asarray`` method must also be able to convert
objects with supported formats which implement the protocol.

..
NOTE: please keep the functions and their inverse together

.. currentmodule:: array_api.sparse

.. autosummary::
:toctree: generated
:template: method.rst

from_binsparse

.. currentmodule:: array_api

.. autosummary::
:toctree: generated
:template: property.rst

array.__binsparse__
array.__binsparse_descriptor__
1 change: 1 addition & 0 deletions src/array_api_stubs/_draft/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from .utility_functions import *
from . import linalg
from . import fft
from . import sparse
from .info import __array_namespace_info__


Expand Down
47 changes: 47 additions & 0 deletions src/array_api_stubs/_draft/array_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -1246,5 +1246,52 @@ def to_device(
Clarified behavior when a provided ``device`` object corresponds to the device on which an array instance resides.
"""

def __binsparse_descriptor__(self) -> dict:
"""
Returns a `dict` equivalent to a parsed `binsparse JSON descriptor <https://graphblas.org/binsparse-specification/>`_.

Parameters
----------
self: array
array instance.

Returns
-------
out: dict
A ``dict`` equivalent to a parsed JSON binsparse descriptor of an array. See :ref:`sparse_interchange` for details.
"""

def __binsparse__(
self, /, *, descriptor: Optional[dict] = None
) -> dict[str, array]:
"""
Returns a key-value store of the constituent arrays of a sparse array, as specified by the `binsparse specification <https://graphblas.org/binsparse-specification/>`_.

Parameters
----------
self: array
array instance.
descriptor: Optional[dict]
If ``descriptor`` is not ``None``, the data returned must be in the format specified by it.

Returns
-------
out: dict[str, array]
A ``dict`` equivalent to a parsed JSON binsparse descriptor of an array. See :ref:`sparse_interchange` for details.

Raises
------
TypeError
If ``descriptor`` is not ``None``, and the array library does not support converting to a format specified by it.
ValueError
If ``descriptor`` is not a valid binsparse descriptor.

Notes
-----

- ``x.__binsparse_descriptor__()["binsparse"]["data_types"].keys() == x.__binsparse__().keys()`` must hold.
- ``descriptor["binsparse"]["data_types"].keys() == x.__binsparse__(descriptor=descriptor).keys()`` must hold.
"""


array = _array
78 changes: 78 additions & 0 deletions src/array_api_stubs/_draft/sparse.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from __future__ import annotations

from typing import Optional
from ._types import array, device

__all__ = ["from_binsparse"]


def from_binsparse(
x: object,
/,
*,
descriptor: Optional[dict] = None,
device: Optional[device] = None,
copy: Optional[bool] = None,
) -> array:
"""
Returns a new array containing the data from another (array) object with a ``__binsparse__`` method,
assuming the format specified in `descriptor` is supported in this library.

Parameters
----------
x: object
input (array) object.
descriptor: Optional[dict]
If ``descriptor`` is ``None``, the array must be returned in the format in which it is stored or materializable to.
Otherwise, it must be converted to the format specified by ``descriptor``.

If ``copy`` is ``False``, no conversion should be performed, and only stored data should be returned.

If the format specified by ``descriptor`` is unsupported by the library, a ``TypeError`` must be raised.
device: Optional[device]
device on which to place the created array. If ``device`` is ``None`` and ``x`` supports binsparse, the output array
must be on the same device as ``x``. Default: ``None``.

The v2023.12 standard only mandates that a compliant library should offer a way for ``from_binsparse`` to return an array
whose underlying memory is accessible to the Python interpreter, when the corresponding ``device`` is provided. If the
array library does not support such cases at all, the function must raise ``BufferError``. If a copy must be made to
enable this support but ``copy`` is set to ``False``, the function must raise ``ValueError``.

Other device kinds will be considered for standardization in a future version of this API standard.
copy: Optional[bool]
boolean indicating whether or not to copy the input. If ``True``, the function must always copy. If ``False``, the function must never copy, and raise ``BufferError`` in case a copy is deemed necessary (e.g. if a cross-device data movement is requested, and it is not possible without a copy). If ``None``, the function must reuse the existing memory buffer if possible and copy otherwise. Default: ``None``.


Returns
-------
out: array
an array containing the data in `arrays` with a format specified by `descriptor`.

.. admonition:: Note
:class: note

The returned array may be either a copy or a view. See :ref:`data-interchange` for details.

Raises
------
BufferError
The ``__binsparse__``, ``__binsparse_descriptor__``, ``__dlpack__`` or ``__dlpack_device__``
methods on the input array or constituent arrays may raise ``BufferError`` when the data
cannot be exported as a binsparse-compatible array. (e.g., incompatible dtype, strides, or
device). It may also raise other errors when export fails for other reasons (e.g., not
enough memory available to materialize the data). ``from_dlpack`` must propagate such
exceptions.
AttributeError
If the ``__binsparse__`` and ``__binsparse_descriptor__`` methods are not present
on the input array. This may happen for libraries that are never able
to export their data with binsparse.
ValueError
If data exchange is possible via an explicit copy but ``copy`` is set to ``False``, or if the specified
descriptor is not valid.
TypeError
If ``descriptor`` is ``None``, the data received from the source library is not guaranteed to
be in a format that the target array library supports. In this case, a ``TypeError`` must be raised.
Additionally, if ``descriptor`` is not ``None``, it must be passed along to ``__binsparse__``, which
may raise a ``TypeError`` if the conversion is unsupported by the source library, which
``from_binsparse`` must propagate.
"""