From 79b5ca477a96273151c084862632d8889c2addbf Mon Sep 17 00:00:00 2001
From: Denis Smirnov <denis.smirnov@intel.com>
Date: Tue, 15 Feb 2022 08:52:16 -0600
Subject: [PATCH] Add page 'Compute follows data' to docs

---
 doc/compute_follows_data.rst | 71 ++++++++++++++++++++++++++++++++++++
 doc/index.rst                |  1 +
 2 files changed, 72 insertions(+)
 create mode 100644 doc/compute_follows_data.rst

diff --git a/doc/compute_follows_data.rst b/doc/compute_follows_data.rst
new file mode 100644
index 000000000000..92e768e39f64
--- /dev/null
+++ b/doc/compute_follows_data.rst
@@ -0,0 +1,71 @@
+Compute Follows Data
+====================
+
+Compute follows data means computation on the device where data is placed.
+Assume we placed input data for an algorithm we want to run into GPU memory.
+The algorithm will be ran on GPU and
+resulting data will be placed into GPU memory as well as input data.
+
+Actually data knows which device it is located on and which execution queue associated with the device.
+Based on the queue an algorithm is ran on the respective device.
+
+Actually we don't need to know execution queue.
+We just need to know device where we want to compute an algorithm.
+DPNP provides array constructors like :obj:`dpnp.array`
+that have such parameters as ``device``, ``usm_type`` and ``sycl_queue``.
+These parameters allow us to specify where to place data.
+Also DPNP provides list of functions that don't have such parameters, e.g. :obj:`dpnp.matmul`.
+These functions take the parameters from input data.
+
+Сonsider an example where we create two arrays in GPU memory and
+compute matrix product of the arrays on GPU.
+
+.. code-block:: python
+  :linenos:
+
+  import dpnp
+
+  x = dpnp.array([[1, 1], [1, 1]], device="gpu")
+  y = dpnp.array([[1, 1], [1, 1]], device="gpu")
+
+  res = dpnp.matmul(x, y)
+
+Resulting array is placed in GPU memory as well as input arrays.
+
+.. code-block:: python
+  :linenos:
+  :lineno-start: 7
+  
+  res_device = res.get_array().sycl_device
+  
+  res_device.filter_string  # 'opencl:gpu:0'
+
+Compute follows data prevents computation on data located on different devices.
+In other words compute follows data prevents implicit cross-device data copying.
+Copying makes computation expensive.
+
+Consider an example where we create two arrays on different devices and
+try to compute matrix product of the arrays. As a result ``ValueError`` is raised.
+
+.. code-block:: python
+  :linenos:
+  :lineno-start: 10
+
+  x = dpnp.array([[1, 1], [1, 1]], device="gpu")
+  y = dpnp.array([[1, 1], [1, 1]], device="cpu")
+
+  res = dpnp.matmul(x, y)  # ValueError: execution queue could not be determined ...
+
+Execution queue couldn't be determined, because ``x`` and ``y`` are located on different devices.
+To avoid ``ValueError`` we can do explicit copying of ``y`` to GPU memory before the function call.
+
+.. code-block:: python
+  :linenos:
+  :lineno-start: 14
+
+  y = dpnp.asarray(y, device="gpu")
+
+  res = dpnp.matmul(x, y)
+
+For more information about available devices please refer to
+`DPCtl's devices documentation <https://intelpython.github.io/dpctl/latest/docfiles/user_guides/manual/dpctl/devices.html>`_.
diff --git a/doc/index.rst b/doc/index.rst
index f6a33e9a58cb..d405c22f0bd3 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -35,3 +35,4 @@ along with their limitations.
    reference/index
    dpnp_backend_api
    dpctl
+   compute_follows_data