|
1 | 1 | Features of fsspec
|
2 | 2 | ==================
|
3 | 3 |
|
4 |
| -Consistent API to many different storage backends. The general API and functionality were |
5 |
| -proven with the projects `s3fs`_ and `gcsfs`_ (along with `hdfs3`_ and `adlfs`_), within the |
6 |
| -context of Dask and independently. These have been tried and tested by many users and shown their |
7 |
| -usefulness over some years. ``fsspec`` aims to build on these and unify their models, as well |
8 |
| -as extract out file-system handling code from Dask which does not so comfortably fit within a |
9 |
| -library designed for task-graph creation and their scheduling. |
10 |
| - |
11 |
| -.. _s3fs: https://s3fs.readthedocs.io/en/latest/ |
12 |
| -.. _gcsfs: https://gcsfs.readthedocs.io/en/latest/ |
13 |
| -.. _hdfs3: https://hdfs3.readthedocs.io/en/latest/ |
14 |
| -.. _adlfs: https://docs.microsoft.com/en-us/azure/data-lake-store/ |
15 |
| - |
16 | 4 | Here follows a brief description of some features of note of ``fsspec`` that provides to make
|
17 | 5 | it an interesting project beyond some other file-system abstractions.
|
18 | 6 |
|
@@ -50,20 +38,31 @@ the initiation of the context which actually does the work of creating file-like
|
50 | 38 | # f is now a real file-like object holding resources
|
51 | 39 | f.read(...)
|
52 | 40 |
|
53 |
| -Random Access and Buffering |
54 |
| ---------------------------- |
55 |
| - |
56 |
| -The :func:`fsspec.spec.AbstractBufferedFile` class is provided as an easy way to build file-like |
57 |
| -interfaces to some service which is capable of providing blocks of bytes. This class is derived |
58 |
| -from in a number of the existing implementations. A subclass of ``AbstractBufferedFile`` provides |
59 |
| -random access for the underlying file-like data (without downloading the whole thing) and |
60 |
| -configurable read-ahead buffers to minimise the number of the read operations that need to be |
61 |
| -performed on the back-end storage. |
| 41 | +File Buffering and random access |
| 42 | +-------------------------------- |
62 | 43 |
|
63 |
| -This is also a critical feature in the big-data access model, where each sub-task of an operation |
| 44 | +Most implementations create file objects which derive from ``fsspec.spec.AbstractBufferedFile``, and |
| 45 | +have many behaviours in common. A subclass of ``AbstractBufferedFile`` provides |
| 46 | +random access for the underlying file-like data (without downloading the whole thing). |
| 47 | +This is a critical feature in the big-data access model, where each sub-task of an operation |
64 | 48 | may need on a small part of a file, and does not, therefore want to be forced into downloading the
|
65 | 49 | whole thing.
|
66 | 50 |
|
| 51 | +These files offer buffering of both read and write operations, so that |
| 52 | +communication with the remote resource is limited. The size of the buffer is generally configured |
| 53 | +with the ``blocksize=`` kwarg at open time, although the implementation may have some minimum or |
| 54 | +maximum sizes that need to be respected. |
| 55 | + |
| 56 | +For reading, a number of buffering schemes are available, listed in ``fsspec.caching.caches`` |
| 57 | +(see :ref:`readbuffering`), or "none" for no buffering at all, e.g., for a simple read-ahead |
| 58 | +buffer, you can do |
| 59 | + |
| 60 | +.. code-block:: python |
| 61 | +
|
| 62 | + fs = fsspec.filesystem(...) |
| 63 | + with fs.open(path, mode='rb', cache_type='readahead') as f: |
| 64 | + use_for_something(f) |
| 65 | +
|
67 | 66 | Transparent text-mode and compression
|
68 | 67 | -------------------------------------
|
69 | 68 |
|
@@ -195,25 +194,6 @@ is called, so that subsequent listing of the given paths will force a refresh. I
|
195 | 194 | addition, some methods like ``ls`` have a ``refresh`` parameter to force fetching
|
196 | 195 | the listing again.
|
197 | 196 |
|
198 |
| -File Buffering |
199 |
| --------------- |
200 |
| - |
201 |
| -Most implementations create file objects which derive from ``fsspec.spec.AbstractBufferedFile``, and |
202 |
| -have many behaviours in common. These files offer buffering of both read and write operations, so that |
203 |
| -communication with the remote resource is limited. The size of the buffer is generally configured |
204 |
| -with the ``blocksize=`` kwargs at open time, although the implementation may have some minimum or |
205 |
| -maximum sizes that need to be respected. |
206 |
| - |
207 |
| -For reading, a number of buffering schemes are available, listed in ``fsspec.caching.caches`` |
208 |
| -(see :ref:`readbuffering`), or "none" for no buffering at all, e.g., for a simple read-ahead |
209 |
| -buffer, you can do |
210 |
| - |
211 |
| -.. code-block:: python |
212 |
| -
|
213 |
| - fs = fsspec.filesystem(...) |
214 |
| - with fs.open(path, mode='rb', cache_type='readahead') as f: |
215 |
| - use_for_something(f) |
216 |
| -
|
217 | 197 | URL chaining
|
218 | 198 | ------------
|
219 | 199 |
|
@@ -344,10 +324,10 @@ shown (or if none are selected, all files are shown).
|
344 | 324 |
|
345 | 325 | The interface provides the following outputs:
|
346 | 326 |
|
347 |
| -- ``.urlpath``: the currently selected item (if any) |
348 |
| -- ``.storage_options``: the value of the kwargs box |
349 |
| -- ``.fs``: the current filesystem instance |
350 |
| -- ``.open_file()``: produces an ``OpenFile`` instance for the current selection |
| 327 | +#. ``.urlpath``: the currently selected item (if any) |
| 328 | +#. ``.storage_options``: the value of the kwargs box |
| 329 | +#. ``.fs``: the current filesystem instance |
| 330 | +#. ``.open_file()``: produces an ``OpenFile`` instance for the current selection |
351 | 331 |
|
352 | 332 | Configuration
|
353 | 333 | -------------
|
@@ -388,16 +368,16 @@ the style ``FSSPEC_{protocol}_{kwargname}=value``.
|
388 | 368 |
|
389 | 369 | Configuration is determined in the following order, with later items winning:
|
390 | 370 |
|
391 |
| -- the contents of ini files, and json files in the config directory, sorted |
392 |
| - alphabetically |
393 |
| -- environment variables |
394 |
| -- the contents of ``fsspec.config.conf``, which can be edited at runtime |
395 |
| -- kwargs explicitly passed, whether with ``fsspec.open``, ``fsspec.filesystem`` |
396 |
| - or directly instantiating the implementation class. |
| 371 | +#. the contents of ini files, and json files in the config directory, sorted |
| 372 | + alphabetically |
| 373 | +#. environment variables |
| 374 | +#. the contents of ``fsspec.config.conf``, which can be edited at runtime |
| 375 | +#. kwargs explicitly passed, whether with ``fsspec.open``, ``fsspec.filesystem`` |
| 376 | + or directly instantiating the implementation class. |
397 | 377 |
|
398 | 378 |
|
399 | 379 | Asynchronous
|
400 |
| -============ |
| 380 | +------------ |
401 | 381 |
|
402 | 382 | Some implementations, those deriving from ``fsspec.asyn.AsyncFileSystem``, have
|
403 | 383 | async/coroutine implementations of some file operations. The async methods have
|
|
0 commit comments