You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Sky Storage object represents an abstract data store containing large data
6
7
files required by the task. Compared to file_mounts, storage is faster and
@@ -9,13 +10,20 @@ Behind the scenes, storage automatically uploads all data in the source
9
10
to a backing object store in a particular cloud (S3/GCS/Azure Blob).
10
11
11
12
A storage object is used by "mounting" it to a task. On mounting, the data
12
-
specified in the source becomes available at the destination mount_path.
13
-
Please note that sky.Storage does not guarantee preservation of file
14
-
permissions - you may need to set file permissions during task execution.
13
+
specified in the source becomes available at the destination mount path.
14
+
15
+
A storage object can used in either :code:`MOUNT` mode or :code:`COPY` mode.
16
+
17
+
* In :code:`MOUNT` mode, the backing store is directly "mounted" to the remote VM.
18
+
I.e., files are fetched when accessed by the task and files written to the
19
+
mount path are also written to the remote store.
20
+
21
+
* In :code:`COPY` mode, the files are pre-fetched and cached on the local disk.
22
+
Writes are not replicated on the remote store.
15
23
16
24
.. note::
17
-
Sky file mounting currently does not support syncing writes.
18
-
Any writes made at a mounted folder will not reflect at the mounting source.
25
+
sky.Storage does not guarantee preservation of file
26
+
permissions - you may need to set file permissions during task execution.
19
27
20
28
Using Sky Storage
21
29
-----------------
@@ -27,141 +35,229 @@ the files to a cloud store (e.g. S3, GCS) and have them persist there by
27
35
specifying the :code:`name`, :code:`source` and :code:`persistent` fields. By
28
36
enabling persistence, file_mount sync can be made significantly faster.
29
37
30
-
.. note::
31
-
Symbolic links are handled differently in :code:`file_mounts` depending on whether Sky Storage is used. For mounts backed by Sky Storage, referenced data for all symbolic links is copied to remote. For mounts not using Sky Storage (e.g., those using rsync) the symbolic links are directly copied. Their targets must be separately mounted or else the symlinks may break.
38
+
Your usage of sky storage can fall under four broad use cases:
39
+
40
+
1. **You want to upload your local data to remote VM -** specify the name and
41
+
source fields. Name sets the bucket name that will be used, and source
42
+
specifies the local path to be uploaded.
43
+
44
+
2. **You want to mount an existing S3/GCS bucket to your remote VM -** specify
45
+
just the source field (e.g., s3://my-bucket/)
46
+
47
+
3. **You want to have a write-able path to directly write files to S3 buckets
48
+
-** specify a name (to create a bucket if it doesn't exist) and set the mode
49
+
to MOUNT. This is useful for writing code outputs, such as checkpoints or
50
+
logs directly to a S3 bucket.
51
+
52
+
4. **You want to have a shared file-system across workers running on different
53
+
nodes -** specify a name (to create a bucket if it doesn't exist) and set
54
+
the mode to MOUNT. This will create an empty scratch space that workers
55
+
can write to. Any writes will show up on all worker's mount points.
56
+
57
+
When specifying a storage object, you can specify either of two modes:
58
+
59
+
- :code:`mode: MOUNT` (default)
60
+
This mode directly mounts the bucket at the specified path on the VM.
61
+
In effect, files are streamed from the backing source bucket as and when
62
+
they are accessed by applications. This mode also allows applications to
63
+
write to the mount path. All writes are replicated to remote bucket (and
64
+
any other VMs mounting the same bucket). Please note that this mode
65
+
uses a close-to-open consistency model, which means a file write is
66
+
committed to the backing store only after :code:`close()` is called on it.
67
+
68
+
- :code:`mode: COPY`
69
+
This mode pre-fetches your files from remote storage and caches them on the
70
+
local disk. Note that in this mode, any writes to the mount path are not
71
+
replicated to the source bucket.
72
+
73
+
Here are a few examples covering a range of use cases for sky file_mounts
74
+
and storage mounting:
32
75
33
76
.. code-block:: yaml
34
77
35
78
name: storage-demo
36
79
37
80
resources:
38
81
cloud: aws
39
-
instance_type: m5.2xlarge
82
+
40
83
41
84
file_mounts:
42
-
# This uses rsync to directly copy files from your machine to the remote
43
-
# VM at /datasets. Since this uses rsync, the ~/datasets folder is
44
-
# uploaded on each execution.
85
+
# *** Copying files from local ***
86
+
#
87
+
# This uses rsync to directly copy files from your machine to the remote VM at
88
+
# /datasets.
45
89
/datasets: ~/datasets
46
90
91
+
# *** Copying files from S3 ***
92
+
#
93
+
# This re-uses a predefined bucket (public bucket used here, but can be
94
+
# private) and copies it's contents directly to /datasets-s3.
95
+
/datasets-s3: s3://enriched-topical-chat
96
+
97
+
# *** Copying files from GCS ***
98
+
#
99
+
# This copies a single object (train-00001-of-01024) from a remote cloud
0 commit comments