Skip to content

Commit 10b4b79

Browse files
Touches on docs. (skypilot-org#684)
* Touches on docs. * Touches * touches on yaml-spec * update --gpus=all * extend underline
1 parent 5da7c4d commit 10b4b79

File tree

6 files changed

+83
-52
lines changed

6 files changed

+83
-52
lines changed

docs/source/examples/iterative-dev-project.rst

+5-4
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,18 @@ Iteratively Developing a Project
55
This page shows a typical workflow for iteratively developing and running a
66
project on Sky.
77

8-
Provisioning a VM
8+
Getting an interactive node
99
------------------
10-
To provision a GPU-based :ref:`interactive node <interactive-nodes>` named :code:`dev`, run
10+
:ref:`Interactive nodes <interactive-nodes>` are easy-to-spin-up VMs that enable **fast development and interactive debugging**.
11+
12+
To provision a GPU interactive node named :code:`dev`, run
1113

1214
.. code-block:: console
1315
1416
$ # Provisions/reuses an interactive node with a single K80 GPU.
1517
$ sky gpunode -c dev --gpus K80
1618
17-
Interactive nodes are easy-to-spin-up VMs that allow for fast development and interactive debugging.
18-
See the :ref:`CLI reference <sky-gpunode>` for all configuration options.
19+
See the :ref:`CLI reference <sky-gpunode>` for all flags such as changing the GPU type and count.
1920

2021
Running code
2122
--------------------

docs/source/getting-started/installation.rst

+13
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ Install Sky using pip:
1212
$ cd sky
1313
$ pip install ".[all]"
1414
15+
$ # To install AWS dependencies only:
16+
$ # pip install ".[aws]"
17+
1518
Sky currently supports three major cloud providers: AWS, GCP, and Azure. If you
1619
only have access to certain clouds, use any combination of
1720
:code:`".[aws,azure,gcp]"` (e.g., :code:`".[aws,gcp]"`) to reduce the
@@ -83,3 +86,13 @@ This will produce a summary like:
8386
Azure: enabled
8487
8588
Sky will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run sky check.
89+
90+
Requesting quotas for first time users
91+
--------------------------------------
92+
93+
If your cloud account has not been used to launch instances before, the
94+
respective quotas are likely set to zero or a low limit. This is especially
95+
true for GPU instances.
96+
97+
Please follow :ref:`Requesting Quota Increase` to check quotas and request quota
98+
increases before proceeding.

docs/source/reference/auto-stop.rst

+5-4
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,13 @@
22
Auto-stopping
33
=========
44

5-
Sky's **auto-stopping** can automatically stop a cluster after a few minutes of idleness.
5+
Sky's **auto-stopping** automatically stops a cluster after it becomes idle.
6+
67
With auto-stopping, users can simply submit jobs and leave their laptops, while
78
**ensuring no unnecessary spending occurs**: after jobs have finished, the
8-
cluster(s) used will be automatically stopped (and restarted later).
9+
cluster(s) used will be automatically stopped (which can be restarted later).
910

10-
To setup auto-stopping for a cluster, use :code:`sky autostop`:
11+
To schedule auto-stopping for a cluster, use :code:`sky autostop`:
1112

1213
.. code-block:: bash
1314
@@ -40,7 +41,7 @@ To view the status of the cluster, use ``sky status [--refresh]``:
4041
# Refresh the status for auto-stopping
4142
sky status --refresh
4243
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
43-
mycluster 11 min ago 2x AWS(m4.2xlarge) STOPPED - sky launch -d -c ...
44+
mycluster 11 min ago 2x AWS(m4.2xlarge) STOPPED 10 min sky launch -d -c ...
4445
4546
4647
:code:`sky status` shows the cached statuses, which can be outdated for clusters with auto-stopping scheduled. To query the real statuses of clusters with auto-stopping scheduled, use :code:`sky status --refresh`.

docs/source/reference/job-queue.rst

+16
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,22 @@ Use :code:`sky exec mycluster task.yaml` to submit this task, which will be sche
7878

7979
See :ref:`Distributed Jobs on Many VMs` for more details.
8080

81+
Using ``CUDA_VISIBLE_DEVICES``
82+
--------------------------------
83+
84+
The environment variable ``CUDA_VISIBLE_DEVICES`` will be automatically set to
85+
the devices allocated to each task on each node. This variable is set
86+
when a task's ``run`` commands are invoked.
87+
88+
For example, ``task.yaml`` above launches a 4-GPU task on each node that has 8
89+
GPUs, so the task's ``run`` commands will be invoked with
90+
``CUDA_VISIBLE_DEVICES`` populated with 4 device IDs.
91+
92+
If your ``run`` commands use Docker/``docker run``, simply pass ``--gpus=all``
93+
as the correct environment variable is made available to the container (only the
94+
allocated device IDs will be set).
95+
96+
8197
Scheduling behavior
8298
--------------------------------
8399

docs/source/reference/quota.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ AWS
1414

1515
1. Go to the `EC2 Quotas console <https://console.aws.amazon.com/servicequotas/home/services/ec2/quotas>`_.
1616
2. **Select a region** on the top right.
17-
3. Choose an EC2 instance type from the list (e.g, ``Running On-Demand P instances``). Use ``sky show-gpus --cloud aws --all`` or check `here <https://aws.amazon.com/ec2/instance-types/>`_ for more instance types.
17+
3. Choose an EC2 instance type from the list (e.g, ``Running On-Demand P instances`` or ``All P Spot Instance Requests``). Use ``sky show-gpus --cloud aws --all`` or check `here <https://aws.amazon.com/ec2/instance-types/>`_ for more instance types.
1818
4. Click the quota name, and then choose **Request quota increase**.
1919
5. For **Change quota value**, enter the new value.
2020
6. Choose **Request**.
@@ -43,4 +43,4 @@ GCP
4343
3. Choose ``Limit Name: instance_name``. (e.g., ``NVIDIA-V100-GPUS-per-project-region``). You may check `here <https://cloud.google.com/compute/quotas#gpu_quota>`_ for a complete GPU list.
4444
4. Select the checkbox of the region whose quota you want to change.
4545
5. Click **Edit Quotas** and fill out the new limit.
46-
6. Click **Submit Request**.
46+
6. Click **Submit Request**.

docs/source/reference/yaml-spec.rst

+42-42
Original file line numberDiff line numberDiff line change
@@ -8,84 +8,84 @@ describe all fields available.
88

99
.. code-block:: yaml
1010
11-
# Task name (optional), used in the job queue.
11+
# Task name (optional), used for display purposes.
1212
name: my-task
1313
1414
# Working directory (optional), synced to ~/sky_workdir on the remote cluster
1515
# each time launch or exec is run with the yaml file.
1616
#
17-
# NOTE: Sky does not currently support large, multi-gigabyte workdirs as the
18-
# files are synced to the remote VM with `rsync`. Please consider using Sky
19-
# Storage to transfer large datasets and files.
17+
# Commands in "setup" and "run" will be executed under it.
2018
#
2119
# If a .gitignore file (or a .git/info/exclude file) exists in the working
22-
# directory, files and directories listed in those files will be ignored.
20+
# directory, files and directories listed in it will be excluded from syncing.
2321
workdir: ~/my-task-code
2422
25-
# Number of nodes (optional) to launch including the head node. If not
26-
# specified, defaults to 1. The specified resource requirements are identical
27-
# across all nodes.
23+
# Number of nodes (optional; defaults to 1) to launch including the head node.
24+
#
25+
# A task can set this to a smaller value than the size of a cluster.
2826
num_nodes: 4
2927
3028
# Per-node resource requirements (optional).
3129
resources:
32-
cloud: aws # A cloud (optional) can be specified, if desired.
30+
cloud: aws # The cloud to use (optional).
3331
34-
# Accelerator requirements (optional) can be specified, use `sky show-gpus`
35-
# to view available accelerator configurations.
36-
# This specifies the accelerator type and the count per node. Format:
37-
# <name>:<cnt> or <name> (short for a count of 1).
32+
# Accelerator name and count per node (optional).
33+
#
34+
# Use `sky show-gpus` to view available accelerator configurations.
35+
#
36+
# Format: <name>:<count> (or simply <name>, short for a count of 1).
3837
accelerators: V100:4
3938
40-
# Accelerator arguments (optional) provides additional metadata for some
41-
# accelerators, such as the TensorFlow version for TPUs.
42-
accelerator_args:
43-
tf_version: 2.5.0
39+
# Instance type to use (optional). If 'accelerators' is specified,
40+
# the corresponding instance type is automatically inferred.
41+
instance_type: p3.8xlarge
4442
45-
# Specify whether the cluster should use spot instances or not (optional).
46-
# If unspecified, Sky will default to on-demand instances.
43+
# Whether the cluster should use spot instances (optional).
44+
# If unspecified, defaults to False (on-demand instances).
4745
use_spot: False
4846
4947
# Disk size in GB to allocate for OS (mounted at /). Increase this if you
5048
# have a large working directory or tasks that write out large outputs.
5149
disk_size: 256
5250
53-
# Using Sky Storage, you can specify file mounts (all optional).
51+
# Additional accelerator metadata (optional); only used for TPUs.
52+
accelerator_args:
53+
tf_version: 2.5.0
54+
tpu_name: mytpu
55+
5456
file_mounts:
55-
# This uses rsync to directly copy files from your machine to the remote
56-
# VM at /remote/path/datasets. Rsync will copy symlinks as symlinks. The
57-
# symlink targets must also be synced using file_mounts to ensure they are
58-
# functional.
57+
# Uses rsync to copy local files to all nodes of the cluster.
58+
#
59+
# If symlinks are present, they are copied as symlinks, and their targets
60+
# must also be synced using file_mounts to ensure correctness.
5961
/remote/path/datasets: /local/path/datasets
6062
61-
# This uses Sky Storage to first create a S3 bucket named sky-dataset,
62-
# copies the contents of /local/path/datasets to the remote bucket and makes the
63-
# bucket persistent (i.e., the bucket is not deleted after the completion of
64-
# this sky task, and future invocations of this bucket will be much faster).
65-
# The bucket is mounted at /datasets-storage. Symlink contents are copied over.
63+
# Uses Sky Storage to create a S3 bucket named sky-dataset, uploads the
64+
# contents of /local/path/datasets to the bucket, and marks the bucket
65+
# as persistent (it will not be deleted after the completion of this task).
66+
# Symlink contents are copied over.
67+
#
68+
# Mounts the bucket at /datasets-storage on every node of the cluster.
6669
/datasets-storage:
6770
name: sky-dataset
6871
source: /local/path/datasets
69-
force_stores: [s3] # Could be [s3, gcs], [gcs] default: None
70-
persistent: True # Defaults to True, can be set to false.
72+
force_stores: [s3] # Could be [s3, gcs], [gcs]; default: None
73+
persistent: True # Defaults to True; can be set to false
7174
72-
# This re-uses a predefined bucket (sky-dataset, defined above) and mounts it
73-
# directly at datasets-s3.
74-
/datasets-s3: s3://sky-dataset
75+
# Copies a cloud object store URI to the cluster. Can be private buckets.
76+
/datasets-s3: s3://my-awesome-dataset
7577
76-
# A setup script (optional) can be provided to run when a cluster is provisioned or a
77-
# task is launched. Alternatively, a single setup command can be provided by removing |
78-
# and using the following syntax:
79-
# setup: pip install -r requirements.txt
78+
# Setup script (optional) to execute on every `sky launch`.
79+
# This is executed before the 'run' commands.
80+
#
81+
# The '|' separator indicates a multiline string. To specify a single command:
82+
# setup: pip install -r requirements.txt
8083
setup: |
8184
echo "Begin setup."
8285
pip install -r requirements.txt
8386
echo "Setup complete."
8487
85-
# A task script (optional, but recommended) is the main script to run on the
86-
# cluster. Alternatively, a single run command can be provided by removing |
87-
# and using the following syntax:
88-
# run: python train.py
88+
# Main program (optional, but recommended) to run on every node of the cluster.
8989
run: |
9090
echo "Beginning task."
9191
python train.py

0 commit comments

Comments
 (0)