Skip to content

Commit bf0aa61

Browse files
Docs: polish quickstart, interactive-nodes (skypilot-org#491)
* Polish interactive-nodes.rst * Polish quickstart, interactive-nodes * Address comments * Address comments
1 parent f37ffd4 commit bf0aa61

File tree

3 files changed

+137
-79
lines changed

3 files changed

+137
-79
lines changed

docs/source/getting-started/quickstart.rst

+59-43
Original file line numberDiff line numberDiff line change
@@ -20,64 +20,64 @@ Complete the :ref:`installation instructions <installation>` before continuing w
2020

2121
Provisioning your first cluster
2222
--------------------------------
23-
We'll start by launching our first cluster on Sky using an interactive node.
24-
Interactive nodes are standalone machines that can be used like any other VM instance,
25-
but are easy to configure without any additional setup.
23+
We'll start by launching our first cluster on Sky using an :ref:`interactive
24+
node <interactive-nodes>`. Interactive nodes are easy-to-spin-up VMs that allow
25+
for fast development and interactive debugging.
2626

2727
Let's provision an instance with a single K80 GPU.
2828

29-
.. code-block:: bash
29+
.. code-block:: console
3030
31-
# Provisions/reuses an interactive node with a single K80 GPU.
32-
sky gpunode -c mygpu --gpus K80
31+
$ # Provisions/reuses an interactive node with a single K80 GPU.
32+
$ sky gpunode -c mygpu --gpus K80
3333
34-
Provisioning takes a few minutes, after which you're automatically logged in:
34+
Provisioning should take a few minutes, after which you're automatically logged in:
3535

36-
.. code-block:: bash
36+
.. code-block:: console
3737
3838
Last login: Wed Feb 23 22:35:47 2022 from 136.152.143.101
3939
4040
ubuntu@ip-172-31-86-108:~$ gpustat
41+
4142
ip-172-31-86-108 Wed Feb 23 22:42:43 2022 450.142.00
4243
[0] Tesla K80 | 31°C, 0 % | 0 / 11441 MB |
4344
44-
Press :code:`Ctrl-D` to log out. On your machine, query :code:`sky status` for all provisioned clusters:
45+
Press :code:`Ctrl-D` to log out. On your machine, use :code:`sky status` to query all provisioned clusters:
4546

46-
.. code-block:: bash
47+
.. code-block:: console
4748
48-
sky status
49+
$ sky status
4950
5051
NAME LAUNCHED RESOURCES COMMAND STATUS
5152
mygpu a few secs ago 1x Azure(Standard_NC6_Promo) sky gpunode -c mygpu --gpus K80 UP
5253
5354
To log back in, simply type :code:`ssh mygpu`.
5455

55-
After you are done, run :code:`sky down mygpu` to terminate the cluster. Find more commands
56-
on managing the lifecycle of clusters :ref:`here <interactive-nodes>`.
57-
58-
Sky can also provision interactive CPU and TPU nodes with :code:`cpunode` and :code:`tpunode`.
59-
Please see our :ref:`CLI reference <cli>` for all configuration options. For more information on
60-
using and managing interactive nodes, check out our :ref:`reference documentation <interactive-nodes>`.
56+
After you are done, run :code:`sky down mygpu` to terminate the cluster. See
57+
:ref:`here <interactive-nodes>` for other types of interactive nodes and
58+
commands that manage the lifecycle of clusters.
6159

6260

6361
Hello, Sky!
6462
-----------
65-
You can also define tasks to be executed by Sky. We'll define our very first task
66-
to be a simple hello world program.
67-
68-
We can specify the following task attributes with a YAML file:
63+
Next, let's define our very first task, a simple hello world program, to be
64+
executed by Sky. We can specify the following task attributes with a YAML file:
6965

70-
- :code:`resources` (optional): what cloud resources the task must be run on (e.g., accelerators, instance type, etc.)
71-
- :code:`workdir` (optional): specifies working directory containing project code that is synced with the provisioned instance(s)
66+
- :code:`resources` (optional): cloud resources the task must be run on (e.g., accelerators, instance type, etc.)
67+
- :code:`workdir` (optional): the working directory containing project code that will be synced to the provisioned instance(s)
7268
- :code:`setup` (optional): commands that must be run before the task is executed
73-
- :code:`run` (optional): specifies the commands that must be run as the actual ask
69+
- :code:`run` (optional): commands that run the actual task
7470

7571
.. note::
7672

77-
For large, multi-gigabyte workdirs (e.g. large datasets in your working directory), uploading may take time as the files are synced to the remote VM with :code:`rsync`. If you have certain files in your workdir that you would like to have excluded from upload, consider including them in your `.gitignore` file. For large datasets and files, consider using :ref:`Sky Storage <sky-storage>` to speed up transfers.
73+
For large, multi-gigabyte workdirs (e.g., large datasets in your working
74+
directory), uploading may be slow the files are synced to the remote VM(s)
75+
with :code:`rsync`. To exclude large files in your workdir from being uploaded,
76+
add them to your :code:`.gitignore` file. To upload large datasets and files, consider using :ref:`Sky
77+
Storage <sky-storage>` to speed up transfers.
7878

7979
Below is a minimal task YAML that prints "hello sky!" and shows installed Conda environments,
80-
requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `repo <https://github.com/sky-proj/sky/tree/master/examples>`_, with a fully-complete example documented :ref:`here <yaml-spec>`.
80+
requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `repository <https://github.com/sky-proj/sky/tree/master/examples>`_ and a fully-complete YAML example :ref:`here <yaml-spec>`.
8181

8282
.. code-block:: yaml
8383
@@ -86,8 +86,8 @@ requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `re
8686
resources:
8787
# Optional; if left out, pick from the available clouds.
8888
cloud: aws
89-
90-
accelerators: V100:1 # 1x NVIDIA V100 GPU
89+
# 1x NVIDIA V100 GPU
90+
accelerators: V100:1
9191
9292
# Working directory (optional) containing the project codebase.
9393
# This directory will be synced to ~/sky_workdir on the provisioned cluster.
@@ -102,11 +102,13 @@ requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `re
102102
echo "hello sky!"
103103
conda env list
104104
105-
Sky handles selecting an appropriate VM based on user-specified resource
106-
constraints, launching the cluster on an appropriate cloud provider, and
107-
executing the task.
108105
109-
To launch a task based on our above YAML spec, we can use :code:`sky launch`.
106+
**To launch a task** based on a YAML spec, use :code:`sky launch`. This command
107+
performs many heavy-lifting: (1) selects an appropriate cloud and VM based on
108+
the specified resource constraints, (2) provisions (or reuses) a cluster on that
109+
cloud,
110+
(3) uploads the :code:`workdir`, (4) executes the :code:`setup` commands,
111+
and (5) executes the :code:`run` commands.
110112

111113
.. code-block:: console
112114
@@ -118,7 +120,19 @@ exists, a new cluster with that name will be provisioned. If no cluster name is
118120
provided, (e.g., :code:`sky launch hello_sky.yaml`), a cluster name will be
119121
autogenerated.
120122

121-
We can view our existing clusters by running :code:`sky status`:
123+
**To execute a task on an existing cluster**, use :code:`sky exec`:
124+
125+
.. code-block:: console
126+
127+
$ sky exec mycluster hello_sky.yaml
128+
129+
This command is more lightweight: it simply executes the task's :code:`run`
130+
commands. :code:`workdir` is also synced every time :code:`sky exec` is run, so
131+
that the task may use updated code. Bash commands are also supported, such as
132+
:code:`sky exec mycluster htop`.
133+
134+
135+
**To view existing clusters**, use :code:`sky status`:
122136

123137
.. code-block:: console
124138
@@ -130,25 +144,27 @@ This may show multiple clusters, if you have created several:
130144
131145
NAME LAUNCHED RESOURCES COMMAND STATUS
132146
gcp 1 day ago 1x GCP(n1-highmem-8) sky cpunode -c gcp --cloud gcp STOPPED
133-
mycluster 12 mins ago 1x AWS(p2.xlarge) sky launch -c mycluster hello_sky.yaml UP
147+
mycluster 12 mins ago 1x AWS(p3.2xlarge) sky launch -c mycluster hello_sky.yaml UP
134148
135-
If you would like to log into the a cluster, Sky provides convenient SSH access via :code:`ssh <cluster_name>`:
149+
**To log into the a cluster**, Sky provides convenient SSH access via :code:`ssh <cluster_name>`:
136150

137151
.. code-block:: console
138152
139153
$ ssh mycluster
140154
141-
If you would like to transfer files to and from the cluster, *rsync* or *scp* can be used:
155+
**To transfer files to and from the cluster** after a task's execution, use :code:`rsync` (or :code:`scp`) :
142156

143157
.. code-block:: console
144158
145-
$ rsync -Pavz /local/path/source mycluster:/remote/dest # copy files to remote VM
146-
$ rsync -Pavz mycluster:/remote/source /local/dest # copy files from remote VM
159+
$ rsync -Pavz /local/path/source mycluster:/remote/dest # copy to remote VM
160+
$ rsync -Pavz mycluster:/remote/source /local/dest # copy from remote VM
147161
148-
After you are done, run :code:`sky down mycluster` to terminate the cluster. Find more details
149-
on managing the lifecycle of your cluster :ref:`here <interactive-nodes>`.
162+
**To terminate (or stop) the cluster**, run :code:`sky down mycluster` (for
163+
stopping, run :code:`sky stop mycluster`). Find more commands that manage the
164+
lifecycle of clusters :ref:`here <interactive-nodes>`.
150165

151166
Sky is more than a tool for easily provisioning and managing multiple clusters
152-
on different clouds. It also comes with features for :ref:`storing and moving data <sky-storage>`,
153-
:ref:`queueing multiple jobs <job-queue>`, :ref:`iterative development <iter-dev>`, and :ref:`interactive nodes <interactive-nodes>` for
154-
debugging.
167+
on different clouds. It also comes with features for :ref:`storing and moving
168+
data <sky-storage>`, :ref:`queueing multiple jobs <job-queue>`, :ref:`iterative
169+
development <iter-dev>`, and :ref:`interactive nodes <interactive-nodes>`.
170+
Refer to the :ref:`CLI Reference <cli>` for details of the :code:`sky` CLI.

docs/source/reference/interactive-nodes.rst

+71-36
Original file line numberDiff line numberDiff line change
@@ -2,78 +2,113 @@
22
Interactive Nodes
33
=================
44

5-
During development, it may be preferable to have direct access to a VM without
6-
specifying a task YAML. Sky provides this functionality by providing interactive nodes
7-
nodes for development sessions. These are by default single node VMs, customizable
8-
with resources of your choice.
5+
Sky provides **interactive nodes**, the user's *personal work servers* in the
6+
clouds. These are single-node VMs that can be quickly accessed by convenient
7+
CLI commands:
98

10-
Interactive nodes act like other clusters launched with YAML, except they are
11-
easily accessed with command line aliases that automatically log in to the node.
9+
- :code:`sky gpunode`
10+
- :code:`sky cpunode`
11+
- :code:`sky tpunode`
1212

13-
Launching a development machine
13+
Interactive nodes are normal Sky clusters. They allow fast access to instances
14+
without requiring a task YAML specification.
15+
16+
Workflow
1417
-------------------------------
15-
To acquire and log in to an interative node with no accelerators:
18+
19+
Use :code:`sky gpunode` to get a node with GPU(s):
1620

1721
.. code-block:: console
1822
19-
$ sky cpunode -c my-cpu
23+
$ # Create and log in to a cluster with the
24+
$ # default name, "sky-gpunode-<username>".
25+
$ sky gpunode
2026
21-
We can also force a cloud and instance type if required:
27+
$ # Or, use -c to set a custom name to manage multiple clusters:
28+
$ # sky gpunode -c node0
29+
30+
Use :code:`--gpus` to change the type and the number of GPUs:
2231

2332
.. code-block:: console
2433
25-
$ sky cpunode -c my-cpu --cloud gcp --instance-type n1-standard-8
34+
$ sky gpunode # By default, use 1 K80 GPU.
35+
$ sky gpunode --gpus V100
36+
$ sky gpunode --gpus V100:8
37+
38+
$ # To see available GPU names:
39+
$ # sky show-gpus
2640
27-
All available configuration options can be viewed with:
41+
Directly set a cloud and an instance type, if required:
2842

2943
.. code-block:: console
3044
31-
$ sky cpunode --help
45+
$ sky gpunode --cloud aws --instance-type p2.16xlarge
3246
33-
To get an interactive node with an accelerator, we have
34-
:code:`sky gpunode` and :code:`sky tpunode` as well with similar usage patterns.
47+
See all available options and short keys:
3548

36-
To log in to an interactive node:
49+
.. code-block:: console
50+
51+
$ sky gpunode --help
3752
38-
.. code-block:: bash
53+
Sky also provides :code:`sky cpunode` for CPU-only instances and :code:`sky
54+
tpunode` for TPU instances (only available on Google Cloud Platform).
3955

40-
# automatically logs in after provisioning
41-
sky cpunode -c my-cpu
56+
To log in to an interactive node, either re-type the CLI command or use :code:`ssh`:
4257

43-
# directly logs in
44-
ssh my-cpu
58+
.. code-block:: console
4559
60+
$ # If the cluster with the default name exists, this will directly log in.
61+
$ sky gpunode
4662
47-
Because Sky exposes SSH access to interactive nodes, this means they can also be
48-
used with tools such as `Visual Studio Code Remote <https://code.visualstudio.com/docs/remote/remote-overview>`_.
63+
$ # Equivalently:
64+
$ ssh sky-gpunode-<username>
4965
66+
$ # Use -c to refer to different interactive nodes.
67+
$ # sky gpunode -c node0
68+
$ # ssh node0
5069
51-
Interactive nodes can be started and stopped like any other cluster:
70+
Because Sky exposes SSH access to clusters, this means clusters can be easily added into
71+
tools such as `Visual Studio Code Remote <https://code.visualstudio.com/docs/remote/remote-overview>`_.
5272

53-
.. code-block:: bash
73+
Since interactive nodes are just normal Sky clusters, :code:`sky exec` can be used to submit jobs to them.
5474

55-
# stop the cluster
56-
$ sky stop my-cpu
75+
Interactive nodes can be stopped, restarted, and terminated, like any other cluster:
5776

58-
# restart the cluster
59-
$ sky start my-cpu
77+
.. code-block:: console
78+
79+
$ # Stop at the end of the work day:
80+
$ sky stop sky-gpunode-<username>
81+
82+
$ # Restart it the next morning:
83+
$ sky start sky-gpunode-<username>
84+
85+
$ # Terminate entirely:
86+
$ sky down sky-gpunode-<username>
6087
6188
.. note::
6289

63-
Since :code:`sky start` is used to restart a stopped cluster, auto-failover provisioning
64-
is not used and the cluster will be started on the same cloud and region that it was
65-
originally provisioned on.
90+
Stopping a cluster does not lose data on the attached disks (billing for the
91+
instances will stop while the disks will still be charged). Those disks
92+
will be reattached when restarting the cluster. Terminating a cluster, on
93+
the other hand, will delete all associated resources (all billing stops),
94+
and any data on the attached disks will be lost.
6695

96+
.. note::
6797

68-
Advanced configuration
98+
Since :code:`sky start` restarts a stopped cluster, :ref:`auto-failover
99+
provisioning <auto-failover>` is disabled---the cluster will be restarted on
100+
the same cloud and region where it was originally provisioned.
101+
102+
103+
Getting multiple nodes
69104
----------------------
70-
By default, interactive clusters are a single node. If you require a cluster with multiple nodes
71-
(e.g. for distributed training, etc.), you can launch a cluster using YAML:
105+
By default, interactive clusters are a single node. If you require a cluster
106+
with multiple nodes (e.g., for hyperparameter tuning or distributed training),
107+
use :code:`num_nodes` in a YAML spec:
72108

73109
.. code-block:: yaml
74110
75111
# multi_node.yaml
76-
77112
num_nodes: 16
78113
resources:
79114
accelerators: V100:8

sky/cli.py

+7
Original file line numberDiff line numberDiff line change
@@ -950,6 +950,10 @@ def stop(
950950
CLUSTER is the name of the cluster to stop. If both CLUSTER and --all are
951951
supplied, the latter takes precedence.
952952
953+
Stopping a cluster does not lose data on the attached disks (billing for
954+
the instances will stop while the disks will still be charged). Those
955+
disks will be reattached when restarting the cluster.
956+
953957
Currently, spot-instance clusters cannot be stopped.
954958
955959
Examples:
@@ -1118,6 +1122,9 @@ def down(
11181122
CLUSTER is the name of the cluster to tear down. If both CLUSTER and --all
11191123
are supplied, the latter takes precedence.
11201124
1125+
Terminating a cluster will delete all associated resources (all billing
1126+
stops), and any data on the attached disks will be lost.
1127+
11211128
Accelerators (e.g., TPU) that are part of the cluster will be deleted too.
11221129
11231130
Examples:

0 commit comments

Comments
 (0)