Docs: polish quickstart, interactive-nodes (skypilot-org#491)

concretevitamin · web-flow · commit bf0aa615c47d · 2022-03-07T09:01:30.000-08:00
* Polish interactive-nodes.rst

* Polish quickstart, interactive-nodes

* Address comments

* Address comments
diff --git a/docs/source/getting-started/quickstart.rst b/docs/source/getting-started/quickstart.rst
@@ -20,64 +20,64 @@ Complete the :ref:`installation instructions <installation>` before continuing w
 
 Provisioning your first cluster
 --------------------------------
-We'll start by launching our first cluster on Sky using an interactive node.
-Interactive nodes are standalone machines that can be used like any other VM instance,
-but are easy to configure without any additional setup.
+We'll start by launching our first cluster on Sky using an :ref:`interactive
+node <interactive-nodes>`. Interactive nodes are easy-to-spin-up VMs that allow
+for fast development and interactive debugging.
 
 Let's provision an instance with a single K80 GPU.
 
-.. code-block:: bash
+.. code-block:: console
 
-  # Provisions/reuses an interactive node with a single K80 GPU.
-  sky gpunode -c mygpu --gpus K80
+  $ # Provisions/reuses an interactive node with a single K80 GPU.
+  $ sky gpunode -c mygpu --gpus K80
 
-Provisioning takes a few minutes, after which you're automatically logged in:
+Provisioning should take a few minutes, after which you're automatically logged in:
 
-.. code-block:: bash
+.. code-block:: console
 
   Last login: Wed Feb 23 22:35:47 2022 from 136.152.143.101
 
   ubuntu@ip-172-31-86-108:~$ gpustat
+
   ip-172-31-86-108     Wed Feb 23 22:42:43 2022  450.142.00
   [0] Tesla K80        | 31°C,   0 % |     0 / 11441 MB |
 
-Press :code:`Ctrl-D` to log out. On your machine, query :code:`sky status` for all provisioned clusters:
+Press :code:`Ctrl-D` to log out. On your machine, use :code:`sky status` to query all provisioned clusters:
 
-.. code-block:: bash
+.. code-block:: console
 
-  sky status
+  $ sky status
 
   NAME   LAUNCHED        RESOURCES                     COMMAND                          STATUS
   mygpu  a few secs ago  1x Azure(Standard_NC6_Promo)  sky gpunode -c mygpu --gpus K80  UP
 
 To log back in, simply type :code:`ssh mygpu`.
 
-After you are done, run :code:`sky down mygpu` to terminate the cluster. Find more commands
-on managing the lifecycle of clusters :ref:`here <interactive-nodes>`.
-
-Sky can also provision interactive CPU and TPU nodes with :code:`cpunode` and :code:`tpunode`.
-Please see our :ref:`CLI reference <cli>` for all configuration options. For more information on
-using and managing interactive nodes, check out our :ref:`reference documentation <interactive-nodes>`.
+After you are done, run :code:`sky down mygpu` to terminate the cluster. See
+:ref:`here <interactive-nodes>` for other types of interactive nodes and
+commands that manage the lifecycle of clusters.
 
 
 Hello, Sky!
 -----------
-You can also define tasks to be executed by Sky. We'll define our very first task
-to be a simple hello world program.
-
-We can specify the following task attributes with a YAML file:
+Next, let's define our very first task, a simple hello world program, to be
+executed by Sky.  We can specify the following task attributes with a YAML file:
 
-- :code:`resources` (optional): what cloud resources the task must be run on (e.g., accelerators, instance type, etc.)
-- :code:`workdir` (optional): specifies working directory containing project code that is synced with the provisioned instance(s)
+- :code:`resources` (optional): cloud resources the task must be run on (e.g., accelerators, instance type, etc.)
+- :code:`workdir` (optional): the working directory containing project code that will be synced to the provisioned instance(s)
 - :code:`setup` (optional): commands that must be run before the task is executed
-- :code:`run` (optional): specifies the commands that must be run as the actual ask
+- :code:`run` (optional): commands that run the actual task
 
 .. note::
 
-    For large, multi-gigabyte workdirs (e.g. large datasets in your working directory), uploading may take time as the files are synced to the remote VM with :code:`rsync`. If you have certain files in your workdir that you would like to have excluded from upload, consider including them in your `.gitignore` file. For large datasets and files, consider using :ref:`Sky Storage <sky-storage>` to speed up transfers.
+    For large, multi-gigabyte workdirs (e.g., large datasets in your working
+    directory), uploading may be slow the files are synced to the remote VM(s)
+    with :code:`rsync`. To exclude large files in your workdir from being uploaded,
+    add them to your :code:`.gitignore` file. To upload large datasets and files, consider using :ref:`Sky
+    Storage <sky-storage>` to speed up transfers.
 
 Below is a minimal task YAML that prints "hello sky!" and shows installed Conda environments,
-requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `repo <https://github.com/sky-proj/sky/tree/master/examples>`_, with a fully-complete example documented :ref:`here <yaml-spec>`.
+requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `repository <https://github.com/sky-proj/sky/tree/master/examples>`_ and a fully-complete YAML example :ref:`here <yaml-spec>`.
 
 .. code-block:: yaml
 
@@ -86,8 +86,8 @@ requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `re
   resources:
     # Optional; if left out, pick from the available clouds.
     cloud: aws
-
-    accelerators: V100:1 # 1x NVIDIA V100 GPU
+    # 1x NVIDIA V100 GPU
+    accelerators: V100:1
 
   # Working directory (optional) containing the project codebase.
   # This directory will be synced to ~/sky_workdir on the provisioned cluster.
@@ -102,11 +102,13 @@ requiring an NVIDIA Tesla K80 GPU on AWS. See more example YAML files in the `re
     echo "hello sky!"
     conda env list
 
-Sky handles selecting an appropriate VM based on user-specified resource
-constraints, launching the cluster on an appropriate cloud provider, and
-executing the task.
 
-To launch a task based on our above YAML spec, we can use :code:`sky launch`.
+**To launch a task** based on a YAML spec, use :code:`sky launch`.  This command
+performs many heavy-lifting: (1) selects an appropriate cloud and VM based on
+the specified resource constraints, (2) provisions (or reuses) a cluster on that
+cloud,
+(3) uploads the :code:`workdir`, (4) executes the :code:`setup` commands,
+and (5) executes the :code:`run` commands.
 
 .. code-block:: console
 
@@ -118,7 +120,19 @@ exists, a new cluster with that name will be provisioned. If no cluster name is
 provided, (e.g., :code:`sky launch hello_sky.yaml`), a cluster name will be
 autogenerated.
 
-We can view our existing clusters by running :code:`sky status`:
+**To execute a task on an existing cluster**, use :code:`sky exec`:
+
+.. code-block:: console
+
+  $ sky exec mycluster hello_sky.yaml
+
+This command is more lightweight: it simply executes the task's :code:`run`
+commands.  :code:`workdir` is also synced every time :code:`sky exec` is run, so
+that the task may use updated code.  Bash commands are also supported, such as
+:code:`sky exec mycluster htop`.
+
+
+**To view existing clusters**, use :code:`sky status`:
 
 .. code-block:: console
 
@@ -130,25 +144,27 @@ This may show multiple clusters, if you have created several:
 
   NAME       LAUNCHED     RESOURCES             COMMAND                                 STATUS
   gcp        1 day ago    1x GCP(n1-highmem-8)  sky cpunode -c gcp --cloud gcp          STOPPED
-  mycluster  12 mins ago  1x AWS(p2.xlarge)     sky launch -c mycluster hello_sky.yaml  UP
+  mycluster  12 mins ago  1x AWS(p3.2xlarge)    sky launch -c mycluster hello_sky.yaml  UP
 
-If you would like to log into the a cluster, Sky provides convenient SSH access via :code:`ssh <cluster_name>`:
+**To log into the a cluster**, Sky provides convenient SSH access via :code:`ssh <cluster_name>`:
 
 .. code-block:: console
 
   $ ssh mycluster
 
-If you would like to transfer files to and from the cluster, *rsync* or *scp* can be used:
+**To transfer files to and from the cluster** after a task's execution, use :code:`rsync` (or :code:`scp`) :
 
 .. code-block:: console
 
-    $ rsync -Pavz /local/path/source mycluster:/remote/dest  # copy files to remote VM
-    $ rsync -Pavz mycluster:/remote/source /local/dest       # copy files from remote VM
+    $ rsync -Pavz /local/path/source mycluster:/remote/dest  # copy to remote VM
+    $ rsync -Pavz mycluster:/remote/source /local/dest       # copy from remote VM
 
-After you are done, run :code:`sky down mycluster` to terminate the cluster. Find more details
-on managing the lifecycle of your cluster :ref:`here <interactive-nodes>`.
+**To terminate (or stop) the cluster**, run :code:`sky down mycluster` (for
+stopping, run :code:`sky stop mycluster`).  Find more commands that manage the
+lifecycle of clusters :ref:`here <interactive-nodes>`.
 
 Sky is more than a tool for easily provisioning and managing multiple clusters
-on different clouds.  It also comes with features for :ref:`storing and moving data <sky-storage>`,
-:ref:`queueing multiple jobs <job-queue>`, :ref:`iterative development <iter-dev>`, and :ref:`interactive nodes <interactive-nodes>` for
-debugging.
+on different clouds.  It also comes with features for :ref:`storing and moving
+data <sky-storage>`, :ref:`queueing multiple jobs <job-queue>`, :ref:`iterative
+development <iter-dev>`, and :ref:`interactive nodes <interactive-nodes>`.
+Refer to the :ref:`CLI Reference <cli>` for details of the :code:`sky` CLI.
diff --git a/docs/source/reference/interactive-nodes.rst b/docs/source/reference/interactive-nodes.rst
@@ -2,78 +2,113 @@
 Interactive Nodes
 =================
 
-During development, it may be preferable to have direct access to a VM without
-specifying a task YAML. Sky provides this functionality by providing interactive nodes
-nodes for development sessions. These are by default single node VMs, customizable
-with resources of your choice.
+Sky provides **interactive nodes**, the user's *personal work servers* in the
+clouds.  These are single-node VMs that can be quickly accessed by convenient
+CLI commands:
 
-Interactive nodes act like other clusters launched with YAML, except they are
-easily accessed with command line aliases that automatically log in to the node.
+- :code:`sky gpunode`
+- :code:`sky cpunode`
+- :code:`sky tpunode`
 
-Launching a development machine
+Interactive nodes are normal Sky clusters.  They allow fast access to instances
+without requiring a task YAML specification.
+
+Workflow
 -------------------------------
-To acquire and log in to an interative node with no accelerators:
+
+Use :code:`sky gpunode` to get a node with GPU(s):
 
 .. code-block:: console
 
-   $ sky cpunode -c my-cpu
+   $ # Create and log in to a cluster with the
+   $ # default name, "sky-gpunode-<username>".
+   $ sky gpunode
 
-We can also force a cloud and instance type if required:
+   $ # Or, use -c to set a custom name to manage multiple clusters:
+   $ # sky gpunode -c node0
+
+Use :code:`--gpus` to change the type and the number of GPUs:
 
 .. code-block:: console
 
-   $ sky cpunode -c my-cpu --cloud gcp --instance-type n1-standard-8
+   $ sky gpunode  # By default, use 1 K80 GPU.
+   $ sky gpunode --gpus V100
+   $ sky gpunode --gpus V100:8
+
+   $ # To see available GPU names:
+   $ # sky show-gpus
 
-All available configuration options can be viewed with:
+Directly set a cloud and an instance type, if required:
 
 .. code-block:: console
 
-   $ sky cpunode --help
+   $ sky gpunode --cloud aws --instance-type p2.16xlarge
 
-To get an interactive node with an accelerator, we have
-:code:`sky gpunode` and :code:`sky tpunode` as well with similar usage patterns.
+See all available options and short keys:
 
-To log in to an interactive node:
+.. code-block:: console
+
+   $ sky gpunode --help
 
-.. code-block:: bash
+Sky also provides :code:`sky cpunode` for CPU-only instances and :code:`sky
+tpunode` for TPU instances (only available on Google Cloud Platform).
 
-    # automatically logs in after provisioning
-    sky cpunode -c my-cpu
+To log in to an interactive node, either re-type the CLI command or use :code:`ssh`:
 
-    # directly logs in
-    ssh my-cpu
+.. code-block:: console
 
+    $ # If the cluster with the default name exists, this will directly log in.
+    $ sky gpunode
 
-Because Sky exposes SSH access to interactive nodes, this means they can also be
-used with tools such as `Visual Studio Code Remote <https://code.visualstudio.com/docs/remote/remote-overview>`_.
+    $ # Equivalently:
+    $ ssh sky-gpunode-<username>
 
+    $ # Use -c to refer to different interactive nodes.
+    $ # sky gpunode -c node0
+    $ # ssh node0
 
-Interactive nodes can be started and stopped like any other cluster:
+Because Sky exposes SSH access to clusters, this means clusters can be easily added into
+tools such as `Visual Studio Code Remote <https://code.visualstudio.com/docs/remote/remote-overview>`_.
 
-.. code-block:: bash
+Since interactive nodes are just normal Sky clusters, :code:`sky exec` can be used to submit jobs to them.
 
-    # stop the cluster
-    $ sky stop my-cpu
+Interactive nodes can be stopped, restarted, and terminated, like any other cluster:
 
-    # restart the cluster
-    $ sky start my-cpu
+.. code-block:: console
+
+    $ # Stop at the end of the work day:
+    $ sky stop sky-gpunode-<username>
+
+    $ # Restart it the next morning:
+    $ sky start sky-gpunode-<username>
+
+    $ # Terminate entirely:
+    $ sky down sky-gpunode-<username>
 
 .. note::
 
-    Since :code:`sky start` is used to restart a stopped cluster, auto-failover provisioning
-    is not used and the cluster will be started on the same cloud and region that it was
-    originally provisioned on.
+    Stopping a cluster does not lose data on the attached disks (billing for the
+    instances will stop while the disks will still be charged).  Those disks
+    will be reattached when restarting the cluster.  Terminating a cluster, on
+    the other hand, will delete all associated resources (all billing stops),
+    and any data on the attached disks will be lost.
 
+.. note::
 
-Advanced configuration
+    Since :code:`sky start` restarts a stopped cluster, :ref:`auto-failover
+    provisioning <auto-failover>` is disabled---the cluster will be restarted on
+    the same cloud and region where it was originally provisioned.
+
+
+Getting multiple nodes
 ----------------------
-By default, interactive clusters are a single node. If you require a cluster with multiple nodes
-(e.g. for distributed training, etc.), you can launch a cluster using YAML:
+By default, interactive clusters are a single node. If you require a cluster
+with multiple nodes (e.g., for hyperparameter tuning or distributed training),
+use :code:`num_nodes` in a YAML spec:
 
 .. code-block:: yaml
 
     # multi_node.yaml
-
     num_nodes: 16
     resources:
       accelerators: V100:8
diff --git a/sky/cli.py b/sky/cli.py
@@ -950,6 +950,10 @@ def stop(
     CLUSTER is the name of the cluster to stop.  If both CLUSTER and --all are
     supplied, the latter takes precedence.
 
+    Stopping a cluster does not lose data on the attached disks (billing for
+    the instances will stop while the disks will still be charged).  Those
+    disks will be reattached when restarting the cluster.
+
     Currently, spot-instance clusters cannot be stopped.
 
     Examples:
@@ -1118,6 +1122,9 @@ def down(
     CLUSTER is the name of the cluster to tear down.  If both CLUSTER and --all
     are supplied, the latter takes precedence.
 
+    Terminating a cluster will delete all associated resources (all billing
+    stops), and any data on the attached disks will be lost.
+
     Accelerators (e.g., TPU) that are part of the cluster will be deleted too.
 
     Examples: