Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add slurm plugin blank components Signed-off-by: jiangjiawei1103 <[email protected]> * feat: Add naive slurm agent create and get with rest api Signed-off-by: jiangjiawei1103 <[email protected]> * Use asyncssh instead of REST API Signed-off-by: jiangjiawei1103 <[email protected]> * Test ssh communication and run sbatch Signed-off-by: JiaWei Jiang <[email protected]> * Add delete method and support slurm job state Signed-off-by: JiaWei Jiang <[email protected]> * feat: Submit and run SlurmTask on a remote Slurm cluster Successfully submit and run the user-defined task as a normal python function on a remote Slurm cluster. 1. Inherit from PythonFunctionTask instead of PythonTask 2. Transfer the task module through sftp 3. Interact with amazon s3 bucket on both localhost and Slurm cluster Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Remove redundant task_module transfer Specifying `--raw-output-data-prefix` option handles task_module download. Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Remove redundant env var Signed-off-by: JiaWei Jiang <[email protected]> * docs: Add env setup guide for local test Signed-off-by: JiaWei Jiang <[email protected]> * docs: Add links and figures Signed-off-by: JiaWei Jiang <[email protected]> * docs: Fix commit sha Signed-off-by: JiaWei Jiang <[email protected]> * docs: Fix commit sha for demo guide Signed-off-by: JiaWei Jiang <[email protected]> * docs: Fix links Signed-off-by: JiaWei Jiang <[email protected]> * feat: Support SSH config in task config Add `ssh_conf` filed to let users specify connection secret Note that reconnection is done in both `get` and `delete`. This is just a temporary workaround. Signed-off-by: JiaWei Jiang <[email protected]> * docs: Include ssh config in demo example Signed-off-by: JiaWei Jiang <[email protected]> * fix: Retain user-specified file format info Signed-off-by: JiaWei Jiang <[email protected]> * fix: Set sdt format based on user-specified file_format Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant modification Signed-off-by: JiaWei Jiang <[email protected]> * test: Test file_format attribute alignment in dc.sd Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Reduce ssh_conf option to slurm_host only For data scientists and MLEs developing flyte wf with Slurm agent, they don't actually need to know ssh connection details. We assume they only need to specify which Slurm cluster to use by hostname. Signed-off-by: JiaWei Jiang <[email protected]> * feat: Support Slurm agent with ShellTask 1. Write user-defined batch script to a tmp file 2. Transfer the batch script through sftp 3. Construct sbatch command to run on Slurm cluster Signed-off-by: JiaWei Jiang <[email protected]> * feat: Simplify Slurm job submission logic 1. Remove SFTP for batch script transfer * Assume Slurm batch script is present on Slurm cluster 2. Support directly specifying a remote batch script path Signed-off-by: JiaWei Jiang <[email protected]> * Added script args to agent and task Signed-off-by: pryce-turner <[email protected]> * Add asyncssh to dependencies Signed-off-by: JiaWei Jiang <[email protected]> * docs: Update setup and demo for a basic use case Signed-off-by: JiaWei Jiang <[email protected]> * docs: Update basic arch figure path Signed-off-by: JiaWei Jiang <[email protected]> * docs: Fix typo and hyperlink Signed-off-by: JiaWei Jiang <[email protected]> * fix: A tmp workaround to test agent locally without container_image Signed-off-by: JiaWei Jiang <[email protected]> * feat: Support user-defined batch script content with SlurmShellTask `SlurmTask` and `SlurmShellTask` now share the same agent. Signed-off-by: JiaWei Jiang <[email protected]> * feat: Fall back to PythonTask for naive use cases 1. Inherited from `PythonTask` for cases in which the batch script is already on the Slurm cluster 2. Use a dummy `Interface` as a tmp workaround Signed-off-by: JiaWei Jiang <[email protected]> * refactor: Define Slurm as a base task config and extend for remote script Signed-off-by: JiaWei Jiang <[email protected]> * feat: Support PythonFunctionTask and reorganize agent structure 1. Add back `PythonFunctionTask` to support running user-defined functions on Slurm 2. Categorize task types into `script/` and `function/` Signed-off-by: JiaWei Jiang <[email protected]> * Use poetry virtual env to avoid contamination Signed-off-by: JiangJiaWei1103 <[email protected]> * docs: Complete local test env setup process Signed-off-by: JiangJiaWei1103 <[email protected]> * docs: Add use cases ranging from basic to advanced Signed-off-by: JiangJiaWei1103 <[email protected]> * feat: Add a script option for the Slurm function task Signed-off-by: JiangJiaWei1103 <[email protected]> * fix: Avoid attaching async resource to different event loops Signed-off-by: JiangJiaWei1103 <[email protected]> * use await self._connect(slurm_host) in slurm agent Signed-off-by: Future-Outlier <[email protected]> * change Signed-off-by: Future-Outlier <[email protected]> * print more info Signed-off-by: Future-Outlier <[email protected]> * use logger Signed-off-by: Future-Outlier <[email protected]> * print more infor Signed-off-by: Future-Outlier <[email protected]> * print Signed-off-by: Future-Outlier <[email protected]> * Use sbatch for running Slurm function task Signed-off-by: JiangJiaWei1103 <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * push Signed-off-by: Future-Outlier <[email protected]> * feat: Show stdout and stderr msg of the Slurm cluster Signed-off-by: JiangJiaWei1103 <[email protected]> * feat: Show stdout and stderr msg of the Slurm cluster for SlurmFunctionTask Signed-off-by: JiangJiaWei1103 <[email protected]> * feat: Make an SSH connetion based on client config file or ssh_config 1. Make SSH `host` and `username` required fields 2. Support SSH connection based on the default OpenSSH client config file `~/.ssh/config` 3. Support SSH connection via public key auth either by user-specified `client_keys` or the secret for key `FLYTE_SLURM_PRIVATE_KEY` Signed-off-by: JiangJiaWei1103 <[email protected]> * Clarify SSH connection logic Signed-off-by: JiangJiaWei1103 <[email protected]> * feat: Interpolate the script with dynamic input values Signed-off-by: JiangJiaWei1103 <[email protected]> * feat: Interpolate the script with dynamic output values Support passing files across multiple `SlurmShellTask` Signed-off-by: JiangJiaWei1103 <[email protected]> * add assertion Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * Fix Script agent bug Signed-off-by: Future-Outlier <[email protected]> * agent service for shell task Signed-off-by: Future-Outlier <[email protected]> * Remove remote path to avoid race condition Signed-off-by: Future-Outlier <[email protected]> * Revert agent server change Signed-off-by: Future-Outlier <[email protected]> * use key val to run ssh config Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * use _get_or_create_ssh_connection Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * use SlurmCluster and hash Signed-off-by: Future-Outlier <[email protected]> * updagte Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * refactor: Simplify validation process and clean up legacy code 1. Ensure `"host"` must be provided in `__post_init__` 2. Explicitly set `known_hosts` to `None` 3. Make `username` optional 4. Remove legacy code snippets 5. Make docstring clear Signed-off-by: JiangJiaWei1103 <[email protected]> * Add Slurm agent function task Signed-off-by: JiangJiaWei1103 <[email protected]> * Revert ShellTask behavior Signed-off-by: JiangJiaWei1103 <[email protected]> * Remove fix for SlurmShellTask Signed-off-by: JiangJiaWei1103 <[email protected]> * Remove blank line Signed-off-by: JiangJiaWei1103 <[email protected]> * fix doc string and remove logs Signed-off-by: Future-Outlier <[email protected]> * build plugins Signed-off-by: Future-Outlier <[email protected]> * merge master Signed-off-by: Future-Outlier <[email protected]> * fix-sage-maker-test Signed-off-by: Future-Outlier <[email protected]> * add test_slurm_fn_task Signed-off-by: Future-Outlier <[email protected]> * fix Signed-off-by: Future-Outlier <[email protected]> * update flytebot Signed-off-by: Future-Outlier <[email protected]> * add know host = None Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: jiangjiawei1103 <[email protected]> Signed-off-by: JiaWei Jiang <[email protected]> Signed-off-by: pryce-turner <[email protected]> Signed-off-by: JiangJiaWei1103 <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: pryce-turner <[email protected]> Co-authored-by: Future-Outlier <[email protected]>
- Loading branch information