added documentation on file run output to S3 storage and logging. Mentioned new runtime env variable ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3 (#123)

shalberd · web-flow · commit 4d3c840757df · 2024-12-30T20:16:36.000-08:00
Signed-off-by: shalberd &lt;21118431+shalberd@users.noreply.github.com&gt;
diff --git a/pipelines/run-generic-pipelines-on-apache-airflow/README.md b/pipelines/run-generic-pipelines-on-apache-airflow/README.md
@@ -53,7 +53,24 @@ Elyra currently supports Apache Airflow deployments that utilize GitHub or GitHu
 - Branch in named repository, e.g. `test-dags`. This branch must exist.
 - [Personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) that Elyra can use to push DAGs to the repository, e.g. `4d79206e616d6520697320426f6e642e204a616d657320426f6e64`
 
-Elyra utilizes S3-compatible cloud storage to make data available to notebooks and Python scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and the Apache Airflow cluster. Collect the following information:
+Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and the Apache Airflow cluster.
+
+Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default.
+This happens in addition to logging and writing to STDOUT and STDERR at runtime.
+
+`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files:
+- `<notebook name>-output.ipynb`
+- `<notebook name>.html`
+
+.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files:
+- `<r or python filename>.log`
+
+Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**,
+either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, 
+either at Pipeline Properties - Generic Node Defaults - Environment Variables or at
+Node Properties - Additional Properties - Environment Variables.
+
+Collect the following information:
 - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000`
 - S3 object storage username, e.g. `minio`
 - S3 object storage password, e.g. `minio123`
diff --git a/pipelines/run-generic-pipelines-on-kubeflow-pipelines/README.md b/pipelines/run-generic-pipelines-on-kubeflow-pipelines/README.md
@@ -47,7 +47,24 @@ Collect the following information for your Kubeflow Pipelines installation:
 - Password, for a multi-user, auth-enabled Kubeflow installation, e.g. `passw0rd`
 - Workflow engine type, which should be `Argo` or `Tekton`. Contact your administrator if you are unsure which engine your deployment utilizes.
 
-Elyra utilizes S3-compatible cloud storage to make data available to notebooks and scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and from the Kubeflow Pipelines cluster. Collect the following information:
+Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and from the Kubeflow Pipelines cluster.
+
+Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default.
+This happens in addition to logging and writing to STDOUT and STDERR at runtime.
+
+`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files:
+- `<notebook name>-output.ipynb`
+- `<notebook name>.html`
+
+.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files:
+- `<r or python filename>.log`
+
+Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**,
+either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, 
+either at Pipeline Properties - Generic Node Defaults - Environment Variables or at
+Node Properties - Additional Properties - Environment Variables.
+
+Collect the following information:
 - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000`
 - S3 object storage username, e.g. `minio`
 - S3 object storage password, e.g. `minio123`
diff --git a/pipelines/run-pipelines-on-apache-airflow/README.md b/pipelines/run-pipelines-on-apache-airflow/README.md
@@ -52,7 +52,22 @@ Collect the following information for your Apache Airflow installation:
 
 Detailed instructions for setting up a DAG repository and generating an access token can be found in [the User Guide](https://elyra.readthedocs.io/en/latest/recipes/configure-airflow-as-a-runtime.html#setting-up-a-dag-repository-on-github).
 
-Elyra utilizes S3-compatible cloud storage to make data available to notebooks and scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Apache Airflow cluster. 
+Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Apache Airflow cluster. 
+
+Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default.
+This happens in addition to logging and writing to STDOUT and STDERR at runtime.
+
+`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files:
+- `<notebook name>-output.ipynb`
+- `<notebook name>.html`
+
+.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files:
+- `<r or python filename>.log`
+
+Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**,
+either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor,
+either at Pipeline Properties - Generic Node Defaults - Environment Variables or at
+Node Properties - Additional Properties - Environment Variables.
 
 Collect the following information:
 - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000`
diff --git a/pipelines/run-pipelines-on-kubeflow-pipelines/README.md b/pipelines/run-pipelines-on-kubeflow-pipelines/README.md
@@ -52,7 +52,23 @@ Collect the following information for your Kubeflow Pipelines installation:
 - Password, for a multi-user, auth-enabled Kubeflow installation, e.g. `passw0rd`
 - Workflow engine type, which should be `Argo` or `Tekton`. Contact your administrator if you are unsure which engine your deployment utilizes.
 
-Elyra utilizes S3-compatible cloud storage to make data available to notebooks and scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Kubeflow Pipelines cluster. 
+Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Kubeflow Pipelines cluster. 
+
+Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default.
+This happens in addition to logging and writing to STDOUT and STDERR at runtime.
+
+`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files:
+- `<notebook name>-output.ipynb`
+- `<notebook name>.html`
+
+.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files:
+- `<r or python filename>.log`
+
+Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**,
+either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, 
+either at Pipeline Properties - Generic Node Defaults - Environment Variables or at
+Node Properties - Additional Properties - Environment Variables.
+
 
 Collect the following information:
 - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000`