Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add descriptions and fix formatting #97

Merged
merged 8 commits into from
Sep 16, 2024
6 changes: 4 additions & 2 deletions docs/modules/demos/pages/airflow-scheduled-job.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
= airflow-scheduled-job
:page-aliases: stable@stackablectl::demos/airflow-scheduled-job.adoc
:description: This demo installs Airflow with Postgres and Redis on Kubernetes, showcasing DAG scheduling, job runs, and status verification via the Airflow UI.

Install this demo on an existing Kubernetes cluster:

Expand Down Expand Up @@ -102,9 +103,10 @@ Click on the `run_every_minute` box in the centre of the page and then select `L

[WARNING]
====
In this demo, the logs are not available when the KubernetesExecutor is deployed. See the https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#managing-dags-and-logs[Airflow Documentation] for more details.
In this demo, the logs are not available when the KubernetesExecutor is deployed.
See the https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#managing-dags-and-logs[Airflow Documentation] for more details.

If you are interested in persisting the logs, please take a look at the xref:logging.adoc[] demo.
If you are interested in persisting the logs, take a look at the xref:logging.adoc[] demo.
====

image::airflow-scheduled-job/airflow_9.png[]
Expand Down
207 changes: 102 additions & 105 deletions docs/modules/demos/pages/data-lakehouse-iceberg-trino-spark.adoc

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions docs/modules/demos/pages/end-to-end-security.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= end-to-end-security

:k8s-cpu: https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu
:description: This demo showcases end-to-end security in Stackable Data Platform with OPA, featuring row/column access control, OIDC, Kerberos, and flexible group policies.

This is a demo to showcase what can be done with Open Policy Agent around authorization in the Stackable Data Platform.
It covers the following aspects of security:
Expand Down Expand Up @@ -55,8 +55,7 @@ You can see the deployed products and their relationship in the following diagra

image::end-to-end-security/overview.png[Architectural overview]

Please note the different types of arrows used to connect the technologies in here, which symbolize
how authentication happens along that route and if impersonation is used for queries executed.
Note the different types of arrows used to connect the technologies in here, which symbolize how authentication happens along that route and if impersonation is used for queries executed.

The Trino schema (with schemas, tables and views) is shown below.

Expand Down
46 changes: 22 additions & 24 deletions docs/modules/demos/pages/hbase-hdfs-load-cycling-data.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
= hbase-hdfs-cycling-data
:page-aliases: stable@stackablectl::demos/hbase-hdfs-load-cycling-data.adoc
:description: Load cyclist data from HDFS to HBase on Kubernetes using Stackable's demo. Install, copy data, create HFiles, and query efficiently.

:kaggle: https://www.kaggle.com/datasets/timgid/cyclistic-dataset-google-certificate-capstone?select=Divvy_Trips_2020_Q1.csv
:k8s-cpu: https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu
Expand All @@ -14,10 +15,7 @@ Install this demo on an existing Kubernetes cluster:
$ stackablectl demo install hbase-hdfs-load-cycling-data
----

[WARNING]
====
This demo should not be run alongside other demos.
====
WARNING: This demo should not be run alongside other demos.

[#system-requirements]
== System requirements
Expand All @@ -34,11 +32,11 @@ This demo will

* Install the required Stackable operators.
* Spin up the following data products:
** *Hbase:* An open source distributed, scalable, big data store. This demo uses it to store the
** *HBase:* An open source distributed, scalable, big data store. This demo uses it to store the
{kaggle}[cyclist dataset] and enable access.
** *HDFS:* A distributed file system used to intermediately store the dataset before importing it into Hbase
** *HDFS:* A distributed file system used to intermediately store the dataset before importing it into HBase
* Use {distcp}[distcp] to copy a {kaggle}[cyclist dataset] from an S3 bucket into HDFS.
* Create HFiles, a File format for hbase consisting of sorted key/value pairs. Both keys and values are byte arrays.
* Create HFiles, a File format for hBase consisting of sorted key/value pairs. Both keys and values are byte arrays.
* Load Hfiles into an existing table via the `Importtsv` utility, which will load data in `TSV` or `CSV` format into
HBase.
* Query data via the `hbase` shell, which is an interactive shell to execute commands on the created table
Expand Down Expand Up @@ -86,10 +84,9 @@ This demo will run two jobs to automatically load data.

=== distcp-cycling-data

{distcp}[DistCp] (distributed copy) is used for large inter/intra-cluster copying. It uses MapReduce to effect its
distribution, error handling, recovery, and reporting. It expands a list of files and directories into input to map
tasks, each of which will copy a partition of the files specified in the source list. Therefore, the first Job uses
DistCp to copy data from a S3 bucket into HDFS. Below, you'll see parts from the logs.
{distcp}[DistCp] (distributed copy) efficiently transfers large amounts of data from one location to another.
Therefore, the first Job uses DistCp to copy data from a S3 bucket into HDFS.
Below, you'll see parts from the logs.

[source]
----
Expand All @@ -110,11 +107,12 @@ Copying s3a://public-backup-nyc-tlc/cycling-tripdata/demo-cycling-tripdata.csv.g

The second Job consists of 2 steps.

First, we use `org.apache.hadoop.hbase.mapreduce.ImportTsv` (see {importtsv}[ImportTsv Docs]) to create a table and
Hfiles. Hfile is an Hbase dedicated file format which is performance optimized for hbase. It stores meta-information
about the data and thus increases the performance of hbase. When connecting to the hbase master, opening a hbase shell
and executing `list`, you will see the created table. However, it'll contain 0 rows at this point. You can connect to
the shell via:
First, we use `org.apache.hadoop.hbase.mapreduce.ImportTsv` (see {importtsv}[ImportTsv Docs]) to create a table and Hfiles.
Hfile is an HBase dedicated file format which is performance optimized for HBase.
It stores meta-information about the data and thus increases the performance of HBase.
When connecting to the HBase master, opening a HBase shell and executing `list`, you will see the created table.
However, it'll contain 0 rows at this point.
You can connect to the shell via:

[source,console]
----
Expand All @@ -135,7 +133,7 @@ cycling-tripdata
----

Secondly, we'll use `org.apache.hadoop.hbase.tool.LoadIncrementalHFiles` (see {bulkload}[bulk load docs]) to import
the Hfiles into the table and ingest rows.
the Hfiles into the table and ingest rows.

Now we will see how many rows are in the `cycling-tripdata` table:

Expand All @@ -162,7 +160,7 @@ Took 13.4666 seconds

== Inspecting the Table

You can now use the table and the data. You can use all available hbase shell commands.
You can now use the table and the data. You can use all available HBase shell commands.

[source,sql]
----
Expand Down Expand Up @@ -190,15 +188,15 @@ COLUMN FAMILIES DESCRIPTION
{NAME => 'started_at', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
----

== Accessing the Hbase web interface
== Accessing the HBase web interface

[TIP]
====
Run `stackablectl stacklet list` to get the address of the _ui-http_ endpoint.
If the UI is unavailable, please do a port-forward `kubectl port-forward hbase-master-default-0 16010`.
If the UI is unavailable, do a port-forward `kubectl port-forward hbase-master-default-0 16010`.
====

The Hbase web UI will give you information on the status and metrics of your Hbase cluster. See below for the start page.
The HBase web UI will give you information on the status and metrics of your HBase cluster. See below for the start page.

image::hbase-hdfs-load-cycling-data/hbase-ui-start-page.png[]

Expand All @@ -208,8 +206,7 @@ image::hbase-hdfs-load-cycling-data/hbase-table-ui.png[]

== Accessing the HDFS web interface

You can also see HDFS details via a UI by running `stackablectl stacklet list` and following the link next to one of
the namenodes.
You can also see HDFS details via a UI by running `stackablectl stacklet list` and following the link next to one of the namenodes.

Below you will see the overview of your HDFS cluster.

Expand All @@ -223,7 +220,8 @@ You can also browse the file system by clicking on the `Utilities` tab and selec

image::hbase-hdfs-load-cycling-data/hdfs-data.png[]

Navigate in the file system to the folder `data` and then the `raw` folder. Here you can find the raw data from the distcp job.
Navigate in the file system to the folder `data` and then the `raw` folder.
Here you can find the raw data from the distcp job.

image::hbase-hdfs-load-cycling-data/hdfs-data-raw.png[]

Expand Down
37 changes: 17 additions & 20 deletions docs/modules/demos/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,33 +1,30 @@
= Demos
:page-aliases: stable@stackablectl::demos/index.adoc
:description: Explore Stackable demos showcasing data platform architectures. Includes external components for evaluation.

The pages below this section guide you on how to use the demos provided by Stackable. To install a demo please follow
the xref:management:stackablectl:quickstart.adoc[quickstart guide] or have a look at the
xref:management:stackablectl:commands/demo.adoc[demo command]. We currently offer the following list of demos:
The pages in this section guide you on how to use the demos provided by Stackable.
To install a demo follow the xref:management:stackablectl:quickstart.adoc[quickstart guide] or have a look at the xref:management:stackablectl:commands/demo.adoc[demo command].
These are the available demos:

include::partial$demos.adoc[]

[IMPORTANT]
.External Components in these demos
====
These demos are provided by Stackable as showcases to demonstrate potential architectures that could be built with the
Stackable Data Platform. As such they may include components that are not supported by Stackable as part of our
commercial offering.
These demos are provided by Stackable as showcases to demonstrate potential architectures that could be built with the Stackable Data Platform.
As such they may include components that are not supported by Stackable as part of our commercial offering.

If you are evaluating one or more of these demos with the intention of purchasing a subscription, please make sure to
double-check the list of supported operators, anything that is not mentioned on there is not part of our commercial
offering.
If you are evaluating one or more of these demos with the intention of purchasing a subscription, make sure to double-check the list of supported operators; anything that is not mentioned on there is not part of our commercial offering.

Below you can find a list of components that are currently contained in one or more of the demos for reference, if
something is missing from this list and also not mentioned on our operators list, then this component is not supported:
Below you can find a list of components that are currently contained in one or more of the demos for reference, if something is missing from this list and also not mentioned on our operators list, then this component is not supported:

- Grafana
- JupyterHub
- MinIO
- OpenLDAP
- OpenSearch
- OpenSearch Dashboards
- PostgreSQL
- Prometheus
- Redis
* Grafana
* JupyterHub
* MinIO
* OpenLDAP
* OpenSearch
* OpenSearch Dashboards
* PostgreSQL
* Prometheus
* Redis
====
Loading