Skip to content

Commit e12144b

Browse files
Add new planner, JDBC driver, and DDL machinery (#72)
* Add new planner, JDBC driver, and DDL machinery * Disable integration tests in GH workflow Co-authored-by: Joseph Grogan <[email protected]>
1 parent 67fda04 commit e12144b

File tree

161 files changed

+9055
-1482
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

161 files changed

+9055
-1482
lines changed

.github/workflows/integration-tests.yml

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -38,17 +38,6 @@ jobs:
3838
run: make deploy
3939
- name: Deploy Samples
4040
run: make deploy-samples
41-
- name: Wait for Readiness
42-
run: kubectl wait pod hoptimator --for condition=Ready --timeout=10m
43-
- name: Wait for Flink Jobs
44-
run: |
45-
i=0
46-
while [ $i -ne 10 ]
47-
do
48-
kubectl wait flinkdeployments --all --for=jsonpath={.status.lifecycleState}=STABLE --timeout=1m && break || sleep 60
49-
i=$(($i+1))
50-
echo "No stable Flink jobs after $i tries..."
51-
done
5241
- name: Run Integration Tests
5342
run: make integration-tests
5443
- name: Capture Cluster State

Dockerfile

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
FROM eclipse-temurin:18
22
WORKDIR /home/
3-
ADD ./hoptimator-cli-integration/build/distributions/hoptimator-cli-integration.tar ./
43
ADD ./hoptimator-operator-integration/build/distributions/hoptimator-operator-integration.tar ./
54
ADD ./etc/* ./
65
ENTRYPOINT ["/bin/sh", "-c"]
7-
CMD ["./hoptimator-cli-integration/bin/hoptimator-cli-integration -n '' -p '' -u jdbc:calcite:model=model.yaml"]
86

Makefile

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,25 @@
11

2+
install:
3+
./gradlew installDist
4+
25
build:
36
./gradlew build
47
docker build . -t hoptimator
58
docker build hoptimator-flink-runner -t hoptimator-flink-runner
69

7-
bounce: build undeploy deploy deploy-samples deploy-config
10+
bounce: build undeploy deploy deploy-samples deploy-config deploy-demo
811

912
integration-tests:
10-
./bin/hoptimator --run=./integration-tests.sql
11-
echo "\nPASS"
13+
echo "\nNOTHING TO DO FOR NOW"
1214

1315
clean:
1416
./gradlew clean
1517

18+
deploy-demo:
19+
kubectl apply -f ./deploy/samples/demodb.yaml
20+
1621
deploy: deploy-config
22+
kubectl apply -f ./hoptimator-k8s/src/main/resources/
1723
kubectl apply -f ./deploy
1824

1925
undeploy:

README.md

Lines changed: 37 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,68 @@
11
# Hoptimator
22

3-
Multi-hop declarative data pipelines
3+
## Intro
44

5-
## What is Hoptimator?
5+
Hoptimator gives you a SQL interface to a Kubernetes cluster. You can install databases, query tables, create views, and deploy data pipelines using just SQL.
66

7-
Hoptimator is an SQL-based control plane for complex data pipelines.
8-
9-
Hoptimator turns high-level SQL _subscriptions_ into multi-hop data pipelines. Pipelines may involve an auto-generated Flink job (or similar) and any arbitrary resources required for the job to run.
10-
11-
## How does it work?
12-
13-
Hoptimator has a pluggable _adapter_ framework, which lets you wire-up arbtitary data sources. Adapters loosely correspond to connectors in the underlying compute engine (e.g. Flink Connectors), but they may include custom control plane logic. For example, an adapter may create a cache or a CDC stream as part of a pipeline. This enables a single pipeline to span multiple "hops" across different systems (as opposed to, say, a single Flink job).
14-
15-
Hoptimator's pipelines tend to have the following general shape:
16-
17-
_________
18-
topic1 ----------------------> | |
19-
table2 --> CDC ---> topic2 --> | SQL job | --> topic4
20-
table3 --> rETL --> topic3 --> |_________|
7+
To install a database, use `kubectl`:
218

9+
```
10+
$ kubectl apply -f my-database.yaml
11+
```
2212

23-
The three data sources on the left correspond to three different adapters:
13+
(`create database` is coming soon!)
2414

25-
1. `topic1` can be read directly from a Flink job, so the first adapter simply configures a Flink connector.
26-
2. `table2` is inefficient for bulk access, so the second adapter creates a CDC stream (`topic2`) and configures a Flink connector to read from _that_.
27-
3. `table3` is in cold storage, so the third adapter creates a reverse-ETL job to re-ingest the data into Kafka.
15+
Then use Hoptimator DDL to create a materialized view:
2816

29-
In order to deploy such a pipeline, you only need to write one SQL query, called a _subscription_. Pipelines are constructed automatically based on subscriptions.
17+
```
18+
> create materialized view my.foo as select * from ads.page_views;
19+
```
3020

31-
## Quick Start
21+
Views created via DDL show up in Kubernetes as `views`:
3222

33-
### Prerequistes
23+
```
24+
$ kubectl get views
25+
NAME SCHEMA VIEW SQL
26+
my-foo MY FOO SELECT *...
3427
35-
1. `docker` is installed and docker daemon is running
36-
2. `kubectl` is installed and cluster is running
37-
1. `minikube` can be used for a local cluster
38-
3. `helm` for Kubernetes is installed
28+
```
3929

40-
### Run
30+
Materialized views result in `pipelines`:
4131

4232
```
43-
$ make quickstart
44-
... wait a while ...
45-
$ ./bin/hoptimator
46-
> !intro
47-
> !q
33+
$ kubectl get pipelines
34+
NAME SQL STATUS
35+
my-foo INSERT INTO... Ready.
4836
```
4937

50-
## Subscriptions
38+
## Quickstart
5139

52-
Subscriptions are SQL views that are automatically materialized by a pipeline.
40+
Hoptimator requires a Kubernetes cluster. To connect from outside a Kubernetes cluster, make sure your `kubectl` is properly configured.
5341

5442
```
55-
$ kubectl apply -f deploy/samples/subscriptions.yaml
43+
$ make install # build and install SQL CLI
44+
$ make deploy deploy-demo # install CRDs and K8s objects
45+
$ ./hoptimator
46+
> !intro
5647
```
5748

58-
In response, the operator will deploy a Flink job and other resources:
49+
## The SQL CLI
5950

60-
```
61-
$ kubectl get subscriptions
62-
$ kubectl get flinkdeployments
63-
$ kubectl get kafkatopics
64-
```
51+
The `./hoptimator` script launches the [sqlline](https://github.com/julianhyde/sqlline) SQL CLI pre-configured to connect to `jdbc:hoptimator://`. The CLI includes some additional commands. See `!intro`.
6552

66-
You can verify the job is running by inspecting the output:
53+
## The JDBC Driver
6754

68-
```
69-
$ ./bin/hoptimator
70-
> !tables
71-
> SELECT * FROM RAWKAFKA."products" LIMIT 5;
72-
> !q
73-
```
55+
To use Hoptimator from Java code, or from anything that supports JDBC, use the `jdbc:hoptimator://` JDBC driver.
7456

7557
## The Operator
7658

77-
Hoptimator-operator is a Kubernetes operator that orchestrates multi-hop data pipelines based on Subscriptions (a custom resource). When a Subscription is deployed, the operator:
59+
`hoptimator-operator` turns materialized views into real data pipelines.
7860

79-
1. creates a _plan_ based on the Subscription SQL. The plan includes a set of _resources_ that make up a _pipeline_.
80-
2. deploys each resource in the pipeline. This may involve creating Kafka topics, Flink jobs, etc.
81-
3. reports Subscription status, which depends on the status of each resource in the pipeline.
61+
## Extending Hoptimator
8262

83-
The operator is extensible via _adapters_. Among other responsibilities, adapters can implement custom control plane logic (see `ControllerProvider`), or they can depend on external operators. For example, the Kafka adapter actively manages Kafka topics using a custom controller. The Flink adapter defers to [flink-kubernetes-operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/) to manage Flink jobs.
63+
Hoptimator can be extended via `TableTemplates`:
8464

85-
## The CLI
65+
```
66+
$ kubectl apply -f my-table-template.yaml
67+
```
8668

87-
Hoptimator includes a SQL CLI based on [sqlline](https://github.com/julianhyde/sqlline). This is primarily for testing and debugging purposes, but it can also be useful for runnig ad-hoc queries. The CLI leverages the same adapters as the operator, but it doesn't deploy anything. Instead, queries run as local, in-process Flink jobs.

deploy/hoptimator-pod.yaml

Lines changed: 0 additions & 21 deletions
This file was deleted.

deploy/samples/demodb.yaml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
apiVersion: hoptimator.linkedin.com/v1alpha1
2+
kind: Database
3+
metadata:
4+
name: ads-database
5+
spec:
6+
schema: ADS
7+
url: jdbc:demodb://ads
8+
dialect: Calcite
9+
10+
---
11+
12+
apiVersion: hoptimator.linkedin.com/v1alpha1
13+
kind: Database
14+
metadata:
15+
name: profile-database
16+
spec:
17+
schema: PROFILE
18+
url: jdbc:demodb://profile
19+
dialect: Calcite
20+
21+
---
22+
23+
apiVersion: hoptimator.linkedin.com/v1alpha1
24+
kind: TableTemplate
25+
metadata:
26+
name: demodb-template
27+
spec:
28+
databases:
29+
- profile-database
30+
- ads-database
31+
connector: |
32+
connector = demo
33+
database = {{database}}
34+
table = {{table}}
35+

etc/integration-tests.sql

Lines changed: 0 additions & 20 deletions
This file was deleted.

generate-models.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/sh
2+
3+
docker pull ghcr.io/kubernetes-client/java/crd-model-gen:v1.0.6
4+
5+
docker run \
6+
--rm \
7+
--mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
8+
--mount type=bind,src="$(pwd)",dst="$(pwd)" \
9+
-ti \
10+
--network host \
11+
ghcr.io/kubernetes-client/java/crd-model-gen:v1.0.6 \
12+
/generate.sh -o "$(pwd)/hoptimator-k8s" -n "" -p "com.linkedin.hoptimator.k8s" \
13+
-u "$(pwd)/hoptimator-k8s/src/main/resources/databases.crd.yaml" \
14+
-u "$(pwd)/hoptimator-k8s/src/main/resources/pipelines.crd.yaml" \
15+
-u "$(pwd)/hoptimator-k8s/src/main/resources/tabletemplates.crd.yaml" \
16+
-u "$(pwd)/hoptimator-k8s/src/main/resources/views.crd.yaml" \
17+
-u "$(pwd)/hoptimator-k8s/src/main/resources/subscriptions.crd.yaml" \
18+
&& echo "done."

gradle/libs.versions.toml

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,31 @@
11
[libraries]
22
assertj = "org.assertj:assertj-core:3.12.0"
33
avro = "org.apache.avro:avro:1.10.2"
4-
calciteAvatica = "org.apache.calcite.avatica:avatica:1.23.0"
5-
calciteCore = "org.apache.calcite:calcite-core:1.34.0"
6-
flinkClients = "org.apache.flink:flink-clients:1.16.2"
7-
flinkConnectorBase = "org.apache.flink:flink-connector-base:1.16.2"
8-
flinkCore = "org.apache.flink:flink-core:1.16.2"
9-
flinkCsv = "org.apache.flink:flink-csv:1.16.2"
10-
flinkStreamingJava = "org.apache.flink:flink-streaming-java:1.16.2"
11-
flinkTableApiJava = "org.apache.flink:flink-table-api-java:1.16.2"
12-
flinkTableApiJavaBridge = "org.apache.flink:flink-table-api-java-bridge:1.16.2"
13-
flinkTableCommon = "org.apache.flink:flink-table-common:1.16.2"
14-
flinkTablePlanner = "org.apache.flink:flink-table-planner_2.12:1.16.2"
15-
flinkTableRuntime = "org.apache.flink:flink-table-runtime:1.16.2"
16-
flinkMetricsDropwizard = "org.apache.flink:flink-metrics-dropwizard:1.16.2"
17-
flinkConnectorKafka = "org.apache.flink:flink-sql-connector-kafka:1.16.2"
18-
flinkConnectorMySqlCdc = "com.ververica:flink-sql-connector-mysql-cdc:2.3.0"
4+
calcite-avatica = "org.apache.calcite.avatica:avatica:1.23.0"
5+
calcite-core = "org.apache.calcite:calcite-core:1.34.0"
6+
calcite-server = "org.apache.calcite:calcite-server:1.34.0"
7+
flink-clients = "org.apache.flink:flink-clients:1.16.2"
8+
flink-connector-base = "org.apache.flink:flink-connector-base:1.16.2"
9+
flink-core = "org.apache.flink:flink-core:1.16.2"
10+
flink-csv = "org.apache.flink:flink-csv:1.16.2"
11+
flink-streaming-java = "org.apache.flink:flink-streaming-java:1.16.2"
12+
flink-table-api-java = "org.apache.flink:flink-table-api-java:1.16.2"
13+
flink-table-api-java-bridge = "org.apache.flink:flink-table-api-java-bridge:1.16.2"
14+
flink-table-common = "org.apache.flink:flink-table-common:1.16.2"
15+
flink-table-planner = "org.apache.flink:flink-table-planner_2.12:1.16.2"
16+
flink-table-runtime = "org.apache.flink:flink-table-runtime:1.16.2"
17+
flink-connector-kafka = "org.apache.flink:flink-sql-connector-kafka:1.16.2"
18+
flink-connector-mysql-cdc = "com.ververica:flink-sql-connector-mysql-cdc:2.3.0"
1919
gson = "com.google.code.gson:gson:2.9.0"
2020
jackson = "com.fasterxml.jackson.core:jackson-core:2.14.1"
21-
jacksonYaml = "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.14.1"
22-
javaxAnnotationApi = "javax.annotation:javax.annotation-api:1.3.2"
21+
jackson-dataformat-yaml = "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.14.1"
22+
javax-annotation-api = "javax.annotation:javax.annotation-api:1.3.2"
2323
junit = "junit:junit:4.12"
24-
kafkaClients = "org.apache.kafka:kafka-clients:2.7.1"
25-
kubernetesClient = "io.kubernetes:client-java:16.0.2"
26-
kubernetesExtendedClient = "io.kubernetes:client-java-extended:16.0.2"
27-
slf4jSimple = "org.slf4j:slf4j-simple:1.7.30"
28-
slf4jApi = "org.slf4j:slf4j-api:1.7.30"
24+
kafka-clients = "org.apache.kafka:kafka-clients:2.7.1"
25+
kubernetes-client = "io.kubernetes:client-java:16.0.2"
26+
kubernetes-extended-client = "io.kubernetes:client-java-extended:16.0.2"
27+
slf4j-simple = "org.slf4j:slf4j-simple:1.7.30"
28+
slf4j-api = "org.slf4j:slf4j-api:1.7.30"
2929
sqlline = "sqlline:sqlline:1.12.0"
30-
commonsCli = 'commons-cli:commons-cli:1.4'
31-
30+
commons-cli = "commons-cli:commons-cli:1.4"
31+
quidem = "net.hydromatic:quidem:0.11"

hoptimator

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/sh
2+
3+
BASEDIR="$( cd "$( dirname "$0" )" && pwd )"
4+
5+
$BASEDIR/hoptimator-cli/build/install/hoptimator-cli/bin/hoptimator-cli sqlline.SqlLine \
6+
-ac sqlline.HoptimatorAppConfig \
7+
-u jdbc:hoptimator:// -n "" -p "" -nn "Hoptimator" $@
8+

0 commit comments

Comments
 (0)