Skip to content

Commit 360fa2d

Browse files
committedJan 18, 2019
Improved structure, descriptions and examples
1 parent 35fc812 commit 360fa2d

File tree

1 file changed

+23
-12
lines changed

1 file changed

+23
-12
lines changed
 

‎live-demo___python-jupyter-apache-kafka-ksql-tensorflow-keras.adoc

+23-12
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,22 @@
11
= Live Demo: Python, Jupyter notebook, TensorFlow, Keras, Apache Kafka and KSQL
22

33
Kai Waehner <kontakt@kai-waehner.de>
4-
16 Jan 2019
4+
18 Jan 2019
55

66
This script assumes that all components like Zookeeper, Kafka, Connect, KSQL, Jupyter) use default values.
77

88
We use the following test data (each row is one single payment):
99

10+
[source,bash]
11+
----
1012
Id bigint, Timestamp varchar, User varchar, Time int, V1 double, V2 double, V3 double, V4 double, V5 double, V6 double, V7 double, V8 double, V9 double, V10 double, V11 double, V12 double, V13 double, V14 double, V15 double, V16 double, V17 double, V18 double, V19 double, V20 double, V21 double, V22 double, V23 double, V24 double, V25 double, V26 double, V27 double, V28 double, Amount double, Class string
13+
----
1114

1215
== Starting backend services
1316

1417
First we need to start a local Kafka ecosystem to use KSQL. This can be done in Jupyter or from your development environment or command line.
1518

16-
We also need to create some test data: Either start a data generator to create a continous feed of streaming data, integrate with a file via a bash script, or use Kafka Connect for a real continuous data stream of any source data.
19+
We also need to create some test data: Either start a data generator to create a continuous feed of streaming data, integrate with a file via a bash script, or use Kafka Connect for a real continuous data stream of any source data.
1720

1821
This is not part of the ML-related tasks, but just to get some test data into a Kafka topic:
1922

@@ -23,7 +26,7 @@ This is not part of the ML-related tasks, but just to get some test data into a
2326
confluent start ksql-server
2427
2528
// Optional: Start Kafka Connect
26-
// confluent start connect
29+
confluent start connect
2730
2831
// Create Kafka topic
2932
kafka-topics --zookeeper localhost:2181 --create --topic creditcardfraud_source --partitions 3 --replication-factor 1
@@ -63,26 +66,23 @@ curl -s -X DELETE localhost:8083/connectors/file-source
6366
6467
----
6568

66-
You can also use an easy-to-use Data Generator; either as standalone script or Kafka Connect connector. More details and examples in the blog post "Easy Ways to Generate Test Data in Kafka" (https://www.confluent.io/blog/easy-ways-generate-test-data-kafka).
69+
You can also use an easy-to-use Kafka data generator; either as standalone script or Kafka Connect connector. See the blog post "Easy Ways to Generate Test Data in Kafka" (https://www.confluent.io/blog/easy-ways-generate-test-data-kafka) for details and examples about the Kafka data generator.
6770

6871
== Demo in Jupyter Notebook
69-
Now go to the Jupyter Notebook 'python-jupyter-apache-kafka-ksql-tensorflow-keras.ipynb' to do the preprocessing and interactive analysis with Python + KSQL, then the model training with Python + Keras.
72+
Now go to the Jupyter Notebook 'python-jupyter-apache-kafka-ksql-tensorflow-keras.ipynb' (https://github.com/kaiwaehner/python-jupyter-apache-kafka-ksql-tensorflow-keras/blob/master/live-demo___python-jupyter-apache-kafka-ksql-tensorflow-keras.adoc) to do the preprocessing and interactive analysis with Python + KSQL, then the model training with Python + TensorFlow / Keras.
7073

7174
[source,bash]
7275
----
7376
// Terminal
7477
jupyter notebook
7578
----
7679

77-
== Commands to create KSQL Streams and to consume events
78-
Some options to consume the data for testing:
80+
== Commands to create the KSQL Streams and to consume events
81+
These KSQL statements can be used from KSQL CLI (not via Python API):
7982

8083
[source,bash]
8184
----
8285
83-
// Terminal
84-
confluent consume creditcardfraud_source --from-beginning
85-
8686
// KSQL-CLI
8787
SELECT * FROM creditcardfraud_source;
8888
----
@@ -98,6 +98,17 @@ CREATE STREAM creditcardfraud_source (Id bigint, Timestamp varchar, User varcha
9898
// Filter messages where class is empty
9999
// Change data format to Avro
100100
CREATE STREAM creditcardfraud_preprocessed_avro WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='creditcardfraud_preprocessed_avro') AS SELECT Time, V1 , V2 , V3 , V4 , V5 , V6 , V7 , V8 , V9 , V10 , V11 , V12 , V13 , V14 , V15 , V16 , V17 , V18 , V19 , V20 , V21 , V22 , V23 , V24 , V25 , V26 , V27 , V28 , Amount , Class FROM creditcardfraud_source WHERE Class IS NOT NULL;
101+
102+
// Terminal
103+
confluent consume creditcardfraud_source --from-beginning
104+
----
105+
106+
You can also oconsume the data outside of Jupyter (for instance helpful if you need to find out if a problem is due to Python issues or Kafka / KSQL issues):
107+
108+
[source,bash]
109+
----
110+
// Terminal
111+
confluent consume creditcardfraud_source --from-beginning
101112
----
102113

103114
== Optional additional steps to analyse and process the source data
@@ -236,11 +247,11 @@ ksql-datagen quickstart=users format=json topic=users maxInterval=1000 propertie
236247

237248
== Helper commands for Python, Conda, Jupyter, pip
238249

239-
Open Jupyter notebook
250+
To open the Jupyter notebook, go to the folder where the '.ipynb' files are. Then:
240251

241252
[source,bash]
242253
----
243-
// Open Jupyter and select the notebook 'live-demo___python-jupyter-apache-kafka-ksql-tensorflow-keras.adoc'
254+
// Open Jupyter and select the notebook 'python-jupyter-apache-kafka-ksql-tensorflow-keras.ipynb'
244255
jupyter notebook
245256
----
246257

0 commit comments

Comments
 (0)
Please sign in to comment.