Skip to content

Commit 7ccdee4

Browse files
authored
Add multiple AWS and Azure Databricks examples (#17)
* add more aws examples and teradata demo example * change VM type to support nested virtualization * update VM size type to work with gen2 image * add workspace and vnet configs * add config script for td vm * update module and readme for Teradata * add readme content * update readme * readme update * update vm nsg for TD * clean up repo * readme * readme * replace underscore with dash for foldernames * add adb-kafka example * remove simple stuff
1 parent 234ecb4 commit 7ccdee4

File tree

131 files changed

+4384
-154
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

131 files changed

+4384
-154
lines changed

examples/adb-basic-demo/outputs.tf

-4
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,11 @@
11
output "databricks_azure_workspace_resource_id" {
2-
// The ID of the Databricks Workspace in the Azure management plane.
32
value = azurerm_databricks_workspace.example.id
43
}
54

65
output "workspace_url" {
7-
// The workspace URL which is of the format 'adb-{workspaceId}.{random}.azuredatabricks.net'
8-
// this is not named as DATABRICKS_HOST, because it affect authentication
96
value = "https://${azurerm_databricks_workspace.example.workspace_url}/"
107
}
118

129
output "module_cluster_id" {
13-
// reference to module's outputs: value = module.module_name.output_attr_name
1410
value = module.auto_scaling_cluster_example.cluster_id
1511
}

examples/adb-kafka/.terraform.lock.hcl

+97
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

examples/adb-kafka/README.md

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
## ADB - Kafka Single VM Demo environment
2+
3+
This template provisions a single VM and Azure Databricks workspace, installation of Kafka service is a manual step. Major components to deploy include:
4+
- 1 Vnet with 3 subnets (2 for Databricks, 1 for Kafka VM)
5+
- 1 Azure VM (to host Kafka and Zookeeper services), with port 9092 exposed to other devices in same VNet (allowed by default NSG rules).
6+
- 1 VNet injected Azure Databricks Workspace
7+
- NSGs for Databricks and Kafka subnets
8+
9+
## Folder Structure
10+
.
11+
├── main.tf
12+
├── outputs.tf
13+
├── data.tf
14+
├── providers.tf
15+
├── variables.tf
16+
├── vnet.tf
17+
├── workspace.tf
18+
├── terraform.tfvars
19+
├── charts
20+
├── modules
21+
├── general_vm
22+
├── main.tf
23+
├── outputs.tf
24+
├── providers.tf
25+
├── variables.tf
26+
27+
`terraform.tfvars` is provided as reference variable values, you should change it based on your need.
28+
29+
## Getting Started
30+
31+
> Step 1: Preparation
32+
33+
Clone this repo to your local, and run `az login` to interactively login thus get authenticated with `azurerm` provider.
34+
35+
> Step 2: Deploy resources
36+
37+
Change the `terraform.tfvars` to your need (you can also leave as default values as a random string will be generated in prefix), then run:
38+
```bash
39+
terraform init
40+
terraform apply
41+
```
42+
This will deploy all resources wrapped in a new resource group to your the default subscription of your `az login` profile; you will see the public ip address of the VM after the deployment is done. After deployment, you will get below resources:
43+
44+
![alt text](./charts/resources.png?raw=true)
45+
46+
> Step 3: Configure your VM to run Kafka and Zookeeper services
47+
48+
At this moment, you have a vanilla VM without any bootstraping performed. We are to manually log into the VM and install Kafka and Zookeeper services.
49+
50+
The VM's private key has been generated for you in local folder; replace the public ip accordingly. SSH into VM by (azureuser is the hardcoded username for VMs in this template):
51+
52+
```bash
53+
ssh -i <private_key_local_path> azureuser@<public_ip>
54+
```
55+
56+
Now you should follow this [guide from DigitalOcean](https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-20-04) to install Kafka on the VM. Note that a few commands need to be updated:
57+
1. When downloading the kafka binary, go to https://kafka.apache.org/downloads.html and copy the latest binary link and replace it here:
58+
```bash
59+
curl "https://downloads.apache.org/kafka/3.3.2/kafka_2.12-3.3.2.tgz" -o ~/Downloads/kafka.tgz
60+
```
61+
62+
![alt text](./charts/kafka-download.png?raw=true)
63+
64+
2. When testing your Kafka installation, --zookeeper is deprecated, use --bootstrap-server instead:
65+
66+
```bash
67+
~/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic TutorialTopic
68+
```
69+
70+
At the end of the guide, you should have a running Kafka service on your VM. You can test it by running the following command:
71+
```bash
72+
sudo systemctl status kafka
73+
```
74+
75+
![alt text](./charts/test-kafka.png?raw=true)
76+
77+
> Step 4: Integration with Azure Databricks
78+
79+
Now your Kafka Broker is running; let's connect to it in Databricks.
80+
We first create a topic `TutorialTopic2` in Kafka via your VM's Command Line:
81+
82+
```bash
83+
~/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic TutorialTopic2
84+
```
85+
86+
Then we can write from Spark DataFrame to this topic; you can also test the connection by `telnet vm-private-ip 9092` first.
87+
88+
![alt text](./charts/write-to-kafka.png?raw=true)
89+
90+
Read from this topic in another stream job:
91+
92+
![alt text](./charts/read-kafka.png?raw=true)
88.7 KB
Loading
135 KB
Loading
40.3 KB
Loading
50.2 KB
Loading
103 KB
Loading

examples/adb-kafka/data.tf

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
data "azurerm_client_config" "current" {
2+
}
3+
4+
data "external" "me" {
5+
program = ["az", "account", "show", "--query", "user"]
6+
}

examples/adb-kafka/main.tf

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
resource "random_string" "naming" {
2+
special = false
3+
upper = false
4+
length = 6
5+
}
6+
7+
locals {
8+
prefix = join("-", [var.workspace_prefix, "${random_string.naming.result}"])
9+
location = var.rglocation
10+
dbfsname = join("", [var.dbfs_prefix, "${random_string.naming.result}"]) // dbfs name must not have special chars
11+
12+
tags = {
13+
Environment = "Testing"
14+
Owner = lookup(data.external.me.result, "name")
15+
Epoch = random_string.naming.result
16+
}
17+
}
18+
19+
resource "azurerm_resource_group" "this" {
20+
name = "adb-${local.prefix}-rg"
21+
location = local.location
22+
tags = local.tags
23+
}
24+
25+
module "kafka_broker" {
26+
source = "./modules/general_vm"
27+
resource_group_name = azurerm_resource_group.this.name
28+
vm_name = "broker01"
29+
region = local.location
30+
subnet_id = azurerm_subnet.vm_subnet.id
31+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
resource "azurerm_network_interface" "general-nic" {
2+
name = "${var.vm_name}-nic"
3+
location = var.region
4+
resource_group_name = var.resource_group_name
5+
6+
ip_configuration {
7+
name = "internal"
8+
subnet_id = var.subnet_id
9+
private_ip_address_allocation = "Dynamic"
10+
public_ip_address_id = azurerm_public_ip.general-nic-pubip.id
11+
}
12+
}
13+
14+
resource "tls_private_key" "general_ssh" {
15+
algorithm = "RSA"
16+
rsa_bits = 4096
17+
}
18+
19+
resource "local_file" "private_key" {
20+
content = tls_private_key.general_ssh.private_key_pem
21+
filename = "${var.vm_name}_ssh_private.pem"
22+
file_permission = "0600"
23+
}
24+
25+
resource "azurerm_public_ip" "general-nic-pubip" {
26+
name = "${var.vm_name}-nic-pubip"
27+
resource_group_name = var.resource_group_name
28+
location = var.region
29+
allocation_method = "Static"
30+
}
31+
32+
resource "azurerm_linux_virtual_machine" "general_vm" {
33+
name = "${var.vm_name}-vm"
34+
resource_group_name = var.resource_group_name
35+
location = var.region
36+
size = "Standard_D16s_v3"
37+
admin_username = "azureuser"
38+
39+
network_interface_ids = [
40+
azurerm_network_interface.general-nic.id,
41+
]
42+
43+
admin_ssh_key {
44+
username = "azureuser"
45+
public_key = tls_private_key.general_ssh.public_key_openssh // using generated ssh key
46+
}
47+
48+
os_disk {
49+
caching = "ReadWrite"
50+
storage_account_type = "Standard_LRS"
51+
}
52+
53+
source_image_reference {
54+
publisher = "Canonical"
55+
offer = "0001-com-ubuntu-server-focal"
56+
sku = "20_04-lts-gen2"
57+
version = "latest"
58+
}
59+
60+
depends_on = [
61+
local_file.private_key,
62+
]
63+
}
64+
65+
resource "azurerm_managed_disk" "general_disk" {
66+
name = "${var.vm_name}-disk"
67+
location = var.region
68+
resource_group_name = var.resource_group_name
69+
storage_account_type = "Standard_LRS"
70+
create_option = "Empty"
71+
disk_size_gb = 60
72+
}
73+
74+
resource "azurerm_virtual_machine_data_disk_attachment" "diskattachment" {
75+
managed_disk_id = azurerm_managed_disk.general_disk.id
76+
virtual_machine_id = azurerm_linux_virtual_machine.general_vm.id
77+
lun = "10"
78+
caching = "ReadWrite"
79+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
output "vm_public_ip" {
2+
value = azurerm_public_ip.general-nic-pubip.ip_address
3+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
terraform {
2+
required_providers {
3+
azurerm = {
4+
source = "hashicorp/azurerm"
5+
}
6+
}
7+
}
8+
9+
provider "azurerm" {
10+
features {}
11+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
variable "resource_group_name" {
2+
type = string
3+
}
4+
5+
variable "region" {
6+
type = string
7+
}
8+
9+
variable "subnet_id" {
10+
type = string
11+
}
12+
13+
variable "vm_name" {
14+
type = string
15+
}

examples/adb-kafka/outputs.tf

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
output "pip" {
2+
value = module.kafka_broker.vm_public_ip
3+
}

examples/adb-kafka/providers.tf

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
terraform {
2+
required_providers {
3+
azurerm = {
4+
source = "hashicorp/azurerm"
5+
}
6+
}
7+
}
8+
9+
provider "azurerm" {
10+
features {}
11+
}
12+
13+
provider "random" {
14+
}

examples/adb-kafka/terraform.tfvars

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
spokecidr="10.179.0.0/20"
2+
no_public_ip=true
3+
rglocation="southeastasia"
4+
dbfs_prefix="dbfs"
5+
workspace_prefix="adb-kafka"

0 commit comments

Comments
 (0)