Skip to content

Commit efe9511

Browse files
authored
reorg external hive metastore example and simplify steps (#96)
* reorg external hive metastore example and simplify steps * removing file
1 parent 112debf commit efe9511

19 files changed

+134
-236
lines changed

examples/adb-external-hive-metastore/.terraform.lock.hcl

+23-27
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

examples/adb-external-hive-metastore/README.md

+3-15
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@ This architecture will be deployed:
66
![alt text](https://raw.githubusercontent.com/databricks/terraform-databricks-examples/main/examples/adb-external-hive-metastore/images/adb-external-hive-metastore.png?raw=true)
77

88
# Get Started:
9-
There are 2 stages of deployment: stage 1 will deploy all the major infra components including the Databricks workspace and the sql server & database that serves as your external hive metastore. After stage 1 is complete, you need to log into your workspace (this will turn you into the first workspace admin), then you need to navigate into `stage-2-workspace-objects` to deploy remaining components like secret scope, cluster, job, notebook, etc. These are the workspace objects that since we are using `az cli` auth type with Databricks provider at workspace level, we rely on having the caller identity being inside the workspace before stage 2.
9+
This template will complete 99% process for external hive metastore deployment with Azure Databricks, using hive version 3.1.0. The last 1% step is just to `run only once` a pre-deployed Databricks job to initialize the external hive metastore. After successful deployment, your cluster can connect to external hive metastore (using azure sql database).
1010

11-
Stage 1:
1211
On your local machine:
1312

1413
1. Clone this repository to local.
@@ -24,20 +23,9 @@ On your local machine:
2423

2524
`terraform apply`
2625

27-
After the deployment of stage 1 completes, you should have a Databricks workspace running in your own VNet, a sql server and azure sql database in another VNet, and private link connection from your Databricks VNet to your sql server.
26+
Now we log into the Databricks workspace, such that you are added into the workspace (since the user identity deployed workspace and have at least Contributor role on the workspace, upon lauching workspace, user identity will be added as workspace admin).
2827

29-
Now we need to manually log into the Databricks workspace, such that you are added into the workspace (since you have Azure contributor role on the workspace resource, at lauch workspace time, you will be added as workspace admin). After first login, you can now proceed to stage 2.
30-
31-
Stage 2:
32-
1. Navigate into `stage-2-workspace-objects` folder.
33-
2. Configure input variables, see samples inside provided `terraform.tfvars`. You can get the values from stage 1 outputs.
34-
3. Init terraform and apply to deploy resources:
35-
36-
`terraform init`
37-
38-
`terraform apply`
39-
40-
At this step, we've completes most of the work. The final step is to manually trigger the deployed job to run it only once.
28+
Once logged into workspace, the final step is to manually trigger the pre-deployed job.
4129

4230
Go to databricks workspace - Job - run the auto-deployed job only once; this is to initialize the database with metastore schema.
4331

examples/adb-external-hive-metastore/akv.tf

+9-9
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,17 @@ resource "azurerm_key_vault" "akv1" {
22
name = "${local.prefix}-akv"
33
location = azurerm_resource_group.this.location
44
resource_group_name = azurerm_resource_group.this.name
5-
enabled_for_disk_encryption = true
65
tenant_id = data.azurerm_client_config.current.tenant_id
6+
sku_name = "premium"
77
soft_delete_retention_days = 7
88
purge_protection_enabled = false
9-
sku_name = "standard"
10-
}
9+
enabled_for_disk_encryption = true
10+
11+
access_policy {
12+
tenant_id = data.azurerm_client_config.current.tenant_id
13+
object_id = data.azurerm_client_config.current.object_id
1114

12-
resource "azurerm_key_vault_access_policy" "example" {
13-
key_vault_id = azurerm_key_vault.akv1.id
14-
tenant_id = data.azurerm_client_config.current.tenant_id
15-
object_id = data.azurerm_client_config.current.object_id
16-
key_permissions = ["Backup", "Delete", "Get", "List", "Purge", "Recover", "Restore"]
17-
secret_permissions = ["Backup", "Delete", "Get", "List", "Purge", "Recover", "Restore", "Set"]
15+
key_permissions = ["Backup", "Delete", "Get", "List", "Purge", "Recover", "Restore"]
16+
secret_permissions = ["Backup", "Delete", "Get", "List", "Purge", "Recover", "Restore", "Set"]
17+
}
1818
}
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# automate job to init schema of the database, prep to be hive metastore
22
resource "databricks_notebook" "ddl" {
3-
source = "../coldstart/metastore_coldstart.py" #local notebook
3+
source = "./coldstart/metastore_coldstart.py" #local notebook
44
path = "${data.databricks_current_user.me.home}/coldstart" #remote notebook
55
}
66

Loading

examples/adb-external-hive-metastore/main.tf

+14-21
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,6 @@
77
* * External Hive Metastore for ADB workspace
88
*/
99

10-
provider "azurerm" {
11-
features {
12-
key_vault {
13-
purge_soft_delete_on_destroy = true
14-
}
15-
}
16-
}
17-
18-
provider "random" {
19-
}
20-
21-
# Use Azure CLI to authenticate at Azure Databricks account level, and the Azure Databricks workspace level
22-
provider "databricks" {
23-
host = azurerm_databricks_workspace.this.workspace_url
24-
}
25-
2610
resource "random_string" "naming" {
2711
special = false
2812
upper = false
@@ -36,17 +20,26 @@ data "external" "me" {
3620
program = ["az", "account", "show", "--query", "user"]
3721
}
3822

23+
# Retrieve information about the current user (the caller of tf apply)
24+
data "databricks_current_user" "me" {
25+
depends_on = [azurerm_databricks_workspace.this]
26+
}
27+
28+
data "databricks_spark_version" "latest_lts" {
29+
long_term_support = true
30+
latest = true
31+
depends_on = [azurerm_databricks_workspace.this]
32+
}
33+
34+
3935
locals {
4036
// dltp - databricks labs terraform provider
4137
prefix = join("-", [var.workspace_prefix, "${random_string.naming.result}"])
4238
location = var.rglocation
4339
cidr = var.spokecidr
4440
sqlcidr = var.sqlvnetcidr
4541
dbfsname = join("", [var.dbfs_prefix, "${random_string.naming.result}"]) // dbfs name must not have special chars
46-
47-
db_url = "jdbc:sqlserver://${azurerm_mssql_server.metastoreserver.name}.database.windows.net:1433;database=${azurerm_mssql_database.sqlmetastore.name};user=${var.db_username}@${azurerm_mssql_server.metastoreserver.name};password={${var.db_password}};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
48-
db_username_local = var.db_username
49-
db_password_local = var.db_password
42+
db_url = "jdbc:sqlserver://${azurerm_mssql_server.metastoreserver.name}.database.windows.net:1433;database=${azurerm_mssql_database.sqlmetastore.name};user=${var.db_username}@${azurerm_mssql_server.metastoreserver.name};password={${var.db_password}};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
5043

5144
tags = {
5245
Environment = "Testing"
@@ -56,7 +49,7 @@ locals {
5649
}
5750

5851
resource "azurerm_resource_group" "this" {
59-
name = "adb-dev-${local.prefix}-rg"
52+
name = "adb-test-${local.prefix}-rg"
6053
location = local.location
6154
tags = local.tags
6255
}
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
output "databricks_azure_workspace_resource_id" {
2-
// The ID of the Databricks Workspace in the Azure management plane.
32
value = azurerm_databricks_workspace.this.id
43
}
54

@@ -12,19 +11,3 @@ output "workspace_url" {
1211
output "resource_group" {
1312
value = azurerm_resource_group.this.name
1413
}
15-
16-
output "vault_uri" {
17-
value = azurerm_key_vault.akv1.vault_uri
18-
}
19-
20-
output "key_vault_id" {
21-
value = azurerm_key_vault.akv1.id
22-
}
23-
24-
output "metastoreserver" {
25-
value = azurerm_mssql_server.metastoreserver.name
26-
}
27-
28-
output "metastoredbname" {
29-
value = azurerm_mssql_database.sqlmetastore.name
30-
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
terraform {
2+
required_providers {
3+
databricks = {
4+
source = "databricks/databricks"
5+
version = ">=1.27.0"
6+
}
7+
8+
azurerm = {
9+
source = "hashicorp/azurerm"
10+
version = ">=3.76.0"
11+
}
12+
}
13+
}
14+
15+
provider "random" {
16+
}
17+
18+
provider "azurerm" {
19+
features {
20+
key_vault {
21+
purge_soft_delete_on_destroy = true
22+
}
23+
}
24+
}
25+
26+
# Use Azure CLI to authenticate at Azure Databricks account level, and the Azure Databricks workspace level
27+
provider "databricks" {
28+
host = azurerm_databricks_workspace.this.workspace_url
29+
}

examples/adb-external-hive-metastore/stage-2-workspace-objects/secrets.tf renamed to examples/adb-external-hive-metastore/secrets.tf

+9-6
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,28 @@ resource "databricks_secret_scope" "kv" {
22
# akv backed secret scope
33
name = "hive"
44
keyvault_metadata {
5-
resource_id = var.key_vault_id
6-
dns_name = var.vault_uri
5+
resource_id = azurerm_key_vault.akv1.id
6+
dns_name = azurerm_key_vault.akv1.vault_uri
77
}
88
}
99

1010
resource "azurerm_key_vault_secret" "hiveurl" {
1111
name = "HIVE-URL"
1212
value = local.db_url
13-
key_vault_id = var.key_vault_id
13+
key_vault_id = azurerm_key_vault.akv1.id
14+
depends_on = [azurerm_key_vault.akv1]
1415
}
1516

1617
resource "azurerm_key_vault_secret" "hiveuser" {
1718
name = "HIVE-USER"
18-
value = var.db_username # use local group instead of var
19-
key_vault_id = var.key_vault_id
19+
value = var.db_username
20+
key_vault_id = azurerm_key_vault.akv1.id
21+
depends_on = [azurerm_key_vault.akv1]
2022
}
2123

2224
resource "azurerm_key_vault_secret" "hivepwd" {
2325
name = "HIVE-PASSWORD"
2426
value = var.db_password
25-
key_vault_id = var.key_vault_id
27+
key_vault_id = azurerm_key_vault.akv1.id
28+
depends_on = [azurerm_key_vault.akv1]
2629
}

examples/adb-external-hive-metastore/sqlserver.tf

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ resource "azurerm_mssql_server" "metastoreserver" {
1313
version = "12.0"
1414
administrator_login = var.db_username // sensitive data stored as env variables locally
1515
administrator_login_password = var.db_password
16-
public_network_access_enabled = true // consider to disable public access to the server, to set as false
16+
public_network_access_enabled = true // set to false to remove public access
1717
}
1818

1919
resource "azurerm_mssql_database" "sqlmetastore" {

examples/adb-external-hive-metastore/stage-2-workspace-objects/auth.tf

-2
This file was deleted.

examples/adb-external-hive-metastore/stage-2-workspace-objects/cluster.tf

-40
This file was deleted.

examples/adb-external-hive-metastore/stage-2-workspace-objects/main.tf

-31
This file was deleted.

examples/adb-external-hive-metastore/stage-2-workspace-objects/terraform.tfvars

-5
This file was deleted.

0 commit comments

Comments
 (0)