Skip to content

Commit f1c561d

Browse files
Adding one apply UC and workspace example (#58)
* Adding one apply UC and workspace example Changes: - Creating AWS and Databricks resources to build a simple workspace with UC enabled - Creating Databricks account level groups, promoting them to UC Metastore owners and adding permissions * Adding aws-workspace-uc-simple example to readme * Moving modules to dedicated folder and bumping tf version Changes: - Moved sub modules for the `aws-workspace-uc-simple` to the `modules` folder - Bumped Databricks terraform provider version to the latest - Altering the authentication method for the Databricks provider at the account level to `OAuth` * Adjusting readme to match OAuth config * Adding the new modules to the main README.md * Adjusting permissions for metastore, aws provider versions and adding sandbox catalog * Authenticating workspace level provider with OAuth * Adjusting explicit self assuming Unity Catalog role * Adjusing max nodes for demo cluster
1 parent de78aec commit f1c561d

27 files changed

+917
-0
lines changed

README.md

+4
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ The folder `examples` contains the following Terraform implementation examples :
4848
| AWS | aws-workspace-with-private-link | Coming soon |
4949
| AWS | [aws-databricks-flat](examples/aws-databricks-flat/) | AWS Databricks simple example |
5050
| AWS | [aws-databricks-modular-privatelink](examples/aws-databricks-modular-privatelink/) | Deploy multiple AWS Databricks workspaces |
51+
| AWS | [aws-workspace-uc-simple](examples/aws-workspace-uc-simple/) | Provisioning AWS Databricks E2 with Unity Catalog in a single apply |
5152
| AWS | [aws-databricks-uc](examples/aws-databricks-uc/) | AWS UC |
5253
| AWS | [aws-databricks-uc-bootstrap](examples/aws-databricks-uc-bootstrap/) | AWS UC |
5354
| AWS | [aws-remote-backend-infra](examples/aws-remote-backend-infra/) | Simple example on remote backend |
@@ -68,6 +69,9 @@ The folder `modules` contains the following Terraform modules :
6869
| Azure | [adb-exfiltration-protection](modules/adb-exfiltration-protection/) | A sample implementation of [Data Exfiltration Protection](https://www.databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html) |
6970
| Azure | [adb-with-private-links-exfiltration-protection](modules/adb-with-private-links-exfiltration-protection/) | Provisioning Databricks on Azure with Private Link and [Data Exfiltration Protection](https://www.databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html) |
7071
| AWS | [aws-workspace-basic](modules/aws-workspace-basic/) | Provisioning AWS Databricks E2 |
72+
| AWS | [aws-databricks-base-infra](modules/aws-databricks-base-infra/) | Provisioning AWS Infrastructure to be used for the deployment of a Databricks E2 workspace |
73+
| AWS | [aws-databricks-unity-catalog](modules/aws-databricks-unity-catalog/) | Provisioning the AWS Infrastructure and setting up the metastore for Databricks Unity Catalog |
74+
| AWS | [aws-databricks-workspace](modules/aws-databricks-workspace/) | Provisioning AWS Databricks E2 Workspace using pre-created AWS Infra |
7175
| AWS | [aws-workspace-with-firewall](modules/aws-workspace-with-firewall/) | Provisioning AWS Databricks E2 with an AWS Firewall |
7276
| AWS | [aws-exfiltration-protection](modules/aws-exfiltration-protection/) | An implementation of [Data Exfiltration Protection on AWS](https://www.databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html) |
7377
| AWS | aws-workspace-with-private-link | Coming soon |
+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
AWS Databricks Unity Catalog - One apply
2+
=========================
3+
4+
Using this template, you can deploy all the necessary resources in order to have a simple Databricks AWS workspace with Unity Catalog enabled.
5+
6+
This is a one apply template, you will create the base aws resources for a workspace (VPC, subnets, VPC endpoints, S3 Bucket and cross account IAM role) and the unity catalog metastore and cross account role.
7+
8+
In order to run this template, you need to have an `account admin` identity, preferably with a service principal. Running with a user account also works, but one should not include the `account owner` in the terraform UC admin or databricks users list as you cannot destroy yourself from the admin list.
9+
10+
When running tf configs for UC resources, due to sometimes requires a few minutes to be ready and you may encounter errors along the way, so you can either wait for the UI to be updated before you apply and patch the next changes; or specifically add depends_on to account level resources. We tried to add the necessary wait times but should you encounter an error just apply again and you should be good to go.
11+
12+
## Get Started
13+
14+
> Step 1: Fill in values in `terraform.tfvars`; also configure env necessary variables for AWS provider authentication.
15+
16+
> Step 2: Run `terraform init` and `terraform apply` to deploy the resources. This will deploy both AWS resources that Unity Catalog requires and Databricks Account Level resources.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Get the smallest cluster node possible on that cloud
2+
data "databricks_node_type" "smallest" {
3+
provider = databricks.workspace
4+
local_disk = true
5+
depends_on = [module.databricks_workspace]
6+
}
7+
8+
# Get the latest LTS Version for Databricks Runtime
9+
data "databricks_spark_version" "latest_version" {
10+
provider = databricks.workspace
11+
long_term_support = true
12+
depends_on = [module.databricks_workspace]
13+
}
14+
15+
resource "databricks_cluster" "unity_catalog_cluster" {
16+
provider = databricks.workspace
17+
cluster_name = "Demo Cluster"
18+
spark_version = data.databricks_spark_version.latest_version.id
19+
node_type_id = data.databricks_node_type.smallest.id
20+
apply_policy_default_values = true
21+
data_security_mode = "USER_ISOLATION"
22+
autotermination_minutes = 30
23+
aws_attributes {
24+
availability = "SPOT"
25+
first_on_demand = 1
26+
spot_bid_price_percent = 100
27+
}
28+
29+
depends_on = [
30+
module.databricks_workspace
31+
]
32+
33+
autoscale {
34+
min_workers = 1
35+
max_workers = 3
36+
}
37+
38+
39+
custom_tags = {
40+
"ClusterScope" = "Initial Demo"
41+
}
42+
43+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
resource "databricks_catalog" "demo_catalog" {
2+
provider = databricks.workspace
3+
metastore_id = module.unity_catalog.metastore_id
4+
name = "sandbox_demo_catalog"
5+
comment = "This catalog is managed by terraform"
6+
properties = {
7+
purpose = "Demoing catalog creation and management using Terraform"
8+
}
9+
10+
depends_on = [
11+
databricks_group_member.my_service_principal,
12+
resource.databricks_mws_permission_assignment.add_admin_group,
13+
databricks_group.users
14+
]
15+
16+
force_destroy = true
17+
18+
}
19+
20+
resource "databricks_grants" "unity_catalog_grants" {
21+
provider = databricks.workspace
22+
catalog = databricks_catalog.demo_catalog.name
23+
grant {
24+
principal = local.workspace_users_group
25+
privileges = ["USE_CATALOG", "USE_SCHEMA", "CREATE_SCHEMA", "CREATE_TABLE"]
26+
}
27+
28+
depends_on = [
29+
resource.databricks_mws_permission_assignment.add_admin_group
30+
]
31+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
data "databricks_service_principal" "admin_service_principal" {
2+
provider = databricks.mws
3+
application_id = var.databricks_client_id
4+
}
5+
6+
resource "databricks_user" "unity_users" {
7+
provider = databricks.mws
8+
for_each = toset(concat(var.databricks_users, var.databricks_metastore_admins))
9+
user_name = each.key
10+
force = true
11+
}
12+
13+
resource "databricks_group" "admin_group" {
14+
provider = databricks.mws
15+
display_name = local.unity_admin_group
16+
}
17+
18+
resource "databricks_group" "users" {
19+
provider = databricks.mws
20+
display_name = local.workspace_users_group
21+
depends_on = [databricks_group.admin_group]
22+
}
23+
24+
# Sleeping for 20s to wait for the workspace to enable identity federation
25+
resource "time_sleep" "wait_for_permission_apis" {
26+
depends_on = [
27+
module.unity_catalog
28+
]
29+
create_duration = "20s"
30+
}
31+
32+
resource "databricks_mws_permission_assignment" "add_admin_group" {
33+
provider = databricks.mws
34+
workspace_id = module.databricks_workspace.databricks_workspace_id
35+
principal_id = resource.databricks_group.admin_group.id
36+
permissions = ["ADMIN"]
37+
depends_on = [
38+
resource.time_sleep.wait_for_permission_apis
39+
]
40+
}
41+
42+
resource "databricks_group_member" "admin_group_member" {
43+
provider = databricks.mws
44+
for_each = toset(var.databricks_metastore_admins)
45+
group_id = databricks_group.admin_group.id
46+
member_id = databricks_user.unity_users[each.value].id
47+
}
48+
49+
resource "databricks_group_member" "my_service_principal" {
50+
provider = databricks.mws
51+
group_id = databricks_group.admin_group.id
52+
member_id = data.databricks_service_principal.admin_service_principal.id
53+
}
54+
55+
resource "databricks_group_member" "users_group_members" {
56+
provider = databricks.mws
57+
for_each = toset(var.databricks_users)
58+
group_id = resource.databricks_group.users.id
59+
member_id = databricks_user.unity_users[each.value].id
60+
}
61+
62+
resource "databricks_mws_permission_assignment" "add_user_group" {
63+
provider = databricks.mws
64+
workspace_id = module.databricks_workspace.databricks_workspace_id
65+
principal_id = resource.databricks_group.users.id
66+
permissions = ["USER"]
67+
depends_on = [
68+
resource.time_sleep.wait_for_permission_apis
69+
]
70+
}
+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
module "aws_base" {
2+
providers = {
3+
databricks.mws = databricks.mws
4+
}
5+
source = "../../modules/aws-databricks-base-infra"
6+
prefix = local.prefix
7+
region = var.region
8+
databricks_account_id = var.databricks_account_id
9+
cidr_block = var.cidr_block
10+
tags = local.tags
11+
roles_to_assume = [local.aws_access_services_role_arn]
12+
}
13+
14+
module "databricks_workspace" {
15+
providers = {
16+
databricks.mws = databricks.mws
17+
databricks.workspace = databricks.workspace
18+
}
19+
source = "../../modules/aws-databricks-workspace"
20+
prefix = local.prefix
21+
region = var.region
22+
databricks_account_id = var.databricks_account_id
23+
security_group_ids = module.aws_base.security_group_ids
24+
vpc_private_subnets = module.aws_base.subnets
25+
vpc_id = module.aws_base.vpc_id
26+
root_storage_bucket = module.aws_base.root_bucket
27+
cross_account_role_arn = module.aws_base.cross_account_role_arn
28+
tags = local.tags
29+
depends_on = [
30+
module.aws_base
31+
]
32+
}
33+
34+
module "unity_catalog" {
35+
source = "../../modules/aws-databricks-unity-catalog"
36+
providers = {
37+
databricks.mws = databricks.mws
38+
databricks.workspace = databricks.workspace
39+
}
40+
prefix = local.prefix
41+
region = var.region
42+
databricks_account_id = var.databricks_account_id
43+
aws_account_id = local.aws_account_id
44+
unity_metastore_owner = local.unity_admin_group
45+
databricks_workspace_ids = [module.databricks_workspace.databricks_workspace_id]
46+
tags = local.tags
47+
depends_on = [
48+
module.databricks_workspace,
49+
resource.databricks_group.admin_group
50+
]
51+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
output "databricks_workspace_id" {
2+
value = module.databricks_workspace.databricks_workspace_id
3+
description = "Databricks workspace ID"
4+
}
5+
6+
output "databricks_workspace_url" {
7+
value = module.databricks_workspace.databricks_host
8+
description = "Databricks workspace URL"
9+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
provider "aws" {
2+
region = var.region
3+
profile = var.aws_profile
4+
}
5+
6+
// Initialize provider in multi workspace mode
7+
provider "databricks" {
8+
alias = "mws"
9+
host = "https://accounts.cloud.databricks.com"
10+
account_id = var.databricks_account_id
11+
client_id = var.databricks_client_id
12+
client_secret = var.databricks_client_secret
13+
}
14+
15+
# Initialize the provider for the workspace we created in this terraform
16+
provider "databricks" {
17+
alias = "workspace"
18+
host = module.databricks_workspace.databricks_host
19+
client_id = var.databricks_client_id
20+
client_secret = var.databricks_client_secret
21+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
aws_profile = "YOUR_AWS_PROFILE" // For AWS cli authentication
2+
region = "sa-east-1" // AWS region where you want to deploy your resources
3+
cidr_block = "10.4.0.0/16" // CIDR block for the workspace VPC, will be divided in two equal sized subnets
4+
my_username = "[email protected]" // Username for tagging the resources
5+
databricks_users = ["[email protected]", "[email protected]"] // List of users that will be admins at the workspace level
6+
databricks_metastore_admins = ["[email protected]"] // List of users that will be admins for Unity Catalog
7+
unity_admin_group = "unity-admin-group" // Metastore Owner and Admin
8+
databricks_account_id = "YOUR_DATABRICKS_ACCOUNT_ID" // Databricks Account ID
9+
databricks_client_id = "YOUR_SERVICE_PRINCIPAL_CLIENT_ID" // Databricks Service Principal Client ID
10+
databricks_client_secret = "YOUR_SERVICE_PRINCIPAL_CLIENT_SECRET" // Databricks Service Principal Client Secret
11+
tags = {
12+
Environment = "Demo-with-terraform"
13+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
2+
# Step 1: Initializing configs and variables
3+
variable "tags" {
4+
type = map(any)
5+
description = "(Optional) List of tags to be propagated accross all assets in this demo"
6+
}
7+
8+
variable "cidr_block" {
9+
type = string
10+
description = "(Required) CIDR block to be used to create the Databricks VPC"
11+
}
12+
13+
variable "region" {
14+
type = string
15+
description = "(Required) AWS region where the assets will be deployed"
16+
}
17+
18+
variable "aws_profile" {
19+
type = string
20+
description = "(Required) AWS cli profile to be used for authentication with AWS"
21+
}
22+
23+
data "aws_caller_identity" "current" {}
24+
25+
variable "my_username" {
26+
type = string
27+
description = "(Required) Username in the form of an email to be added to the tags and be declared as owner of the assets"
28+
}
29+
30+
variable "databricks_client_id" {
31+
type = string
32+
description = "(Required) Client ID to authenticate the Databricks provider at the account level"
33+
}
34+
35+
variable "databricks_client_secret" {
36+
type = string
37+
description = "(Required) Client secret to authenticate the Databricks provider at the account level"
38+
}
39+
40+
variable "databricks_account_id" {
41+
type = string
42+
description = "(Required) Databricks Account ID"
43+
}
44+
45+
resource "random_string" "naming" {
46+
special = false
47+
upper = false
48+
length = 6
49+
}
50+
51+
variable "databricks_users" {
52+
description = <<EOT
53+
List of Databricks users to be added at account-level for Unity Catalog.
54+
Enter with square brackets and double quotes
55+
56+
EOT
57+
type = list(string)
58+
}
59+
60+
variable "databricks_metastore_admins" {
61+
description = <<EOT
62+
List of Admins to be added at account-level for Unity Catalog.
63+
Enter with square brackets and double quotes
64+
65+
EOT
66+
type = list(string)
67+
}
68+
69+
variable "unity_admin_group" {
70+
description = "(Required) Name of the admin group. This group will be set as the owner of the Unity Catalog metastore"
71+
type = string
72+
}
73+
74+
variable "aws_access_services_role_name" {
75+
type = string
76+
description = "(Optional) Name for the AWS Services role by this module"
77+
default = null
78+
}
79+
80+
locals {
81+
prefix = "demo-${random_string.naming.result}"
82+
unity_admin_group = "${local.prefix}-${var.unity_admin_group}"
83+
workspace_users_group = "${local.prefix}-workspace-users"
84+
aws_access_services_role_name = var.aws_access_services_role_name == null ? "${local.prefix}-aws-services-role" : "${local.prefix}-${var.aws_access_services_role_name}"
85+
aws_access_services_role_arn = "arn:aws:iam::${local.aws_account_id}:role/${local.aws_access_services_role_name}"
86+
aws_account_id = data.aws_caller_identity.current.account_id
87+
tags = merge(var.tags, { Owner = split("@", var.my_username)[0], ownerEmail = var.my_username })
88+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
terraform {
2+
required_providers {
3+
aws = {
4+
source = "hashicorp/aws"
5+
version = "=4.57.0"
6+
}
7+
8+
random = {
9+
source = "hashicorp/random"
10+
version = "=3.4.1"
11+
}
12+
13+
time = {
14+
source = "hashicorp/time"
15+
version = "=0.9.1"
16+
}
17+
18+
databricks = {
19+
source = "databricks/databricks"
20+
version = "=1.17.0"
21+
}
22+
23+
}
24+
}

0 commit comments

Comments
 (0)