Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Provisioning Databricks on GCP workspace with a Hub & Spoke network architecture for data exfiltration protection

This example is using the [gcp-with-psc-exfiltration-protection](../../modules/gcp-with-psc-exfiltration-protection) module.

This template provides an example deployment of: Hub-Spoke networking with egress firewall to control all outbound traffic from Databricks subnets.

With this setup, you can setup firewall rules to block / allow egress traffic from your Databricks clusters. You can also use firewall to block all access to storage accounts, and use private endpoint connection to bypass this firewall, such that you allow access only to specific storage accounts.


To find IP and FQDN for your deployment, go to: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html

## Overall Architecture

![alt text](../../modules/gcp-with-psc-exfiltration-protection/images/architecture.png)

Resources to be created:
* Hub VPC and its subnet
* Spoke VPC and its subnets
* Peering between Hub and Spoke VPC
* Private Service Connect (PSC) endpoints
* DNS private and peering zones
* Firewall rules for Hub and Spoke VPCs
* Databricks workspace with private link to control plane, user to webapp and private link to DBFS




## How to use

1. Reference this module using one of the different [module source types](https://developer.hashicorp.com/terraform/language/modules/sources)
2. Add `terraform.tfvars` with the information about service principals to be provisioned at account level.

## How to fill in variable values

Variables have no default values in order to avoid misconfiguration

Most values are related to resources managed by Databricks. The required values can be found at: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html

<!-- BEGIN_TF_DOCS -->
## Requirements

| Name | Version |
|------------------------------------------------------------------------------|----------|
| <a name="requirement_databricks"></a> [databricks](#requirement\_databricks) | >=1.77.0 |
| <a name="requirement_google"></a> [google](#requirement\_google) | 6.17.0 |

## Providers

No providers.

## Modules

| Name | Source | Version |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|---------|
| <a name="module_gcp_with_data_exfiltration_protection"></a> [gcp\_with\_data\_exfiltration\_protection](#module\_gcp\_with\_data\_exfiltration\_protection) | ../../modules/gcp-with-psc-exfiltration-protection | n/a |

## Resources

No resources.

## Inputs

| Name | Description | Type | Default | Required |
|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|---------------|---------|:--------:|
| <a name="input_databricks_account_id"></a> [databricks\_account\_id](#input\_databricks\_account\_id) | Databricks Account ID | `string` | n/a | yes |
| <a name="input_google_region"></a> [google\_region](#input\_google\_region) | Google Cloud region where the resources will be created | `string` | n/a | yes |
| <a name="input_hive_metastore_ip"></a> [hive\_metastore\_ip](#input\_hive\_metastore\_ip) | Value of regional default Hive Metastore IP | `string` | n/a | yes |
| <a name="input_hub_vpc_cidr"></a> [hub\_vpc\_cidr](#input\_hub\_vpc\_cidr) | CIDR for Hub VPC | `string` | n/a | yes |
| <a name="input_hub_vpc_google_project"></a> [hub\_vpc\_google\_project](#input\_hub\_vpc\_google\_project) | Google Cloud project ID related to Hub VPC | `string` | n/a | yes |
| <a name="input_is_spoke_vpc_shared"></a> [is\_spoke\_vpc\_shared](#input\_is\_spoke\_vpc\_shared) | Whether the Spoke VPC is a Shared or a dedicated VPC | `bool` | n/a | yes |
| <a name="input_prefix"></a> [prefix](#input\_prefix) | Prefix to use in generated resources name | `string` | n/a | yes |
| <a name="input_psc_subnet_cidr"></a> [psc\_subnet\_cidr](#input\_psc\_subnet\_cidr) | CIDR for Spoke VPC | `string` | n/a | yes |
| <a name="input_spoke_vpc_cidr"></a> [spoke\_vpc\_cidr](#input\_spoke\_vpc\_cidr) | CIDR for Spoke VPC | `string` | n/a | yes |
| <a name="input_spoke_vpc_google_project"></a> [spoke\_vpc\_google\_project](#input\_spoke\_vpc\_google\_project) | Google Cloud project ID related to Spoke VPC | `string` | n/a | yes |
| <a name="input_tags"></a> [tags](#input\_tags) | Map of tags to add to all resources | `map(string)` | `{}` | no |
| <a name="input_workspace_google_project"></a> [workspace\_google\_project](#input\_workspace\_google\_project) | Google Cloud project ID related to Databricks workspace | `string` | n/a | yes |

## Outputs

| Name | Description |
|-------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| <a name="output_workspace_id"></a> [workspace\_id](#output\_workspace\_id) | The Databricks workspace ID |
| <a name="output_workspace_url"></a> [workspace\_url](#output\_workspace\_url) | The workspace URL which is of the format '{workspaceId}.{random}.gcp.databricks.com' |
<!-- END_TF_DOCS -->
16 changes: 16 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
module "gcp_with_data_exfiltration_protection" {
source = "../../modules/gcp-with-psc-exfiltration-protection"

databricks_account_id = var.databricks_account_id
hub_vpc_google_project = var.hub_vpc_google_project
is_spoke_vpc_shared = var.is_spoke_vpc_shared
prefix = var.prefix
spoke_vpc_google_project = var.spoke_vpc_google_project
workspace_google_project = var.workspace_google_project
google_region = var.google_region
hive_metastore_ip = var.hive_metastore_ip
hub_vpc_cidr = var.hub_vpc_cidr
psc_subnet_cidr = var.psc_subnet_cidr
spoke_vpc_cidr = var.spoke_vpc_cidr
tags = var.tags
}
10 changes: 10 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

output "workspace_url" {
value = module.gcp_with_data_exfiltration_protection.workspace_url
description = "The workspace URL which is of the format '{workspaceId}.{random}.gcp.databricks.com'"
}

output "workspace_id" {
description = "The Databricks workspace ID"
value = module.gcp_with_data_exfiltration_protection.workspace_id
}
13 changes: 13 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/providers.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
provider "databricks" {
host = "https://accounts.gcp.databricks.com"
account_id = var.databricks_account_id
}

provider "databricks" {
alias = "workspace"

host = module.gcp_with_data_exfiltration_protection.workspace_url
}

provider "google" {
}
15 changes: 15 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/terraform.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
version = ">=1.81.1"
}
google = {
source = "hashicorp/google"
version = "6.17.0"
}
random = {
source = "hashicorp/random"
}
}
}
20 changes: 20 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/terraform.tfvars
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
databricks_account_id = ""

google_region = ""

workspace_google_project = ""

spoke_vpc_google_project = ""
hub_vpc_google_project = ""
is_spoke_vpc_shared = true

prefix = ""

hive_metastore_ip = ""
hub_vpc_cidr = ""
spoke_vpc_cidr = ""
psc_subnet_cidr = ""

metastore_name = ""
catalog_name = ""

15 changes: 15 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/unity-catalog.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
module "unity_catalog" {
source = "../../modules/gcp-unity-catalog"

providers = {
databricks = databricks,
databricks.workspace = databricks.workspace
}
databricks_workspace_id = module.gcp_with_data_exfiltration_protection.workspace_id
databricks_workspace_url = module.gcp_with_data_exfiltration_protection.workspace_url
google_project = var.workspace_google_project
google_region = var.google_region
metastore_name = var.metastore_name
catalog_name = var.catalog_name
prefix = var.prefix
}
73 changes: 73 additions & 0 deletions examples/gcp-with-psc-exfiltration-protection/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
variable "databricks_account_id" {
type = string
description = "Databricks Account ID"
}

variable "google_region" {
type = string
description = "Google Cloud region where the resources will be created"
}

variable "workspace_google_project" {
type = string
description = "Google Cloud project ID related to Databricks workspace"
}

variable "spoke_vpc_google_project" {
type = string
description = "Google Cloud project ID related to Spoke VPC"
}

variable "hub_vpc_google_project" {
type = string
description = "Google Cloud project ID related to Hub VPC"
}

variable "is_spoke_vpc_shared" {
type = bool
description = "Whether the Spoke VPC is a Shared or a dedicated VPC"
}

variable "prefix" {
type = string
description = "Prefix to use in generated resources name"
}

# For the value of the regional Hive Metastore IP, refer to the Databricks documentation
# Here - https://docs.gcp.databricks.com/en/resources/ip-domain-region.html#addresses-for-default-metastore
variable "hive_metastore_ip" {
type = string
description = "Value of regional default Hive Metastore IP"
}

variable "hub_vpc_cidr" {
type = string
description = "CIDR for Hub VPC"
}

variable "spoke_vpc_cidr" {
type = string
description = "CIDR for Spoke VPC"
}

variable "psc_subnet_cidr" {
type = string
description = "CIDR for Spoke VPC"
}

variable "tags" {
type = map(string)
description = "Map of tags to add to all resources"

default = {}
}

variable "metastore_name" {
type = string
description = "Name to assign to regional metastore"
}

variable "catalog_name" {
type = string
description = "Name to assign to default catalog"
}
40 changes: 40 additions & 0 deletions modules/gcp-unity-catalog/databricks-cloud-resources.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
resource "databricks_metastore" "this" {
name = var.metastore_name
region = var.google_region
force_destroy = true
}

resource "databricks_metastore_assignment" "this" {
workspace_id = var.databricks_workspace_id
metastore_id = databricks_metastore.this.id
}

resource "databricks_storage_credential" "this" {
provider = databricks.workspace
name = "${var.prefix}-storage-credential"
databricks_gcp_service_account {}
depends_on = [databricks_metastore_assignment.this]
}

resource "databricks_external_location" "this" {
provider = databricks.workspace
name = "${var.prefix}-external-location"
url = "gs://${google_storage_bucket.ext_bucket.name}/"

credential_name = databricks_storage_credential.this.id

comment = "Managed by TF"
depends_on = [
databricks_metastore_assignment.this,
google_storage_bucket_iam_member.unity_cred_reader,
google_storage_bucket_iam_member.unity_cred_admin
]
}

resource "databricks_catalog" "main" {
provider = databricks.workspace
name = var.catalog_name
storage_root = databricks_external_location.this.url
comment = "This catalog is managed by terraform"
isolation_mode = "OPEN"
}
20 changes: 20 additions & 0 deletions modules/gcp-unity-catalog/gcs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
resource "google_storage_bucket" "ext_bucket" {
name = "${var.prefix}-bucket"

project = var.google_project
location = var.google_region
force_destroy = true
}

resource "google_storage_bucket_iam_member" "unity_cred_admin" {
bucket = google_storage_bucket.ext_bucket.name
role = "roles/storage.objectAdmin"
member = "serviceAccount:${databricks_storage_credential.this.databricks_gcp_service_account[0].email}"
}

resource "google_storage_bucket_iam_member" "unity_cred_reader" {
bucket = google_storage_bucket.ext_bucket.name
role = "roles/storage.legacyBucketReader"
member = "serviceAccount:${databricks_storage_credential.this.databricks_gcp_service_account[0].email}"
}

14 changes: 14 additions & 0 deletions modules/gcp-unity-catalog/terraform.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
configuration_aliases = [databricks, databricks.workspace]
}
google = {
source = "hashicorp/google"
}
random = {
source = "hashicorp/random"
}
}
}
32 changes: 32 additions & 0 deletions modules/gcp-unity-catalog/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
variable "databricks_workspace_url" {
description = "The URL of the Databricks workspace to which resources will be deployed (e.g., https://<region>.gcp.databricks.com)."
}

variable "databricks_workspace_id" {
description = "The unique identifier of the Databricks workspace in which resources will be managed."
}

variable "google_region" {
type = string
description = "Google Cloud region where the resources will be created"
}

variable "google_project" {
type = string
description = "The Google Cloud project ID where the Databricks workspace and associated resources will be created."
}

variable "prefix" {
type = string
description = "Prefix to use in generated resources name"
}

variable "metastore_name" {
type = string
description = "Name to assign to regional metastore"
}

variable "catalog_name" {
type = string
description = "Name to assign to default catalog"
}
Loading