-
Notifications
You must be signed in to change notification settings - Fork 200
Databricks on GCP data exfiltration protection workspace deployment #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+1,359
−0
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
2e99eff
Optional Resource Group creation in adb-lakehouse module
micheledaddetta-databricks 5b8be50
Merge branch 'main' into main
micheledaddetta-databricks f86408e
Fix resource group name value in the azure data factory resource
micheledaddetta-databricks c83947b
Merge branch 'databricks:main' into main
micheledaddetta-databricks b4aafcf
Document examples
micheledaddetta-databricks e3329fa
Remove modules reference and change how to use guide
micheledaddetta-databricks 074a016
Databricks on GCP data exfiltration protection workspace deployment
micheledaddetta-databricks 3dec5d4
Merge branch 'databricks:main' into main
micheledaddetta-databricks edc6842
Align implementation with GCE (CMv1) version
micheledaddetta-databricks 4626350
Remove references to GKE related variables
micheledaddetta-databricks ec6fc72
Add utility for Unity Catalog setup in GCP
micheledaddetta-databricks 0afedcb
Add missing firewall rule + Fix architecture diagram and README + Run…
micheledaddetta-databricks File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| # Provisioning Databricks on GCP workspace with a Hub & Spoke network architecture for data exfiltration protection | ||
|
|
||
| This example is using the [gcp-with-psc-exfiltration-protection](../../modules/gcp-with-psc-exfiltration-protection) module. | ||
|
|
||
| This template provides an example deployment of: Hub-Spoke networking with egress firewall to control all outbound traffic from Databricks subnets. | ||
|
|
||
| With this setup, you can setup firewall rules to block / allow egress traffic from your Databricks clusters. You can also use firewall to block all access to storage accounts, and use private endpoint connection to bypass this firewall, such that you allow access only to specific storage accounts. | ||
|
|
||
|
|
||
| To find IP and FQDN for your deployment, go to: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html | ||
|
|
||
| ## Overall Architecture | ||
|
|
||
|  | ||
|
|
||
| Resources to be created: | ||
| * Hub VPC and its subnet | ||
| * Spoke VPC and its subnets | ||
| * Peering between Hub and Spoke VPC | ||
| * Private Service Connect (PSC) endpoints | ||
| * DNS private and peering zones | ||
| * Firewall rules for Hub and Spoke VPCs | ||
| * Databricks workspace with private link to control plane, user to webapp and private link to DBFS | ||
|
|
||
|
|
||
|
|
||
|
|
||
| ## How to use | ||
|
|
||
| 1. Reference this module using one of the different [module source types](https://developer.hashicorp.com/terraform/language/modules/sources) | ||
| 2. Add `terraform.tfvars` with the information about service principals to be provisioned at account level. | ||
|
|
||
| ## How to fill in variable values | ||
|
|
||
| Variables have no default values in order to avoid misconfiguration | ||
|
|
||
| Most values are related to resources managed by Databricks. The required values can be found at: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html | ||
|
|
||
| <!-- BEGIN_TF_DOCS --> | ||
| ## Requirements | ||
|
|
||
| | Name | Version | | ||
| |------------------------------------------------------------------------------|----------| | ||
| | <a name="requirement_databricks"></a> [databricks](#requirement\_databricks) | >=1.77.0 | | ||
| | <a name="requirement_google"></a> [google](#requirement\_google) | 6.17.0 | | ||
|
|
||
| ## Providers | ||
|
|
||
| No providers. | ||
|
|
||
| ## Modules | ||
|
|
||
| | Name | Source | Version | | ||
| |-------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|---------| | ||
| | <a name="module_gcp_with_data_exfiltration_protection"></a> [gcp\_with\_data\_exfiltration\_protection](#module\_gcp\_with\_data\_exfiltration\_protection) | ../../modules/gcp-with-psc-exfiltration-protection | n/a | | ||
|
|
||
| ## Resources | ||
|
|
||
| No resources. | ||
|
|
||
| ## Inputs | ||
|
|
||
| | Name | Description | Type | Default | Required | | ||
| |------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|---------------|---------|:--------:| | ||
| | <a name="input_databricks_account_id"></a> [databricks\_account\_id](#input\_databricks\_account\_id) | Databricks Account ID | `string` | n/a | yes | | ||
| | <a name="input_google_region"></a> [google\_region](#input\_google\_region) | Google Cloud region where the resources will be created | `string` | n/a | yes | | ||
| | <a name="input_hive_metastore_ip"></a> [hive\_metastore\_ip](#input\_hive\_metastore\_ip) | Value of regional default Hive Metastore IP | `string` | n/a | yes | | ||
| | <a name="input_hub_vpc_cidr"></a> [hub\_vpc\_cidr](#input\_hub\_vpc\_cidr) | CIDR for Hub VPC | `string` | n/a | yes | | ||
| | <a name="input_hub_vpc_google_project"></a> [hub\_vpc\_google\_project](#input\_hub\_vpc\_google\_project) | Google Cloud project ID related to Hub VPC | `string` | n/a | yes | | ||
| | <a name="input_is_spoke_vpc_shared"></a> [is\_spoke\_vpc\_shared](#input\_is\_spoke\_vpc\_shared) | Whether the Spoke VPC is a Shared or a dedicated VPC | `bool` | n/a | yes | | ||
| | <a name="input_prefix"></a> [prefix](#input\_prefix) | Prefix to use in generated resources name | `string` | n/a | yes | | ||
| | <a name="input_psc_subnet_cidr"></a> [psc\_subnet\_cidr](#input\_psc\_subnet\_cidr) | CIDR for Spoke VPC | `string` | n/a | yes | | ||
| | <a name="input_spoke_vpc_cidr"></a> [spoke\_vpc\_cidr](#input\_spoke\_vpc\_cidr) | CIDR for Spoke VPC | `string` | n/a | yes | | ||
| | <a name="input_spoke_vpc_google_project"></a> [spoke\_vpc\_google\_project](#input\_spoke\_vpc\_google\_project) | Google Cloud project ID related to Spoke VPC | `string` | n/a | yes | | ||
| | <a name="input_tags"></a> [tags](#input\_tags) | Map of tags to add to all resources | `map(string)` | `{}` | no | | ||
| | <a name="input_workspace_google_project"></a> [workspace\_google\_project](#input\_workspace\_google\_project) | Google Cloud project ID related to Databricks workspace | `string` | n/a | yes | | ||
|
|
||
| ## Outputs | ||
|
|
||
| | Name | Description | | ||
| |-------------------------------------------------------------------------------|--------------------------------------------------------------------------------------| | ||
| | <a name="output_workspace_id"></a> [workspace\_id](#output\_workspace\_id) | The Databricks workspace ID | | ||
| | <a name="output_workspace_url"></a> [workspace\_url](#output\_workspace\_url) | The workspace URL which is of the format '{workspaceId}.{random}.gcp.databricks.com' | | ||
| <!-- END_TF_DOCS --> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| module "gcp_with_data_exfiltration_protection" { | ||
| source = "../../modules/gcp-with-psc-exfiltration-protection" | ||
|
|
||
| databricks_account_id = var.databricks_account_id | ||
| hub_vpc_google_project = var.hub_vpc_google_project | ||
| is_spoke_vpc_shared = var.is_spoke_vpc_shared | ||
| prefix = var.prefix | ||
| spoke_vpc_google_project = var.spoke_vpc_google_project | ||
| workspace_google_project = var.workspace_google_project | ||
| google_region = var.google_region | ||
| hive_metastore_ip = var.hive_metastore_ip | ||
| hub_vpc_cidr = var.hub_vpc_cidr | ||
| psc_subnet_cidr = var.psc_subnet_cidr | ||
| spoke_vpc_cidr = var.spoke_vpc_cidr | ||
| tags = var.tags | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
|
|
||
| output "workspace_url" { | ||
| value = module.gcp_with_data_exfiltration_protection.workspace_url | ||
| description = "The workspace URL which is of the format '{workspaceId}.{random}.gcp.databricks.com'" | ||
| } | ||
|
|
||
| output "workspace_id" { | ||
| description = "The Databricks workspace ID" | ||
| value = module.gcp_with_data_exfiltration_protection.workspace_id | ||
| } |
13 changes: 13 additions & 0 deletions
13
examples/gcp-with-psc-exfiltration-protection/providers.tf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| provider "databricks" { | ||
| host = "https://accounts.gcp.databricks.com" | ||
| account_id = var.databricks_account_id | ||
| } | ||
|
|
||
| provider "databricks" { | ||
| alias = "workspace" | ||
|
|
||
| host = module.gcp_with_data_exfiltration_protection.workspace_url | ||
| } | ||
|
|
||
| provider "google" { | ||
| } |
15 changes: 15 additions & 0 deletions
15
examples/gcp-with-psc-exfiltration-protection/terraform.tf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| terraform { | ||
| required_providers { | ||
| databricks = { | ||
| source = "databricks/databricks" | ||
| version = ">=1.81.1" | ||
| } | ||
| google = { | ||
| source = "hashicorp/google" | ||
| version = "6.17.0" | ||
| } | ||
| random = { | ||
| source = "hashicorp/random" | ||
| } | ||
| } | ||
| } |
20 changes: 20 additions & 0 deletions
20
examples/gcp-with-psc-exfiltration-protection/terraform.tfvars
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| databricks_account_id = "" | ||
|
|
||
| google_region = "" | ||
|
|
||
| workspace_google_project = "" | ||
|
|
||
| spoke_vpc_google_project = "" | ||
| hub_vpc_google_project = "" | ||
| is_spoke_vpc_shared = true | ||
|
|
||
| prefix = "" | ||
|
|
||
| hive_metastore_ip = "" | ||
| hub_vpc_cidr = "" | ||
| spoke_vpc_cidr = "" | ||
| psc_subnet_cidr = "" | ||
|
|
||
| metastore_name = "" | ||
| catalog_name = "" | ||
|
|
15 changes: 15 additions & 0 deletions
15
examples/gcp-with-psc-exfiltration-protection/unity-catalog.tf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| module "unity_catalog" { | ||
| source = "../../modules/gcp-unity-catalog" | ||
|
|
||
| providers = { | ||
| databricks = databricks, | ||
| databricks.workspace = databricks.workspace | ||
| } | ||
| databricks_workspace_id = module.gcp_with_data_exfiltration_protection.workspace_id | ||
| databricks_workspace_url = module.gcp_with_data_exfiltration_protection.workspace_url | ||
| google_project = var.workspace_google_project | ||
| google_region = var.google_region | ||
| metastore_name = var.metastore_name | ||
| catalog_name = var.catalog_name | ||
| prefix = var.prefix | ||
| } |
73 changes: 73 additions & 0 deletions
73
examples/gcp-with-psc-exfiltration-protection/variables.tf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| variable "databricks_account_id" { | ||
| type = string | ||
| description = "Databricks Account ID" | ||
| } | ||
|
|
||
| variable "google_region" { | ||
| type = string | ||
| description = "Google Cloud region where the resources will be created" | ||
| } | ||
|
|
||
| variable "workspace_google_project" { | ||
| type = string | ||
| description = "Google Cloud project ID related to Databricks workspace" | ||
| } | ||
|
|
||
| variable "spoke_vpc_google_project" { | ||
| type = string | ||
| description = "Google Cloud project ID related to Spoke VPC" | ||
| } | ||
|
|
||
| variable "hub_vpc_google_project" { | ||
| type = string | ||
| description = "Google Cloud project ID related to Hub VPC" | ||
| } | ||
|
|
||
| variable "is_spoke_vpc_shared" { | ||
| type = bool | ||
| description = "Whether the Spoke VPC is a Shared or a dedicated VPC" | ||
| } | ||
|
|
||
| variable "prefix" { | ||
| type = string | ||
| description = "Prefix to use in generated resources name" | ||
| } | ||
|
|
||
| # For the value of the regional Hive Metastore IP, refer to the Databricks documentation | ||
| # Here - https://docs.gcp.databricks.com/en/resources/ip-domain-region.html#addresses-for-default-metastore | ||
| variable "hive_metastore_ip" { | ||
| type = string | ||
| description = "Value of regional default Hive Metastore IP" | ||
| } | ||
|
|
||
| variable "hub_vpc_cidr" { | ||
| type = string | ||
| description = "CIDR for Hub VPC" | ||
| } | ||
|
|
||
| variable "spoke_vpc_cidr" { | ||
| type = string | ||
| description = "CIDR for Spoke VPC" | ||
| } | ||
|
|
||
| variable "psc_subnet_cidr" { | ||
| type = string | ||
| description = "CIDR for Spoke VPC" | ||
| } | ||
|
|
||
| variable "tags" { | ||
| type = map(string) | ||
| description = "Map of tags to add to all resources" | ||
|
|
||
| default = {} | ||
| } | ||
|
|
||
| variable "metastore_name" { | ||
| type = string | ||
| description = "Name to assign to regional metastore" | ||
| } | ||
|
|
||
| variable "catalog_name" { | ||
| type = string | ||
| description = "Name to assign to default catalog" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| resource "databricks_metastore" "this" { | ||
| name = var.metastore_name | ||
| region = var.google_region | ||
| force_destroy = true | ||
| } | ||
|
|
||
| resource "databricks_metastore_assignment" "this" { | ||
| workspace_id = var.databricks_workspace_id | ||
| metastore_id = databricks_metastore.this.id | ||
| } | ||
|
|
||
| resource "databricks_storage_credential" "this" { | ||
| provider = databricks.workspace | ||
| name = "${var.prefix}-storage-credential" | ||
| databricks_gcp_service_account {} | ||
| depends_on = [databricks_metastore_assignment.this] | ||
| } | ||
|
|
||
| resource "databricks_external_location" "this" { | ||
| provider = databricks.workspace | ||
| name = "${var.prefix}-external-location" | ||
| url = "gs://${google_storage_bucket.ext_bucket.name}/" | ||
|
|
||
| credential_name = databricks_storage_credential.this.id | ||
|
|
||
| comment = "Managed by TF" | ||
| depends_on = [ | ||
| databricks_metastore_assignment.this, | ||
| google_storage_bucket_iam_member.unity_cred_reader, | ||
| google_storage_bucket_iam_member.unity_cred_admin | ||
| ] | ||
| } | ||
|
|
||
| resource "databricks_catalog" "main" { | ||
| provider = databricks.workspace | ||
| name = var.catalog_name | ||
| storage_root = databricks_external_location.this.url | ||
| comment = "This catalog is managed by terraform" | ||
| isolation_mode = "OPEN" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| resource "google_storage_bucket" "ext_bucket" { | ||
| name = "${var.prefix}-bucket" | ||
|
|
||
| project = var.google_project | ||
| location = var.google_region | ||
| force_destroy = true | ||
| } | ||
|
|
||
| resource "google_storage_bucket_iam_member" "unity_cred_admin" { | ||
| bucket = google_storage_bucket.ext_bucket.name | ||
| role = "roles/storage.objectAdmin" | ||
| member = "serviceAccount:${databricks_storage_credential.this.databricks_gcp_service_account[0].email}" | ||
| } | ||
|
|
||
| resource "google_storage_bucket_iam_member" "unity_cred_reader" { | ||
| bucket = google_storage_bucket.ext_bucket.name | ||
| role = "roles/storage.legacyBucketReader" | ||
| member = "serviceAccount:${databricks_storage_credential.this.databricks_gcp_service_account[0].email}" | ||
| } | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| terraform { | ||
| required_providers { | ||
| databricks = { | ||
| source = "databricks/databricks" | ||
| configuration_aliases = [databricks, databricks.workspace] | ||
| } | ||
| google = { | ||
| source = "hashicorp/google" | ||
| } | ||
| random = { | ||
| source = "hashicorp/random" | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| variable "databricks_workspace_url" { | ||
| description = "The URL of the Databricks workspace to which resources will be deployed (e.g., https://<region>.gcp.databricks.com)." | ||
| } | ||
|
|
||
| variable "databricks_workspace_id" { | ||
| description = "The unique identifier of the Databricks workspace in which resources will be managed." | ||
| } | ||
|
|
||
| variable "google_region" { | ||
| type = string | ||
| description = "Google Cloud region where the resources will be created" | ||
| } | ||
|
|
||
| variable "google_project" { | ||
| type = string | ||
| description = "The Google Cloud project ID where the Databricks workspace and associated resources will be created." | ||
| } | ||
|
|
||
| variable "prefix" { | ||
| type = string | ||
| description = "Prefix to use in generated resources name" | ||
| } | ||
|
|
||
| variable "metastore_name" { | ||
| type = string | ||
| description = "Name to assign to regional metastore" | ||
| } | ||
|
|
||
| variable "catalog_name" { | ||
| type = string | ||
| description = "Name to assign to default catalog" | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.