NOTE: Documentation of this Terraform for the developer, or others who are interested, can be found here.
This is a slightly-opinionated Terraform module for deploying an HPCC Systems cluster on Azure's Kubernetes service (aks). The goal is to provide a simple method for deploying a cluster from scratch, with only the most important options to consider.
The HPCC Systems cluster created by this module uses ephemeral storage, which is the default. This means the storage will be deleted when the cluster is deleted) But, you can also have Persistent Storage. See the section titled Persistent Storage, below.
-
terraform This is a Terraform module, so you need to have terraform installed on your system. Instructions for downloading and installing terraform can be found at https://www.terraform.io/downloads.html. Do make sure you install a 64-bit version of terraform, as that is needed to accommodate some of the large random numbers used for IDs in the Terraform modules.
-
helm Helm is used to deploy the HPCC Systems processes under Kubernetes. Instructions for downloading and installing Helm are at https://helm.sh/docs/intro/install.
-
kubectl The Kubernetes client (kubectl) is also required so you can inspect and manage the Azure Kubernetes cluster. Instructions for download and installing that can be found at https://kubernetes.io/releases/download/. Make sure you have version 1.22.0 or later.
-
Azure CLI To work with Azure, you will need to install the Azure Command Line tools. Instructions can be found at https://docs.microsoft.com/en-us/cli/azure/install-azure-cli. Even if you think you won't be working with Azure, this module does leverage the command line tools to manipulate network security groups within Kubernetes clusters. TL;DR: Make sure you have the command line tools installed.
-
To successfully create everything you will need to have Azure's
Contributor
role plus access toMicrosoft.Authorization/*/Write
andMicrosoft.Authorization/*/Delete
permissions on your subscription. You may have to create a custom role for this. Of course, Azure'sOwner
role includes everything so if you're the subscription's owner then you're good to go. -
You need a minimum of 28 vCPUs available on
azure
andaks_serv_node_size
must be at leastxlarge
. The followingaz
command will tell you the maximum number of vCPUs you can use. And, the 2ndaz
command, below, gives you the number of vCPUs you have already used in regioneastus
(replaceeastus
with the name of the region you are using). Furthermore, you can get the number of vCPUs available for you to use by subtracting the result of the 2ndaz
command, below, from the result of the firstaz
command.az vm list-usage --location "eastus" -o table|grep "Total Regional vCPUs"|sed "s/ */\t/g"|cut -f5
az vm list-usage --location "eastus" -o table|grep "Total Regional vCPUs"|sed "s/ */\t/g"|cut -f4
-
You need to make sure
jq
andkubelogin
are installed on your linux machine. You can determine if they are by using thewhich
command, e.g.which jq
returnsjq
s path if it is installed. The following commands can be used to install 'jq
andkubelogin
, respectively:sudo apt-get install jq
sudo az aks install-cli
-
If you run the terraform code on an azure VM, then the azure VM must have EncryptionAtHost enabled. You can do this by: 1) Stopping your azure VM; 2) click on
Disk
in the Overview of the azure VM; 3) click on the tab,Additional Settings
; 4) selectingyes
radio button underEncryption at host
.
- If necessary, login to Azure.
- From the command line, this is usually accomplished with the
az login
command.
- From the command line, this is usually accomplished with the
- Clone this repo to your local system and change current directory.
git clone https://github.com/hpccsystems-solutions-lab/terraform-azurerm-hpcc-lite.git
cd terraform-azurerm-hpcc-lite
- Issue
terraform init
to initialize the Terraform modules. - Issue
terraform apply
This command will do aterraform init
,terraform plan
andterraform apply
for each of the subsystems needed, i.e.vnet
,aks
,storage
, andhpcc
(thestorage
subsystem is deployed only if you setexternal_storage_desired=true
). The order that these subsystems are deploy is:vnet
,aks
,storage
, andhpcc
. For each subsystem,terraform
creates aplan
file which is stored in the directory:~/tflogs
(note: if this directory doesn't exist, it is created automatically). - Decide how you want to supply option values to the module during invocation. There are three possibilities:
- Invoke the
terraform apply
command and enter values for each option as terraform prompts for it, then enteryes
at the final prompt to begin building the cluster. - Recommended: Create a
lite.auto.tfvars
file containing the values for each option, invoketerraform apply
, then enteryes
at the final prompt to begin building the cluster. The easiest way to createlite.auto.tfvars
is to copy the example file,lite.auto.tfvars.example
, and then edit the copy:cp -v lite.auto.tfvars.example lite.auto.tfvars
- Use -var arguments on the command line when executing the terraform tool to set each of the values found in the .tfvars file. This method is useful if you are driving the creation of the cluster from a script.
- Invoke the
- After the Kubernetes cluster is deployed, your local
kubectl
tool can be used to interact with it. At some point during the deploymentkubectl
will acquire the login credentials for the cluster and it will be the current context (so anykubectl
commands you enter will be directed to that cluster by default).
At the end of a successful deployment these items are output for aks, hpcc, and vnet:
- aks
- Advisor recommendations or 'none',
advisor_recommendations
. - Location of the aks credentials,
aks_login
. - Name of the Azure Kubernetes Service,
cluster_name
. - Resource group where the cluster is deployed,
cluster_resource_group_name
.
- Advisor recommendations or 'none',
- hpcc
- The URL used to access ECL Watch,
eclwatch_url
. - The deployment azure resource group,
deployment_resource_group
. - Whether there is external storage or not,
external_storage_config_exists
.
- The URL used to access ECL Watch,
- vnet
- Advisor recommendations or 'none',
advisor_recommendations
. - ID of private subnet,
private_subnet_id
. - ID of public subnet,
public_subnet_id
. - ID of route table,
route_table_id
. - Route table name,
route_table_name
. - Virtual network resource group name,
resource_group_name
. - Virtual network name,
vnet_name
.
- Advisor recommendations or 'none',
Options have data types. The ones used in this module are:
- string
- Typical string enclosed by quotes
- Example
"value"
- number
- Integer number; do not quote
- Example
1234
- boolean
- true or false (not quoted)
- map of string
- List of key/value pairs, delimited by commas
- Both key and value should be a quoted string
- Entire map is enclosed by braces
- Example with two key/value pairs
{"key1" = "value1", "key2" = "value2"}
- Empty value is
{}
- list of string
- List of values, delimited by commas
- A value is a quoted string
- Entire list is enclosed in brackets
- Example with two values
["value1", "value2"]
- Empty value is
[]
The following options should be set in your lite.auto.tfvars
file (or entered interactively, if you choose to not create a file). Only a few of them have default values. The rest are required. The 'Updateable' column indicates whether, for any given option, it is possible to successfully apply the update against an already-running HPCC k8s cluster.
Option | Type | Description | Updatable |
---|---|---|---|
a_record_name |
string | Name of the A record, of following dns zone, where the ecl watch ip is placed This A record will be created and therefore should not exist in the following dns zone. Example entry: "my-product". This should be something project specific rather than something generic. | Y |
admin_username |
string | Username of the administrator of this HPCC Systems cluster. Example entry: "jdoe" | N |
aks_admin_email |
string | Email address of the administrator of this HPCC Systems cluster. Example entry: "[email protected]" | Y |
aks_admin_ip_cidr_map |
map of string | Map of name => CIDR IP addresses that can administrate this AKS. Format is '{"name"="cidr" [, "name"="cidr"]*}'. The 'name' portion must be unique. To add no CIDR addresses, use '{}'. The corporate network and your current IP address will be added automatically, and these addresses will have access to the HPCC cluster as a user. | Y |
aks_admin_name |
string | Name of the administrator of this HPCC Systems cluster. Example entry: "Jane Doe" | Y |
aks_azure_region |
string | The Azure region abbreviation in which to create these resources. Example entry: "eastus" | N |
aks_dns_zone_name |
string | Name of an existing dns zone. Example entry: "hpcczone.us-hpccsystems-dev.azure.lnrsg.io" | N |
aks_dns_zone_resource_group_name |
string | Name of the resource group of the above dns zone. Example entry: "app-dns-prod-eastus2" | N |
aks_enable_roxie |
boolean | Enable ROXIE? This will also expose port 8002 on the cluster. Example entry: false | Y |
aks_logging_monitoring_enabled |
boolean | This variable enable you to ask for logging and monitoring of the Kubernetes and hpcc cluster (true means enable logging and monitoring, false means don't. | N |
aks_4nodepools |
boolean | Determines whether 1 or 4 nodepools are use -- 4 used if true otherwise 2 used. (default is false). | N |
aks_nodepools_max_capacity |
string | The maximum number of nodes of every hpcc nodepool. | N |
aks_roxie_node_size |
string | The VM size for each roxie node in the HPCC Systems. Example format aks_roxie_node-size ="xlarge". |
N |
aks_serv_node_size |
string | The VM size for each serv node in the HPCC Systems. Example format aks_serv_node-size ="2xlarge". |
N |
aks_spray_node_size |
string | The VM size for each spray node in the HPCC Systems. Example format aks_spray_node-size ="2xlarge". |
N |
aks_thor_node_size |
string | The VM size for each thor node in the HPCC Systems. Example format aks_thor_node-size ="2xlarge". |
N |
aks_capacity |
map of number | The min and max number of nodes of each node pool in the HPCC Systems. Example format is '{ roxie_min = 1, roxie_max = 3, serv_min = 1, serv_max = 3, spray_min = 1, spray_max = 3, thor_min = 1, thor_max = 3}'. | N |
authn_htpasswd_filename |
string | If you would like to use htpasswd to authenticate users to the cluster, enter the filename of the htpasswd file. This file should be uploaded to the Azure 'dllsshare' file share in order for the HPCC processes to find it. A corollary is that persistent storage is enabled. An empty string indicates that htpasswd is not to be used for authentication. Example entry: "htpasswd.txt" | Y |
enable_code_security |
boolean | Enable code security? If true, only signed ECL code will be allowed to create embedded language functions, use PIPE(), etc. Example entry: false | Y |
enable_thor |
boolean | If you want a thor cluster then 'enable_thor' must be set to true Otherwise it is set to false | Y |
external_storage_desired |
boolean | If you want external storage instead of ephemeral storage then set this variable to true otherwise set it to false. | Y |
extra_tags |
map of string | Map of name => value tags that can will be associated with the cluster. Format is '{"name"="value" [, "name"="value"]*}'. The 'name' portion must be unique. To add no tags, use '{}'. | Y |
hpcc_user_ip_cidr_list |
list of string | List of explicit CIDR addresses that can access this HPCC Systems cluster. To allow public access, set value to ["0.0.0.0/0"] or []. | Y |
hpcc_version |
string | The version of HPCC Systems to install. Only versions in nn.nn.nn format are supported. | Y |
my_azure_id |
string | Your azure account object id. Find this on azure portal, by going to 'users' then search for your name and click on it. The account object id is called 'Object ID'. There is a link next to it that lets you copy it. | N |
storage_data_gb |
number | The amount of storage reserved for data in gigabytes. Must be 1 or more. If a storage account is defined (see below) then this value is ignored. | Y |
storage_lz_gb |
number | The amount of storage reserved for the landing zone in gigabytes. Must be 1 or more. If a storage account is defined (see below) then this value is ignored. | Y |
thor_max_jobs |
number | The maximum number of simultaneous Thor jobs allowed. Must be 1 or more. | Y |
thor_num_workers |
number | The number of Thor workers to allocate. Must be 1 or more. | Y |
To get persistent storage, i.e. storage that is not deleted when the HPCC cluster is deleted, set the variable, external_storage_desired
, to true.
- Useful
az cli
commands:az account list --output table
- Shows your current subscriptions, and determine which is the default
az account set --subscription "My_Subscription"
- Sets the default subscription
- Useful
kubectl
commands once the cluster is deployed:kubectl get pods
- Shows Kubernetes pods for the current cluster.
kubectl get services
- Show the current services running on the pods on the current cluster.
kubectl config get-contexts
- Show the saved kubectl contexts. A context contains login and reference information for a remote Kubernetes cluster. A kubectl command typically relays information about the current context.
kubectl config use-context <ContextName>
- Make <ContextName> context the current context for future kubectl commands.
kubectl config unset contexts.<ContextName>
- Delete context named <ContextName>.
- Note that when you delete the current context, kubectl does not select another context as the current context. Instead, no context will be current. You must use
kubectl config use-context <ContextName>
to make another context current.
- Note that
terraform destroy
does not delete the kubectl context. You need to usekubectl config unset contexts.<ContextName>
to get rid of the context from your local system. - If a deployment fails and you want to start over, you have two options:
- Immediately issue a
terraform destroy
command and let terraform clean up. - Clean up the resources by hand:
- Delete the Azure resource group manually, such as through the Azure Portal.
- Note that there are two resource groups, if the deployment got far enough. Examples:
app-thhpccplatform-sandbox-eastus-68255
mc_tf-zrms-default-aks-1
- The first one contains the Kubernetes service that created the second one (services that support Kubernetes). So, if you delete only the first resource group, the second resource group will be deleted automatically.
- Note that there are two resource groups, if the deployment got far enough. Examples:
- Delete all terraform state files using
rm *.tfstate*
- Delete the Azure resource group manually, such as through the Azure Portal.
- Then, of course, fix whatever caused the deployment to fail.
- Immediately issue a
- If you want to completely reset terraform, issue
rm -rf .terraform* *.tfstate*
and thenterraform init
.