Deploying Azure Data Services via Terraform Part 2: An Introduction to Terraform

Before diving into what the various Terraform modules do that make up the Arc-PX-VMware-Faststart repo, I’m going to provide an introduction to Terraform in this blog post. Terraform comes from Hashicorp, it is a tool that works on the principle of infrastructure-as-code. Resources are specified in what are called configuration files using Hashicorp Control Language in a declarative manner, i.e. you state what you want and to the best of its ability Terraform attempts to create those resources for you. ‘Providers’ are used to create resources for particular types of entity, for example you might use local file, helm (the Kubernetes package manager), Azure, VMware providers etc. etc. . . . Using providers requires plugins, most of which are provided by Hashicorp, but third parties can write their own plugins also.

Configuration Lifecycles

Configurations go through a lifecycle consisting of at least three basic steps. To use Terraform it its most basic form, you have to download Terraform . Hashicorp is a commercial organization, so how does it make money from Terraform ?. Terraform is free for up to 5 users, thereafter you have to pay for the Team version and then there is also Terraform cloud. Once you have downloaded Terraform and have created a configuration, you run:

  • terraform init
    Terraform reads the configuration files (the convention is to give these .tf suffixes), Terraform will then download and install plugins as appropriate based on what it finds.
  • terraform plan
    This causes Terraform to create a plan, in its most basic form the state of the resources that have been provisioned are stored in terraform.tfstate files – which you should never ever manually change. Plans can be saved and used at a later date with terraform apply by using the -out-path option. More advanced options of storing resource state include the use of Azure or AWS via the use of a ‘Backend’ block as per the examples below, the final alternative is to use Terraform cloud which does 100% of the state metadata management heavy lifting for you. Of the many useful things that terraform plan does for you, it will attempt to connect to the various provider(s) your configuration uses to a) ensure it can connect to them in the first instance and b) ensure that it can create the resources you want.

terraform {
backend "azurerm" {
resource_group_name = "tstate"
storage_account_name = "tstate09762"
container_name = "tstate"
key = "terraform.tfstate"
}
}

terraform {
backend "s3" {
bucket = "mybucket"
key = "path/to/my/key"
region = "us-east-1"
}
}

  • terraform apply
    This results in the resources specified in the configuration being created. If the configuration includes any output blocks, this command will also result in values being displayed in the same terminal session from where the command was run.
  • terraform destroy
    To destroy all the currently provisioned resources associated with the configuration, run terraform destroy.

The Anatomy Of A Terraform Configuration

A Terraform configuration is composed of blocks specified in Hashicorp Control Language:

  • terraform
    This is an optional block that can be used to specify a backend – where the terraform state is to be stored and it can also be used to mandate that specific versions of specific providers should be used.

terraform {
required_providers {
mycloud = {
source = "mycorp/mycloud"
version = "~> 1.0"
}
}
}

  • provider
    Some provider types mandate the use of provider blocks:

provider "azurerm" {
features {}
alias = "azure_rm"
}

  • data
    If there is a requirement to fetch data from an external source for use across a configuration, use a data source via a data block:

data "azurerm_subscription" "primary" {
provider = azurerm.azure_rm
}

  • resource
    Resources are the most fundamental building block in a configuration, no resources equals no configuration. Optional depends_on blocks can be specified to force the order in which resources are created. Resource blocks enable post resource creation actions to be performed via provisioner block, of which there are three types:
    • local – perform the action on the host terraform apply was executed from
    • remote – perform the action on a remote host
    • file – copy files or directories to the resource that has been provisioned

resource "helm_release" "metallb" {
name = "metallb"
repository = "https://charts.bitnami.com/bitnami"
chart = "metallb"
namespace = kubernetes_namespace.metallb_system.metadata.0.name

set {
name = "version"
value = var.helm_chart_version
}

provisioner "local-exec" {
command = "kubectl delete configmap metallb-config -n metallb-system"
}

depends_on = [
kubernetes_namespace.metallb_system
]
}

  • variable
    Terraform supports primitive variable types in the form of string, number and bool and also complex types in the form of lists, maps and lists of maps, also variable values can be supplied via environment variables.
    • An example of a primitive type:

      variable "vsphere_network" {
      description = "Network to use for virtual machine"
      type = string
      default = "VM Network"
      }
    • An example of a complex type (a list of maps):

variable "virtual_machines" {
default = {
"z-ca-bdc-control1" = {
name = "z-ca-bdc-control1"
compute_node = false
ipv4_address = "192.168.123.4"
ipv4_netmask = "22"
ipv4_gateway = "192.168.123.1"
dns_server = "192.168.123.2"
ram = 8192
logical_cpu = 4
os_disk_size = 120
px_disk_size = 0
},
"z-ca-bdc-control2" = {
name = "z-ca-bdc-control2"
compute_node = false
ipv4_address = "192.168.123.
5"
ipv4_netmask = "22"
ipv4_gateway = "192.168.123.
1"
dns_server = "192.168.123.
2"
ram = 8192
logical_cpu = 4
os_disk_size = 120
px_disk_size = 0
}

}

  • output
    Specify values to be output to the terminal from which terraform apply is executed using output blocks:

output "client_id" {
value = azuread_application.auth.application_id
description = "name"
}

Good Practices

I hesitate to use the term “Best practice”, because what is perceived to be a “Best practice” depends wholly on the context in which something is used. However, whilst learning Terraform here are a handful of things that I would consider to be “Good practices”:

  • Adopt a uniform configuration file naming convention and layout
    Generally speaking Terraform will attempt to read and use whatever it can find in .tf or .tfvars files in and under the directory from which the command is run – TL;DR providing that files have the right extensions, Terraform does not care what configuration files are called. Having said all of this, in the interest of promoting a file layout that is easy to understand, I would recommend by starting off with a structure/convention:
    • main.tf – as the main file for containing resource creation HCL
    • variables.tf – for specifying variable types and defaults
    • variables.tfvars – for specifying variable values, I have not used a .tfvars file – yet, but its something you should consider
    • outputs.tf – for output block
  • Prefer resources to null_resources
    Despite the fact that Hashicorp and a good number of third parties provide a rich ecosystem of resource types, the chances are that you will come across scenarios in which there is no out-of-the-box resource to cater for your specific needs. In this instance you can use what is called a null_resource and embed code into this via a provisioner. Whilst this is a workable solution – to an extent, you will have to also create a destroy provisioner to undo whatever action your null_resource performs when a terraform destroy is issued, also note that you cannot currently reference variables or use self references in destroy provisioners.
  • Use provisioners sparingly – where possible
    Mea culpa – I have used provisioners more than I would have liked, however, where and if possible avoid provisioners as much as you can, the whole idea behind Terraform is to foster a declarative infrastructure-as-code mindset, writing lots of imperative code embedded in local and remote provisioners goes against this:
  • Use templatefile instead of embedding text processing languages inside local/remote provisioners
    Sooner or later the need will arise to create files based on a template which values are plugged into, Terraform provides an incredibly elegant means of doing this via the templatefile function, always prefer this approach where possible in preference to doing things such as embedding awk, sed, or perl code inside of local or remote provisioners.
  • Group configurations that share providers with the same initial configuration into modules
    Place configuration files that share the same providers with the same provider configuration into modules, this is achieved through the use of module blocks within a root module. This is, at the time of writing, the root module for
    the kubernetes related Arc-PX-VMware-Faststart modules. Modules form the cornerstone of code reuse with Terraform and providing certain standards are adhered to, modules can be uploaded to a Hashicorp registry.

provider "kubernetes" {
config_path = "~/.kube/config"
}

provider "helm" {
kubernetes {
config_path = "~/.kube/config"
}
}

module "kubernetes_cluster" {
source = "./modules/kubernetes_cluster"
}

module "px_store" {
source = "./modules/px_store"
}

module "px_backup" {
source = "./modules/px_backup"
}

module "metallb" {
source = "./modules/metallb"
}

But Chris, All The Cool Kids Are Now Using CDKs !

CDKs – or cloud development kits are most definitely a thing and they have traction, in short a CDK allows you to specify infrastructure as code in a native 3 GL, such as python. AWS has a CDK, Pulumi is a CDK and not to be left out Hashicorp ships Terraform CDK. Despite the fact I am sure I will use a CDK at some point, CDKs will not be covered in this series of posts, however they are well worth keeping an eye on.

Wrapping Things Up

The aim of this post was to paint a broad brush strokes picture of Terraform, I’ll delve into more advanced topics as I begin to use things such as the the templatefile function, for loops etc. in each module that I cover. But, to cut a long story short, the only real way that you will learn Terraform is to start using it in anger.




One thought on “Deploying Azure Data Services via Terraform Part 2: An Introduction to Terraform

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s