Deploying Azure Data Services via Terraform Part 6: Deploying a Storage Solution to The Kubernetes Cluster

Part six of this series will focus on deploying a storage solution to our Kubernetes cluster:

All the content in this series relates to this GitHub repo, the content for this post relates to the px_store module specifically.

Where Were We ?

If you have been following this blog post series you should have:

  • a basic grasp of Terraform
  • a Kubernetes cluster and an understanding of why Kubespray has been used to create it
  • a software load balancer in the form of MetalLB deployed to your cluster and basic understanding of Kubernetes services

Kubernetes Storage 101

Before we dive into deploying a storage solution to our Kubernetes cluster, we need to understand the basics of storage in the world of Kubernetes, which can appear to be both exotic and mysterious to the uninitiated. To dispel some confusion around Kubernetes and storage, the storage IO path is exactly the same as that with common garden vanilla variety Unix or Linux. The Kubernetes storage ecosystem introduces two extra things we need to concern ourselves with above and beyond conventional Unix/Linux storage, firstly there are some extra layers of abstraction between the physical storage and filesystems that pods use, what I like to refer to as . . .

The Kubernetes Storage “Layer Cake”

From the bottom up, persistent volumes map to physical storage devices, persistent volume claims are required in order to consume capacity from a persistent volume. The final piece in the puzzle is the volume – the touch point for storage consumption by containers in a pod:

Data Mobility

The second thing of note above and beyond how storage behaves in the world of Linux, is that when a pod moves around the cluster, they need to maintain their state as they jump from one worker node to another, and there are two ways of ensuring this:

  • shared storage
    Each worker node has an IO path to the same device or storage cluster, rescheduling a pod simply involves unmounting any volumes associated with that pod and re-mounting them once the pod has successfully been rescheduled to a different worker node:
  • replicated storage
    Persistent volumes have at least one replica, rescheduling a pod is simply a matter of ensuring that the pod ‘Lands’ on a worker node which has access to the relevant persistent volume replicas.

Provisioning

The Kubernetes storage ecosystem includes the concept of ‘Provisioning’, there are two provisioning schemes:

  • Manual
    Someone has to create persistent volumes manually
  • Automatic
    When a persistent volume claim is created a persistent volume is created automatically and the two entities are bound together. To use Kubernetes storage parlance, for a persistent volume to be in a usable state for a pod, it needs to be ‘Bound’ to a persistent volume.

From an end user perspective automatic provisioning provides the simplest means by which to consume storage.

Storage Classes

Each Kubernetes cluster requires at least one storage class object, simply put – different storage classes enable storage to be provisioned from platforms with different qualities of service. For example an OLTP style application might require a storage class associated with a low latency platform, where as a storage class associated with a high IO bandwidth application is better suited to OLAP style applications.

Here is an example manifest for a storage class:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: portworx-sc
provisioner: kubernetes.io/portworx-volume
parameters:
  repl: "2"

Persistent Volume Reclaim Policies

The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted. If a reclaim policy is not specified at the persistent volume level, the reclaim policy will default to that specified by the relevant storage class.

Putting The Pieces Of The Puzzle Together

The creation and consumption of storage under an automatic provisioning scheme requires:

  • The name of the persistent volume
  • The size of the persistent volume
  • A storage class name
  • An access mode, for block storage this is usually ReadWriteOnce, meaning that only a single pod can access the persistent volume claim at a time, file storage such as NFS permits a mode of ReadWriteMany, meaning that the persistent volume claim can be accessed via multiple worker nodes simultaneously.

This is illustrated in the deployment manifest for SQL Server below, note that it contains:

  • a volume: mssqldb
  • a persistent volume claim of mssql-data used by the volume
  • a definition for the persistent volume claim
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mssql-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: mssql
    spec:
      terminationGracePeriodSeconds: 1
      containers:
      - name: mssql
        image: mcr.microsoft.com/mssql/server/server:2017-latest
        ports:
        - containerPort: 1433
        env:
        - name: MSSQL_SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mssql
              key: SA_PASSWORD
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: mssql-data
---
apiVersion: v1
kind: Service
metadata:
  name: mssql-deployment
spec:
  selector:
    app: mssql
  ports:
  - protocol: TCP
    port: 1433
    targetPort: 1433
  type: LoadBalancer
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mssql-data
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: pure-block

A Brief History Of Kubernetes Storage Plugins

The very first integration of storage with Kubernetes was via “In-Tree” storage drivers, meaning that driver code had to be integrated directly into the Kubernetes code base. The next stage in the evolution of the storage ecosystem was FlexVolume drivers, this allowed storage vendors to develop their drivers independently of the Kubernetes code base. Despite FlexVolume being a step in the right direction, it required components to be installed on worker nodes with elevated access rights, and hence was not perfect. Fast forward to the present day and there is a Kubernetes storage special interest group and the focal point of their work is around the Container Storage Interface standard – CSI, one of the initiatives this group has worked on is the ability to migrate In-Tree volumes to CSI volumes, the inference here being that “The community” prefers CSI storage plugins over In-Tree drivers.

This Kubecon Europe 2019 session recording (remember those days when you could travel ?) is one of the best summaries of Kubernetes and storage I have seen to date, if you only ever watch one YouTube clip on the subject, this is the one to watch, plus Saad Ali is a member of the Kubernetes storage special interest group – so its reasonable to assume that he knows what he is talking about.

CSI – A Solid Foundation, But Not The End Game

The Container Storage Interface standard provides a great standard for storage vendors to support and adhere to, however to date and at the time of writing this blog post, there are things that stateful applications on Kubernetes require that the CSI standard does not cater for:

  • backup and recovery
    The CSI standard supports volume snapshots, however you also need the capability to back these up somewhere onto a storage platform that is separate from the one(s) that your Kubernetes cluster is using and as the name suggests this only covers volumes, backing up things such as secrets is important also.
  • high availability
    Imagine that AWS or GCP are your public clouds of choice, you want to use Azure Arc enabled Data Services and you require data redundancy across two or more available zones, availability groups for Azure Managed SQL Server instances may get you so far, but what about the data associated with the Azure Arc enabled Data Services controllers ?, again the CSI standard does not cater for this.
  • IO path awareness and tuning
    Different data and analytics applications have different IO requirements, the classic types of application being OLTP and OLAP, with the emergence of HTAP.

A Good Vendor Neutral Option

NFS is a good option if you are inclined to go for storage integration that is vendor agnostic, its baked into just about every version of vanilla Kubernetes – by ‘Vanilla’ I mean the opensource version and not platforms/distributions based on this such as EKS, AKS, GKE, Anthos, SUSE Rancher, Red Hat OpenShift or Tanzu. And in keeping with one of the aims of this series, it is free to use, as NFS servers incur no costs to setup providing you have the infrastructure to spin one up on.

However, another aim of this work is to provide a stack which looks like something you might run in production. And I dislike intensely any kind of infrastructure that I have to setup manually; things that involve significant effort when it comes to rooting around under /etc to find config files to edit etc . . . If you want something there is NFS, if however, you want something that:

  • supports the CSI standard, meaning that it facilitates backups via CSI volume snapshots
  • provides automatic provisioning
  • facilitates high availability across availability zones for EKS/AKS and GKE
  • is one of the easiest storage platforms out there to set up, you simply present block storage to the clusters worker nodes for on-premises solutions and then apply a YAML manifest to the cluster, for EKS/AKS.GCP you only have to apply the manifest
  • is storage and IO path aware
  • provides a foundation for features that allow persistent volumes to be resized based on rules (highly useful for big data clusters)

Portworx Essentials ticks all of these boxes and many more, for the purposes of provisioning persistent volumes and consuming storage from them, PX-Store is the component we are interested in.

PX-Store In A Nutshell

If you cast your mind back to Part 3 of this series, the variable used to create the virtual machines underpinning the Kubernetes cluster looks like this, note px_disk_size:

variable "virtual_machines" {
  default = {
    "z-ca-bdc-control1" = {
      name         = "z-ca-bdc-control1"
      compute_node = false
      ipv4_address = "192.168.123.88"
      ipv4_netmask = "22" 
      ipv4_gateway = "192.168.123.1"
      dns_server   = "192.168.123.2"
      ram          = 8192
      logical_cpu  = 4
      os_disk_size = 120
      px_disk_size = 0
    },
    "z-ca-bdc-control2" = {
      name         = "z-ca-bdc-control2"
      compute_node = false
      ipv4_address = "192.168.123.89"
      ipv4_netmask = "22"
      ipv4_gateway = "192.168.123.1"
      dns_server   = "192.168.123.2"
      ram          = 8192
      logical_cpu  = 4
      os_disk_size = 120
      px_disk_size = 0
    },
    "z-ca-bdc-compute1" = {
      name         = "z-ca-bdc-compute1"
      compute_node = true
      ipv4_address = "192.168.123.90"
      ipv4_netmask = "22"
      ipv4_gateway = "192.168.123.1"
      dns_server   = "192.168.123.2"
      ram          = 73728
      logical_cpu  = 12
      os_disk_size = 120
      px_disk_size = 120
    },
    "z-ca-bdc-compute2" = {
      name         = "z-ca-bdc-compute2"
      compute_node = true
      ipv4_address = "192.168.123.91"
      ipv4_netmask = "22"
      ipv4_gateway = "192.168.123.1"
      dns_server   = "192.168.123.2"
      ram          = 73728
      logical_cpu  = 12
      os_disk_size = 120
      px_disk_size = 120
    },
    "z-ca-bdc-compute3" = {
      name         = "z-ca-bdc-compute3"
      compute_node = true
      ipv4_address = "192.168.123.92"
      ipv4_netmask = "22"
      ipv4_gateway = "192.168.123.1"
      dns_server   = "192.168.123.2"
      ram          = 73728
      logical_cpu  = 12
      os_disk_size = 120 
      px_disk_size = 120
    }
  }
}

PX-Store requires a disk to create persistent volumes on, this could be any kind of block storage presented to worker nodes, be that via local storage, direct attached storage or a shared device such as a SAN, . . .

Note: the lsblk command returns one or more block devices that persistent volumes can be created on

Deploying and Testing The Module

Full instructions for deploying this module can be found here on GitHub. To prepare for deploying the module via:

terraform apply -target=module.px_store -auto-approve

there are three basic steps that need to be followed:

  1. Irrespective of whether you have used the module to create the VMware virtual machines to underpin the Kubernetes cluster or you are using something different, say NUCs in your home lab for example, the starting point is to ensure that each worker node host in the cluster has block storage for creating persistent volumes on, again – lsblk is your friend here.
  2. Log into PX-Central and create a spec for PX Essentials, the process for doing this is fully documented here.
  3. Edit the variables.tf file under Arc-PX-VMware-Faststart/modules/px_store (refer to the HCL excerpt below):
    use_csi determines whether or not a CSI compatible storage class will be created
    px_repl_factor specifies the number of replicas per persistent volume, a minimum value of two is recommended
    px_spec paste the URL generated from PX Central here
    use_stork this invokes the storage aware scheduler, if the underlying infrastructure uses local storage, i.e. solid state drives, PCI cards or spinning disks housed in the same chassis as the server CPU resources, stork will endeavor to ensure that pods are co-located on the same node hosts that their persistent volumes reside on
    storage_class the name of the storage class that will be created
variable "use_csi" {
  default     = false
}

variable "px_repl_factor" {
  default     = "2"
}

variable "px_spec" {
  description = "PX spec URL"
  type        = string
  default     = "https://install.portworx.com/2.6?mc=false&kbver=1.19.7&b=true&j=auto&c=px-cluster-942c7da3-d540-4b79-b391-b74453c1da43&stork=true&csi=true&st=k8s"
}

variable "use_stork" {
  description = "Boolean variable to determine whether or not the stork scheduler should be used"
  default     = true
}

variable "storage_class" {
  description = "Kubernetes storage class name"
  type        = string
  default     = "portworx-sc"
}

Now that the necessary preparation has taken place to deploy the module, lets walk through the module resource-by-resource:

  • portworx
    Portworx (PX-Store) is deployed and we wait until all the Portworx pods are in a ready state, to cater for terraform destroy a destroy provisioner is provided
resource "null_resource" "portworx" {
  provisioner "local-exec" {
    command =<<EOF
      kubectl apply -f $PX_SPEC
      until [ $(kubectl get po -n kube-system | egrep '(stork|px|portworx)' | \
                    awk '{ s += substr($2,1,1); t +=substr($2,3,1) } END { print s - t}') -eq 0 ]; do
        echo "."
        echo "Waiting for Portworx PX-Store pods to be ready"
        echo "."
        kubectl get po -n kube-system | egrep '(stork|px|portworx)'
        sleep 10
      done
      echo " "
      PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
      kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status
    EOF

    environment = {
      PX_SPEC = var.px_spec
    }
  }

  provisioner "local-exec" {
    command=<<EOF
      curl -fsL https://install.portworx.com/px-wipe | bash
      kubectl label nodes --all px/enabled=remove --overwrite
      until [ $(kubectl get po -n kube-system | egrep '(stork|px|portworx)' | wc -l) -eq 0 ]; do
        echo "."
        echo "Waiting for Portworx PX-Store pods to be destroyed"
        echo "."
        kubectl get po -n kube-system | egrep '(stork|px|portworx)'
        sleep 10
      done
      echo " "
      kubectl delete -f "https://install.portworx.com?ctl=true&kbver=$(kubectl version --short | awk -Fv '/Server Version: /{print $3}')"
      kubectl label nodes --all px/enabled-
    EOF
    when = destroy
  }
}
  • kubernetes_scheduler
    Set the scheduler to stork – for storage awareness, if the user_stork variable is set to true:
resource "null_resource" "kubernetes_scheduler" {
  count = var.use_stork ? 1 : 0
  provisioner "local-exec" {
    command = <<EOF
      kubectl get deployment stork -n kube-system -o yaml | perl -pe "s/--webhook-controller=false/--webhook-controller=true/g" | kubectl apply -f -
    EOF
  }
  depends_on = [
    null_resource.portworx
  ]
}
  • storage_class_spec
    This resource creates a manifest for the creation of a storage class:
resource "local_file" "storage_class_spec" {
  content = templatefile("${path.module}/templates/storage_class.tpl", {
    provisioner = var.use_csi ? "pxd.portworx.com" : "kubernetes.io/portworx-volume"
    px_repl_factor = var.px_repl_factor
  })
  filename = "storage-class.yml" 

  depends_on = [
    null_resource.kubernetes_scheduler
  ]
}
  • storage_class
    Finally we create the actual storage class object in the Kubernetes cluster:
resource "null_resource" "storage_class" {
  provisioner "local-exec" {
    command = "kubectl apply -f ./storage-class.yml"
  }

  depends_on = [
    local_file.storage_class_spec 
  ]
}

Once the module has been deployed via:

terraform apply -target=module.px_store -auto-approve

create a persistent volume using the test-pvc.yaml file provided in the px_store directory, this assumes that the storage class is called portworx-sc:

kubectl apply -f test-pvc.yaml

Finally, check that the persistent volume claim has been created:

kubectl get pvc

the persistent volume claim should be present and in a bound state:

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-pvc Bound pvc-c977cf68-6fcc-4515-a04d-18650875f0f7 1Gi RWO portworx-sc 7s

Coming Up In Part 7

Following the logical order of components in the stack, the next thing to look at is the px_backup module – however in order to walk through this we need something to actually backup in the first place, and this is why part 7 in this series will cover Azure Arc enabled Data Services, part 8 will cover Big Data Cluster deployment . . . and then once we have something to backup, part 9 will cover the px_backup module.

One thought on “Deploying Azure Data Services via Terraform Part 6: Deploying a Storage Solution to The Kubernetes Cluster

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s