Part six of this series will focus on deploying a storage solution to our Kubernetes cluster:

Where Were We ?
If you have been following this blog post series you should have:
- a basic grasp of Terraform
- a Kubernetes cluster and an understanding of why Kubespray has been used to create it
- a software load balancer in the form of MetalLB deployed to your cluster and basic understanding of Kubernetes services
Kubernetes Storage 101
Before we dive into deploying a storage solution to our Kubernetes cluster, we need to understand the basics of storage in the world of Kubernetes, which can appear to be both exotic and mysterious to the uninitiated. To dispel some confusion around Kubernetes and storage, the storage IO path is exactly the same as that with common garden vanilla variety Unix or Linux. The Kubernetes storage ecosystem introduces two extra things we need to concern ourselves with above and beyond conventional Unix/Linux storage, firstly there are some extra layers of abstraction between the physical storage and filesystems that pods use, what I like to refer to as . . .
The Kubernetes Storage “Layer Cake”
From the bottom up, persistent volumes map to physical storage devices, persistent volume claims are required in order to consume capacity from a persistent volume. The final piece in the puzzle is the volume – the touch point for storage consumption by containers in a pod:

Data Mobility
The second thing of note above and beyond how storage behaves in the world of Linux, is that when a pod moves around the cluster, they need to maintain their state as they jump from one worker node to another, and there are two ways of ensuring this:
- shared storage
Each worker node has an IO path to the same device or storage cluster, rescheduling a pod simply involves unmounting any volumes associated with that pod and re-mounting them once the pod has successfully been rescheduled to a different worker node:

- replicated storage
Persistent volumes have at least one replica, rescheduling a pod is simply a matter of ensuring that the pod ‘Lands’ on a worker node which has access to the relevant persistent volume replicas.

Provisioning
The Kubernetes storage ecosystem includes the concept of ‘Provisioning’, there are two provisioning schemes:
- Manual
Someone has to create persistent volumes manually - Automatic
When a persistent volume claim is created a persistent volume is created automatically and the two entities are bound together. To use Kubernetes storage parlance, for a persistent volume to be in a usable state for a pod, it needs to be ‘Bound’ to a persistent volume.
From an end user perspective automatic provisioning provides the simplest means by which to consume storage.
Storage Classes
Each Kubernetes cluster requires at least one storage class object, simply put – different storage classes enable storage to be provisioned from platforms with different qualities of service. For example an OLTP style application might require a storage class associated with a low latency platform, where as a storage class associated with a high IO bandwidth application is better suited to OLAP style applications.
Here is an example manifest for a storage class:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: portworx-sc
provisioner: kubernetes.io/portworx-volume
parameters:
repl: "2"
Persistent Volume Reclaim Policies
The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted. If a reclaim policy is not specified at the persistent volume level, the reclaim policy will default to that specified by the relevant storage class.
Putting The Pieces Of The Puzzle Together
The creation and consumption of storage under an automatic provisioning scheme requires:
- The name of the persistent volume
- The size of the persistent volume
- A storage class name
- An access mode, for block storage this is usually
ReadWriteOnce
, meaning that only a single pod can access the persistent volume claim at a time, file storage such as NFS permits a mode ofReadWriteMany
, meaning that the persistent volume claim can be accessed via multiple worker nodes simultaneously.
This is illustrated in the deployment manifest for SQL Server below, note that it contains:
- a volume:
mssqldb
- a persistent volume claim of
mssql-data
used by the volume - a definition for the persistent volume claim
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: mssql
spec:
terminationGracePeriodSeconds: 1
containers:
- name: mssql
image: mcr.microsoft.com/mssql/server/server:2017-latest
ports:
- containerPort: 1433
env:
- name: MSSQL_SA_PASSWORD
valueFrom:
secretKeyRef:
name: mssql
key: SA_PASSWORD
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: mssql-data
---
apiVersion: v1
kind: Service
metadata:
name: mssql-deployment
spec:
selector:
app: mssql
ports:
- protocol: TCP
port: 1433
targetPort: 1433
type: LoadBalancer
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: mssql-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
storageClassName: pure-block
A Brief History Of Kubernetes Storage Plugins
The very first integration of storage with Kubernetes was via “In-Tree” storage drivers, meaning that driver code had to be integrated directly into the Kubernetes code base. The next stage in the evolution of the storage ecosystem was FlexVolume drivers, this allowed storage vendors to develop their drivers independently of the Kubernetes code base. Despite FlexVolume being a step in the right direction, it required components to be installed on worker nodes with elevated access rights, and hence was not perfect. Fast forward to the present day and there is a Kubernetes storage special interest group and the focal point of their work is around the Container Storage Interface standard – CSI, one of the initiatives this group has worked on is the ability to migrate In-Tree volumes to CSI volumes, the inference here being that “The community” prefers CSI storage plugins over In-Tree drivers.
This Kubecon Europe 2019 session recording (remember those days when you could travel ?) is one of the best summaries of Kubernetes and storage I have seen to date, if you only ever watch one YouTube clip on the subject, this is the one to watch, plus Saad Ali is a member of the Kubernetes storage special interest group – so its reasonable to assume that he knows what he is talking about.

CSI – A Solid Foundation, But Not The End Game
The Container Storage Interface standard provides a great standard for storage vendors to support and adhere to, however to date and at the time of writing this blog post, there are things that stateful applications on Kubernetes require that the CSI standard does not cater for:
- backup and recovery
The CSI standard supports volume snapshots, however you also need the capability to back these up somewhere onto a storage platform that is separate from the one(s) that your Kubernetes cluster is using and as the name suggests this only covers volumes, backing up things such as secrets is important also. - high availability
Imagine that AWS or GCP are your public clouds of choice, you want to use Azure Arc enabled Data Services and you require data redundancy across two or more available zones, availability groups for Azure Managed SQL Server instances may get you so far, but what about the data associated with the Azure Arc enabled Data Services controllers ?, again the CSI standard does not cater for this. - IO path awareness and tuning
Different data and analytics applications have different IO requirements, the classic types of application being OLTP and OLAP, with the emergence of HTAP.
A Good Vendor Neutral Option
NFS is a good option if you are inclined to go for storage integration that is vendor agnostic, its baked into just about every version of vanilla Kubernetes – by ‘Vanilla’ I mean the opensource version and not platforms/distributions based on this such as EKS, AKS, GKE, Anthos, SUSE Rancher, Red Hat OpenShift or Tanzu. And in keeping with one of the aims of this series, it is free to use, as NFS servers incur no costs to setup providing you have the infrastructure to spin one up on.
However, another aim of this work is to provide a stack which looks like something you might run in production. And I dislike intensely any kind of infrastructure that I have to setup manually; things that involve significant effort when it comes to rooting around under /etc
to find config files to edit etc . . . If you want something there is NFS, if however, you want something that:
- supports the CSI standard, meaning that it facilitates backups via CSI volume snapshots
- provides automatic provisioning
- facilitates high availability across availability zones for EKS/AKS and GKE
- is one of the easiest storage platforms out there to set up, you simply present block storage to the clusters worker nodes for on-premises solutions and then apply a YAML manifest to the cluster, for EKS/AKS.GCP you only have to apply the manifest
- is storage and IO path aware
- provides a foundation for features that allow persistent volumes to be resized based on rules (highly useful for big data clusters)
Portworx Essentials ticks all of these boxes and many more, for the purposes of provisioning persistent volumes and consuming storage from them, PX-Store is the component we are interested in.
PX-Store In A Nutshell
If you cast your mind back to Part 3 of this series, the variable used to create the virtual machines underpinning the Kubernetes cluster looks like this, note px_disk_size:
variable "virtual_machines" {
default = {
"z-ca-bdc-control1" = {
name = "z-ca-bdc-control1"
compute_node = false
ipv4_address = "192.168.123.88"
ipv4_netmask = "22"
ipv4_gateway = "192.168.123.1"
dns_server = "192.168.123.2"
ram = 8192
logical_cpu = 4
os_disk_size = 120
px_disk_size = 0
},
"z-ca-bdc-control2" = {
name = "z-ca-bdc-control2"
compute_node = false
ipv4_address = "192.168.123.89"
ipv4_netmask = "22"
ipv4_gateway = "192.168.123.1"
dns_server = "192.168.123.2"
ram = 8192
logical_cpu = 4
os_disk_size = 120
px_disk_size = 0
},
"z-ca-bdc-compute1" = {
name = "z-ca-bdc-compute1"
compute_node = true
ipv4_address = "192.168.123.90"
ipv4_netmask = "22"
ipv4_gateway = "192.168.123.1"
dns_server = "192.168.123.2"
ram = 73728
logical_cpu = 12
os_disk_size = 120
px_disk_size = 120
},
"z-ca-bdc-compute2" = {
name = "z-ca-bdc-compute2"
compute_node = true
ipv4_address = "192.168.123.91"
ipv4_netmask = "22"
ipv4_gateway = "192.168.123.1"
dns_server = "192.168.123.2"
ram = 73728
logical_cpu = 12
os_disk_size = 120
px_disk_size = 120
},
"z-ca-bdc-compute3" = {
name = "z-ca-bdc-compute3"
compute_node = true
ipv4_address = "192.168.123.92"
ipv4_netmask = "22"
ipv4_gateway = "192.168.123.1"
dns_server = "192.168.123.2"
ram = 73728
logical_cpu = 12
os_disk_size = 120
px_disk_size = 120
}
}
}
PX-Store requires a disk to create persistent volumes on, this could be any kind of block storage presented to worker nodes, be that via local storage, direct attached storage or a shared device such as a SAN, . . .
Note: the lsblk
command returns one or more block devices that persistent volumes can be created on
Deploying and Testing The Module
Full instructions for deploying this module can be found here on GitHub. To prepare for deploying the module via:terraform apply -target=module.px_store -auto-approve
there are three basic steps that need to be followed:
- Irrespective of whether you have used the module to create the VMware virtual machines to underpin the Kubernetes cluster or you are using something different, say NUCs in your home lab for example, the starting point is to ensure that each worker node host in the cluster has block storage for creating persistent volumes on, again – lsblk is your friend here.
- Log into PX-Central and create a spec for PX Essentials, the process for doing this is fully documented here.
- Edit the variables.tf file under
Arc-PX-VMware-Faststart/modules/px_store
(refer to the HCL excerpt below):
–use_csi
determines whether or not a CSI compatible storage class will be created
–px_repl_factor
specifies the number of replicas per persistent volume, a minimum value of two is recommended
–px_spec
paste the URL generated from PX Central here
–use_stork
this invokes the storage aware scheduler, if the underlying infrastructure uses local storage, i.e. solid state drives, PCI cards or spinning disks housed in the same chassis as the server CPU resources, stork will endeavor to ensure that pods are co-located on the same node hosts that their persistent volumes reside on
–storage_class
the name of the storage class that will be created
variable "use_csi" {
default = false
}
variable "px_repl_factor" {
default = "2"
}
variable "px_spec" {
description = "PX spec URL"
type = string
default = "https://install.portworx.com/2.6?mc=false&kbver=1.19.7&b=true&j=auto&c=px-cluster-942c7da3-d540-4b79-b391-b74453c1da43&stork=true&csi=true&st=k8s"
}
variable "use_stork" {
description = "Boolean variable to determine whether or not the stork scheduler should be used"
default = true
}
variable "storage_class" {
description = "Kubernetes storage class name"
type = string
default = "portworx-sc"
}
Now that the necessary preparation has taken place to deploy the module, lets walk through the module resource-by-resource:
portworx
Portworx (PX-Store) is deployed and we wait until all the Portworx pods are in a ready state, to cater forterraform destroy
a destroy provisioner is provided
resource "null_resource" "portworx" {
provisioner "local-exec" {
command =<<EOF
kubectl apply -f $PX_SPEC
until [ $(kubectl get po -n kube-system | egrep '(stork|px|portworx)' | \
awk '{ s += substr($2,1,1); t +=substr($2,3,1) } END { print s - t}') -eq 0 ]; do
echo "."
echo "Waiting for Portworx PX-Store pods to be ready"
echo "."
kubectl get po -n kube-system | egrep '(stork|px|portworx)'
sleep 10
done
echo " "
PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status
EOF
environment = {
PX_SPEC = var.px_spec
}
}
provisioner "local-exec" {
command=<<EOF
curl -fsL https://install.portworx.com/px-wipe | bash
kubectl label nodes --all px/enabled=remove --overwrite
until [ $(kubectl get po -n kube-system | egrep '(stork|px|portworx)' | wc -l) -eq 0 ]; do
echo "."
echo "Waiting for Portworx PX-Store pods to be destroyed"
echo "."
kubectl get po -n kube-system | egrep '(stork|px|portworx)'
sleep 10
done
echo " "
kubectl delete -f "https://install.portworx.com?ctl=true&kbver=$(kubectl version --short | awk -Fv '/Server Version: /{print $3}')"
kubectl label nodes --all px/enabled-
EOF
when = destroy
}
}
kubernetes_scheduler
Set the scheduler to stork – for storage awareness, if theuser_stork
variable is set to true:
resource "null_resource" "kubernetes_scheduler" {
count = var.use_stork ? 1 : 0
provisioner "local-exec" {
command = <<EOF
kubectl get deployment stork -n kube-system -o yaml | perl -pe "s/--webhook-controller=false/--webhook-controller=true/g" | kubectl apply -f -
EOF
}
depends_on = [
null_resource.portworx
]
}
storage_class_spec
This resource creates a manifest for the creation of a storage class:
resource "local_file" "storage_class_spec" {
content = templatefile("${path.module}/templates/storage_class.tpl", {
provisioner = var.use_csi ? "pxd.portworx.com" : "kubernetes.io/portworx-volume"
px_repl_factor = var.px_repl_factor
})
filename = "storage-class.yml"
depends_on = [
null_resource.kubernetes_scheduler
]
}
storage_class
Finally we create the actual storage class object in the Kubernetes cluster:
resource "null_resource" "storage_class" {
provisioner "local-exec" {
command = "kubectl apply -f ./storage-class.yml"
}
depends_on = [
local_file.storage_class_spec
]
}
Once the module has been deployed via:terraform apply -target=module.px_store -auto-approve
create a persistent volume using the test-pvc.yaml
file provided in the px_store
directory, this assumes that the storage class is called portworx-sc:
kubectl apply -f test-pvc.yaml
Finally, check that the persistent volume claim has been created:
kubectl get pvc
the persistent volume claim should be present and in a bound state:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-pvc Bound pvc-c977cf68-6fcc-4515-a04d-18650875f0f7 1Gi RWO portworx-sc 7s
Coming Up In Part 7
Following the logical order of components in the stack, the next thing to look at is the px_backup module – however in order to walk through this we need something to actually backup in the first place, and this is why part 7 in this series will cover Azure Arc enabled Data Services, part 8 will cover Big Data Cluster deployment . . . and then once we have something to backup, part 9 will cover the px_backup module.
One thought on “Deploying Azure Data Services via Terraform Part 6: Deploying a Storage Solution to The Kubernetes Cluster”