Kubernetes For The Microsoft Data Platform Professional: Dipping Your Toe In The Water With Minikube

In the previous post the scene was set for why container orchestration is required, what Kubernetes is and why the world of open source should be entered into with ones eyes wide open. This post will cover a popular learning tool for Kubernetes in the form of minikube. Minikube is a single node Kubernetes ‘Cluster’ that can run on windows, Linux or MacOS. Full documentation for minikube can be found here. For the purposes of this post minikube will be run on Windows 10 and Hyper-V. The simplest way to install minikube is to use chocalatey:

choco install minikube

The hyper-v feature needs to be enabled for windows and an external switch created in the Hyper-v Manager. With these prerequisites in place, a cluster can now be created:

minikube start --memory=3250 --vm-driver="hyperv" --hyperv-virtual-switch="<switch name goes here>"

Once the cluster is running, which may take several minutes, check the status minikube install and the health of the cluster via:

minikube status

and

kubectl cluster-info

respectively, vis:

cluster-health.PNG

A Word Of Caution !

If for any reason you need to power down the machine minikube is running on, shutdown minikube first by issuing minikube stop, otherwise you might find that the next time your machine is powered on, your cluster will not start up properly. If this happens, remove the cluster using minikube delete and then re-create it.

What Minikube Gives Us

To quote the Kubernetes documentation, minikube provides a single node Kubernetes cluster with the following features:

  • DNS
  • NodePorts
  • ConfigMaps and Secrets
  • Dashboards
  • Container Runtime: Docker, rktCRI-O and containerd
  • Enabling CNI (Container Network Interface)
  • Ingress

we also get a built in load balancer, a default installation of kubectl, the standard tool for managing application deployments to a cluster and an automatic storage provisioner.

The previous post mentioned that OpenShift from RedHat is platform-as-a-service (PaaS) software built on top of Kubernetes. OpenShift’s equivalent of minikube is minishift. Kubectl can still be used against OpenShift, but OpenShift’s PaaS extensions can only be leveraged via the command line tool: oc. Refer to this documentation for the differences between oc and kubectl. An official Kubernetes kubectl cheat sheet can be found here.

Deploying SQL Server To Minikube

Once the cluster is up, lets deploy a stand alone SQL Server instance to it, availability groups will be covered at a later date. To do this several different types of object will need to be created:

  • Storage objects; persistent volume claims, these will lead to the automatic creation of a persistent volumes.
  • A secret to hold the password for the sa account.
  • A deployment, this embodies the SQL Server instance.
  • A service, this provides the means of accessing the instance via an ip address and port from outside the cluster.

Storage

The storage ecosystem tends to be one of the least well understood areas of Kubernetes. The touch point for storage in an application (pods) is the volume. A volume in turn requires that a persistent volume claim is ‘Bound’ to a persistent volume. When using most of the popular public cloud providers you should find that when you request a persistent volume claim, the Kubernetes-as-a-service platform creates persistent volumes for you. When using Kubernetes on premises storage provisioning the presence of automatic storage provisioning will depend entirely on the provisioner you are using. In the case of manual provisioning, persistent volumes will need to be created manually prior to the creation of persistent volume claims. For production grade installations of Kubernetes on premises the following points should be headed:

  1. Avoid loop back storage, the performance of this tends to be poor, RedHat discourages the use of this type of storage for OpenShift as per this article.
  2. Local volumes should be avoided for most types of applications, refer to this article from the official Kubernetes documentation.
  3. All cluster state is stored in etcd; a key value store database that originated from CoreOs. Production grade installation of Kubernetes should have at least two etcd instances residing on highly durable and available storage.
  4. For the most “Cloud like” experience, prefer storage provisioners that furnish automatic provisioning.
  5. Prefer provisioners from vendors who have made a commitment to supporting the container storage interface.

The excerpt below is yaml for creating a persistent volume claim:

apiVersion: v1 
kind: PersistentVolumeClaim 
metadata: 
  name: "pvc0001" 
  spec: 
    accessModes: 
    - "ReadWriteOnce" 
    resources: 
      requests: storage: "5Gi"

The yaml above can be placed in a text file, this will be called pvc0001.yaml for example purposes and the persistent volume claim is created as follows:

kubectl apply -f pvc0001.yaml

Once created the persistent volume claim can be inspected by issuing:

kubectl describe pvc pvc0001

the output from running this command should look similar to this:

pvc.PNG

This tells us that a volume has been created dynamically to satisfy the persistent volume claim (last line in screen capture) and that the claim has been ‘Bound’ to this volume (third line from top). For purposes that will become clear later on, create the exact same persistent volume claim, but this time with the name pvc0002.

Password Management

Next a password needs to be created for the account used to access the instance, this requires that an object called a ‘Secret’ is created:

kubectl create secret generic mssql --from-literal=SA_PASSWORD="MyC0m9l&xP@ssw0rd"

The Kubernetes documentation can be quite dry, however it does an excellent job of explaining secrets, to be found here. A secret created from a literal has been used here for the purposes of brevity. For production purposes you may wish to consider using files or a password / token / vertificate safe type product that integrates with Kubernetes,  Vault by Hashicorp (link to tweet here) is one such product that enables this:

vault.PNG

Deploying A SQL Server Instance

A deployment in the context of Kubernetes provides declarative updates to pods and replica sets. If the pods and replica sets specified in the deployment do not exist, they will be created. Below is the deployment to be used for creating our SQL Server instance. The password for the sa user is provided via the secret created from earlier. Under the covers Kubernetes mounts a temporary file system for the pod, copies the secret to the file system by which the secret is made available to the container inside the pod.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mssql-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: mssql
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mssql
        image: microsoft/mssql-server-linux
        ports:
        - containerPort: 1433
          securityContext:
          privileged: true
        env:
        - name: ACCEPT_EULA
          value: "Y"
        - name: SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mssql
              key: SA_PASSWORD
      volumeMounts:
      - name: mssqldb
        mountPath: /var/opt/mssql
    volumes
    - name: mssqldb
      persistentVolumeClaim:
        claimName: pvc0001

To create the deployment run:

kubectl apply -f deployment.yaml

Lets now have a look at the deployment in detail:

kubectl describe deployment mssql-deployment

note the persistent volume that its highlighted in the yellow box:

dep-desc1.png

Lets now change the persistent volume claim name in the deployment (contents of deployment.yaml) to pvc0002, and apply the deployment.yaml file again:

kubectl apply -f deployment.yaml

issuing:

kubectl describe deployment mssql-deployment

now reveals that the persistent volume claim in the deployment has changed to pvc0002, as highlighted below:

dep-desc2.png

The point of this exercise being to illustrate how deployments can be used to change the nature of our pods.

Connecting To Instance From The Outside World

To connect to the instance from outside the Kubernetes cluster we need to create a service. For vanilla Kubernetes running on premises or on infrastructure-as-a-service via a cloud provider, there are two options for making port 1433 accessible to the outside world:

  • Use a node port
  • Create a load balancer

A good blog post that covers the different means of getting traffic in to and out of a Kubernetes cluster can be found here . For the simplest means of creating a load balancer refer to metallb (and the simplest way to configure this is to use the layer 2 method). To create a service for the deployment, the following yaml will be used:

apiVersion: v1
kind: Service
metadata:
  name: mssql-deployment
spec:
  selector:
    app: mssql
  ports:
    - protocol: TCP
      port: 1433
      targetPort: 1433
  type: LoadBalancer

If this is saved in a service.yaml file, it can be created like so:

kubectl apply -f service.yaml

Note that in the yaml specification for the service, port 1433 is an internal port within the pod and target port is an internal port outside of the pod, but inside the cluster. The external ip address and port is obtained via the following command:

minikube service mssql-deployment --url

Now for the moment of truth, connecting to the actual instance, fire up SQL Server management studio, there is no reason why Azure data studio should not work also. Use sa, the password from the secret as login credentials. For the server name use the ip address returned from the minikube command above followed by a comma and then the port number:

ssms.PNG

Next Time . . .

I have various ideas for where to take this to next, availability group clusters is an obvious topic. Perhaps not so obvious is the use of Helm, a package manager for Kubernetes and how this can be leveraged in the world of SQL Server. Finally, I have some code written in GOLANG. GO, to use its abbreviated name, is the language used to implement Docker, Kubernetes and Terraform. The code I have at present can query a SQL Server database after being instigated from a REST endpoint and render the results in Json. The idea I’m toying with is having a soup-to-nuts style blog post / or series of posts that goes from writing this code, packaging it up into container image form and then deploying it to a Kubernetes cluster. Watch this space . . .

 

2 thoughts on “Kubernetes For The Microsoft Data Platform Professional: Dipping Your Toe In The Water With Minikube

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s