Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 2: Kubernetes Cluster Creation

Part 1 of this series covered the creation of the virtualized infrastructure for creating a Kubernetes cluster on. There are a variety of tools for building clusters, including Kops, Kubespray and Kubeadm. Kubeadm is perhaps the most popular tool for boot strapping clusters followed by Kubespray.


In essence Kubespray is a bunch of Ansible playbooks; yaml file that specify what actions should take place against one or more machines specified in a inventory.ini file, this resides in what is known as an inventory. Of all the infrastructure as code tools available at the time of writing, Ansible is the most popular and has the greatest traction. Examples of playbooks produced by Microsoft can be found on GitHub for automating tasks in Azure and deploying SQL Server availability groups on Linux. The good news for anyone into PowerShell is that PowerShell modules can be installed via Ansible and PowerShell commands can be executed via Ansible. Also, there are people already using PowerShell desired state configuration with Ansible. Ansible’s popularity is down to the facts it is easy to pick up and agent-less because it relies on ssh, hence why one of the steps in this post includes the creation of keys for ssh. This free tutorial is highly recommended for anyone wishing to pick up Ansible.

Cluster Topology Recap

The cluster this blog post will cover the creation of comprises of the following nodes and etcd instances. Note that any alphabetic characters used have to be lower case, you can use whatever naming convention you like, below is the naming convention I have elected to go with:

two master nodes:



three worker nodes:




three etcd instances, residing on:




the cluster will be deployed and administered from ca-k8s-boot.

Deploying Big Data Clusters On VMware

The process outlined in this blog post has been used to create Kubernetes clusters on vmware, ESXi 6.7 to be exact, which have then been used to deploy big data clusters on with success.

Cluster Creation

Kubespray is open-source software, as such it is subject to change, the information in this blog post is correct according to the version of Kubespray that was available on the 10th April 2020.

  1. Add entries for the two master and three worker nodes in the /etc/hosts file on the boot strap virtual machine
  2. Check that each machine can be pinged from the bootstrap machine in order to verify that the networking setup is sane.
  3. Create a key pair on the boot strap machine, I’ve called this ca-k8s-boot in my example:
  4. This will result in the following output, hit enter to accept the default:
    Generating public/private rsa key pair.
    Enter file in which to save the key (/your_home/.ssh/id_rsa)
  5. Enter a pass phrase and confirm this, this should result in output that looks similar to this
  6. We now need to copy the public key element of the key pair to the master and worker node virtual machines via ssh-copy-id:
    ssh-copy-id <username>@<ip address>

    after issuing this a prompt will request confirmation of the fact that you actually want to copy the public key followed by a prompt requesting the password of the user entered:

  7. Cache the passphrase on the boot strap machine by entering the following commands:
    ssh-agent /bin/bash
    ssh-add ~/.ssh/id_rsa

    a prompt will appear requesting that the pass phrase used in step 4 be entered.

  8. Update the apt packages on each virtual machine, if errors are encountered stating that a repository cannot be resolved, for example:
    Could not resolve ''

    you are likely to have a name resolution issue, if this is the case, perform the following steps to rectify the problem:

    sudo vi /etc/resolvconf/resolv.conf.d/base

    add the following lines to the file:

    nameserver <ip address of a DNS name server on your network>


    sudo reboot

    then issue:

    sudo apt-get-update
    sudo apt-get-upgrade
  9. Download Kubespray:
    git clone
  10. Install the package manager for Python 3:
    sudo apt-get install python3-pip
  11. Addendum 1st January 2019
    In the working directory that the git clone was performed from, change directory to the kubernetes directory and install the python packages specified in the requirements.txt file. This pulls in all the python packages and software required for the rest of the Kubernetes cluster creation process (including Ansible which is used in the next step):

    pip3 install -r requirements.txt
  12. Install the sshpass package:
    sudo apt-get install sshpass
  13. With the kubernetes directory as your working directory, create a copy of the sample inventory directory for your cluster:
    cp -rfp inventory/sample inventory/<cluster_name>
  14. In the directory you created as part of the previous step under inventory, edit the hosts.ini file.the inventory.ini file under the inventory directory created for the cluster should look like this:
    ca-k8s-m01 ansible_host=XXX.XXX.XXX.03 ip=XXX.XXX.XXX.03 etcd_member_name=etcd1
    ca-k8s-m02 ansible_host=XXX.XXX.XXX.04 ip=XXX.XXX.XXX.04 etcd_member_name=etcd2 
    ca-k8s-w01 ansible_host=XXX.XXX.XXX.05 ip=XXX.XXX.XXX.05 etcd_member_name=etcd3 
    ca-k8s-w02 ansible_host=XXX.XXX.XXX.06 ip=XXX.XXX.XXX.06
    ca-k8s-w03 ansible_host=XXX.XXX.XXX.07 ip=XXX.XXX.XXX.07
  15. Test that Ansible can connect to each host by issuing the following command:
    ansible -i inventory/<cluster_name>/inventory.ini all -m ping

    providing ssh has been configured correctly, the Ansible ping command should result in output which looks something similar to this:

    ca-k8s-w03 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    ca-k8s-m02 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    ca-k8s-w02 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    ca-k8s-w01 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    ca-k8s-m01 | SUCCESS => {
        "changed": false,
        "ping": "pong"
  16. Add the following line to the ansible.cfg file in the kubernetes directory in the [defaults] section:
    log_path = ./kubespray.log
  17. run playbook using the following command from within the kubernetes directory:
    ansible-playbook -i inventory/<cluster_name>/inventory.ini --become --become-user=root -K cluster.yml
  18. Open up the kubespray.log file in vi, perform:
    :set nowrap

    issuing these commands should disable line wrapping and display the end of the file. If the playbook has run successfully the values associated with unreachable and failed in the play recap section of the log should all be zero:

Kubectl Installation and Configuration

Kubectl is the primary tool for administering a Kubernetes cluster and deploying applications to it. This section will cover off installing and configuring the tool:

  1. On the boot strap virtual machine: ca-k8s-boot update all apt packages:
    sudo apt-get update
  2. Install snap:
    sudo apt-get install snapd
  3. Install kubectl:
    sudo snap install kubectl --classic
  4. Under the home directory of the Kubernetes administration user (I’ve gone with cadkin) create a directory to hold the kubectl config file:
    mkdir ~/.kube
  5. Log onto one of the master node virtual machines, ca-k8s-m01 for example, and change permissions on the Kubernetes admin.conf file as follows:
    sudo chmod 775 /etc/kubernetes/admin.conf
  6. On the boot server; ca-k8s-w01 copy the admin.conf file to the .kube directory:
    sudo scp cadkin@ca-k8-m01:/etc/kubernetes/admin.conf ~/.kube/config
  7. Check that kubectl has picked up the context of the cluster in the configuration file:
    kubectl config get-contexts

    the output from this command should look something like this:

  8. Lets take a look at the system pods:
    kubectl get po --all-namespaces

Configuring a Storage Plugin

In order that our cluster can use persistent volumes, a storage plugin is required – the ultimate aim that a Kubernetes storage class is created that can be plugged into the bdc.json configuration file for a big data cluster. Any storage plugin can be used that supports the following storage protocols: block; iSCSI or FC, NFS 4.2 or SMB 3.0. vSphere cloud provider from VMware is also an option for anyone who’s Kubernetes nodes are virtualized via VMware.

For the purposes of this blog post, I will cover storage plugin for Pure Storage, this comes in the form of a helm chart, this is deployed as follows:

  1. Install the multipath-tools package on each worker node:
    sudo apt-get install multipath-tools
  2. Install Helm 3.0, note that unlike previous versions of Helm, this does not require tiller (the server side element of Helm):
    sudo snap install helm --classic
  3. Download the YAML file template for configuring the plugin:
    curl --output pso-values.yaml
  4. Edit the pso-values.yaml file using the text editor of your choice, uncomment out lines 89 through to 92 and enter the ip address of a FlashArray along with an API token:
      - Mgmt Endpoint: ""
        APIToken: "1426d275-fj34-ed1a-a072-65f07c1b390b"
  5. Syntax check the pso-values.yaml file.YAML can be particularly fussy when it comes to indentation, therefore its not a bad idea to run the pso-values.yaml through something to validate its contents, such as
  6. Add the Pure chart repository to Helm:
    helm repo add pure
    helm repo update
  7. Dry run the installation of the Pure storage plugin:
    helm install pure-storage-driver pure/pure-csi --namespace kube-public -f pso-values.yaml --dry-run --debug
  8. Assuming that the dry run of the install returns no errors, perform the actual install:
    helm install pure-storage-driver pure/pure-csi --namespace kube-public -f pso-values.yaml
  9. Sanity test the environment by creating a simple persistent volume claim. The storage plugin in this example creates a persistent volume automatically, therefore there is no need to create a persistent volume separately, first create a yaml file (call this test-pvc.yaml) with the following contents, as before getting the indentation correct is critical:
    apiVersion: v1
    kind: PersistentVolumeClaim
      name: test-pvc
      - ReadWriteOnce
          storage: 1Gi
        storageClassName: pure-block
  10. Create the persistent volume claim:
    kubectl apply -f test-pvc.yaml
  11. Check that the persistent volume claim exists and has a status of bound:
    kubectl get pvc
  12. Clean-up by deleting the persistent volume claim:
    kubectl delete pvc test-pvc

Coming Up In Part 3

At present all communications between the cluster and the outside world, colloquially referred to as “North south traffic” is handled by something known as a node port. By default we have to make our own provision for load balancing this traffic, luckily there is an incredibly simple way to achieve this which will be covered in the next post in this series.

One thought on “Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 2: Kubernetes Cluster Creation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s