Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 1: Hyper-V Virtual Machine Creation

This blog post is the first in a series detailing how to build a Kubernetes cluster to deploy a SQL Server 2019 big data cluster to. For the purposes of learning and on-boarding there are three options:

  • Minikube
    A single node cluster that runs on Windows, Linux or MacOs
  • A full blown vanilla Kubernetes deployment
  • Kubernetes-as-a-service from a public cloud provider

Minikube is a good learning tool and Microsoft provides instructions for deploying a big data cluster to this ‘Platform’. However, its single node nature and the fact that application pods run on the master node means that this does not reflect a cluster that anyone would run in production. Kubernetes-as-a-service is probably by far the easiest option for spinning a cluster up, however it relies on an Aws, Azure or Google Cloud Platform account, hence there is a $ cost associated with this. This leaves a vanilla deployment of Kubernetes on premises. Based on the assumption that most people will have access to Windows server version 2008 or above, a relatively cheap and way of deploying a Kubernetes cluster is via Linux virtual machines running on Hyper-V. This blog post will provide step by step instructions for creating the virtual machines to act as the master and worker nodes in the cluster. The cluster we will build will consist of five nodes in total:

  • Two masters
  • Three worker nodes

etcd is a light weight highly resilient key value store database that originates from CoreOs. The CoreOs administration guide document states that a minimum of 3 etcd members (instances) are required in order for the cluster to tolerate failures, as quoted verbatim from the documentation vis:

cluster size

We will therefore go with 3 etcd members.

Addendum 10th January 2019: Virtual Machine Resource Requirements

Since writing this post the resource requirements for worker nodes in the cluster have been clarified by Microsoft thus:

  • The cluster requires a minimum of three worker nodes (one virtual machine per worker node = three in total).
  • Each worker node virtual machine requires a minimum of 32GB of memory, I’ve managed to stand up a big data cluster with 16GB of memory per worker node, however, 32GB is the recommended minimum value from Microsoft.
  • Each worker node virtual machine requires a minimum of 8 logical processors.
  • The minimum root partition size for the Linux operating system on the worker nodes (or the docker local file system – for basic implementations, the two are same) is 100GB. You may however wish to make this larger in order to give yourself some extra breathing space. 

Virtual Machine Creation Overview

This will be divided up into three steps:

  1. Creation of a base virtual machine.
  2. Use of Hyper-V export and import to replicate the base virtual machine in order that we end up with six virtual machines we require:
    • Two virtual machines for the cluster’s master nodes
    • Three virtual machines to act as worker nodes
    • One virtual machine to boot strap the cluster from using Kubespray
    • There will be one etcd instance per master one and one instance on one of the worker nodes.
  3. Assigning of static ip address to each machine.

Base Virtual Machine Creation

1. From Hyper-V virtual machine manager select -> New Virtual Machine.

2. The virtual machine creation wizard should start, select next and give the machine a location, in this example we will use D:\Virtual Machines. You can use any location you wish, however its probably best to avoid using the system drive:3. Select “Generation 2” for the type of the machine and then hit next:
4. Select 2048 MB for the amount of memory the virtual machine will use and hit next:5. Select the Intel Gigabit NIC as the adapter from the connection pull down list and hit next:
6. On the connect virtual hard disk, leave the options as they are and hit next7. In the installation options section, select the second radio button which is to create the image from a bootable ISO and then browse to the location of the ISO, for the purposes of this example Ubuntu 16.04 (server version) will be used, hit next and then finish:
8. On the summary screen hit Finish:
9. Right click on the virtual machine (it should be in a state of Off) select settings -> Security and then un-check the “Enable secure boot” box. Hit apply and then Ok:
10. Right click on the virtual machine, select connect and and start the virtual machine.Hit the option to “Install Ubuntu Server”:
11. Hit the language option for “English – English”, or whatever language is appropriate for your region:
12. Select the appropriate region, I’ll use United Kingdom:
13. Go through the key board detection sequence, in this example it should detect that ‘My’ keyboard has a GB layout:
14. The installer should spend a few seconds or so scanning the cd-rom (your bootable ISO) and then it will determine an IP-V6 address for the virtual machine via DCHP.14.  At the hostname prompt give the virtual machine a name, you can go with whatever you wish, however its probably not a bad idea to give it the same name as the actual virtual machine, in this example I will use ca-K8s-base, hit return to continue:
15. Enter a username for the user you wish to create, I will be incredibly imaginative and go with cadkin and hit return, however, choose something as appropriate that is meaningful to you:
16. Enter the password for your user, hit enter and enter it again to confirm it, hit enter:17. Select ‘No’ for the option to encrypt the home directory, were this a production environment, then you might like to select ‘Yes’, but for the purposes of this exercise ‘No’ is fine, hit enter:

18. The installer will now request that you confirm your timezone, based on what I have entered in the example this should be “Europe/London”, hit enter, you can change this if required:19. Partition disks time, select “Guided – use entire disk” – the first option, hit enter:

20. Select the disk to partition, there just be one, hit enter:

21. Hit enter to “Finish partitioning and save changes to disk”:

22. Select Yes to write the changes to disk:
23. Installation of the operating system will now begin. Part way through the installation at roughly 83% you will be prompted for http proxy information, should you need this to access the outside world, enter a proxy if required, otherwise leave the text box blank and hit continue:

24. When prompted for the option of selecting automatic update, select “No automatic updates”:

25. On the Software selection screen select “OpenSSH server” and then hit continue:25. Addendum 11th January 2019: supervisord container not staying up
Various people have reported that despite the fact they have managed to deploy big data clusters successfully on both hyper-v and on Azure, they have encountered problems with Kubernetes clusters running on vmware. The specific problem is that the supervisord container will not stay up. There is an anecdotal belief that this relates to the use of loopback storage when using vmware. The fix is to upgrade Linux 16.04 to the latest version of the kernel, the apt-get line in bold below helps to do exactly that.

Your machine will go through its boot sequence after which you will be prompted to login via the user that you created in step 16. Once your machine is backup issue the following commands:

sudo apt-get install --install-recommends linux-generic-hwe-16.04
sudo apt update sudo apt upgrade sudo reboot sudo apt autoremove

26. Addendum 27th February 2019: kubeadm ethtool dependency

The latest version of kubespray which the next post in this series features, leverages kubeadm, this has a dependency on the ethtool package, install ethtool as follows:

sudo apt-get install ethtool
sudo apt update

27. . The base image now complete.

28. Export the virtual machine by right clicking on the virtual machine that has been created in the preceding steps, select export, enter a path and hit export, I will use D:\templates\ubuntu16.04:

Cloning The Base Virtual Machine

1. Navigate to the directory containing the virtual machine that has just been exported, and note the full path of the vmcx file:
2. Run the following powershell script:

Import-VM -Path 'D:\templates\ubuntu16.04\ca-K8s-base\Virtual Machines\<vmcx file>' -VhdDestinationpath 'D:\Virtual Machines\ca-K8s-m01' -Copy -GenerateNewId
Rename-Vm -Name ca-K8s-base -NewName ca-K8S-m01
Import-VM -Path 'D:\templates\ubuntu16.04\ca-K8s-base\Virtual Machines\<vmcx file>' -VhdDestinationpath 'D:\Virtual Machines\ca-K8s-m02' -Copy -GenerateNewId
Rename-Vm -Name ca-K8s-base -NewName ca-K8S-m02
Import-VM -Path 'D:\templates\ubuntu16.04\ca-K8s-base\Virtual Machines\<vmcx file>' -VhdDestinationpath 'D:\Virtual Machines\ca-K8s-w01' -Copy -GenerateNewId
Rename-Vm -Name ca-K8s-base -NewName ca-K8S-w01
Import-VM -Path 'D:\templates\ubuntu16.04\ca-K8s-base\Virtual Machines\<vmcx file>' -VhdDestinationpath 'D:\Virtual Machines\ca-K8s-w02' -Copy -GenerateNewId
Rename-Vm -Name ca-K8s-base -NewName ca-K8S-w02
Import-VM -Path 'D:\templates\ubuntu16.04\ca-K8s-base\Virtual Machines\<vmcx file>' -VhdDestinationpath 'D:\Virtual Machines\ca-K8s-w03' -Copy -GenerateNewId
Rename-Vm -Name ca-K8s-base -NewName ca-K8S-w03
Import-VM -Path 'D:\templates\ubuntu16.04\ca-K8s-base\Virtual Machines\<vmcx file>' -VhdDestinationpath 'D:\Virtual Machines\ca-K8s-boot' -Copy -GenerateNewId
Rename-Vm -Name ca-K8s-base -NewName ca-K8S-boot

3. The virtual machines suffixed with m01 and m02 will be the master nodes for our cluster, those with the w01, w02, and w03 suffixes the worker nodes and finally the machine with boot in its name is the machine from which we will boot strap the cluster via kubespray. At present the memory for each of the worker virtual machines should be 2GB, change this to 16GB for each of them.

4. At present each virtual machine will use dchp, the configuration for this is stored in the file /etc/network/interfaces and will look something like this:

# The loopback network interface
auto lo
iface lo inet loopback

# The loopback network interface
auto eth0
iface eth0 inet dhcp

The contents of this file needs to be altered to look like this:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static

dns-nameservers XXX.XXX.XXX.XXX

If for any reason a dns name server is not accessible or available, host name to ip address mappings can be defined in the /etc/hosts file. Refer to the Ubuntu hosts file manual page for specific information on the format of this file.

Note: the white space on the start of each line for address, netmask, gateway and dna-nameservers should be a single tab character. 

5. Change the hostname of the machine:

sudo hostnamectl set-hostname NEW_NAME_HERE

6. Reboot the machine for static ip address change to take effect:

sudo reboot


7. In order that the host name for each virtual machine resolves back to the address of for each host, add a line to the /etc/hosts file. For example, for the 1 st master virtual machine, assuming this is called ca-K8s-m01, the first two lines of /etc/hosts file should look like:


Coming Up In Part 2

Part 2 of this series will take a walk through deploying the tools necessary to boot strap the cluster to the boot strap virtual machine and the process of cluster creation via kubespray.


3 thoughts on “Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 1: Hyper-V Virtual Machine Creation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s