Part 1 of this series covered the creation of the virtualized infrastructure for creating a Kubernetes cluster on. There are a variety of tools for building clusters, including Kops, Kubespray and Kubeadm. Kubeadm is perhaps the most popular tool for boot strapping clusters followed by Kubespray.
Kubespray
In essence Kubespray is a bunch of Ansible playbooks; yaml file that specify what actions should take place against one or more machines specified in a inventory.ini file, this resides in what is known as an inventory. Of all the infrastructure as code tools available at the time of writing, Ansible is the most popular and has the greatest traction. Examples of playbooks produced by Microsoft can be found on GitHub for automating tasks in Azure and deploying SQL Server availability groups on Linux. The good news for anyone into PowerShell is that PowerShell modules can be installed via Ansible and PowerShell commands can be executed via Ansible. Also, there are people already using PowerShell desired state configuration with Ansible. Ansible’s popularity is down to the facts it is easy to pick up and agent-less because it relies on ssh, hence why one of the steps in this post includes the creation of keys for ssh. This free tutorial is highly recommended for anyone wishing to pick up Ansible.
Cluster Topology Recap
The cluster this blog post will cover the creation of comprises of the following nodes and etcd instances. Note that any alphabetic characters used have to be lower case, you can use whatever naming convention you like, below is the naming convention I have elected to go with:
two master nodes:
ca-k8s-m01
ca-k8s-m02
three worker nodes:
ca-k8s-w01
ca-k8s-w02
ca-k8s-w03
three etcd instances, residing on:
ca-k8s-m01
ca-k8s-m02
ca-k8s-w01
the cluster will be deployed and administered from ca-k8s-boot.
Deploying Big Data Clusters On VMware
The process outlined in this blog post has been used to create Kubernetes clusters on vmware, ESXi 6.7 to be exact, which have then been used to deploy big data clusters on with success.
Cluster Creation
Disclaimer
Kubespray is open-source software, as such it is subject to change, the information in this blog post is correct according to the version of Kubespray that was available on the 10th April 2020.
- Add entries for the two master and three worker nodes in the /etc/hosts file on the boot strap virtual machine
- Check that each machine can be pinged from the bootstrap machine in order to verify that the networking setup is sane.
- Create a key pair on the boot strap machine, I’ve called this ca-k8s-boot in my example:
ssh-keygen
- This will result in the following output, hit enter to accept the default:
Generating public/private rsa key pair. Enter file in which to save the key (/your_home/.ssh/id_rsa)
- Enter a pass phrase and confirm this, this should result in output that looks similar to this
- We now need to copy the public key element of the key pair to the master and worker node virtual machines via ssh-copy-id:
ssh-copy-id <username>@<ip address>
after issuing this a prompt will request confirmation of the fact that you actually want to copy the public key followed by a prompt requesting the password of the user entered:
- Cache the passphrase on the boot strap machine by entering the following commands:
ssh-agent /bin/bash ssh-add ~/.ssh/id_rsa
a prompt will appear requesting that the pass phrase used in step 4 be entered.
- Update the apt packages on each virtual machine, if errors are encountered stating that a repository cannot be resolved, for example:
Could not resolve 'security.ubuntu.com'
you are likely to have a name resolution issue, if this is the case, perform the following steps to rectify the problem:
sudo vi /etc/resolvconf/resolv.conf.d/base
add the following lines to the file:
nameserver 8.8.8.8 nameserver 8.8.8.4 nameserver <ip address of a DNS name server on your network>
issue:
sudo reboot
then issue:
sudo apt-get-update sudo apt-get-upgrade
- Download Kubespray:
git clone https://github.com/kubernetes-sigs/kubespray.git
- Install the package manager for Python 3:
sudo apt-get install python3-pip
- Addendum 1st January 2019
In the working directory that the git clone was performed from, change directory to the kubernetes directory and install the python packages specified in the requirements.txt file. This pulls in all the python packages and software required for the rest of the Kubernetes cluster creation process (including Ansible which is used in the next step):pip3 install -r requirements.txt
- Install the sshpass package:
sudo apt-get install sshpass
- With the kubernetes directory as your working directory, create a copy of the sample inventory directory for your cluster:
cp -rfp inventory/sample inventory/<cluster_name>
- In the directory you created as part of the previous step under inventory, edit the hosts.ini file.the inventory.ini file under the inventory directory created for the cluster should look like this:
[all] ca-k8s-m01 ansible_host=XXX.XXX.XXX.03 ip=XXX.XXX.XXX.03 etcd_member_name=etcd1 ca-k8s-m02 ansible_host=XXX.XXX.XXX.04 ip=XXX.XXX.XXX.04 etcd_member_name=etcd2 ca-k8s-w01 ansible_host=XXX.XXX.XXX.05 ip=XXX.XXX.XXX.05 etcd_member_name=etcd3 ca-k8s-w02 ansible_host=XXX.XXX.XXX.06 ip=XXX.XXX.XXX.06 ca-k8s-w03 ansible_host=XXX.XXX.XXX.07 ip=XXX.XXX.XXX.07 [kube-master] ca-k8s-m01 ca-k8s-m02 [etcd] ca-k8s-m01 ca-k8s-m02 ca-k8s-w01 [kube-node] ca-k8s-w01 ca-k8s-w02 ca-k8s-w0w [k8s-cluster:children] kube-master kube-node calico-rr
- Test that Ansible can connect to each host by issuing the following command:
ansible -i inventory/<cluster_name>/inventory.ini all -m ping
providing ssh has been configured correctly, the Ansible ping command should result in output which looks something similar to this:
ca-k8s-w03 | SUCCESS => { "changed": false, "ping": "pong" } ca-k8s-m02 | SUCCESS => { "changed": false, "ping": "pong" } ca-k8s-w02 | SUCCESS => { "changed": false, "ping": "pong" } ca-k8s-w01 | SUCCESS => { "changed": false, "ping": "pong" } ca-k8s-m01 | SUCCESS => { "changed": false, "ping": "pong" }
- Add the following line to the ansible.cfg file in the kubernetes directory in the [defaults] section:
log_path = ./kubespray.log
- run playbook using the following command from within the kubernetes directory:
ansible-playbook -i inventory/<cluster_name>/inventory.ini --become --become-user=root -K cluster.yml
- Open up the kubespray.log file in vi, perform:
ctrl+[ :set nowrap shift+G
issuing these commands should disable line wrapping and display the end of the file. If the playbook has run successfully the values associated with unreachable and failed in the play recap section of the log should all be zero:
Kubectl Installation and Configuration
Kubectl is the primary tool for administering a Kubernetes cluster and deploying applications to it. This section will cover off installing and configuring the tool:
- On the boot strap virtual machine: ca-k8s-boot update all apt packages:
sudo apt-get update
- Install snap:
sudo apt-get install snapd
- Install kubectl:
sudo snap install kubectl --classic
- Under the home directory of the Kubernetes administration user (I’ve gone with cadkin) create a directory to hold the kubectl config file:
mkdir ~/.kube
- Log onto one of the master node virtual machines, ca-k8s-m01 for example, and change permissions on the Kubernetes admin.conf file as follows:
sudo chmod 775 /etc/kubernetes/admin.conf
- On the boot server; ca-k8s-w01 copy the admin.conf file to the .kube directory:
sudo scp cadkin@ca-k8-m01:/etc/kubernetes/admin.conf ~/.kube/config
- Check that kubectl has picked up the context of the cluster in the configuration file:
kubectl config get-contexts
the output from this command should look something like this:
- Lets take a look at the system pods:
kubectl get po --all-namespaces
Configuring a Storage Plugin
In order that our cluster can use persistent volumes, a storage plugin is required – the ultimate aim that a Kubernetes storage class is created that can be plugged into the bdc.json configuration file for a big data cluster. Any storage plugin can be used that supports the following storage protocols: block; iSCSI or FC, NFS 4.2 or SMB 3.0. vSphere cloud provider from VMware is also an option for anyone who’s Kubernetes nodes are virtualized via VMware.
For the purposes of this blog post, I will cover storage plugin for Pure Storage, this comes in the form of a helm chart, this is deployed as follows:
- Install the multipath-tools package on each worker node:
sudo apt-get install multipath-tools
- Install Helm 3.0, note that unlike previous versions of Helm, this does not require tiller (the server side element of Helm):
sudo snap install helm --classic
- Download the YAML file template for configuring the plugin:
curl --output pso-values.yaml https://raw.githubusercontent.com/purestorage/helm-charts/master/pure-csi/values.yaml
- Edit the pso-values.yaml file using the text editor of your choice, uncomment out lines 89 through to 92 and enter the ip address of a FlashArray along with an API token:
arrays: FlashArrays: - Mgmt Endpoint: "192.168.4.125" APIToken: "1426d275-fj34-ed1a-a072-65f07c1b390b"
- Syntax check the pso-values.yaml file.YAML can be particularly fussy when it comes to indentation, therefore its not a bad idea to run the pso-values.yaml through something to validate its contents, such as http://www.yamllint.com/.
- Add the Pure chart repository to Helm:
helm repo add pure https://purestorage.github.io/helm-charts helm repo update
- Dry run the installation of the Pure storage plugin:
helm install pure-storage-driver pure/pure-csi --namespace kube-public -f pso-values.yaml --dry-run --debug
- Assuming that the dry run of the install returns no errors, perform the actual install:
helm install pure-storage-driver pure/pure-csi --namespace kube-public -f pso-values.yaml
- Sanity test the environment by creating a simple persistent volume claim. The storage plugin in this example creates a persistent volume automatically, therefore there is no need to create a persistent volume separately, first create a yaml file (call this test-pvc.yaml) with the following contents, as before getting the indentation correct is critical:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: pure-block
- Create the persistent volume claim:
kubectl apply -f test-pvc.yaml
- Check that the persistent volume claim exists and has a status of bound:
kubectl get pvc
- Clean-up by deleting the persistent volume claim:
kubectl delete pvc test-pvc
Coming Up In Part 3
At present all communications between the cluster and the outside world, colloquially referred to as “North south traffic” is handled by something known as a node port. By default we have to make our own provision for load balancing this traffic, luckily there is an incredibly simple way to achieve this which will be covered in the next post in this series.
One thought on “Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 2: Kubernetes Cluster Creation”