Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 2: Kubernetes Cluster Creation

Part 1 of this series covered the creation of the virtualized infrastructure for creating a Kubernetes cluster on. There are a variety of tools for building clusters, including Kops, Kubespray and Kubeadm. Kubeadm is perhaps the most popular tool for boot strapping clusters followed by Kubespray.

Kubespray

In essence Kubespray is a bunch of Ansible playbooks; yaml file that specify what actions should take place against one or more machines specified in a hosts.ini file, this resides in what is known as an inventory. Of all the infrastructure as code tools available at the time of writing, Ansible is the most popular and has the greatest traction. Examples of playbooks produced by Microsoft can be found on GitHub for automating tasks in Azure and deploying SQL Server availability groups on Linux. The good news for anyone into PowerShell is that PowerShell modules can be installed via Ansible and PowerShell commands can be executed via Ansible. Also, there are people already using PowerShell desired state configuration with Ansible. Ansible’s popularity is down to the facts it is easy to pick up and agent-less because it relies on ssh, hence why one of the steps in this post includes the creation of keys for ssh. This free tutorial is highly recommended for anyone wishing to pick up Ansible.

Cluster Topology Recap

The cluster this blog post will cover the creation of comprises of the following nodes and etcd instances. Note that any alphabetic characters used have to be lower case, you can use whatever naming convention you like, below is the naming convention I have elected to go with:

two master nodes:

ca-k8s-m01

ca-k8s-m02

three worker nodes:

ca-k8s-w01

ca-k8s-w02

ca-k8s-w03

three etcd instances, residing on:

ca-k8s-m01

ca-k8s-m02

ca-k8s-w01

the cluster will be deployed and administered from ca-k8s-boot.

Deploying Big Data Clusters On vmware

The process outlined in this blog post has been used to create Kubernetes clusters on vmware, ESXi 6.7 to be exact, which have then been used to deploy big data clusters on with success.

Cluster Creation

  1. Add entries for the two master and three worker nodes in the /etc/hosts file on the boot strap virtual machine
  2. Check that each machine can be pinged from the bootstrap machine in order to verify that the networking setup is sane.
  3. Create a key pair on the boot strap machine, I’ve called this ca-k8s-boot in my example:
    ssh-keygen
  4. This will result in the following output, hit enter to accept the default:
    Generating public/private rsa key pair.
    Enter file in which to save the key (/your_home/.ssh/id_rsa)
  5. Enter a pass phrase and confirm this, this should result in output that looks similar to this:
  6. We now need to copy the public key element of the key pair to the master and worker node virtual machines via ssh-copy-id:
    ssh-copy-id <username>@<ip address>

    after issuing this a prompt will request confirmation of the fact that you actually want to copy the public key followed by a prompt requesting the password of the user entered:

  7. Cache the passphrase on the boot strap machine by entering the following commands:
    ssh-agent /bin/bash
    ssh-add ~/.ssh/id_rsa

    a prompt will appear requesting that the pass phrase used in step 4 be entered.

  8. Update the apt packages on each virtual machine, if errors are encountered stating that a repository cannot be resolved, for example:
    Could not resolve 'security.ubuntu.com'

    you are likely to have a name resolution issue, if this is the case, perform the following steps to rectify the problem:

    sudo vi /etc/resolvconf/resolv.conf.d/base

    add the following lines to the file:

    nameserver 8.8.8.8
    nameserver 8.8.8.4
    nameserver <ip address of a DNS name server on your network>

    issue:

    sudo reboot

    then issue:

    sudo apt-get-update
  9. Download Kubespray:
    git clone https://github.com/kubernetes-incubator/kubespray.git
  10. Install the package manager for Python 3:
    sudo apt-get install python3-pip
  11. Addendum 1st January 2019
    In the working directory that the git clone was performed from, change directory to the kubernetes directory and install the python packages specified in the requirements.txt file. This pulls in all the python packages and software required for the rest of the Kubernetes cluster creation process (including Ansible which is used in the next step):

    pip3 install -r requirements.txt
  12. With the kubernetes directory as your working directory, create a copy of the sample inventory directory for your cluster:
    cp -rfp inventory/sample inventory/<cluster_name>
  13. In the directory you created as part of the previous step under inventory, edit the hosts.ini file.the hosts.ini file under the inventory directory created for the cluster should look like this:
    [all]
    ca-k8s-m01 ansible_host=XXX.XXX.XXX.03 ip=XXX.XXX.XXX.03
    ca-k8s-m02 ansible_host=XXX.XXX.XXX.04 ip=XXX.XXX.XXX.04
    ca-k8s-w01 ansible_host=XXX.XXX.XXX.05 ip=XXX.XXX.XXX.05
    ca-k8s-w02 ansible_host=XXX.XXX.XXX.06 ip=XXX.XXX.XXX.06
    ca-k8s-w03 ansible_host=XXX.XXX.XXX.07 ip=XXX.XXX.XXX.07
    
    [kube-master]
    ca-k8s-m01
    ca-k8s-m02
    
    [etcd]
    ca-k8s-m01
    ca-k8s-m02
    ca-k8s-w01
    
    [kube-node]
    ca-k8s-w01
    ca-k8s-w02
    ca-k8s-w0w
    
    [k8s-cluster:children]
    kube-master
    kube-node
    
    [calico-rr]
    
    [vault]
    ca-k8s-m01
    ca-k8s-m02
    ca-k8s-w01
    
    [all:vars]
    ansible_python_interpreter=/usr/bin/python3
  14. Test that Ansible can connect to each host by issuing the following command:
    ansible -i inventory/<cluster_name>/hosts.ini all -m ping

    providing ssh has been configured correctly, the Ansible ping command should result in output which looks something similar to this:

    ca-k8s-w03 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    }
    ca-k8s-m02 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    }
    ca-k8s-w02 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    }
    ca-k8s-w01 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    }
    ca-k8s-m01 | SUCCESS => {
        "changed": false,
        "ping": "pong"
    }
  15. Add the following line to the ansible.cfg file in the kubernetes directory in the [defaults] section:
    log_path = ./kubespray.log
  16. run playbook using the following command from within the kubernetes directory:
    ansible-playbook -i inventory/<cluster_name>/hosts.ini --become --become-user=root --ask-sudo-pass cluster.yml
  17. Open up the kubespray.log file in vi, perform:
    ctrl+[
    :set nowrap
    shift+G

    issuing these commands should disable line wrapping and display the end of the file. If the playbook has run successfully the values associated with unreachable and failed in the play recap section of the log should all be zero:

Kubectl Installation and Configuration

Kubectl is the primary tool for administering a Kubernetes cluster and deploying applications to it. This section will cover off installing and configuring the tool:

  1. On the boot strap virtual machine: ca-k8s-boot update all apt packages:
    sudo apt-get update
  2. Install snap:
    sudo apt-get install snapd
  3. Install kubectl:
    sudo snap install kubectl --classic
  4. Under the home directory of the Kubernetes administration user (I’ve gone with cadkin) create a directory to hold the kubectl config file:
    mkdir ~/.kube
  5. Log onto one of the master node virtual machines, ca-k8s-m01 for example, and change permissions on the Kubernetes admin.conf file as follows:
    sudo chmod 775 /etc/kubernetes/admin.conf
  6. On the boot server; ca-k8s-w01 copy the admin.conf file to the .kube directory:
    sudo scp cadkin@ca-k8-m01:/etc/kubernetes/admin.conf ~/.kube/config
  7. Check that kubectl has picked up the context of the cluster in the configuration file:
    kubectl config get-contexts

    the output from this command should look something like this:

  8. Lets take a look at the system pods:
    kubectl get po --all-namespaces

Coming Up In Part 3

At present all communications between the cluster and the outside world, colloquially referred to as “North south traffic” is handled by something known as a node port. By default we have to make our own provision for load balancing this traffic, luckily there is an incredibly simple way to achieve this which will be covered in the next post in this series.

One thought on “Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 2: Kubernetes Cluster Creation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s