Deploying Azure Data Services via Terraform Part 7: Deploying an Azure Arc enabled Data Services Controller

Part seven of this series focuses on deploying an Azure Arc enabled Data Services controller to a Kubernetes cluster. As per the closing comments of the last blog post, PX Backup will be covered in part 9 of the series, when we will have a big data cluster to backup and restore.

All the content in this series of blog posts relates to this GitHub repo, the content for this post relates to the azure_arc_ds_controller module specifically.

Where Were We ?

If you have been following this series, you should have:

  • a basic understanding of Terraform
  • a Kubernetes cluster that you can connect to using kubectl
  • a basic understanding of Kubernetes services
  • a working metalLB load balancer
  • a basic understanding of how storage works in the world of Kubernetes
  • a Kubernetes storage solution in the form of PX Store, alternatively you can use any solution (for the purposes of this series) which supports persistent volumes, however to use the backup solution in part 9 of the series you will need to use something that supports CSI

Azure Arc enabled Data Services – What Is It ?

Azure Arc is the linchpin of Microsoft’s hybrid and multi-cloud strategy and it encompasses:

  • Arc for servers
  • Arc for Kubernetes
  • Azure Arc enabled SQL Server
  • Arc enabled Data Services

The focus of this series of blog posts is on Azure Arc enabled Data Services, simply put (at the time of writing this post) it allows you to run Azure Managed SQL Server instances and PostgresSQL Hyperscale wherever you can run Kubernetes, be that on-premises or in the public cloud. Microsoft’s ethos behind Azure Arc enabled Data Services is to bring Azure to people who for whatever reason cannot make the trip to Azure. I strongly suspect that more data platforms currently only found in Azure will make their way onto the Azure Arc enabled Data Services platform in the fulness of time.

Azure Arc enabled Data Services Architecture 101

The Azure Arc enabled Data Services stack comprises of three fundamental layers:

  • A Kubernetes cluster
    This can be AWS, EKS, GKE, Red Hat OpenShift, VMware Tanzu or vanilla Kubernetes deployed via kubeadm . . . and you should know by now that Kubespray – Ansible playbooks that invoke kubeadm, is my preferred method for deploying vanilla Kubernetes. To quote Kelsey Hightower: “Kubernetes is a platform for building platforms”, in this specific instance the endgame platform is Azure Arc enabled Data Services.
  • The Data Controller
    The controller extends the Azure Resource Manager to the Kubernetes cluster(s) that underpins the Azure Arc enabled Data Services stack, it sends telemetry metrics and logging to Azure. The Data Controller furnishes the following services and capabilities:
    • A controller service,
    • An API Endpoint,
    • provisioning management,
    • management and performance dashboards,
    • metrics,
    • logging,
    • managed backup/restore,
    • high-availability service coordination.
  • Data Services
    This is the top most layer of the stack, at the time of writing this post, PostgresSQL Hyperscale and Azure Arc managed SQL Instances can be deployed to a data controller. The relationship between a data services instance and a controller is 1:1, but a data controller can be associated with multiple instances, i.e. the relationship is M:1. For a multi-tenant database-as-a-service platform, each tenant, as in a group associated with a particular set of end users, would have a dedicated controller.

Drilling Down Into The Data Controller

The data controller can be deployed with one of two connectivity modes:

  • Indirect mode
    There is no direct connection between the data controller and Azure, data uploaded to Azure has to be instigated manually via azdata. The focal point for all management and deployment functions is the data controller itself. This deployment scenario is intended for “Dark sites”, organizations such as banks and federal organizations who tend to limit connectivity to the public cloud as much as possible.
  • Direct mode
    The data controller is constantly connected to Azure via the Azure Arc enabled Kubernetes agent and Azure itself becomes the control plane. Metrics and logs are constantly delivered to Azure and the full range of tools that can leverage the ARM API can be used. At the time of writing this post, direct mode is not supported in the public preview of Azure Arc enabled Data Services.

In order to deploy a data controller, the following are all required:

  • An Azure subscription
  • An Azure resource group
  • A Kubernetes cluster
  • Storage class that supports persistent volumes
  • An Azure region that the controller should talk to, at present only eastus supports Azure Arc enabled
    Data Services
  • The data controller Azure connectivity mode, direct or indirect, at present only indirect is supported
  • An Azure AD service principal and role assignments – the Terraform configuration will create these for you

Deploying and Testing The Module

This section assumes that the client used for the testing and deployment of the azure_arc_ds_controller is Windows based. The Terraform configuration comprises of:

  • resources that create the Azure service principal and role assignments
  • a null resource to install azdata if it is not already present that also wraps calls to azdata in order to create the actual data controller, once a Helm chart is available for deploying the data controller, the calls to azdata will be replaced with a Helm resource

Full instructions for deploying the module – this essentially involves plugging values into the variables.tf file and executing terraform apply can be found here, once this has been done, the data controller can be tested as follows:

  1. Install kubectl for Windows, instructions for how to do this can be found here.
  2. Create a .kube directory under your windows home directory.
  3. Copy the Kubernetes admin.conf file onto your Windows machine from a control plane node:

    scp <username>@<control plane node IP address>:/etc/kubernetes/admin.conf C:\Users\myuser\.kube\config
  4. Check cluster connectivity:

    kubectl cluster-info

    The output from this should like similar to this:

    Kubernetes control plane is running at https://192.168.123.88:6443

    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

  5. Log into the controller from the same machine that you intend to run Azure Data Studio from, a windows laptop in this instance, instructions for installing azdata can be found here:

    azdata login

    azdata will prompt for the namespace that the controller resides in, enter it here:

    Namespace: arc-ds-controller

    then enter the azdata username:

    Username: azuser

    and finally, enter the azdata user password:

    Password:

    all being well a message similar to this one should appear:

    Logged in successfully to https://192.168.123.94:30080 in namespace arc-ds-controller. Setting active context to arc-ds-controller.

    Note the URL that this command returns, as this is required in step 8.
  6. Start Azure Data Studio (the installer for which can be found here), install the Azure Arc extension:

7. Click on Connect Controller in the bottom left hand corner of the screen:

8. Enter the connection details for the data controller in the pane on the right hand side of the screen, for the
Controller URL enter the URL obtained in step 5 and then hit Connect in the bottom right hand corner:

9. The Azure Arc Enabled Data Services controller should appear in the bottom left hand corner of the screen:

10. Click on the three horizontal dots to the right of CONNECTIONS in the top left part of the screen, select
New Deployment from the floating menu that appears, click on the radio button for Azure SQL managed
instance and finally click on Select in the bottom right hand corner of the screen:

11. From the Resource Type pulldown menu select Azure SQL managed instance – Azure Arc (preview), click the
check box for accepting Microsoft’s terms and conditions and click on Next in the bottom right hand corner:

12. Complete the details on Deploy Azure SQL managed instance parameters screen, storage classes for data and
logs can be selected from the pulldown lists of values, hit Deploy after all the value have been entered:

13. Enter the azdata user password and hit OK:

14. A notebook will appear with containing the Python code for deploying the managed instance, each cell in the
notebook will execute automatically:

15. Finally, right click on the data controller, select refresh, once the controller has refreshed, click on the down arrow
to the left of it and the Azure Arc Managed SQL instance should appear, sqlinstance1 in the bottom left
hand corner of the screen in this example:

Coming Up In Part 8

The next blog post in this series will cover the deployment of a Big Data Cluster.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s