One of the most significant things to change the landscape for Azure data professionals will be general release of Azure Arc enabled Data Services. To provide an expedient means of experiencing all that Azure Arc has to offer, Microsoft has come up with Jumpstart – a collection of GitHub repos for deploying Arc in different scenarios. Last Christmas I had a few vacation days and took the opportunity to try out Jumpstart for Azure Arc enabled data services on AWS. AWS was my choice because it made a certain amount of sense to try out Azure Managed SQL Server instances and Postgres Hyperscale on a cloud that they are not natively available on. After all, the whole point of Azure Arc enabled Data Services is to bring Azure to you on your terms if for any reason you cannot use the Azure cloud. In general the whole experience was a good one, I was able to stand things up and also try out Terraform for the first time. I then decided to increase the number of resources available to my EKS cluster – and then the not so great bit came, my AWS bill. What to do ?, I wanted to continue to use Azure Arc enabled Data Services, but in a more cost efficient manner. It happens that I have access to a lab at work with a reasonable amount of compute, storage and VMware vSphere. So what started out initially as a learning exercise culminated in something I hope allows everyone and anyone who has access to vSphere to stand up not only an Azure Arc enabled Data Services controller, but the entire technology stack. And, because Terraform is virtually the de facto standard for provisioning cloud resources, plus Jumpstart uses it, Terraform seemed like the way to go. Behold, the fruits of my labor can be found in the form of the Arc-PX-VMware-Faststart repository on GitHub.
Everyone Likes Free, Right ?
Providing you have an infrastructure that will support physical or virtual machines with Ubuntu as the guest OS, the entire stack is free to provision:
Everything is provided to support the creation of the virtual machines that underpin a Kubernetes cluster, all the way to deploying Azure Arc enabled Data Services Controllers and SQL Server 2019 Big Data Clusters. The Terraform configurations are divided up into modules, such that – for example if you want to use Hyper-V or NUCs, you can deploy Kubernetes on what you already have.
In the main I have strived to automate things as much as possible, however, a modicum of work is required to create a virtual machine template, should you elect to stand things up on VMware:
Coming Up . . .
This first post is a taster of what is to come, which will include:
- An introduction to Terraform
- A blog-post-by-blog-post commentary for each Terraform module
- Posts on ‘Roadmap’ for this work, including the deployment of Azure managed SQL Server instances, Postgres Hyperscale and more improved flexibility when deploying things such as big data clusters
My aim is not just simply to walk people through what I have done, but why I have done things in a particular way and the Terraform, Kubernetes and Azure Data Services background behind each module.