Containerising Data Pipeline Components

The last post in this series covered some simple Python code that leveraged twitter’s tweepy API in order to obtain tweets based on a query, sentiment score each tweet and then load these into an […]
The last post in this series covered some simple Python code that leveraged twitter’s tweepy API in order to obtain tweets based on a query, sentiment score each tweet and then load these into an […]
In my last post I outlined a number of architectural options for solutions that could be implemented in light of Microsoft retiring SQL Server 2019 Big Data Clusters, one of which was data pipelines that […]
TL;DR This post presents some high-level architectural ideas for implementing Data Lakes using SQL Server 2022, specifically SQL Server 2022 S3 data virtualisation. Whilst SQL Server 2022 is under NDA, this post and subsequent posts […]
Part seven of this series focuses on deploying an Azure Arc enabled Data Services controller to a Kubernetes cluster. As per the closing comments of the last blog post, PX Backup will be covered in […]
Part six of this series will focus on deploying a storage solution to our Kubernetes cluster: Where Were We ? If you have been following this blog post series you should have: a basic grasp […]
Our journey up the stack brings us to the installation of MetalLB – a software load balancer for Kubernetes: All the content in this series of blog posts relates to the Arc-PX-VMware-Faststart repo on GitHub, […]
In the last post, part 3 of this series – we started off at the bottom of the stack with the Terraform module for virtual machine creation. We continue our journey up the stack in […]
Part 3 of this series will begin the journey up the stack, starting with the deployment of the virtual machines that will host the Kubernetes cluster nodes: All the blog posts in this series relate […]
Before diving into what the various Terraform modules do that make up the Arc-PX-VMware-Faststart repo, I’m going to provide an introduction to Terraform in this blog post. Terraform comes from Hashicorp, it is a tool […]
One of the most significant things to change the landscape for Azure data professionals will be general release of Azure Arc enabled Data Services. To provide an expedient means of experiencing all that Azure Arc […]