Skip to content
  • About
  • Courses
  • Contact

Containerising Data Pipeline Components

  • by datatake
  • Posted on April 11, 2022
  • Data Pipelines
  • Featured

The last post in this series covered some simple Python code that leveraged twitter’s tweepy API in order to obtain tweets based on a query, sentiment score each tweet and then load these into an […]

Read More

Building S3 Data Pipelines with Python and Boto3

  • by datatake
  • Posted on March 30, 2022March 30, 2022
  • Data Pipelines
  • Featured

In my last post I outlined a number of architectural options for solutions that could be implemented in light of Microsoft retiring SQL Server 2019 Big Data Clusters, one of which was data pipelines that […]

Read More

Microsoft’s Analytics Road Map, SQL Server 2022 and S3 Based Data Lakes

  • by datatake
  • Posted on February 28, 2022March 1, 2022
  • Uncategorized
  • Featured

TL;DR This post presents some high-level architectural ideas for implementing Data Lakes using SQL Server 2022, specifically SQL Server 2022 S3 data virtualisation. Whilst SQL Server 2022 is under NDA, this post and subsequent posts […]

Read More

Kubernetes Is Not Enough

  • by datatake
  • Posted on April 22, 2021
  • kubernetes
  • Featured

Someone I know had worked at an organization that needed to scale out their OpenShift clusters/footprint, they were constrained by the speed of their procurement department and were wondering if they could get by with […]

Read More

Deploying Azure Data Services via Terraform Part 7: Deploying an Azure Arc enabled Data Services Controller

  • by datatake
  • Posted on April 5, 2021
  • Azure Arc enabled Data Services
  • Featured

Part seven of this series focuses on deploying an Azure Arc enabled Data Services controller to a Kubernetes cluster. As per the closing comments of the last blog post, PX Backup will be covered in […]

Read More

Deploying Azure Data Services via Terraform Part 6: Deploying a Storage Solution to The Kubernetes Cluster

  • by datatake
  • Posted on March 24, 2021March 25, 2021
  • Azure Arc enabled Data Services
  • Featured

Part six of this series will focus on deploying a storage solution to our Kubernetes cluster: Where Were We ? If you have been following this blog post series you should have: a basic grasp […]

Read More

Deploying Azure Data Services via Terraform Part 5: Deploying a Load Balancer to The Kubernetes Cluster

  • by datatake
  • Posted on March 19, 2021March 20, 2021
  • Azure Arc enabled Data Services
  • Featured

Our journey up the stack brings us to the installation of MetalLB – a software load balancer for Kubernetes: All the content in this series of blog posts relates to the Arc-PX-VMware-Faststart repo on GitHub, […]

Read More

Deploying Azure Data Services via Terraform Part 4: Deploying a Kubernetes Cluster

  • by datatake
  • Posted on March 14, 2021March 15, 2021
  • Azure Arc enabled Data Services
  • Featured

In the last post, part 3 of this series – we started off at the bottom of the stack with the Terraform module for virtual machine creation. We continue our journey up the stack in […]

Read More

Deploying Azure Data Services via Terraform Part 3: Deploying VMware Virtual Machines

  • by datatake
  • Posted on March 12, 2021March 14, 2021
  • Uncategorized
  • Featured

Part 3 of this series will begin the journey up the stack, starting with the deployment of the virtual machines that will host the Kubernetes cluster nodes: All the blog posts in this series relate […]

Read More

Deploying Azure Data Services via Terraform Part 2: An Introduction to Terraform

  • by datatake
  • Posted on March 10, 2021March 12, 2021
  • Azure Arc enabled Data Services
  • Featured

Before diving into what the various Terraform modules do that make up the Arc-PX-VMware-Faststart repo, I’m going to provide an introduction to Terraform in this blog post. Terraform comes from Hashicorp, it is a tool […]

Read More

Deploying Azure Data Services via Terraform Part 1: An Introduction

  • by datatake
  • Posted on March 9, 2021March 12, 2021
  • Azure Arc enabled Data Services
  • Featured

One of the most significant things to change the landscape for Azure data professionals will be general release of Azure Arc enabled Data Services. To provide an expedient means of experiencing all that Azure Arc […]

Read More

The Great Kubernetes Virtualization Debate

  • by datatake
  • Posted on July 2, 2020July 2, 2020
  • kubernetes
  • Featured

A source of some interesting discussions at work is whether or not Kubernetes nodes should be virtualized. The thesis behind why this is not a good idea, is the fact that a virtualized layer adds […]

Read More

SQL Server 2019 Big Data Clusters CU5 – Why OpenShift Matters

  • by datatake
  • Posted on June 24, 2020
  • big data clusters
  • Featured

CU5 for SQL Server 2019 Big Data Clusters ushers in support for Red Hat OpenShift Container Platform, this is a big deal – but what exactly is OpenShift and more saliently; why does it matter […]

Read More

Deploying Big Data Clusters Part 1: Planning

  • by datatake
  • Posted on May 14, 2020
  • big data clusters
  • Featured

This series of posts is essentially my presentation from the recent Data Weekender popup virtual conference – in blog form with some bonus additional content. This post focuses on the things you should consider before […]

Read More

Building A Sandbox Environment for A Big Data Cluster With microk8s

  • by datatake
  • Posted on April 5, 2020
  • Uncategorized
  • Featured

At work we are seeing a burgeoning demand for Kubernetes test and development environment, as such we have been looking at simple and rapid ways to provision clusters. I have already blogged about the use […]

Read More

Big Data Clusters: Bare Metal To Kubernetes

  • by datatake
  • Posted on March 31, 2020
  • Uncategorized
  • Featured

I have not blogged for a while, it was my hope to produce part 5 in the series of creating a Kubernetes cluster for production grade Big Data Clusters. However, there is a very good […]

Read More

Scaling Out CI Pipelines With Azure Devops, Docker and SQL Server

  • by datatake
  • Posted on September 2, 2019
  • Azure DevOps
  • Featured

The seed of the idea behind this blog post first germinated when I noticed the following yaml: Note the line that includes mergeTestResults, this got me thinking along the lines of running multiple tests in […]

Read More

Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters Part 4: Persistent Storage Concepts

  • by datatake
  • Posted on April 11, 2019April 19, 2019
  • Uncategorized
  • Featured

I was originally going to cover storage in its entirety in a single blog post. However, as storage and Kubernetes is the cause of a tremendous amount of confusion in the Microsoft data platform community, […]

Read More

Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 3: Big Data Cluster Creation

  • by datatake
  • Posted on February 5, 2019February 5, 2019
  • big data clusters
  • Featured

The previous post in this series covered Kubernetes cluster creation via Kubespray. It was my intention to cover off load balancing in this post, however at the time of writing when you create a SQL […]

Read More

Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 2: Kubernetes Cluster Creation

  • by datatake
  • Posted on December 29, 2018September 18, 2020
  • big data clusters
  • Featured

Part 1 of this series covered the creation of the virtualized infrastructure for creating a Kubernetes cluster on. There are a variety of tools for building clusters, including Kops, Kubespray and Kubeadm. Kubeadm is perhaps […]

Read More

Building A Kubernetes Cluster For SQL Server 2019 Big Data Clusters, Part 1: Hyper-V Virtual Machine Creation

  • by datatake
  • Posted on December 18, 2018September 18, 2020
  • SQL
  • Featured

This blog post is the first in a series detailing how to build a Kubernetes cluster to deploy a SQL Server 2019 big data cluster to. For the purposes of learning and on-boarding there are […]

Read More

Kubernetes For the Microsoft Data Platform Professional: Can I Run Windows Containers On Kubernetes ?

  • by datatake
  • Posted on November 25, 2018
  • Uncategorized
  • Featured

From a ‘Vanilla’ Kubernetes perspective; where all nodes in the cluster run on Linux, only containers based on Linux images can run. As of version 1.9 of Kubernetes, we are currently on 1.12 at the […]

Read More

Kubernetes For The Microsoft Data Platform Professional: Dipping Your Toe In The Water With Minikube

  • by datatake
  • Posted on November 19, 2018November 19, 2018
  • SQL
  • Featured

In the previous post the scene was set for why container orchestration is required, what Kubernetes is and why the world of open source should be entered into with ones eyes wide open. This post […]

Read More

Kubernetes For The Microsoft Data Platform Professional 101

  • by datatake
  • Posted on November 8, 2018November 9, 2018
  • Uncategorized
  • Featured

With the announcement of SQL Server 2019 big data clusters at Ignite, Kubernetes (often abbreviated to K8s) now stands front and center as part of Microsoft’s data platform vision. The obvious inference being that this […]

Read More

My 2018 SQL Server, Docker and Jenkins Presentation In Blog Form, Part III: Adding tSQLt Testing To The Mix

  • by datatake
  • Posted on October 5, 2018
  • Uncategorized
  • Featured

Where Were We ? In part I of this series I set the scene for why you would want to use docker and Jenkins for SQL Server continuous integration pipelines. The first post also covered […]

Read More

The Kubernetes Storage Eco-System For SQL Server Professionals

  • by datatake
  • Posted on September 24, 2018September 24, 2018
  • Uncategorized
  • Featured

There seems to a great deal of interest in containers and Kubernetes at present, fueled by Microsoft hinting that Kubernetes has a big part to play in the future of the Microsoft data platform: Of […]

Read More

My 2018 SQL Server, Docker and Jenkins Presentation In Blog Form, Part II

  • by datatake
  • Posted on June 25, 2018July 19, 2018
  • Uncategorized
  • Featured

In the first post in this series I covered why you might want to use Jenkins as a CI engine and how to deploy to SQL Server running in a container using the ‘Sidecar’ pattern. […]

Read More

My 2018 SQL Server, Docker and Jenkins Presentation In Blog Form, Part I

  • by datatake
  • Posted on June 22, 2018July 19, 2018
  • Uncategorized
  • Featured

The mainstay of my presentation material this year has been my deck on continuous integration, Docker and Jenkins. For people who have not had the chance to see this presentation or have seen it and […]

Read More

Scaling Out Singleton Insert Workloads Using Containers: Part II

  • by datatake
  • Posted on March 6, 2018March 6, 2018
  • Uncategorized
  • Featured

In the previous part of this blog post I discussed how containers could be used to scale out a singleton work load. Where as my attempts to get my experiments to work ran into difficulties […]

Read More

Scaling Out Singleton Insert Workloads Using Containers: Part 1

  • by datatake
  • Posted on January 1, 2018January 4, 2018
  • Uncategorized
  • Featured

I will forewarn readers of this blog post that this is ‘Conceptual’ in nature, due to the fact in my tests I was spinning up containers which then fell over with core dumps. Nonetheless, I […]

Read More

Creating A Docker Containerised Environment For SQL Server and Continuous Integration

  • by datatake
  • Posted on October 19, 2017
  • Uncategorized
  • Featured

This post covers building a simple continuous integration environment using Jenkins and SQL Server data tools which is fully containerised. There are two github repositories associated with this post, the first contains the files for […]

Read More

Jenkins Hybrid Windows Linux Build Pipelines For Docker and SQL Server

  • by datatake
  • Posted on September 8, 2017
  • Uncategorized
  • Featured

Consider a scenario in which you wish to use DACPACs, but you want to spin up SQL Server in a container on Linux (say Ubuntu) because you wish to forgo the cost of having to […]

Read More

Jenkins Multi-Branch Pipeline Builds Using Docker Containers and SQL Server

  • by datatake
  • Posted on August 31, 2017
  • Uncategorized
  • Featured

In this post I am going to demonstrate how to use one of Jenkins more powerful features ; its ability to create multi-branch build pipelines. Source Code Control and Branching 101 The very first step […]

Read More

Scale-able Windows Aggregate Functions With Row Store Objects

  • by datatake
  • Posted on July 24, 2017July 26, 2017
  • Uncategorized
  • Featured

In this post I will demonstrate how a neat trick brought to my attention by Niko Neugebauer can turn the processing of windowing functions from “Anti-scale” to scale-ability. First of all we need to create […]

Read More

Creating Continuous Integration Build Pipelines With Jenkins, Docker and SQL Server

  • by datatake
  • Posted on July 18, 2017July 20, 2017
  • Uncategorized
  • Featured

In the world of continuous integration and delivery where we might want to perform numerous builds a day. Docker is ideally suited for spinning up environments and then tearing them down afterwards in use cases […]

Read More

In-Memory Engine DURABILITY = SCHEMA_ONLY And Transaction Rollback

  • by datatake
  • Posted on July 17, 2017
  • Uncategorized
  • Featured

I was fortunate enough to be selected to speak at SQL Saturday Dublin, the talk I gave was on leveraging the in-memory engine, the basic flow of the presentation is thus: I ask the audience […]

Read More

The Fundamentals Of Processing A SQL Server Workload In A Scale-able Manner

  • by datatake
  • Posted on July 16, 2017October 18, 2017
  • Uncategorized
  • Featured

In this blog post I wanted to distill down the most fundamental points to consider when attempting to process a SQL Server workload in a scale-able manner. However, many of the principles I will outline […]

Read More

Ring Buffer Shock Absorb-er Pattern and SQL Server

  • by datatake
  • Posted on July 16, 2017July 17, 2017
  • Uncategorized
  • Featured

This post continues my work on the LMax disrupt or pattern, I’ve already covered this already, what I have not covered is: Spinlock profiling Wait statistic profiling How the in-memory engine now behaves with SQL […]

Read More

Automated SQL Server Data Tools Build Pipelines Using Jenkins and GIT

  • by datatake
  • Posted on April 25, 2017June 24, 2017
  • Uncategorized
  • Featured

The aim of this blog post is twofold, it is to explain how: A “Self building pipeline” for the deployment of a SQL Server Data Tools project can be implemented using open source tools A build pipeline […]

Read More

Is Storage A Solved Problem ?

  • by datatake
  • Posted on July 4, 2016July 4, 2016
  • Uncategorized
  • Featured

Every so often a post appears on linked-in that serves as an oasis of information in a desert of spam. The article I speak of is “Non-Volatile Storage: Implications of The Data Center’s Shifting Center”. Its a god […]

Read More

Super Scaling SQL Server with Virtualization

  • by datatake
  • Posted on May 15, 2016May 15, 2016
  • SQL
  • Featured

A question I received following my pre-conference training day at SQL Bits and during my post-conference training day in Poland was how my material relates to SQL Server running in a virtualized environment. As there are figures […]

Read More

The SQL Server 2016 Scheduler and Trace Flag 8008

  • by datatake
  • Posted on May 13, 2016May 13, 2016
  • Uncategorized
  • Featured

A lot of the work I have done over the last year has involved placing stress on the database engine via singleton inserts using a stored procedure that inserts rows in a loop under the […]

Read More

The In-Memory OLTP Engine and NUMA Foreign Memory Access: Part 1

  • by datatake
  • Posted on May 7, 2016May 8, 2016
  • Uncategorized
  • Featured

In my Saturday community day session at SQL Bits I mentioned an article by Linchi Shea from the sqlblog site in which he demonstrates that the overhead of 100% foreign memory access does not incur the […]

Read More

I’m Speaking: JOIN! Conference Poland

  • by datatake
  • Posted on April 7, 2016
  • Uncategorized
  • Featured

I will speaking at Join! Conference in Poland, on the Tuesday (May 10 th)  I will be presenting a regular session on leveraging memory in SQL Server and  on the Wednesday I will be delivering […]

Read More

SQL Server 2016 Multi Threaded Log Writer

  • by datatake
  • Posted on April 1, 2016July 24, 2018
  • Uncategorized
  • Featured

This is probably one of the most unheralded new features of SQL Server 2016 which gets but a single bullet point in the CSS engineers “It Just Runs Faster” blog post series. Querying sysprocesses and […]

Read More

Super Scaling Queues Using the LMax Disruptor Pattern And The In-Memory OLTP Engine

  • by datatake
  • Posted on January 18, 2016February 16, 2016
  • Uncategorized
  • Featured

The Story So Far The graph below represents the throughput we managed to get ( from a warm buffer cache ) from the legacy database engine and the help of an in memory table as a scale-able […]

Read More

SQL Bits XV: Advanced Techniques For Superscaling SQL Server

  • by datatake
  • Posted on January 4, 2016January 4, 2016
  • SQL
  • Featured

It is with great honor that I have the privilege of announcing that my pre-conference submission for SQL Bits XV has been accepted !. I will be putting on a days worth of training in […]

Read More

Super Scaling Queues Using the LMax Disruptor Pattern

  • by datatake
  • Posted on January 2, 2016February 11, 2016
  • Uncategorized
  • Featured

One of the many things I hope to get around to blogging about, time permitting, are the challenges of building “Web scale” platforms using SQL Server. Of the many challenges this presents, one is coming […]

Read More

Digging Into Parallel Query Scalability With Windows Performance Toolkit

  • by datatake
  • Posted on November 26, 2015December 1, 2015
  • Uncategorized
  • Featured

Following some feedback from my last blog post: . . . what is inhibiting scalability ? As the degree of parallelism is increased, CPU saturation should be achievable unless some contended resource is being waited […]

Read More

Anti Scale Patterns and The Repartition Streams Iterator In Row Mode

  • by datatake
  • Posted on October 22, 2015November 6, 2015
  • SQL
  • Featured

There is a type of behavior in the database engine which undermines scalability, this is when multiple threads contend for a single resource, contention on the page free space bit map is the example that […]

Read More

Working With Kubernetes Contexts

  • by datatake
  • Posted on March 22, 2022
  • kubernetes

kubectl is the defacto command line tool for administering Kubernetes clusters. Connecting to a cluster via kubectl requires a Kubernetes config file, this in turn contains one or more contexts. A context is simply a […]

Read More

Posts navigation

Older posts

Archives

  • April 2022
  • March 2022
  • February 2022
  • April 2021
  • March 2021
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • September 2019
  • April 2019
  • February 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • June 2018
  • March 2018
  • January 2018
  • December 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • April 2017
  • July 2016
  • May 2016
  • April 2016
  • January 2016
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • October 2014
  • September 2014
  • July 2014
  • June 2014
  • May 2014
  • December 2013
  • June 2013

Meta

  • Register
  • Log in
Blog at WordPress.com.
Blog at WordPress.com.
  • Follow Following
    • chrisadkin.io
    • Join 45 other followers
    • Already have a WordPress.com account? Log in now.
    • chrisadkin.io
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...