Infrastructure as Code
for Database Workloads

As a database administrator, you may be used to manually installing and managing your database infrastructure.

Perhaps you have heard of a concept called Infrastructure as Code.

Perhaps your organization demands it.

Today I talk about why Database Administrators and Data Guardians need to be familiar with Infrastructure as Code!

Video: Infrastructure as Code for Database Workloads

What is infrastructure as code?

Infrastructure as Code, or IaC, is pretty much exactly what it sounds like.

It’s defining your infrastructure in terms of configuration files and scripts.

Or, as our friend Wikipedia puts it:

Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.

In a nutshell, these files can be used to create, modify or delete your infrastructure.

Why should database administrators care?

Maybe you have a single, simple production database.

Perhaps you are even using a cloud service like CloudSQL to handle your operational tasks.

And all you have to do is click a few buttons in the web UI and you’re set.

If this describes you, then being a DBA probably isn’t your full time job.

But if you aspire to be a serious DBA in today’s industry, you will at least have separated your database out into pre-production and production.

Perhaps you even have some development environments.

And maybe one day you will need to work with sharded data environments.

As your data infrastructure grows, you will want to learn more about Infrastructure as Code.

Being able to define your infrastructure in files that are placed in a repository will help you standardize configurations and operations automation.

Repositories are version control systems (VCS) allow you to track historical changes to files.

If you are not familiar with working with git or git-like variants, you need to learn.

Running with infrastructure as code concepts from the beginning will save you a lot of time managing your data environment.

Getting started with Infrastructure as Code

This post won’t get into the details of everything you need, but I will provide some basic ideas in getting started.

First of all, you will need a place to store files.

As mentioned earlier, this should be a VCS repository like Github. Any git-like service will do.

Once you have a place to store files, you’ll want to know what type of files to store.

If you are just getting started, I recommend two types of tools for IaC.

The first is a provisioning tool. Provisioning tools will take your file definitions and create the infrastructure as you defined it. It will also handle modifications and removal.

There are many such tools out there. You are likely working with virtual machines (VMs) or cloud services, so I recommend starting with Terraform from Hashicorp.

If you are not working with VMs or cloud services, let me know in the comments and I can provide you a different recommendation.

The second tool you will want is an orchestration tool.

These tools basically take your files and set up your newly provisioned infrastructure the way it needs to be.

Configuration sometimes requires executing tasks in a specific order across multiple machines.

For example, you need a MySQL primary set up and running before you can configure a replica.

That’s why it should be an orchestration tool.

These tools can also provision, but I find its best to use the right tool for the job.

And the orchestration tool I recommend is Ansible.

In the event that you are running containerized database workloads on something like Kubernetes, these tool recommendations won’t make much sense.

Conclusion

Unless you manage a very static and very simple data environment, Data Guardians will eventually need to learn infrastructure as code concepts and tools.

And the sooner the better, or you may be automated out of a job!

I provide three types of tools and a recommendation for each type based on what I’ve experienced in the industry.

They are either open source or enable open source projects, and all have a thriving community.

There are other tools of course.

But the important thing is that Data Guardians need to define their data infrastructure as code.

Because even though automation is evil, you need to do it anyway!

Let me know in the comments if you’ve experienced this. I’d love to know more!

1 Comment

Comments are closed