Data on Kubernetes
How much do you know about the technology that powers your database?
I’m not talking about the database itself, though that is important.
I’m talking about the bits and bytes that allow the database to run.
Operating Systems like Linux are usually trusted, and many even see virtualization as boring technology.
But now there’s a new kid on the block.
She’s been widely acclaimed in the nearby neighborhood of applications for a few years.
But now her sights are set on the walled community of the database.
Sometimes known as K8s (pronounced Kates), she is Kubernetes.
Today I talk about the increasing trend to trust your Data on Kubernetes.
Kubernetes overview
If you haven’t heard of Kubernetes, you haven’t been paying attention to application development over the past few years.
I don’t pretend to have a great understanding of it myself.
What I know is that Kubernetes is a framework to define how your containers are deployed across your compute resources.
Wait. What are containers?
That’s tough to answer in a quick post like this.
But containers are processes that ‘contain’ everything needed to run a process like an application or a database. This includes the operating system.
And they do it in such a way to be lightweight and fast to deploy.
So containers are the next step beyond virtualization to allow you to efficiently scale your application on your servers.
And Kubernetes has become the default way to manage those containers across a fleet of servers.
Imagine you have 10 instances of your application and you are looking to provide high availability across multiple servers.
Kubernetes will help you schedule those application containers.
And if you need to quickly scale up to 20 instances for a spike in traffic, Kubernetes can do that too!
Data on Kubernetes
Kubernetes works well by moving containers across servers.
If it moves those containers to a different server, any data in memory will be lost.
And any data tied to the storage of the previous server will be lost, as far as the container is concerned.
That’s fine for applications. But it’s terrible for databases, or anything that requires saving of state.
I suspect your data is too critical to just allow it to disappear whenever Kubernetes decides to move your database container to a new server.
And that is exactly why adoption of running databases on containers in Kubernetes is not normal.
Yet.
Stateful containers are the final frontier
PerconaLive is one of the premier conferences for open source database technology.
If you want to know how some of the top companies in the world are solving data problems with open source technology, this is an event you want to attend.
At this year’s PerconaLive, the main thing that stood out to me was how many sessions there were on running some open source database on Kubenetes.
There were talks on running Postgres or MongoDB or Vitess on Kubernetes.
And there were definitely talks on what challenges you need to be aware of for running these stateful applications on Kubernetes.
Many of these were done by the Data on Kubernetes community.
Overall, if you watch these sessions, you might get the sense that the final frontier of data on Kubernetes will soon be a solved problem.
Conclusion
Unless you work at a company that pushes the envelope on bleeding edge technology, you probably are not close to running your data on Kubernetes.
But there are trailblazers that are paving the path for you when you finally get there.
I predict that containerized database environments will become status quo like virtualization before it.
So I recommend adding Kubernetes to your learning list if you are investing in your learning,.
If you are worried about wasting time chasing all of the latest fads, I wouldn’t worry.
Google originally developed Kubernetes and is now a CNCF project.
And you know it’s the standard when competitors like Amazon AWS provide support for it!
Are you currently exploring Kubernetes for your data environment?
If so, leave a comment to let me know how you like it.
Pingback: MySQL on Kubernetes Starter Guide - DistributedDBA
Pingback: Infrastructure as Code For Database Workloads - DistributedDBA