MySQL online schema change options
There are two truths that collide in this post:
- Schema changes are a fact of life.
- Many schema changes are blocking in MySQL, which reduces data availability.
Since Data Availability is a key role for today’s Data Guardian, what can we do about it?
Keep reading to find out!
Why schema changes are necessary
Whether you use a relational database or a “schemaless” database, you will eventually have to handle changing schema definition.
The reason for this is that requirements and features change over time.
For schemaless databases like MongoDB, changing the schema is likely limited to adding or removing indexes as query patterns evolve.
For relational databases, the structure itself changes by adding a new column, removing a column or changing the datatype.
And putting a little thought up front will prevent changes from happening too often.
Which is a good thing, because in MySQL, many of those changes are blocking.
Why online schema changes are necessary
If you have a table of any significant size and you want to modify the definition, chances are good that the ALTER TABLE operation in MySQL will block.
This means that query requests against that table will wait for the ALTER to complete.
This would result in your Availability crashing to the floor, if there weren’t ways to make the change in a non-blocking method.
It gets even worse if you use relational concepts such as foreign keys, or if your application relies on downstream replicas to serve fresh data for read traffic.
Online schema change options for MySQL
The good news is there are options for non-blocking alters in MySQL to help the Data Guardian keep their data available.
Online DDL
Recent versions of MySQL allow some non-blocking alters directly. This is called Online DDL.
There are many operations that can’t be done online still today, but these options are being improved with each major version
Online DDL still has significant limitations, though.
The main concerns are higher load on the database server that can’t be throttled, additional space used for a copy of the table, and blocking replication as the change gets applied on replica servers resulting in large replication lag.
pt-online-schema-change from Percona
The next option that I’ve used comes from the Percona Toolkit: pt-online-schema-change.
This is an external tool that will create a copy of your table, adjust the schema, and begin copying rows from the original to the copy until it has copied all rows.
To keep the table fresh from changes to rows that have already been copied, triggers are set up.
Once the copy is complete, then you can schedule the swap of the old table. This swap can cause issues, so try to schedule it during lower traffic periods.
This tool still has the limitation of double disk usage during the copy operation. But it reduces the issue of replication lag and high load that the Online DDL option has.
The changes made on the source server are replicated to the replicas fast because they are small batches of work.
The copy process can be throttled lower during high demand on the database server, which Online DDL can’t do.
But the performance impact can never be removed completely because of the triggers that exist on the old table.
gh-ost from Github
The final tool that I will mention is gh-ost, the online schema tool from Github.
Gh-ost also creates a copy of the original table with the new structure, called a ghost table.
It also copies rows from the original table to the ghost table.
Similar to pt-online-schema-change, it requires double the disk space and the events that are applied avoid large replication lag at the end of the process.
But instead of creating triggers on the master, gh-ost will connect to the binary log and modify the ghost table directly.
This allows the schema change to be throttled completely, removing any performance impact on the server. And the change can continue during low traffic periods from where it left off.
Gh-ost requires row-based replication, which can add significant disk usage to your binary logs if you aren’t using that currently.
One way to solve this on your source database is to run gh-ost as a replica to a replica.
This is actually the preferred architecture, according to github. And this way you can run your source database in MIXED or even Statement based replication to avoid issues with larger binary logs.
Conclusion
Schema changes are a fact of life for any database environment.
Even schemaless databases will require index modifications on any collection or table of significant size.
In MySQL, many of those changes are blocking.
And blocking of queries will destroy any Data Availability objective the Data Guardian has.
MySQL offers Online DDL for some operations, but the change can’t be throttled and can lead to large replication lag issues.
There are several external online schema change tools, but the two I’ve highlighted today are pt-online-schema-change and gh-ost, because I’ve used both.
While both options reduce load and most issues with replication lag, all options require double disk space for the table copy operation.
Which tool to use depends on your needs.
I’m curious what issues you have run into with schema changes? Leave a comment below!