High Availability MySQL on Ubuntu 8.04
In my previous post I showed how to implement a high availability web tier using Heartbeat and Apache. If you followed that you're probably pretty much sorted for serving static webpages, but what about dynamic webpages that are database driven. How do we make sure that the database is protected against failure of one of our nodes?
Preparation
You'll need two boxes and three IP addresses. Again, I've used virtual machines from Xeriom Networks. I've firewalled them and opened the MySQL and Heartbeat ports so that the servers can communicate with each other but no one else can access them.
# On db-01
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-02.vm.xeriom.net -j ACCEPT
# On db-02
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-01.vm.xeriom.net -j ACCEPT
Your firewall rules should now look something like below, the important lines being those ending in tcp dpt:mysql
, udp dpt:mysql
and dpt:694
. The source
for those lines should be the node that you're not checking the firewall rules on eg db-01 should have rules opening ports for db-02, and db-02 should have rules opening ports for db-01.
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT udp -- db-01 anywhere udp dpt:694
ACCEPT tcp -- db-01 anywhere udp dpt:mysql
ACCEPT tcp -- db-01 anywhere tcp dpt:mysql
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh
All being well, save your firewall rules so they're restored at reboot.
sudo sh -c "iptables-save -c > /etc/iptables.rules"
For the purpose of this post, let's assume that the following IP addresses are available and assigned to the boxes in brackets.
- 193.219.108.241 - db-01 (db-01.vm.xeriom.net)
- 193.219.108.242 - db-02 (db-02.vm.xeriom.net)
- 193.219.108.243 - Not assigned
Start small
To begin with we'll install and configure MySQL for normal use on each of the boxes.
sudo apt-get install mysql-server --yes
Set a strong MySQL root password and wait for the packages to download and install, then edit /etc/mysql/my.cnf
to make MySQL listen on all IP addresses.
bind-address = 0.0.0.0
Now restart MySQL and fire up the MySQL command-line client to check all is good.
sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> \q
If you got the mysql>
prompt then MySQL is running. Try connecting to the other node across the network to see if the firewall is opened and MySQL is listening on the network interface.
mysql -h db-02.vm.xeriom.net -u root -p
Enter password: [enter the MySQL root password you chose earlier]
ERROR 1130 (00000): Host 'db-01' is not allowed to connect to this MySQL server
If you got the above error then everything is working fine - MySQL connected and refused to authorise the client. We'll create some valid accounts for this later. If you got a different error (such as the one below), check MySQL is running on both boxes and that the firewall rules are allowing connections from the correct hosts.
Can't connect to MySQL server on 'db-02' (10061)
One-way replication
The first thing we want to do is setup a simple master-slave replication to see that it's possible to replicate data from one database host to the other. This requires a binary log so tell MySQL on db-01 to keep one. Edit /etc/mysql/my.cnf
and set the following values under the replication section.
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M
binlog_do_db = my_application
binlog_ignore_db = mysql
binlog_ignore_db = test
On db-01 grant replication slave rights to db-02. Change some_password
to a real, strong password. Afterwards, make sure you restart MySQL.
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02.vm.xeriom.net' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart
Jump on to db-02 and set it up to replicate data from db-01 by editing /etc/mysql/my.cnf
, again replacing the hostname, username and password with the values for db-01.
server-id = 2
master-host = db-01.vm.xeriom.net
master-user = replication
master-password = some_password
master-port = 3306
One way replication should now be setup. Restart MySQL and check the status of the slave on db-02. If the Slave_IO_State is "Waiting for master to send event" then you've been successful.
# Run this on db-02 only
sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 193.219.108.241
Master_User: replication
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 98
Relay_Log_File: mysqld-relay-bin.000004
Relay_Log_Pos: 235
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 98
Relay_Log_Space: 235
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)
All being well it's time to test replication is working. We'll create the database we've configured replication for (my_application
) on db-01 and watch as it appears on db-02 as well.
# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;
There should be two - mysql
and test
.
# On db-01 only
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> create database my_application;;
# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;
The new database, my_application
should appear in the output of both nodes. Success! If it doesn't show on both nodes (it didn't for me the first time I set it up), here are some tips for finding out what's wrong.
Trouble-shooting one-way replication
If the slave status above doesn't show Slave_IO_State: Waiting for master to send event
, Slave_IO_Running: Yes
and Slave_SQL_Running: Yes
then something is wrong. This happened a few times while I was setting up replication - here's how I debugged it.
Telnet is one of the best tools in the world for debugging connectivity issues. If you haven't already, install it now.
sudo apt-get install telnet
SSH to the node that you want to check connectivity from (db-02) and telnet to the other node (db-01) on the MySQL port (3306).
# on db-02
telnet db-01.vm.xeriom.net mysql
The problem I encountered was ERROR 1130 (00000): Host 'db-02' is not allowed to connect to this MySQL server
. This happens when an incorrect hostname was used in the grant replication slave
query above. In my case I had granted access to clients using the full hostname (db-02.vm.xeriom.net) but MySQL looked in /etc/hosts
and found a short name (db-02). Run the grant replication slave
query again using the hostname given in the error message.
# on db-01
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart
Another problem I encountered was that the slave status remained "connecting to master" for a long time. If you can connect using telnet this is probably caused by the server-id
being the same on both servers. Check in /etc/mysql/my.cnf
and if necessary change the values and restart MySQL.
Master-master replication
The above setup will replicate data one-way, but if you happen to write to the slave (db-02) then at best the data stored in the databases will be inconsistent, and there's a large possibility that replication will fail from that point onwards.
Setting up the master database so that it replicates data back from the slave would allow us to have a consistent data-set on both databases regardless of which we updated.
On db-02 edit /etc/mysql/my.cnf
and configure it to keep a binary log of updates to the appropriate databases.
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M
binlog_do_db = my_application
binlog_ignore_db = mysql
binlog_ignore_db = test
Jump into MySQL on db-02 and grant replication slave privileges to the replication user on db-01.
# On db-02
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-01.vm.xeriom.net' identified by 'some_password';
Next, edit db-01 to replicate data using this account. Edit /etc/mysql/my.cnf
and set the values of the new master on db-02.
master-host = db-02.vm.xeriom.net
master-user = replication
master-password = some_password
master-port = 3306
Restart MySQL on both boxes and check that the slaves are reading from the appropriate master (db-01 reads from db-02 and db-02 reads from db-01).
sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G
If you don't get output that says Slave_IO_State: Waiting for master to send event
, Slave_IO_Running: Yes
and Slave_SQL_Running: Yes
on both boxes then run through the trouble shooting section above.
If you've got this far your database is now running as a Master-Master cluster. Mmm, redundancy.
Heartbeat
The data is replicated two ways across the network so or data is protected against one host going down, but at the moment we still need to configure our applications to use one or the other host: failover must be handled by the application.
I wrote previously about using Heartbeat to provide a high availability web tier. We'll use the same technique to provide a floating IP address for the database. Our applications will connect to this IP address, and Heartbeat will make sure it's pointing at a live database. Since the databases are replicating data between each other it doesn't matter which database node our applications end up connecting to.
Install and configure Heartbeat on both boxes.
sudo apt-get install heartbeat
Next we'll copy and customise the authkeys
, ha.cf
and haresources
files from the sample documentation to the configuration directory.
sudo cp /usr/share/doc/heartbeat/authkeys /etc/ha.d/
sudo sh -c "zcat /usr/share/doc/heartbeat/ha.cf.gz > /etc/ha.d/ha.cf"
sudo sh -c "zcat /usr/share/doc/heartbeat/haresources.gz > /etc/ha.d/haresources"
The authkeys
should be readable only by root because it's going to contain a valuable password.
sudo chmod go-wrx /etc/ha.d/authkeys
Edit /ec/ha.d/authkeys
and add a password of your choice so that it looks like below.
auth 2
2 sha1 your-password-here
Configure ha.cf
according to your network. In this case the nodes are db-01.vm.xeriom.net
and db-02.vm.xeriom.net
. To figure out what your node names are run uname -n
on each of the database boxes. The values you use in the node
directives in the configuration file must match the names in uname -n
.
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth0
udpport 694
auto_failback on
node db-01.vm.xeriom.net
node db-02.vm.xeriom.net
We need to tell Heartbeat we want it to look after MySQL. Edit haresources and make it look like the following - still on both machines.
db-01.vm.xeriom.net 193.219.108.243
This file must be identical on both nodes - even the hostname, which should be the output of uname -n
on node 1. The IP address should be the unassigned IP address given above in the prelude section.
Start heartbeat on db-01 then db-02.
sudo /etc/init.d/heartbeat start
This process takes quite a while to start up. tail -f /var/log/ha-log
on both boxes to watch what's happening. After a while you should see db-01 say something about completing acquisition.
heartbeat[7734]: 2008/07/07_17:19:34 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[7739]: 2008/07/07_17:19:37 INFO: Running OK
heartbeat[7745]: 2008/07/07_17:19:37 info: Local Resource acquisition completed.
Testing it all works
Until now both boxes have been firewalled to allow MySQL connections only from each other. To prove that the database failover works we'll have to connect from another box, possibly your desktop or laptop. Find the public IP address of your chosen machine (here it's 193.214.108.10) and add it to the accept list on both boxes on the heartbeat IP address.
# On both boxes
sudo iptables -I INPUT 3 -p tcp --dport mysql -s 193.214.108.10 -d 193.214.108.243 -j ACCEPT
Create a user which you can use to query the database, again on both boxes.
# on both boxes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant all, replication_client on my_application.* to 'some_user'@'193.214.108.10' identified by 'some_other_password';
mysql> \q
Now connect to the IP address Hearbeat is managing (193.214.108.243) from your test box and run a query to show the slave status.
mysql -u some_user -p -h 193.214.108.243 my_application
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 193.219.108.242
[unimportant lines snipped]
Note that the master host is db-02. Stop heartbeat (or shutdown db-01) and run the query again. You should now see that the master has changed to the IP address of the other node.
Finally, bring Heartbeat back up on db-01 (or start the box if you stopped it) and run the query again. The master host should be the same as the first time.
Auto increment offsets
To avoid problems if the replication process fails, check out avoiding auto_increment collision.
Love me!
If you've found this article useful I'd appreciate beer and recommendations at Working With Rails.
curl -LO http://barkingiguana.com/2008/07/07/high-availability-mysql-on-ubuntu-804.html.orig
curl -LO http://barkingiguana.com/2008/07/07/high-availability-mysql-on-ubuntu-804.html.orig.asc
gpg --verify high-availability-mysql-on-ubuntu-804.html.orig{.asc,}
If you'd like to have a conversation about this post, email craig@barkingiguana.com. I don't bite.