MySQL MultiMaster Replication

You may know about the MySQL Cluster, which is a complex architecture to achieve high availability and performance. One of the advantages of MySQL Cluster is that each node is a peer to the others, whereas in a normal replicating system you have a master and many slaves, and applications must be careful to write only to the master.

The main disadvantages of MySQL Cluster are (as of MySQL 5.0):

  • The database is in memory only, thus requiring more resources than a normal MySQL database. (MySQL 5.1 introduces table spaces, with the capability of storing nonindexed data on disk.)
  • Some normal features are not available, such as full-text searches, referential integrity, and transaction isolation levels higher than READ COMMITTED.

There are some cases where the MySQL Cluster is the perfect solution, but for the vast majority, replication is still the best choice.

Replication, too, has its problems, though:

There is a fastidious distinction between master and slaves. Your applications must be replication-aware, so that they will write on the master and read from the slaves. It would be so nice to have a replication array where you could use all the nodes in the same way, and every node could be at the same time master and slave.

Fixing this misfeatures is purpose of this article. Using features introduced in MySQL 5.0 and 5.1, it is possible to build a replication system where all nodes act as master and slave at the same time.

Setting Up a Multimaster Replication System

For those of you not well acquainted with the replication basics, I will refer to an earlier article MySQL Database Replication HowTo.

Back to business. Consider the situation where you set up a replication system with more than one master. This has been a common scenario over the past several years.

One hard-to-solve problem in a multimaster replication is the conflict that can happen with self-generated keys. The AUTO_INCREMENT feature is quite convenient, but in a replication environment it will be disruptive. If node A and node B both insert an auto-incrementing key on the same table, conflicts arise immediately.

Rescue comes with recent versions. MySQL 5 introduces a couple of server variables for replicated auto-increment that address this specific problem and allow for the creation of an array of peer-to-peer nodes with MySQL replication.

Quoting from the manual:

  • auto_increment_increment controls the increment between successive AUTO_INCREMENT values.
  • auto_increment_offset determines the starting point for AUTO_INCREMENT column values.

By choosing non-conflicting values for these variables on different masters, servers in a multiple-master configuration will not use conflicting AUTO_INCREMENT values when inserting new rows into the same table. To set up N master servers, set the variables like this:

  • Set auto_increment_increment to N on each master.
  • Set each of the N masters to have a different auto_increment_offset, using the values 1, 2, … , N.

Using those two variables as described in the manual, you can ensure that all nodes in your replication array will use different sequences of auto-incrementing numbers. For example, using auto_increment_increment = 10 and auto_increment_offset=3, the numbers generated when inserting three records will be 3, 13, 23. Using 10, 7, you’ll get 7, 17, 27, and so on.

For my 4 node array, I set auto_increment_increment to 10 for each node, and auto_increment_offset to 1 in the first node, 2 in the second, and so on.

This is theoretically clear, but it still isn’t clear how I managed to transform these servers into peer-to-peer nodes.

The answer is a circular replication, where each node is master of the following node and slave of the preceding one.

Circular replication with two nodes

In its simplest form, circular replication has two nodes, where each one is at the same time master and slave of the other.

Circular replication between two nodes

For this test, I used two servers in my company (jal and vayu there will soon be two more, named agni and prithvi). Their basic configuration is:

# node A - (jal) setup
[mysqld]
server-id = 10
# auto_increment_increment = 10
# auto_increment_offset = 1
master-host = vayu.mycompany.com
master-user = nodeAuser
master-password = nodeApass

# node B - (vayu) setup
[mysqld]
server-id = 20
# auto_increment_increment = 10
# auto_increment_offset = 2
master-host = jal.mycompany.com
master-user = nodeBuser
master-password = nodeBpass

Notice the two magic variables in the configuration files. If you omit such variables (or comment them, as in this example), then something nasty may happen, and the unfortunate circumstances are easy to demonstrate. Remember that MySQL replication is asynchronous. It means that the replication process in the slave can happen at a different time than the one taking place in the master. This feature makes replication more resilient and ensures that even if you suffer a connection breakdown between master and slave, replication will continue when the slave connection resumes. However, this feature has a nasty side effect when you deal with auto-incremented values. Assume that you have a table like this:

CREATE TABLE repadventure (
id int(11) NOT NULL AUTO_INCREMENT,
c char(10) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Assume also that the connection between node A and node B breaks for a moment. Suppose you issue an INSERT statement in both servers, while the replication is not working and the auto_increment variables are not set:

[node A] insert into repadventure values (null, 'aaa'), (null, 'bbb'), (null, 'ccc');

[node B] insert into repadventure values (null, 'xxx'), (null, 'yyy'), (null, 'zzz');

When replication resumes, you get a blocking error in both nodes:

Error ‘Duplicate entry ‘1’ for key ‘PRIMARY” on query. Default database:
‘test’. Query: ‘insert into repadventure values (null, ‘aaa’)’

The reason is easy to discover:

[node A] select * from repadventure;
+----+------+
| id | c    |
+----+------+
| 1  | aaa  |
| 2  | bbb  |
| 3  | ccc  |
+----+------+
[node B] select * from repadventure;
+----+------+
| id | c    |
+----+------+
| 1  | xxx  |
| 2  | yyy  |
| 3  | zzz  |
+----+------+

Both nodes have produced the same primary keys. Thus, when replication resumed, the DBMS justly complained that there was a mistake. Now activate those two variables to see what happens.

[node A] set auto_increment_increment = 10;
[node A] set auto_increment_offset = 1;

[node B] set auto_increment_increment = 10;
[node B] set auto_increment_offset = 2;

Clean the errors, delete all the records in the test table, and redo the insertion (after stopping the replication, to simulate a communication breakdown):

[node A] SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; start slave;
[node B] SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; start slave;
[node A] truncate repadventure;
[node A] stop slave ;
[node B] stop slave ;

[node A] insert into repadventure values (null, 'aaa'), (null, 'bbb'), (null, 'ccc');

[node B] insert into repadventure values (null, 'xxx'), (null, 'yyy'), (null, 'zzz');

Now the situation is different.

[node A] select * from repadventure;
+-----+------+
|  id | c    |
+-----+------+
|  1  | aaa  |
| 11  | bbb  |
| 21  | ccc  |
+-----+------+
[node B] select * from repadventure;
+-----+------+
|  id | c    |
+-----+------+
|  2  | xxx  |
| 12  | yyy  |
| 22  | zzz  |
+-----+------+

Thus, when replication resumes, there is no conflicting error. This proves it. Choosing appropriate values for the auto_increment_increment and auto_increment_offset server variables prevents conflicts between auto-generated keys in this circular replication setup. QED.
Adding more nodes

Having just two nodes could be what you need today, but as your application grows and you need to scale your database environment, you will need to add more nodes into the mix. It’s easy to extend this array to use four nodes.

Circular replication with four nodes

In this broader schema, jal is the master of vayu and slave of prithvi, which is slave of agni; and this last is in turn slave of vayu, thus completing the circle. The boxed numbers next to each server indicate the server ID, which must be different for each node; the auto_increment_increment, the same for all the nodes; and the auto_increment_offset, which guarantees the uniqueness of self-generated keys.

Here is the complete setup for all nodes:

# node A - jal
[mysqld]
server-id = 10
log-bin = mysql-bin
log-slave-updates
replicate-same-server-id = 0
auto_increment_increment = 10
auto_increment_offset = 1
master-host = prithvi.mycompany.com
master-user = nodeAuser
master-password = nodeApass
report-host = nodeA

# Node B - vayu
[mysqld]
server-id = 20
log-bin = mysql-bin
log-slave-updates
replicate-same-server-id = 0
auto_increment_increment = 10
auto_increment_offset = 2
master-host = jal.mycompany.com
master-user = nodeBuser
master-password = nodeBpass
report-host = nodeB

# Node C - agni
[mysqld]
server-id = 30
log-bin = mysql-bin
log-slave-updates
replicate-same-server-id = 0
auto_increment_increment = 10
auto_increment_offset = 3
master-host = vayu.mycompany.com
master-user = nodeCuser
master-password = nodeCpass
report-host = nodeC

# Node D - prithvi
[mysqld]
server-id = 40
log-bin = mysql-bin
log-slave-updates
replicate-same-server-id = 0
auto_increment_increment = 10
auto_increment_offset = 4
master-host = agni.mycompany.com
master-user = nodeDuser
master-password = nodeDpass
report-host = nodeD

A few variables are worth noting in these configuration files. The first is log-slave-updates. This option tells each server to write the changes that it receives from its master through the relay binary log to its own binary log. Without it, cascade replication doesn’t work. The option replicate-same-server-id has the purpose of avoiding infinite replication loops, effectively telling each node to ignore from its master’s binary log any statement that originated with its own server ID.

auto_increment_increment and auto_increment_offset have the appropriate values, as explained earlier. The rest is normal replication administration.

Here’s an example of independent usage:

[node A] stop slave;
[node B] stop slave;
[node C] stop slave;
[node D] stop slave;
[node A] insert into test.repadventure values (null, 'a'), (null, 'aa'), (null, 'aaa');
[node B] insert into test.repadventure values (null, 'b'), (null, 'bb'), (null, 'bbb');
[node C] insert into test.repadventure values (null, 'c'), (null, 'cc'), (null, 'ccc');
[node D] insert into test.repadventure values (null, 'd'), (null, 'dd'), (null, 'ddd');

With the replication stopped, enter in each node three records with independently generated keys. The result is a set of nonconflicting records.

[node A] select * from test.repadventure;
+----+------+
| id | c    |
+----+------+
| 1  | a    |
| 11 | aa   |
| 21 | aaa  |
+----+------+
[node B] select * from test.repadventure;
+----+------+
| id | c    |
+----+------+
| 2  | b    |
| 12 | bb   |
| 22 | bbb  |
+----+------+
[node C] select * from test.repadventure;
+----+------+
| id | c    |
+----+------+
| 3  | c    |
| 13 | cc   |
| 23 | ccc  |
+----+------+
[node D] select * from test.repadventure;
+----+------+
| id | c    |
+----+------+
| 4  | d    |
| 14 | dd   |
| 24 | ddd  |
+----+------+

When you resume the replication flow, the table contents replicate everywhere:

[node A] start slave;
[node B] start slave;
[node C] start slave;
[node D] start slave;
[node A] select * from test.repadventure;
+----+------+
| id | c    |
+----+------+
| 1  | a    |
| 11 | aa   |
| 21 | aaa  |
| 4  | d    |
| 14 | dd   |
| 24 | ddd  |
| 3  | c    |
| 13 | cc   |
| 23 | ccc  |
| 2  | b    |
| 12 | bb   |
| 22 | bbb  |
+----+------+
[node B] select count(*) from test.repadventure;
+----------+
| count(*) |
+----------+
| 12       |
+----------+
[node C] select count(*) from test.repadventure;
+----------+
| count(*) |
+----------+
| 12       |
+----------+
[node D] select count(*) from test.repadventure;
+----------+
| count(*) |
+----------+
| 12       |
+----------+

Of course, if you want to break it, circular replication is as fragile as normal replication when comes to conflicting keys. Inserting the same non-auto-generated primary or unique key in two different nodes will disrupt replication just as well as it does it in normal master-slave replication. With asynchronous replication, this can happen, although you need to be particularly unlucky for this fact to occur. A policy of good programming practice will avoid most of the problems (short of communication failure, that is).

The two important issues here are that you can use circular replication almost seamlessly in any application that now uses a single database server, and that performance is adequate for our purposes.

Testing replication speed

The testing code is a Perl script.

#!/usr/bin/perl

use warnings;

use English qw( -no_match_vars );
use DBI;
use Time::HiRes qw/ usleep /;

@conf_file = ();
$max_config_index = 3;
$current_config = 0;

for ( 'A' .. 'D' ) {
push @conf_file, "$ENV{HOME}/circ_replica/my.node$_.cnf";
}

sub get_connection {
my ($config_files) = @_;
my $config_file = $config_files->[$current_config];
$current_config++;
if ($current_config > $max_config_index) {
$current_config = 0;
}

my $dbh;
eval {

$dbh=DBI->connect("dbi:mysql:test"
. ";mysql_read_default_file=$config_file",
undef,
undef,
{RaiseError => 1})
or die "Can't connect: $DBI::errstrn";
};
if ( $EVAL_ERROR ) {
print STDERR $EVAL_ERROR;
return;
}
return $dbh;
}

my $dbh = get_connection(@conf_file);
$dbh->do(qq{truncate repadventure});
my $bigtext = 'a' repadventure 1000;

for my $loop (1 .. 10)
{
my $dbh1 = get_connection(@conf_file);
my $dbh2 = get_connection(@conf_file);
my ($server1) = $dbh1->selectrow_array(qq{select @@server_id});
my ($server2) = $dbh2->selectrow_array(qq{select @@server_id});
for (1 .. 3) {
$dbh2->do( qq{insert into repadventure values (null, concat("server ", ? ), ?) } , undef , $server1, $bigtext );
}
usleep(1);
my $count = $dbh1->selectrow_array(qq{ select count(*) from x });
print "inserted a record from server $server2n ",
"retrieved $count records from $server1n";
}

After removing all records from the test table, for which there is now a third column (MEDIUMTEXT), the code loops through the nodes, getting at each step a node ($dbh2) and its master ($dbh1). Immediately after inserting three records in the forward node, it calls the master to fetch a count of records, after a stop of one microsecond (usleep). A sample session follows:

inserted a record from server 30
retrieved 3 records from 20
inserted a record from server 10
retrieved 6 records from 40
inserted a record from server 30
retrieved 9 records from 20
inserted a record from server 10
retrieved 12 records from 40
inserted a record from server 30
retrieved 15 records from 20
inserted a record from server 10
retrieved 18 records from 40
inserted a record from server 30
retrieved 21 records from 20
inserted a record from server 10
retrieved 24 records from 40
inserted a record from server 30
retrieved 27 records from 20
inserted a record from server 10
retrieved 30 records from 40

With this simple demonstration, I believe I have shown that circular replication arrays in MySQL 5 are a viable alternative to expensive clusters. However, circular replication does not scale well. When the number of nodes grows to more than 10 nodes, the speed of replication may not be up to the expectations of normal business operations.

Keep in mind, though. If a small number of servers are enough for your business, an array of replication nodes could be just what you need.

Reference: [http://onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html]

Be the first to comment

Leave a Reply

Your email address will not be published.


*


CommentLuv badge