DRBD: For When it Absolutely, Positively, Has to be in Sync

LINBIT develops DRBD it stands for Distributed Replicated Block Device, and as the name implies it is used for replicating a block device between two servers. DRBD was designed to be used in “High Availability” (HA) clusters, and is conceptually similar to a level one RAID, or mirroring, setup.

In August 2009 Logicworks announced an enterprise solution lowering the cost and boosting performance/availability of managed cloud services using software based solution services with LINBIT, This full press release can be read here.

Let’s say you have two servers with a MySQL database that you want to make sure stays up, even if the server it is on crashes. Without some form of HA, if the server hosting MySQL goes away, so does the MySQL database. To provide HA, DRBD inserts itself into the IO stack and proxies all block level actions, simultaneously writing the data to the local disk, as well as the disk on the second server. So, when the time comes to fail over to the standby server, a script moves the IP address over from the primary, mounts the DRBD filesystem, and starts MySQL. Since everything is replicated, no data loss occurs during the failover.

There are a few important things to know that DRBD will not help with. If there is corruption in the filesystem, DRBD will happily replicate the corruption between nodes. This is because DRBD has no knowledge of what is happening farther up the IO stack, and therefore has no way of detecting if such corruption is occurring. Further, DRBD cannot provide instant failover between nodes. Failover is fast, but it’s not on the same level as MySQL Cluster. If you are using scripts like Heartbeat and Mon or Pacemaker, these systems must first detect that a failure has occurred, then add the IP address, take over primary DRBD role, mount the filesystem, and then start the service. These things take time, not a lot, but it might be noticeable, depending on the sensitivity of your environment.

Senior Engineering Expert; Kyle Khultman of Logicworks commented on Ostatic.com he had this to say;

While many people may want to incorporate some of the features and protection of Hearbeat, I like to use either ucarp or keepalived. Both of these really just incorporate the VRRP protocol to linux, and ucarp is a port of bsd’s carp utility – for those familiar with that. I do disagree with you about fail over time though. We do run clusters here that have sub 5-second fail over times. This does take some extreme engineering precision to accomplish, but it is possible.

[Full Article available here.]


About noneil
Rapper turned Rockstar!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: