Leo’s Ramblings Rotating Header Image

Mirroring our iSCSI SAN, continuing on…

We’re continuing on from this post (The sound of one Right Hand clapping… part 1) and we’re now going to mirror our iSCSI volumes between two iSCSI servers. It’s not fail-over yet – it’s just mirroring for DR purposes, cluster round-robin and fail-over/fail-back will come a little later.

So, build two of the iSCSI servers we talked about in the previous article and assign them identical numbers and sizes of disks to be used as LUNs.

In this case, we’ll use a one disk/one LUN configuration.

Do not yet configure the /etc/ietd.conf files on either server, we don’t want to move things too fast.

Get your servers ready:

  1. Run: yum -y upgrade to upgrade the software on all servers
  2. Enable the extras repo in yum (edit /etc/yum.repos.d/CentOS-Base.repo and set enabled=1 on the [extras] definition)
  3. Run: yum -y install drbd82 heartbeat heartbeat-stonith kmod-drbd82-smp to get all we want.

At this point it’s necessary to mention that there are in fact two versions of DRBD – the pre 8.0 series which had single-mode-primary support and the post 8.0 series which have dual-primary support.

This means, that when running a DFS file system that uses a Distributed Lock Manager, we can have both cluster nodes appear as active. VMFS 3.21/3.31 is such a file system.

So we’ll use DRBD 8.2 and have an active/active dual-primary replicated array.

Getting excited yet? (Because we haven’t even started)

Now lets talk about replication. LeftHand Networks make a fair amount of the fact that their SAN has a number of methods of replication. DRBD has 3:

Protocol A: Asynchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has been placed in the local TCP send buffer. In the event of forced fail-over, data loss may occur. The data on the standby node is consistent after fail-over, however, the most recent updates performed prior to the crash could be lost.

Protocol B: Memory synchronous (semi-synchronous) replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node. Normally, no writes are lost in case of forced fail-over. However, in the event of simultaneous power failure on both nodes and concurrent, irreversible destruction of the primary’s data store, the most recent writes completed on the primary may be lost.

Protocol C: Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed. As a result, loss of a single node is guaranteed not to lead to any data loss. Data loss is, of course, inevitable even with this replication protocol if both nodes (or their storage subsystems) are irreversibly destroyed at the same time.

By far, the most commonly used replication protocol in DRBD setups is protocol C.

Configuring DRBD

  • Make sure each host you’re using has a dedicated NIC (GigE) for the replication, and a dedicated NIC (GigE) for iSCSI. For ease of use, we’ll be calling the servers iSCSI1.au.intranet and iSCSI2.au.intranet with IPs 10.254.11.11 and 10.254.11.12 on the replication NICs respectively. The normal iSCSI NICs we’ll assign to 192.16.0.11 and 192.168.0.12 respectively.
  • Run: fdisk -l and note the disk you wish to share – in my case it’s /dev/sdb
  • Create a file (identical on both hosts) called /etc/drbd.conf with the following in it:

global {
usage-count yes;
}
common {
protocol C;
}
resource r0 {
# Rate of replication - set to 0.3 x max limit of link speed in MB/s
syncer {
rate 33;
}
# Enable dual-primary mode (active/active cluster)
net {
allow-two-primaries;
}
# Enable dual-primary mode on startup
startup {
become-primary-on both;
}
device /dev/drbd1;   # Disk mirror device - use drbd1 for first, drbd2 for second
disk /dev/sdb;   # Disk device node
meta-disk internal;
on iSCSI1.au.intranet {   # Hostname of first server
address 10.254.11.11:7789;   # IP address of first server
}
on iSCSI2.au.intranet {
# Hostname of second server
address 10.254.11.12:7789;   # IP address of second server
}
}

You can download my copy of the file with proper formatting here.

  • Now, on both nodes, do the following: drbdadm create-md r0 ; drbdadm up r0 to establish the mirror disk and the replicated resource (r0)
  • Run: cat /proc/drbd and you should see the entry Inconsistent/Inconsistent. This is because no sync has yet happened.
  • Run: drbdadm — –overwrite-data-of-peer primary r0 on one of the servers to make it the sync master. This command is required so that if one of your iSCSI servers dies at some point in the future, you can resync it’s replacement from the remaining server by running this command on the remaining server.
  • Replication is now up – you can present the /dev/drbd1 disk as an iSCSI device on both nodes, mount it in ESX. Everything will replicate.

The iSCSI Component

Now we get to the fun part – enabling iSCSI. It’s done exactly the same as in the first article (please follow the setup/install instructions there), except that the target disk has changed – I repeat the config /etc/ietd.conf below:

Target iqn.2000-12.com.blogsite.lraikhman:storage.lun1
Lun 0 Path=/dev/drbd1,Type=fileio,ScsiSN=AMHDSK-061031-03
Alias iDISK0

Note that the above configuration varies slightly to the one in the previous article = we’re exporting the serial number of the LUN so that clustering will not require a LUN rescan on ESX.

That’s it – you can now attach ESX servers to both nodes of the cluster (active/active synchronous replication), format with VMFS 3.x (Distributed Lock File System) and build a VM farm.

The beauty of this solution is that the two iSCSI servers communicate and replicate over IP, meaning that they could be at different sites, each node attached to a completely different ESX server(s), and still be able to function, providing for a form of manual DR in case one iSCSI node goes down – just re-register the VMs on the ESX servers attached to the other iSCSI node.

This works equally for Xen Server and Virtuozzo and any other solution (including Hyper-V, I think) that uses a shared storage model with a compatible file system.

Obviously, if you don’t like synchronous copy you can change to Protocol A or B, as listed above.

Have fun :)

7 Comments

  1. Are you sure that your setup works? I mean are you sure you do not lose data? I ask this because I was pretty sure to but in drbd mailing list someone told me that the linux iscsi-target are not built for shared disk access (perhaps due to write cache?) and so data loss can happens.

    Thanks in advance for any reply.

  2. Leo says:

    It seemed to work for me.

    iscsi-target can be used for shared disk access without a problem – it only provides a protocol that any amount of servers can connect to.

    Furthermore, there is no problem with providing shared disk to a number of servers as long as there is a Distributed Locking File System.

    In fact, here is DRBD’s own view on the matter: http://www.drbd.org/users-guide/s-dual-primary-mode.html

    In terms of iscsi-target, it only provides the protocol, I’ve attached test ESX servers to Linux iscsi-target implementation demos for years.

    Cheers,
    Leo

    :)

  3. Alan Wilson says:

    Hi,

    You have an error in your drbd.conf – you’re missing an “;” after “rate 33″ – otherwise drbdadm grumbles greatly! :)

  4. Leo says:

    Fixed.

    Thanks :)

  5. Tony Corley says:

    Sorry, Leo, it doesn’t work. The kernel required by iscsitarget is more recent than the latest kernel supported by drbd82. drbd82 will work if i remove kernel-smp and install kmod-drbd82-smp, but that prevents me from installing iscsitarget as it requires that updated kernel. This has caused me such grief that I would never think of putting it into production!

  6. Dmitry R says:

    Hi Leo,

    Thank you for your guide. It is easy to read and use. I used it to build fault-tolerant datastore for 2 ESXs v4.0. I used Openfiler for DRBD and iSCSI Target as it has everything right out of the box. Both ESXs easily discovered the iSCSI disk and recognized it as having 2 paths. However, each ESX assigns different states to these paths. One becomes (Active I/O) and another one is just (Active). And only (Active I/O) path can be used in a time. And moreover, If (Active I/O) path becomes unavailable, ESX doesn’t switch to another path. It just mark the dead path as (dead) and stop working with the iSCSI disk until you restore the dead path. The only way to switch ESX to another path is rescan the iSCSI adapter in each esx. And this helps not everytime. If there are some pending I/O to the dead path, only reboot of ESX helps.

    Do you have any idea how to get ESX to work with both paths of iSCSI disk?
    Any help would be very much appreciated.

    Best Regards, Dmitry

Leave a Reply