This post was originally published at quasardb.

On Amazon’s EC2, using EBS as the backend storage for your application has been the de-facto standard. Using the local storage of an EC2 container is risky: data loss occurs when a container is stopped and it is not replicated by default. As such, people should default to using EBS, which is Amazon’s version of a SAN.

However, if data loss is acceptable, there are lot of valid reasons to prefer the local instance storage over EBS:

performance is better: no matter how big your fibre to your SAN is, all other things being equal, local disks are faster;

there is no single point of failure: several outages on AWS have been EBS-related, which would have been avoided when using instance storage;

it is cheaper: you don’t have to pay an additional monthly fee for SAN storage, and since you are paying for the instance storage anyway, why not use it?

It is worth mentioning that all this highly depends upon the use case of the customer: it is still their choice whether to use instance storage or not, by simply not assigning any instance volumes to the container.

Instance storage comes with a challenge though, when you want to prepare an AMI which you can deploy in any environment: you will have to detect your environment on boot and prepare the filesystem in an automated way. For quasardb I had to write a boot script that does exactly that, and this post guides you through the process of setting up such a script yourself.

Volume management

Amazon’s containers provide the instance storages as block volumes: the bigger instances provide multiple instance storage volumes and in an ideal world, we will just access these block devices under a single mount point.

There are two different approaches which may be viable solution to this:

mdraid, Linux’s software raid implementation;

LVM, is a system of mapping block devices to virtual volumes.

LVM

LVM supports striping (LVM-RAID), which actually uses mdraid under the hood. Due to LVM abstracting mdraid away, it is less usefull for the power-user: you cannot us mdadm on it, rebuilding hard disks is a little more troublesome, etc.

However, since we do not want to use multiple devices to recover data, but rather to increase performance, we do not need these features, in which case LVM and mdraid will have (approximately) equal performance. Since LVM is more flexible and user-friendly, I will be using LVM.

LVM allows you to abstract away from raw block devices and present them as virtual block devices (called volumes). You can increase the size of a volume, decrease it, add new disks, remove disks, etcetera.

Detecting existing volumes

The first thing we need to automate is to detect any existing LVM volumes; if someone reboots an EC2 container (for example, after a kernel upgrade) then we do not want to lose any data. We will be detecting existing volumes using lvdisplay:

1 LVDISPLAY="/sbin/lvdisplay"

2

3 function detect_volume {

4 echo $(${LVDISPLAY} | grep 'LV Path' | awk '{print $3}')

5 }

6

7 # Similar to detect_volume, but fails if no volume is found.

8 function get_volume {

9 local VOLUME=$(detect_volume)

10 if [[ -z ${VOLUME} ]]

11 then

12 echo "Fatal error: LVM volume not found!" 1>&2

13 exit 1

14 fi

15 echo $VOLUME

16 }

Using these functions, the functions detect_volume and get_volume will give you the path to the virtual block device LVM provides. The astute reader will notice that this in fact will return multiple LVM volumes when they are available; in my specific use case, this will not be an issue unless our customers start fiddling with LVM configurations themselves.

Detecting block devices

Once we have determined that we do not have an LVM volume yet, we must know which block devices to consider. EC2 containers will have /dev/xvda as the container’s root device, which will always be EBS. So, we must find out how many other block devices are available with the same prefix. We can use the bash -b flag to determine whether a path is in fact a block device:

1 # Detects all local block devices present on the machine, skipping

2 # the first (which is assumed to be root).

3 function detect_devices {

4 local PREFIX=$1

5 for x in {b..z}

6 do

7 DEVICE="${PREFIX}${x}"

8 if [[ -b ${DEVICE} ]]

9 then

10 echo "${DEVICE}"

11 fi

12 done

13 }

This will return a newline-separated string of all block devices we should use to create our LVM volume.

Creating an LVM volume

This step will perform all the most important grunt work: bundle the block devices into a volume group, create a logical volume that uses all the space in that volume group, and creating a filesystem on top of it:

1 PVCREATE="/sbin/pvcreate"

2 VGCREATE="/sbin/vgcreate"

3 LVCREATE="/sbin/lvcreate"

4

5 MKFS="/sbin/mkfs -t ext4"

6

7 # Creates a new LVM volume. Accepts an array of block devices to

8 # use as physical storage.

9 function create_volume {

10 for device in $@

11 do

12 ${PVCREATE} ${device}

13 done

14

15 # Creates a new volume group called 'data' which pools all

16 # available block devices.

17 ${VGCREATE} data $@

18

19 # Create a logical volume with all the available storage space

20 # assigned to it.

21 ${LVCREATE} -l 100%FREE data

22

23 # Create a filesystem so we can use the partition.

24 ${MKFS} $(get_volume)

25 }

At this point, we have an LVM volume with a filesystem created on top of it, and we are able to mount it:

1 MOUNTPOINT="/mnt/data"

2

3 function mount_volume {

4 echo "mounting: $1 => ${MOUNTPOINT}"

5 mount $1 ${MOUNTPOINT}

6 }

Glueing it all together

Now, let’s make all this logic work together and combine it in a single script:

1 # Detect existing LVM volume

2 VOLUME=$(detect_volume)

3

4 # And create a brand new LVM volume if none were found

5 if [[ -z ${VOLUME} ]]

6 then

7 create_volume $(detect_devices ${DEVICE_PREFIX})

8 fi

9

10 mount_volume $(get_volume)

Once again, the astute reader will notice that this mounts an LVM volume no matter whether or not this volume was already mounted; this is not a problem, since mount will just throw a warning if it is already mounted.

For completeness’ sake, here is the full script: