Summary

Risks of using LVM:

Vulnerable to write caching issues with SSD or VM hypervisor

Harder to recover data due to more complex on-disk structures

Harder to resize filesystems correctly

Snapshots are hard to use, slow and buggy

Requires some skill to configure correctly given these issues

The first two LVM issues combine: if write caching isn't working correctly and you have a power loss (e.g. PSU or UPS fails), you may well have to recover from backup, meaning significant downtime. A key reason for using LVM is higher uptime (when adding disks, resizing filesystems, etc), but it's important to get the write caching setup correct to avoid LVM actually reducing uptime.

-- Updated Dec 2019: minor update on btrfs and ZFS as alternatives to LVM snapshots

Mitigating the risks

LVM can still work well if you:

Get your write caching setup right in hypervisor, kernel, and SSDs

Avoid LVM snapshots

Use recent LVM versions to resize filesystems

Have good backups

Details

I've researched this quite a bit in the past having experienced some data loss associated with LVM. The main LVM risks and issues I'm aware of are:

Vulnerable to hard disk write caching due to VM hypervisors, disk caching or old Linux kernels, and makes it harder to recover data due to more complex on-disk structures - see below for details. I have seen complete LVM setups on several disks get corrupted without any chance of recovery, and LVM plus hard disk write caching is a dangerous combination.

Keeping write caching enabled for performance (and coping with lying drives)

A more complex but performant option is to keep SSD / hard drive write caching enabled and rely on kernel write barriers working with LVM on kernel 2.6.33+ (double-check by looking for "barrier" messages in the logs).

You should also ensure that the RAID setup, VM hypervisor setup and filesystem uses write barriers (i.e. requires the drive to flush pending writes before and after key metadata/journal writes). XFS does use barriers by default, but ext3 does not, so with ext3 you should use barrier=1 in the mount options, and still use data=ordered or data=journal as above.

SSDs are problematic because the use of write cache is critical to the lifetime of the SSD. It's best to use an SSD that has a supercapacitor (to enable cache flushing on power failure, and hence enable cache to be write-back not write-through).

Most enterprise SSDs should be OK on write cache control, and some include supercapacitors.

Some cheaper SSDs have issues that can't be fixed with write-cache configuration - the PostgreSQL project's mailing list and Reliable Writes wiki page are good sources of information. Consumer SSDs can have major write caching problems that will cause data loss, and don't include supercapacitors so are vulnerable to power failures causing corruption.

Advanced Format drive setup - write caching, alignment, RAID, GPT

With newer Advanced Format drives that use 4 KiB physical sectors, it may be important to keep drive write caching enabled, since most such drives currently emulate 512 byte logical sectors ("512 emulation"), and some even claim to have 512-byte physical sectors while really using 4 KiB.

Turning off the write cache of an Advanced Format drive may cause a very large performance impact if the application/kernel is doing 512 byte writes, as such drives rely on the cache to accumulate 8 x 512-byte writes before doing a single 4 KiB physical write. Testing is recommended to confirm any impact if you disable the cache.

Aligning the LVs on a 4 KiB boundary is important for performance but should happen automatically as long as the underlying partitions for the PVs are aligned, since LVM Physical Extents (PEs) are 4 MiB by default. RAID must be considered here - this LVM and software RAID setup page suggests putting the RAID superblock at the end of the volume and (if necessary) using an option on pvcreate to align the PVs. This LVM email list thread points to the work done in kernels during 2011 and the issue of partial block writes when mixing disks with 512 byte and 4 KiB sectors in a single LV.

to align the PVs. This LVM email list thread points to the work done in kernels during 2011 and the issue of partial block writes when mixing disks with 512 byte and 4 KiB sectors in a single LV. GPT partitioning with Advanced Format needs care, especially for boot+root disks, to ensure the first LVM partition (PV) starts on a 4 KiB boundary.

Harder to recover data due to more complex on-disk structures:

Any recovery of LVM data required after a hard crash or power loss (due to incorrect write caching) is a manual process at best, because there are apparently no suitable tools. LVM is good at backing up its metadata under /etc/lvm , which can help restore the basic structure of LVs, VGs and PVs, but will not help with lost filesystem metadata.

, which can help restore the basic structure of LVs, VGs and PVs, but will not help with lost filesystem metadata. Hence a full restore from backup is likely to be required. This involves a lot more downtime than a quick journal-based fsck when not using LVM, and data written since the last backup will be lost.

TestDisk, ext3grep, ext3undel and other tools can recover partitions and files from non-LVM disks but they don't directly support LVM data recovery. TestDisk can discover that a lost physical partition contains an LVM PV, but none of these tools understand LVM logical volumes. File carving tools such as PhotoRec and many others would work as they bypass the filesystem to re-assemble files from data blocks, but this is a last-resort, low-level approach for valuable data, and works less well with fragmented files.

Manual LVM recovery is possible in some cases, but is complex and time consuming - see this example and this, this, and this for how to recover.

Harder to resize filesystems correctly - easy filesystem resizing is often given as a benefit of LVM, but you need to run half a dozen shell commands to resize an LVM based FS - this can be done with the whole server still up, and in some cases with the FS mounted, but I would never risk the latter without up to date backups and using commands pre-tested on an equivalent server (e.g. disaster recovery clone of production server).

Update: More recent versions of lvextend support the -r ( --resizefs ) option - if this is available, it's a safer and quicker way to resize the LV and the filesystem, particularly if you are shrinking the FS, and you can mostly skip this section.

Most guides to resizing LVM-based FSs don't take account of the fact that the FS must be somewhat smaller than the size of the LV: detailed explanation here. When shrinking a filesystem, you will need to specify the new size to the FS resize tool, e.g. resize2fs for ext3, and to lvextend or lvreduce . Without great care, the sizes may be slightly different due to the difference between 1 GB (10^9) and 1 GiB (2^30), or the way the various tools round sizes up or down.

If you don't do the calculations exactly right (or use some extra steps beyond the most obvious ones), you may end up with an FS that is too large for the LV. Everything will seem fine for months or years, until you completely fill the FS, at which point you will get serious corruption - and unless you are aware of this issue it's hard to find out why, as you may also have real disk errors by then that cloud the situation. (It's possible this issue only affects reducing the size of filesystems - however, it's clear that resizing filesystems in either direction does increase the risk of data loss, possibly due to user error.)

It seems that the LV size should be larger than the FS size by 2 x the LVM physical extent (PE) size - but check the link above for details as the source for this is not authoritative. Often allowing 8 MiB is enough, but it may be better to allow more, e.g. 100 MiB or 1 GiB, just to be safe. To check the PE size, and your logical volume+FS sizes, using 4 KiB = 4096 byte blocks: Shows PE size in KiB:

vgdisplay --units k myVGname | grep "PE Size"



Size of all LVs:

lvs --units 4096b



Size of (ext3) FS, assumes 4 KiB FS blocksize:

tune2fs -l /dev/myVGname/myLVname | grep 'Block count'

By contrast, a non-LVM setup makes resizing the FS very reliable and easy - run Gparted and resize the FSs required, then it will do everything for you. On servers, you can use parted from the shell.

It's often best to use the Gparted Live CD or Parted Magic, as these have a recent and often more bug-free Gparted & kernel than the distro version - I once lost a whole FS due to the distro's Gparted not updating partitions properly in the running kernel. If using the distro's Gparted, be sure to reboot right after changing partitions so the kernel's view is correct.

Snapshots are hard to use, slow and buggy - if snapshot runs out of pre-allocated space it is automatically dropped. Each snapshot of a given LV is a delta against that LV (not against previous snapshots) which can require a lot of space when snapshotting filesystems with significant write activity (every snapshot is larger than the previous one). It is safe to create a snapshot LV that's the same size as the original LV, as the snapshot will then never run out of free space.

Snapshots can also be very slow (meaning 3 to 6 times slower than without LVM for these MySQL tests) - see this answer covering various snapshot problems. The slowness is partly because snapshots require many synchronous writes.

Snapshots have had some significant bugs, e.g. in some cases they can make boot very slow, or cause boot to fail completely (because the kernel can time out waiting for the root FS when it's an LVM snapshot [fixed in Debian initramfs-tools update, Mar 2015]).

However, many snapshot race condition bugs were apparently fixed by 2015.

LVM without snapshots generally seems quite well debugged, perhaps because snapshots aren't used as much as the core features.

Snapshot alternatives - filesystems and VM hypervisors

VM/cloud snapshots:

If you are using a VM hypervisor or an IaaS cloud provider (e.g. VMware, VirtualBox or Amazon EC2/EBS), their snapshots are often a much better alternative to LVM snapshots. You can quite easily take a snapshot for backup purposes (but consider freezing the FS before you do).

Filesystem snapshots:

filesystem level snapshots with ZFS or btrfs are easy to use and generally better than LVM, if you are on bare metal (but ZFS seems a lot more mature, just more hassle to install):

ZFS: there is now a kernel ZFS implementation, which has been in use for some years, and ZFS seems to be gaining adoption. Ubuntu now has ZFS as an 'out of the box' option, including experimental ZFS on root in 19.10.

btrfs: still not ready for production use (even on openSUSE which ships it by default and has team dedicated to btrfs), whereas RHEL has stopped supporting it). btrfs now has an fsck tool (FAQ), but the FAQ recommends you to consult a developer if you need to fsck a broken filesystem.

Snapshots for online backups and fsck

Snapshots can be used to provide a consistent source for backups, as long as you are careful with space allocated (ideally the snapshot is the same size as the LV being backed up). The excellent rsnapshot (since 1.3.1) even manages the LVM snapshot creation/deletion for you - see this HOWTO on rsnapshot using LVM. However, note the general issues with snapshots and that a snapshot should not be considered a backup in itself.

You can also use LVM snapshots to do an online fsck: snapshot the LV and fsck the snapshot, while still using the main non-snapshot FS - described here - however, it's not entirely straightforward so it's best to use e2croncheck as described by Ted Ts'o, maintainer of ext3.

You should "freeze" the filesystem temporarily while taking the snapshot - some filesystems such as ext3 and XFS will do this automatically when LVM creates the snapshot.

Conclusions

Despite all this, I do still use LVM on some systems, but for a desktop setup I prefer raw partitions. The main benefit I can see from LVM is the flexibility of moving and resizing FSs when you must have high uptime on a server - if you don't need that, gparted is easier and has less risk of data loss.

LVM requires great care on write caching setup due to VM hypervisors, hard drive / SSD write caching, and so on - but the same applies to using Linux as a DB server. The lack of support from most tools ( gparted including the critical size calculations, and testdisk etc) makes it harder to use than it should be.

If using LVM, take great care with snapshots: use VM/cloud snapshots if possible, or investigate ZFS/btrfs to avoid LVM completely - you may find ZFS or btrfs is sufficiently mature compared to LVM with snapshots.

Bottom line: If you don't know about the issues listed above and how to address them, it's best not to use LVM.