ZFS Basics – An introduction to understanding ZFS

Intro

If you work with storage applications or storage hardware there’s a good chance you’ve heard of ZFS. ZFS is essentially a software implementation of RAID but in my experience the most reliable it’s software RAID I’ve worked with.

Comparison to standard RAID

Over the years I’ve worked with several implementations of hardware RAID and for the most part they are pretty equal. However, most hardware RAID implementations I’ve seen — mine included — aren’t really done well. Before I move on to ZFS RAID I’m going to cover the basic problems I’ve come across with Hardware RAID setups which contributed to my switch to ZFS. In this list below RAID = “Hardware RAID”

RAID controllers are typically more expensive than HBAs Many RAID users do not properly set their cache settings on top of the fact that most cards do not come with a BBU. Lots of admins get frustrated with throughput and force write-back without a BBU RAID controllers rarely keep up with drive capacity Sometimes the implementation is proprietary which can make your setup less scalable (limited RAID sets, inability to mix/max nested raid or difficult to expand existing sets) Most user interfaces I have worked with for hardware RAID were poor; i.e. option ROMs on the card that can’t see full disk names or OS specific utilities that are buggy or available to select OS installs only I’ve yet to see a RAID card that allows you to perform a scan for errors like the ZFS scrub. I’m not saying they don’t exist, just haven’t see them

My personal history with ZFS

I first discovered ZFS or “RaidZ” in 2011 when I was deciding on a storage setup for our virtual disk images (then in VMWare). We were always running out of space because the hardware RAID controllers we had at the time only supported small disks so I decided to do some digging. My first attempt at ZFS was using Openindiana which now depreciated so if you want to go the Solaris route I recommend using Omni. I was familiar with Linux at the time but ZFS was designed for Solaris which felt close to Linux but different enough that there was a learning curve.

I used Openindiana until it wasn’t updated and then switched to OmniOS but Solaris for one reason or another — partially due to a different CLI — kept pissing me off. However, the main catalyst for looking into ZoL (ZFS on Linux) was my dream of a unified computer and storage node. To wrap it all up I’ve run ZoL on CentOS, Ubuntu & Debian for about 2 years without any serious or “mystery” incidents both at work and at home. The purpose of this history was to cover the questions you might be having in your head:

Why I don’t use hardware RAID? (6 reasons above) Why I don’t use Solaris for ZFS? Basically, I like Linux better so it’s personal choice with the added benefit of native Linux packages sooner/without the need to port.

Setting up ZFS

During this section I’m going to assume you know nothing about ZFS so that everyone can follow along. However I will break into parts so you can skip around if you already know.While most of the stuff I cover will work in Solaris keep in mind these steps were performed on Linux so it’s possible some techniques may not transfer to Solaris.

My demo setup

To make sure I don’t miss anything I’m going to do a live setup in a VM so that if you following along to what I’m doing exactly I _shouldn’t_ miss anything.

Variable Details OS Ubuntu 14.04 LTS ZFS ZoL (http://zfsonlinux.org/) version 0.6.3 for Ubuntu CPU 4 VM cores from an i7-2600 3.4Ghz MEM 4 GB of VM ram from DDR3 host memory NIC 1x1Gbe Disks 1 30Gb OS disk and 8 1Gb disks for ZFS examples

Step 1: Install Ubuntu — the same way you normally would —

Now that the install is done I’m switching to SSH, for a variety of reasons I still use windows for my main work machine and I highly recommend Cygwin over the traditional putty.

Login via ssh — or local if you prefer — via the username you created. To get started you’ll want to up by typing

sudo -s apt-get update apt-get upgrade shutdown -r -time 0

Because Ubuntu doesn’t actually set a password for the “root” user. You want to reboot because there is a chance a new kernel was installed during the upgrade command and ZFS needs to be reinstalled at each kernel update — we’ll cover more on that later –. Once you’ve rebooted, log back in and sudo -s.

Step 3: Install ZFS

Next you’ll want to download and install ZFS. Instructions for ZoL install may change over time — it’s presently August 2014) — so you might want to visit this link to see if the Ubuntu package PPA has changed from the ZoL page

http://zfsonlinux.org/

Otherwise continue with the following

apt-add-repository ppa:zfs-native/stable apt-get update apt-get install ubuntu-zfs

The last step — installing ubuntu-zfs — might take up to 10 minutes depending your system and internet speed.

Step 4: Basic test

Once the install finishes you’ll want to try out a few commands to make sure things “work”. If you’ve followed along with me this far you should basically see this:

root@zfs-demo:~# zfs list no datasets available root@zfs-demo:~# zpool status no pools available

Step 5: Basic ZFS concepts

Before we go any further with software setup it’s important to acknowledge that while I’m only working in a VM for demo purposes a real setup will be on bare metal. The setup of a ZFS system on bare metal means that every setup decision you make has physical consequences; so let’s first discuss the logical setup which will in turn drive your hardware needs. The logical parts of ZFS are fairly simple to understand:

Pools

In ZFS pools are essentially a collection of physical disks. However, an important concept to understand is that they are not simply pools of “single” disks — but they can be — but rather they are pools of virtual devices hereafter known as “vdevs”. A vdev will be very familiar to anyone who has worked with RAID. Any RAID # configuration you can think of can be represented in ZFS by a vdev. Types of vdevs correlate to RAID you would be familiar with and they are: vdev Desription Pros Cons Min # Disks* CLI create Single Just a single disk, it fails you lose everything Cheap

Full use of disk space Only one Disk speed

It dies, you lose 1 zpool create pool disk1 Mirror This is RAID-1 and can be as many disks as want 2/3/4/5 way mirros Great data redundancy

Fast read speed Poor space efficiency

Write speed of 1 disk 2 zpool create pool mirror disk1 disk2 RaidZ1 Just like RAID 5, you’re allowed to lose one disk without penalty Great use of space

Great read performance Write penalty x4

Expensive Parity checks 2 zpool create pool raidz1 disk1 disk2 RaidZ2 Just like RAID 6, you’re allowed to lose two disks Good use of space

Great read performance Write penalty x6

Slower than Raidz1 3 zpool create pool raidz2 disk1 disk2 disk3 RaidZ3 Comparable to fictional RAID 7, you can lose three disks Superior data protection

Great read performance Write penalty x8

Most expensive RAIDZ 4 zpool create pool raidz3 disk1 disk2 disk3 disk4 * This is not my recommended # of disks but the absolute minimum, for further on why see this article http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/



However, pools can be multiple vdevs which means you can stripe RaidZ2 vdevs to construct something similar to hardware RAID 60. There isn’t really a limit to how you can combine them and creating a pool is as simple as: zfs create poolname disk0 disk1 disk2 disk3 The above would create a pool of 4 disks JBOD (stripe). If you’re using disks for the first time that have no partitions, or have an old setup you want to clear, you need to use “-f”.

In ZFS pools are essentially a collection of physical disks. However, an important concept to understand is that they are not simply pools of “single” disks — but they can be — but rather they are pools of virtual devices hereafter known as “vdevs”. A vdev will be very familiar to anyone who has worked with RAID. Any RAID # configuration you can think of can be represented in ZFS by a vdev. Types of vdevs correlate to RAID you would be familiar with and they are: Datasets

Datasets are essentially groups of data or ZFS file systems that are stored on the raw data area that is a pool. Datasets are mounted just like any other FS (you can put them in your fstab) but by default they’ll be mounted at pool/dataset off your root.

Datasets are essentially groups of data or ZFS file systems that are stored on the raw data area that is a pool. Datasets are mounted just like any other FS (you can put them in your fstab) but by default they’ll be mounted at pool/dataset off your root. ZVOLs

ZVOLs are raw block devices crated over your pool. Essentially this is a new /dev/sdX that you can format however you like (ext4, xfs, even ZFS!) and it is backed by the integrity of the pool. A ZVOL is the most like hardware RAID you’ll get out of ZFS.

ZVOLs are raw block devices crated over your pool. Essentially this is a new /dev/sdX that you can format however you like (ext4, xfs, even ZFS!) and it is backed by the integrity of the pool. A ZVOL is the most like hardware RAID you’ll get out of ZFS. L2ARC and SLOG

Something that is very powerful about ZFS is the ability to add fast drives (like SSDs or RAM drives) to pools of otherwise slow mechanic HDDs. These fast drives supplement your pool in hard times of read or synchronous write stress. The L2ARC is a read cache which is dynamically populated with your most likely to be needed read data (based on history) and the SLOG is a safe place that writes can go so an fsync can be returned before the data is dumped from RAM to HDD. I will likely have separate articles about these at a later date.

Some very useful ZFS concepts to understand

ZFS is a copy on write filesystem with snapshot capability. The reason this is important is because it gives you the ability to perform fully writable dataset/zvol clones in real-time with no performance hit and no space taken up (except what has actually changed). In my work this means if I need a clone of 20 virtual machines it takes less than 1 second, and it also means I can perform and keep 15 minute backups for 2 years with only 20% more space used. ZFS supports real-time the compression modes of lzjb, gzip, zle & lz4. The ZFS manual currently recommends the use of lz4 for a balance between performance and compression. ZFS supports de-duplication which means that if someone has 100 copies of the same movie we will only store that data once. ZFS supports sending of entire datasets/zvols even pools to other ZFS system while online (even if the receiving pool is a different config) All of these settings a hierarchical and tunable down to each dataset/zvol. You can have compression on one and not the other ZFS can perform real-time scrubs All changes can be made while the pool is online

Step 6: Choosing your hardware

When building a storage system it’s important to choose the right hardware. There are only really a few basic requirements to run a decent ZFS system

Make sure the software can see you drives natively (you don’t want HW RAID in the way). JBOD mode, IT Firmware, or just an HBA Do not use drive bay expanders (while it’s suppose to be “ok”) with SAS drives it definitely isn’t with SATA. Each SFF 8087 gets max 4 disks. Or if you’re just using SATA channels it’s 1×1 not multipliers If you’re using cheaper disks plan your pool to accommodate: meaning plan for a drive failure during a rebuild of another Lots of RAM (you want MIN 16GB) Plan to use SSDs, you might not right away depending on how picky you are about your data (or how much you trust your UPS) but once you start tweaking you’ll want them so leave some channels/bays open for that. Personally I recommend something like this for your SSDs If you’re using compression (get some CPU) or deduplication (also add more RAM) these operations can be somewhat expensive

Step 7: Let’s build

So let’s assume you’ve got your hardware together, you’ve installed Ubuntu 14.04 (or any other ZoL compatible OS) and ZFS is up and running. So as I explained at the beginning I’ve got 8 1GB disks to play with for demo purposes. So let’s have some fun!

Disk Setup

First off, let’s take a look at my disks

root@zfs-demo:/# zpool status no pools available root@zfs-demo:/# parted GNU Parted 2.3 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) print devices /dev/sda (1074MB) /dev/sdb (1074MB) /dev/sdc (1074MB) /dev/sdd (1074MB) /dev/sde (1074MB) /dev/sdf (1074MB) /dev/sdg (1074MB) /dev/sdh (1074MB) /dev/vda (10.7GB) (parted)

You can see above that I have 8 1024MB disks. However they are dangerously only represented by their /dev/sdX (which can change and corrupt your pool if someone starts hot-adding disks, or even between power cycles). There are two solutions to this.

Consistent device IDs via export/import

First solution (import export trick)

root@zfs-demo:/# zpool status no pools available root@zfs-demo:/# zpool create -f neo raidz3 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh root@zfs-demo:/# zpool status pool: neo state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM neo ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 sda ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 errors: No known data errors root@zfs-demo:/# zpool export neo root@zfs-demo:/# zpool import neo -d /dev/disk/by-id root@zfs-demo:/# zpool status pool: neo state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM neo ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00004 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00005 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00007 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00009 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00011 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00013 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00015 ONLINE 0 0 0 ata-QEMU_HARDDISK_QM00017 ONLINE 0 0 0 errors: No known data errors root@zfs-demo:/#

Now we have a set of disks that the identify will never change, we also have a way we can label the tray, hmm but that’s kinda ugly and long there must be a better way. You may have noticed in the above command I used “-f” this is required when the disks have no partitions on them because ZFS is trying to protect you from including non-zfs disks so you need to use -f to confirm it.

Consistent device IDs via vdev_id.conf file

Second solution (drive aliases)

First take a look at how we imported with -d /dev/disk/by-id. This is a directory you can look at

root@zfs-demo:/# ls -al /dev/disk/by-id total 0 drwxr-xr-x 2 root root 540 Aug 18 23:12 . drwxr-xr-x 6 root root 120 Aug 18 23:11 .. lrwxrwxrwx 1 root root 9 Aug 18 23:11 ata-QEMU_DVD-ROM_QM00003 -> ../../sr0 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00004 -> ../../sda lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00004-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00004-part9 -> ../../sda9 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00005 -> ../../sdb lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00005-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00005-part9 -> ../../sdb9 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00007 -> ../../sdc lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00007-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00007-part9 -> ../../sdc9 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00009 -> ../../sdd lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00009-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00009-part9 -> ../../sdd9 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00011 -> ../../sde lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00011-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00011-part9 -> ../../sde9 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00013 -> ../../sdf lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00013-part1 -> ../../sdf1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00013-part9 -> ../../sdf9 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00015 -> ../../sdg lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00015-part1 -> ../../sdg1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00015-part9 -> ../../sdg9 lrwxrwxrwx 1 root root 9 Aug 18 23:12 ata-QEMU_HARDDISK_QM00017 -> ../../sdh lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00017-part1 -> ../../sdh1 lrwxrwxrwx 1 root root 10 Aug 18 23:12 ata-QEMU_HARDDISK_QM00017-part9 -> ../../sdh9

The above show the “by-id” tags for each /dev/sdX. We can take this and place them into /etc/zfs/vdev_id.conf

root@zfs-demo:/# cat /etc/zfs/vdev_id.conf # make sure to run "udevadm trigger" to update the /dev/disk/by-vdev/ list once each time you change this file alias 01 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00004 alias 02 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00005 alias 03 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00007 alias 04 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00009 alias 05 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00011 alias 06 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00013 alias 07 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00015 alias 08 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00017

Once this file exists (matching your ids) you can run “udevadm trigger”

root@zfs-demo:/dev/disk# ls -al total 0 drwxr-xr-x 6 root root 120 Aug 18 23:11 . drwxr-xr-x 16 root root 4740 Aug 18 23:12 .. drwxr-xr-x 2 root root 540 Aug 18 23:12 by-id drwxr-xr-x 2 root root 60 Aug 18 23:12 by-partlabel drwxr-xr-x 2 root root 360 Aug 18 23:12 by-partuuid drwxr-xr-x 2 root root 80 Aug 18 23:11 by-uuid root@zfs-demo:/dev/disk# udevadm trigger root@zfs-demo:/dev/disk# ls -al total 0 drwxr-xr-x 7 root root 140 Aug 18 23:22 . drwxr-xr-x 16 root root 4740 Aug 18 23:12 .. drwxr-xr-x 2 root root 540 Aug 18 23:12 by-id drwxr-xr-x 2 root root 60 Aug 18 23:22 by-partlabel drwxr-xr-x 2 root root 360 Aug 18 23:12 by-partuuid drwxr-xr-x 2 root root 80 Aug 18 23:11 by-uuid drwxr-xr-x 2 root root 460 Aug 18 23:22 by-vdev root@zfs-demo:/dev/disk# ls -al ./by-vdev total 0 drwxr-xr-x 2 root root 460 Aug 18 23:22 . drwxr-xr-x 7 root root 140 Aug 18 23:22 .. lrwxrwxrwx 1 root root 9 Aug 18 23:22 01 -> ../../sda lrwxrwxrwx 1 root root 10 Aug 18 23:22 01-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Aug 18 23:22 01-part9 -> ../../sda9 lrwxrwxrwx 1 root root 9 Aug 18 23:22 02 -> ../../sdb lrwxrwxrwx 1 root root 10 Aug 18 23:22 02-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Aug 18 23:22 02-part9 -> ../../sdb9 lrwxrwxrwx 1 root root 9 Aug 18 23:22 03 -> ../../sdc lrwxrwxrwx 1 root root 10 Aug 18 23:22 03-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Aug 18 23:22 03-part9 -> ../../sdc9 lrwxrwxrwx 1 root root 9 Aug 18 23:22 04 -> ../../sdd lrwxrwxrwx 1 root root 10 Aug 18 23:22 04-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Aug 18 23:22 04-part9 -> ../../sdd9 lrwxrwxrwx 1 root root 9 Aug 18 23:22 05 -> ../../sde lrwxrwxrwx 1 root root 10 Aug 18 23:22 05-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Aug 18 23:22 05-part9 -> ../../sde9 lrwxrwxrwx 1 root root 9 Aug 18 23:22 06 -> ../../sdf lrwxrwxrwx 1 root root 10 Aug 18 23:22 06-part1 -> ../../sdf1 lrwxrwxrwx 1 root root 10 Aug 18 23:22 06-part9 -> ../../sdf9 lrwxrwxrwx 1 root root 9 Aug 18 23:28 07 -> ../../sdg lrwxrwxrwx 1 root root 10 Aug 18 23:28 07-part1 -> ../../sdg1 lrwxrwxrwx 1 root root 10 Aug 18 23:28 07-part9 -> ../../sdg9 lrwxrwxrwx 1 root root 9 Aug 18 23:22 08 -> ../../sdh lrwxrwxrwx 1 root root 10 Aug 18 23:22 08-part1 -> ../../sdh1 lrwxrwxrwx 1 root root 10 Aug 18 23:22 08-part9 -> ../../sdh9 root@zfs-demo:/dev/disk#

Now that these links exist I can always use them with zfs (or anything in linux). So now I can do cool and easy stuff like re-create my pool (let’s try a mirror setup this time) with far less hassle

root@zfs-demo:/# zpool destroy neo root@zfs-demo:/# zpool create supershredder mirror 01 02 mirror 03 04 mirror 05 06 mirror 07 08 root@zfs-demo:/# zpool status pool: supershredder state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM supershredder ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 01 ONLINE 0 0 0 02 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 03 ONLINE 0 0 0 04 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 05 ONLINE 0 0 0 06 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 07 ONLINE 0 0 0 08 ONLINE 0 0 0 errors: No known data errors root@zfs-demo:/#

And, as a bonus it’ll be super easy to put 01-08 stickers on the front of my chassis (yes I agree we should have started at 00). On a side note, this pool will perform much better over all but it’s only safe for 1 disk to fail and you get half the space.

4k “Advanced format” drives

A note about 4k drives (advanced format)

These days most 4k drives are “honest” about their sector size. However if you know you have a 4k drives you can run “zdb” to check to see if the property value for your pool “ashift=12” is there. Ashift=9 means aligned for 512b sectors instead of the 4096b sectors you’d find in most WD drives today. Additionally when creating you’re pool you can always add “-o ashift=12” to ensure that’s done properly. I have not honestly done side by side comparison tests but I’ve been told that there are significant performance losses by having a 4096b drive aligned to 512b sectors instead.

Step 8: Let’s make our datasets, finally!

So we’ve got a mirror pool setup we’re happy with and now we want a place to put the data, version the data etc. Well the good news is all the hard work is done and now ZFS will be super easy to work with. Here’s an example:

root@zfs-demo:/# zfs list NAME USED AVAIL REFER MOUNTPOINT supershredder 116K 3.88G 30K /supershredder root@zfs-demo:/# zfs create supershredder/lamedeath root@zfs-demo:/# zfs list NAME USED AVAIL REFER MOUNTPOINT supershredder 153K 3.88G 30K /supershredder supershredder/lamedeath 30K 3.88G 30K /supershredder/lamedeath root@zfs-demo:/# cd /supershredder/lamedeath root@zfs-demo:/supershredder/lamedeath# ls -al total 3 drwxr-xr-x 2 root root 2 Aug 18 23:42 . drwxr-xr-x 3 root root 3 Aug 18 23:42 .. root@zfs-demo:/supershredder/lamedeath# mount /dev/vda1 on / type ext4 (rw,errors=remount-ro) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/cgroup type tmpfs (rw) none on /sys/fs/fuse/connections type fusectl (rw) none on /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880) none on /run/shm type tmpfs (rw,nosuid,nodev) none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755) none on /sys/fs/pstore type pstore (rw) systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd) supershredder on /supershredder type zfs (rw,xattr) // here is our pool mounted supershredder/lamedeath on /supershredder/lamedeath type zfs (rw,xattr) // here is our dataset mounted root@zfs-demo:/supershredder/lamedeath#

Above you can see that we now have mounted a zfs filesystem (simply by creating it) called “lamedeath” which has 3.8GB free. Changing something like compression is as simple as

root@zfs-demo:/# zfs get compression NAME PROPERTY VALUE SOURCE supershredder compression off default supershredder/lamedeath compression off default root@zfs-demo:/# zfs set compression=lz4 supershredder/lamedeath root@zfs-demo:/# zfs get compression NAME PROPERTY VALUE SOURCE supershredder compression off default supershredder/lamedeath compression lz4 local root@zfs-demo:/#

Now all files written to “lamedeath” will be compressed with lz4, and don’t worry it’s very fast and saves you a TON of space. At this point you have total control of functional local storage. However, it will take some time to get used to the commands so as with any linux command MAN is your friend

man zpool man zfs

You’ll want to read those over to some degree before you consider seriously using ZFS in production.

Overview and future post plans

This tutorial will give you the skills you need to build and use ZFS and most of it will work on every implementation including OpenSolaris implementation like OmniOS, SmartOS & OpenIndiana. A setup like the one above will be about as good as any HW RAID setup with a bit more flexibility but there are still a few major topics I need to cover in the future such as

Snapshots, clones and zfs send/receive Data integrity (failed drives, maintenance, planning and what to do when things go wrong) Tuning (basic and advanced) understanding the infrastructure, finding your own bottlenecks and fixing them

In the meantime here are some links you may find useful

https://sites.google.com/site/eonstorage/zpool_notes Info on read/write penalty of pool types https://blogs.oracle.com/brendan/entry/test When and how to use SSDs (and why they make sense in ZFS) http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide A tuning guide (from Solaris perspective, but this mostly applies to ZoL) https://calomel.org/zfs_raid_speed_capacity.html Speed comparisons between different zpool setups

Thanks