UPDATED: Added 500 million file test to this to show that file count doesn’t matter. 🙂

NetApp announced ONTAP 9.7 at Insight 2019 in Las Vegas, which included a number of new features. But mainly, ONTAP 9.7 focuses on making storage management in ONTAP simpler.

One of the new features that will help make things easier is the new FlexGroup conversion feature, which allows in-place conversion of a FlexVol to a FlexGroup volume without the need to do a file copy.

Best of all, this conversion takes a matter of seconds without needing to remount clients!

I know it sounds too good to be true, but what would you rather do: spend days copying terabytes of data over the network, or run a single command that converts the volume in place without touching the data?

As you can imagine, a lot of people are pretty stoked about being able to convert volumes without copying data, so I wanted to write up something to point people to as the questions inevitably start rolling in. This blog will cover how it works and what caveats there are. The blog will be a bit long, but I wanted to cover all the bases. Look for this information to be included in TR-4571 soon, as well as a new FlexGroup conversion podcast in the coming weeks.

Why would I want to convert a volume to a FlexGroup?

FlexGroup volumes offer a few advantages over FlexVol volume, such as:

Ability to expand beyond 100TB and 2 billion files in a single volume

Ability to scale out capacity or performance non-disruptively

Multi-threaded performance for high ingest workloads

Simplification of volume management and deployment

For example, perhaps you have a workload that is growing rapidly and you don’t want to have to migrate the data, but still want to provide more capacity. Or perhaps a workload’s performance just isn’t cutting it on a FlexVol, so you want to provide better performance handling with a FlexGroup. Converting can help here.

When would I not want to convert a FlexVol?

Converting a FlexVol to a FlexGroup might not always be the best option. If you have features you require in FlexVol that aren’t available in FlexGroup volumes, then you should hold off. For example, SVM-DR and cascading SnapMirrors aren’t supported in ONTAP 9.7, so if you need those, you should stay with FlexVols.

Also, if you have a FlexVol that’s already very large (80-100TB) and already very full (80-90%) then you might want to copy the data rather than convert, as the converted FlexGroup volume would then have a very large, very full member volume, which could create performance issues and doesn’t really fully resolve your capacity issues – particularly if that dataset contains files that grow over time.

For example, if you have a FlexVol that is 100TB in capacity and 90TB used, it would look like this:

If you were to convert this 90% full volume to a FlexGroup, then you’d have a 90% full member volume. Once you add new member volumes, they’d be 100TB each and 0% full, meaning they’d take on a majority of new workloads. The data would not rebalance and if the original files grew over time, you could still run out of space with nowhere to go (since 100TB is the maximum member volume size).

Things that would block a conversion

ONTAP will block conversion of a FlexVol for the following reasons:

The ONTAP version isn’t 9.7 on all nodes

ONTAP upgrade issues preventing conversion

A FlexVol volume was transitioned from 7-Mode using 7MTT

Something is enabled on the volume that isn’t supported with FlexGroups yet (SAN LUNs, Windows NFS, SMB1, part of a fan-out/cascade snapmirror, SVM-DR, Snapshot naming/autodelete, vmalign set, SnapLock, space SLO, logical space enforcement/reporting, etc.)

FlexClones are present (The volume being converted can’t be a parent nor a clone)

The volume is a FlexCache origin volume

Snapshots with snap-ids greater than 255

Storage efficiencies are enabled (can be re-enabled after)

The volume is a source of a snapmirror and the destination has not been converted yet

The volume is part of an active (not quiesced) snapmirror

Quotas enabled (must be disabled first, then re-enabled after)

Volume names longer than 197 characters

Running ONTAP processes (mirrors, jobs, wafliron, NDMP backup, inode conversion in process, etc)

SVM root volume

Volume is too full

You can check for upgrade issues with:

cluster::*> upgrade-revert show cluster::*> system node image show-update-progress -node *

You can check for transitioned volumes with:

cluster::*> volume show -is-transitioned true There are no entries matching your query.

You can check for snapshots with snap-ids >255 with:

cluster::*> volume snapshot show -vserver DEMO -volume testvol -logical-snap-id >255 -fields logical-snap-id

How it works

To convert a FlexVol volume to a FlexGroup volume in ONTAP 9.7, you run a single, simple command in advanced privilege:

cluster::*> volume conversion start ? -vserver <vserver name> *Vserver Name [-volume] <volume name> *Volume Name [ -check-only [true] ] *Validate the Conversion Only [ -foreground [true] ] *Foreground Process (default: true)

When you run this command, it will take a single FlexVol and convert it into a FlexGroup volume with one member. You can even run a validation of the conversion before you do the real thing!

The process is 1:1, so you can’t currently convert multiple FlexVols into a single FlexGroup. Once the conversion is done, you will have a single member FlexGroup volume, which you can then add more member volumes of the same size to increase capacity and performance.

Other considerations/caveats

While the actual conversion process is simple, there are some considerations to think of before converting. Most of these considerations will go away in each ONTAP release as support is added for features, but it’s still prudent to call them out here.

Once the initial conversion is done, ONTAP will unmount the volume internally and remount it to get the new FlexGroup information into the appropriate places. Clients won’t have to remount/reconnect, but will see a disruption that last less than 1 minute while this takes place. Data doesn’t change at all – filehandles all stay the same.

FabricPool doesn’t need anything. It just works. No need to rehydrate data on-prem.

doesn’t need anything. It just works. No need to rehydrate data on-prem. Snapshot copies will remain and available for clients to access data from, but you won’t be able to restore the volume using them via snaprestore commands. Those snapshots get marked as “pre-conversion.”

will remain and available for clients to access data from, but you won’t be able to restore the volume using them via snaprestore commands. Those snapshots get marked as “pre-conversion.” SnapMirrors will pick up where they left off without needing to rebaseline, provided the source and destination volumes have both been converted. But no snapmirror restores of the volume; just file retrieval from clients. Snapmirror destinations need to be converted first.

will pick up where they left off without needing to rebaseline, provided the source and destination volumes have both been converted. But no snapmirror restores of the volume; just file retrieval from clients. Snapmirror destinations need to be converted first. FlexClones will need to be deleted or split from the volume to be converted.

will need to be deleted or split from the volume to be converted. Storage efficiencies will need to be disabled during the conversion, but your space savings will be preserved after the convert

will need to be disabled during the conversion, but your space savings will be preserved after the convert FlexCache instances with an origin volume being converted will need to be deleted.

instances with an origin volume being converted will need to be deleted. Space guarantees can impact how large a FlexGroup volume can get if they’re set to volume guarantee. New member volumes will need to be the same size as the existing members, so you’d need adequate space to honor those.

can impact how large a FlexGroup volume can get if they’re set to volume guarantee. New member volumes will need to be the same size as the existing members, so you’d need adequate space to honor those. Quotas are supported in FlexGroup volumes, but are done a bit differently than in FlexVol volumes. So, while the convert is being done, quotas have to be disabled (quota off) and then re-enabled later (quota on).

Also, conversion to FlexGroup volumes is a one way street after you expand it, so be sure you’re ready to make the jump. If anything goes wrong during the conversion process, there is a “rescue” method that support can help you use to get out of the pickle, so your data will be safe.

When you expand the FlexGroup to add new member volumes, they will be the same size as the converted member volume, so be sure there is adequate space available. Additionally, the existing data that resides in the original volume will remain in that member volume. Data does not re-distribute. Instead, the FlexGroup will favor newly added member volumes for new files.

Nervous about convert?

Well, ONTAP has features for that.

If you don’t feel comfortable about converting your production FlexVol to a FlexGroup right away, you have options.

First of all, remember that we have the ability to run a check on the convert command with -check-only true. That tells us what pre-requisites we might be missing.

cluster::*> volume conversion start -vserver DEMO -volume flexvol -foreground true -check-only true Error: command failed: Cannot convert volume "flexvol" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command: * The volume has Snapshot copies with IDs greater than 255. Use the (privilege: advanced) "volume snapshot show -vserver DEMO -volume flexvol -logical-snap-id >255 -fields logical-snap-id" command to list the Snapshot copies with IDs greater than 255 then delete them using the "snapshot delete -vserver DEMO -volume flexvol" command. * Quotas are enabled. Use the 'volume quota off -vserver DEMO -volume flexvol' command to disable quotas. * Cannot convert because the source "flexvol" of a SnapMirror relationship is source to more than one SnapMirror relationship. Delete other Snapmirror relationships, and then try the conversion of the source "flexvol" volume. * Only volumes with logical space reporting disabled can be converted. Use the 'volume modify -vserver DEMO -volume flexvol -is-space-reporting-logical false' command to disable logical space reporting.

Also, remember, ONTAP has the ability to create multiple storage virtual machines, which can be fenced off from network access. This can be used to test things, such as volume conversion. The only trick is getting a copy of that data over… but it’s really not that tricky.

Option 1: SnapMirror

You can create a SnapMirror of your “to be converted” volume to the same SVM or a new SVM. Then, break the mirror and delete the relationship. Now you have a sandbox copy of your volume, complete with snapshots, to test out conversion, expansion and performance.

Option 2: FlexClone/volume rehost

If you don’t have SnapMirror or want to try a method that is less taxing on your network, you can use a combination of FlexClone (instant copy of your volume backed by a snapshot) and volume rehost (instant move of the volume from one SVM to another). Keep in mind that FlexClones themselves can’t be rehosted, but you can split the clone and then rehost.

Essentially, the process is:

FlexClone create FlexClone split Volume rehost to new SVM (or convert on the existing SVM) Profit!

Sample conversion

Before I converted a volume, I added around 300,000 files to help determine how long the process might take with a lot of files present.

cluster::*> df -i lotsafiles Filesystem iused ifree %iused Mounted on Vserver /vol/lotsafiles/ 330197 20920929 1% /lotsafiles DEMO cluster::*> volume show lots* Vserver Volume Aggregate State Type Size Available Used% --------- ------------ ------------ ---------- ---- ---------- ---------- ----- DEMO lotsafiles aggr1_node1 online RW 10TB 7.33TB 0%

First, let’s try out the validation:

cluster::*> volume conversion start -vserver DEMO -volume lotsafiles -foreground true -check-only true Error: command failed: Cannot convert volume "lotsafiles" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command: * SMB1 is enabled on Vserver "DEMO". Use the 'vserver cifs options modify -smb1-enabled false -vserver DEMO' command to disable SMB1. * The volume contains LUNs. Use the "lun delete -vserver DEMO -volume lotsafiles -lun *" command to remove the LUNs, or use the "lun move start" command to relocate the LUNs to other FlexVols. * NFSv3 MS-DOS client support is enabled on Vserver "DEMO". Use the "vserver nfs modify -vserver DEMO -v3-ms-dos-client disabled" command to disable NFSv3 MS-DOS client support on the Vserver. Note that disabling this support will disable access for all NFSv3 MS-DOS clients connected to Vserver "DEMO".

As you can see, we have some blockers, such as SMB1 and the LUN I created (to intentionally break conversion). So, I clear them with the recommendations and run it again and see some of our caveats:

cluster::*> volume conversion start -vserver DEMO -volume lotsafiles -foreground true -check-only true Conversion of volume "lotsafiles" in Vserver "DEMO" to a FlexGroup can proceed with the following warnings: * After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume. * Converting flexible volume "lotsafiles" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored.

Now, let’s convert. But, first, I’ll start a script that takes a while to complete, while also monitoring performance during the conversion using Active IQ Performance Manager.

The conversion of the volume took less than 1 minute, and the only disruption I saw was a slight drop in IOPS:

cluster::*> volume conversion start -vserver DEMO -volume lotsafiles -foreground true Warning: After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume. Do you want to continue? {y|n}: y Warning: Converting flexible volume "lotsafiles" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored. Do you want to continue? {y|n}: y [Job 23671] Job succeeded: success

cluster::*> statistics show-periodic cpu cpu total fcache total total data data data cluster cluster cluster disk disk pkts pkts avg busy ops nfs-ops cifs-ops ops spin-ops recv sent busy recv sent busy recv sent read write recv sent ---- ---- -------- -------- -------- -------- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- -------- -------- -------- 34% 44% 14978 14968 10 0 14978 14.7MB 15.4MB 0% 3.21MB 3.84MB 0% 11.5MB 11.6MB 4.43MB 1.50MB 49208 55026 40% 45% 14929 14929 0 0 14929 15.2MB 15.7MB 0% 3.21MB 3.84MB 0% 12.0MB 11.9MB 3.93MB 641KB 49983 55712 36% 44% 15020 15020 0 0 15019 14.8MB 15.4MB 0% 3.24MB 3.87MB 0% 11.5MB 11.5MB 3.91MB 23.9KB 49838 55806 30% 39% 15704 15694 10 0 15704 15.0MB 15.7MB 0% 3.29MB 3.95MB 0% 11.8MB 11.8MB 2.12MB 4.99MB 50936 57112 32% 43% 14352 14352 0 0 14352 14.7MB 15.3MB 0% 3.33MB 3.97MB 0% 11.3MB 11.3MB 4.19MB 27.3MB 49736 55707 37% 44% 14807 14797 10 0 14807 14.5MB 15.0MB 0% 3.09MB 3.68MB 0% 11.4MB 11.4MB 4.34MB 2.79MB 48352 53616 39% 43% 15075 15075 0 0 15076 14.9MB 15.6MB 0% 3.24MB 3.86MB 0% 11.7MB 11.7MB 3.48MB 696KB 50124 55971 32% 42% 14998 14998 0 0 14997 15.1MB 15.8MB 0% 3.23MB 3.87MB 0% 11.9MB 11.9MB 3.68MB 815KB 49606 55692 38% 43% 15038 15025 13 0 15036 14.7MB 15.2MB 0% 3.27MB 3.92MB 0% 11.4MB 11.3MB 3.46MB 15.8KB 50256 56150 43% 44% 15132 15132 0 0 15133 15.0MB 15.7MB 0% 3.22MB 3.87MB 0% 11.8MB 11.8MB 1.93MB 15.9KB 50030 55938 34% 42% 15828 15817 10 0 15827 15.8MB 16.5MB 0% 3.39MB 4.10MB 0% 12.4MB 12.3MB 4.02MB 21.6MB 52142 58771 28% 39% 11807 11807 0 0 11807 12.3MB 13.1MB 0% 2.55MB 3.07MB 0% 9.80MB 9.99MB 6.76MB 27.9MB 38752 43748 33% 42% 15108 15108 0 0 15107 15.1MB 15.5MB 0% 3.32MB 3.91MB 0% 11.7MB 11.6MB 3.50MB 1.17MB 50903 56143 32% 42% 16143 16133 10 0 16143 15.1MB 15.8MB 0% 3.28MB 3.95MB 0% 11.8MB 11.8MB 3.78MB 9.00MB 50922 57403 24% 34% 8843 8843 0 0 8861 14.2MB 14.9MB 0% 3.70MB 4.44MB 0% 10.5MB 10.5MB 8.46MB 10.7MB 46174 53157 27% 37% 10949 10949 0 0 11177 9.91MB 10.2MB 0% 2.45MB 2.84MB 0% 7.46MB 7.40MB 5.55MB 1.67MB 31764 35032 28% 38% 12580 12567 13 0 12579 13.3MB 13.8MB 0% 2.76MB 3.26MB 0% 10.5MB 10.6MB 3.92MB 19.9KB 44119 48488 30% 40% 14300 14300 0 0 14298 14.2MB 14.7MB 0% 3.09MB 3.68MB 0% 11.1MB 11.1MB 2.66MB 600KB 47282 52789 31% 41% 14514 14503 10 0 14514 14.3MB 14.9MB 0% 3.15MB 3.75MB 0% 11.2MB 11.2MB 3.65MB 728KB 48093 53532 31% 42% 14626 14626 0 0 14626 14.3MB 14.9MB 0% 3.16MB 3.77MB 0% 11.1MB 11.1MB 4.84MB 1.14MB 47936 53645 ontap9-tme-8040: cluster.cluster: 11/13/2019 22:44:39 cpu cpu total fcache total total data data data cluster cluster cluster disk disk pkts pkts avg busy ops nfs-ops cifs-ops ops spin-ops recv sent busy recv sent busy recv sent read write recv sent ---- ---- -------- -------- -------- -------- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- -------- -------- -------- 30% 39% 15356 15349 7 0 15370 15.3MB 15.8MB 0% 3.29MB 3.94MB 0% 12.0MB 11.8MB 3.18MB 6.90MB 50493 56425 32% 42% 14156 14146 10 0 14156 14.6MB 15.3MB 0% 3.09MB 3.68MB 0% 11.5MB 11.7MB 5.49MB 16.3MB 48159 53678

This is what the performance looked like from AIQ:

And now, we have a single member FlexGroup volume:

cluster::*> volume show lots* Vserver Volume Aggregate State Type Size Available Used% --------- ------------ ------------ ---------- ---- ---------- ---------- ----- DEMO lotsafiles - online RW 10TB 7.33TB 0% DEMO lotsafiles__0001 aggr1_node1 online RW 10TB 7.33TB 0% 2 entries were displayed.

And our snapshots are still there, but are marked as “pre-conversion”:

cluster::> set diag cluster::*> snapshot show -vserver DEMO -volume lotsafiles -fields is-convert-recovery,state vserver volume snapshot state is-convert-recovery ------- ---------- -------- -------------- ------------------- DEMO lotsafiles base pre-conversion false DEMO lotsafiles hourly.2019-11-13_1705 pre-conversion false DEMO lotsafiles hourly.2019-11-13_1805 pre-conversion false DEMO lotsafiles hourly.2019-11-13_1905 pre-conversion false DEMO lotsafiles hourly.2019-11-13_2005 pre-conversion false DEMO lotsafiles hourly.2019-11-13_2105 pre-conversion false DEMO lotsafiles hourly.2019-11-13_2205 pre-conversion false DEMO lotsafiles clone_clone.2019-11-13_223144.0 pre-conversion false DEMO lotsafiles convert.2019-11-13_224411 pre-conversion true 9 entries were displayed.

Snap restores will fail:

cluster::*> snapshot restore -vserver DEMO -volume lotsafiles -snapshot convert.2019-11-13_224411 Error: command failed: Promoting a pre-conversion Snapshot copy is not supported.

But we can still grab files from the client:

[root@centos7 scripts]# cd /lotsafiles/.snapshot/convert.2019-11-13_224411/pre-convert/ [root@centos7 pre-convert]# ls topdir_0 topdir_14 topdir_2 topdir_25 topdir_30 topdir_36 topdir_41 topdir_47 topdir_52 topdir_58 topdir_63 topdir_69 topdir_74 topdir_8 topdir_85 topdir_90 topdir_96 topdir_1 topdir_15 topdir_20 topdir_26 topdir_31 topdir_37 topdir_42 topdir_48 topdir_53 topdir_59 topdir_64 topdir_7 topdir_75 topdir_80 topdir_86 topdir_91 topdir_97 topdir_10 topdir_16 topdir_21 topdir_27 topdir_32 topdir_38 topdir_43 topdir_49 topdir_54 topdir_6 topdir_65 topdir_70 topdir_76 topdir_81 topdir_87 topdir_92 topdir_98 topdir_11 topdir_17 topdir_22 topdir_28 topdir_33 topdir_39 topdir_44 topdir_5 topdir_55 topdir_60 topdir_66 topdir_71 topdir_77 topdir_82 topdir_88 topdir_93 topdir_99 topdir_12 topdir_18 topdir_23 topdir_29 topdir_34 topdir_4 topdir_45 topdir_50 topdir_56 topdir_61 topdir_67 topdir_72 topdir_78 topdir_83 topdir_89 topdir_94 topdir_13 topdir_19 topdir_24 topdir_3 topdir_35 topdir_40 topdir_46 topdir_51 topdir_57 topdir_62 topdir_68 topdir_73 topdir_79 topdir_84 topdir_9 topdir_95

Now, I can add more member volumes using “volume expand”:

cluster::*> volume expand -vserver DEMO -volume lotsafiles -aggr-list aggr1_node1,aggr1_node2 -aggr-list-multiplier 2 Warning: The following number of constituents of size 10TB will be added to FlexGroup "lotsafiles": 4. Expanding the FlexGroup will cause the state of all Snapshot copies to be set to "partial". Partial Snapshot copies cannot be restored. Do you want to continue? {y|n}: y Warning: FlexGroup "lotsafiles" is a converted flexible volume. If this volume is expanded, it will no longer be able to be converted back to being a flexible volume. Do you want to continue? {y|n}: y [Job 23676] Job succeeded: Successful

But remember, the data doesn’t redistribute. The original member volume will keep the files in place:

cluster::*> df -i lots* Filesystem iused ifree %iused Mounted on Vserver /vol/lotsafiles/ 3630682 102624948 3% /lotsafiles DEMO /vol/lotsafiles__0001/ 3630298 17620828 17% /lotsafiles DEMO /vol/lotsafiles__0002/ 96 21251030 0% --- DEMO /vol/lotsafiles__0003/ 96 21251030 0% --- DEMO /vol/lotsafiles__0004/ 96 21251030 0% --- DEMO /vol/lotsafiles__0005/ 96 21251030 0% --- DEMO 6 entries were displayed. cluster::*> df -h lots* Filesystem total used avail capacity Mounted on Vserver /vol/lotsafiles/ 47TB 2735MB 14TB 0% /lotsafiles DEMO /vol/lotsafiles/.snapshot 2560GB 49MB 2559GB 0% /lotsafiles/.snapshot DEMO /vol/lotsafiles__0001/ 9728GB 2505MB 7505GB 0% /lotsafiles DEMO /vol/lotsafiles__0001/.snapshot 512GB 49MB 511GB 0% /lotsafiles/.snapshot DEMO /vol/lotsafiles__0002/ 9728GB 57MB 7505GB 0% --- DEMO /vol/lotsafiles__0002/.snapshot 512GB 0B 512GB 0% --- DEMO /vol/lotsafiles__0003/ 9728GB 57MB 7766GB 0% --- DEMO /vol/lotsafiles__0003/.snapshot 512GB 0B 512GB 0% --- DEMO /vol/lotsafiles__0004/ 9728GB 57MB 7505GB 0% --- DEMO /vol/lotsafiles__0004/.snapshot 512GB 0B 512GB 0% --- DEMO /vol/lotsafiles__0005/ 9728GB 57MB 7766GB 0% --- DEMO /vol/lotsafiles__0005/.snapshot 512GB 0B 512GB 0% --- DEMO 12 entries were displayed.

Converting a FlexVol in a SnapMirror relationship

Now, let’s take a look at a volume that is in a SnapMirror.

cluster::*> snapmirror show -destination-path data_dst -fields state source-path destination-path state ----------- ---------------- ------------ DEMO:data DEMO:data_dst Snapmirrored

If I try to convert the source, I get an error:

cluster::*> vol conversion start -vserver DEMO -volume data -check-only true Error: command failed: Cannot convert volume "data" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command: * Cannot convert source volume "data" because destination volume "data_dst" of the SnapMirror relationship with "data" as the source is not converted. First check if the source can be converted to a FlexGroup volume using "vol conversion start -volume data -convert-to flexgroup -check-only true". If the conversion of the source can proceed then first convert the destination and then convert the source.

So, I’d need to convert the destination first. To do that, I need to quiesce the snapmirror:

cluster::*> vol conversion start -vserver DEMO -volume data_dst -check-only true Error: command failed: Cannot convert volume "data_dst" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command: * The relationship was not quiesced. Quiesce SnapMirror relationship using "snapmirror quiesce -destination-path data_dst" and then try the conversion.

Here we go…

cluster::*> snapmirror quiesce -destination-path DEMO:data_dst Operation succeeded: snapmirror quiesce for destination "DEMO:data_dst". cluster::*> vol conversion start -vserver DEMO -volume data_dst -check-only true Conversion of volume "data_dst" in Vserver "DEMO" to a FlexGroup can proceed with the following warnings: * After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume. * Converting flexible volume "data_dst" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored.

When I convert the volume, it lets me know my next steps:

cluster::*> vol conversion start -vserver DEMO -volume data_dst Warning: After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume. Do you want to continue? {y|n}: y Warning: Converting flexible volume "data_dst" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored. Do you want to continue? {y|n}: y [Job 23710] Job succeeded: SnapMirror destination volume "data_dst" has been successfully converted to a FlexGroup volume. You must now convert the relationship's source volume, "DEMO:data", to a FlexGroup. Then, re-establish the SnapMirror relationship using the "snapmirror resync" command.

Now I convert the source volume…

cluster::*> vol conversion start -vserver DEMO -volume data Warning: After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume. Do you want to continue? {y|n}: y Warning: Converting flexible volume "data" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored. Do you want to continue? {y|n}: y [Job 23712] Job succeeded: success

And resync the mirror:

cluster::*> snapmirror resync -destination-path DEMO:data_dst Operation is queued: snapmirror resync to destination "DEMO:data_dst". cluster::*> snapmirror show -destination-path DEMO:data_dst -fields state source-path destination-path state ----------- ---------------- ------------ DEMO:data DEMO:data_dst Snapmirrored

While that’s fine and all, the most important part of a snapmirror is the restore. So let’s see if I can access files from the destination volume’s snapshot.

First, I mount the source and destination and compare ls output:

# mount -o nfsvers=3 DEMO:/data_dst /dst # mount -o nfsvers=3 DEMO:/data /data # ls -lah /data total 14G drwxrwxrwx 6 root root 4.0K Nov 14 11:57 . dr-xr-xr-x. 54 root root 4.0K Nov 15 10:08 .. drwxrwxrwx 2 root root 4.0K Sep 14 2018 cifslink drwxr-xr-x 12 root root 4.0K Nov 16 2018 nas -rwxrwxrwx 1 prof1 ProfGroup 0 Oct 3 14:32 newfile drwxrwxrwx 5 root root 4.0K Nov 15 10:06 .snapshot lrwxrwxrwx 1 root root 23 Sep 14 2018 symlink -> /shared/unix/linkedfile drwxrwxrwx 2 root bin 4.0K Jan 31 2019 test drwxrwxrwx 3 root root 4.0K Sep 14 2018 unix -rwxrwxrwx 1 newuser1 ProfGroup 0 Jan 14 2019 userfile -rwxrwxrwx 1 root root 6.7G Nov 14 11:58 Windows2.iso -rwxrwxrwx 1 root root 6.7G Nov 14 11:37 Windows.iso # ls -lah /dst total 14G drwxrwxrwx 6 root root 4.0K Nov 14 11:57 . dr-xr-xr-x. 54 root root 4.0K Nov 15 10:08 .. drwxrwxrwx 2 root root 4.0K Sep 14 2018 cifslink dr-xr-xr-x 2 root root 0 Nov 15 2018 nas -rwxrwxrwx 1 prof1 ProfGroup 0 Oct 3 14:32 newfile drwxrwxrwx 4 root root 4.0K Nov 15 10:05 .snapshot lrwxrwxrwx 1 root root 23 Sep 14 2018 symlink -> /shared/unix/linkedfile drwxrwxrwx 2 root bin 4.0K Jan 31 2019 test drwxrwxrwx 3 root root 4.0K Sep 14 2018 unix -rwxrwxrwx 1 newuser1 ProfGroup 0 Jan 14 2019 userfile -rwxrwxrwx 1 root root 6.7G Nov 14 11:58 Windows2.iso -rwxrwxrwx 1 root root 6.7G Nov 14 11:37 Windows.iso

And if I ls to the snapshot in the destination volume…

# ls -lah /dst/.snapshot/snapmirror.7e3cc08e-d9b3-11e6-85e2-00a0986b1210_2163227795.2019-11-15_100555/ total 14G drwxrwxrwx 6 root root 4.0K Nov 14 11:57 . drwxrwxrwx 4 root root 4.0K Nov 15 10:05 .. drwxrwxrwx 2 root root 4.0K Sep 14 2018 cifslink dr-xr-xr-x 2 root root 0 Nov 15 2018 nas -rwxrwxrwx 1 prof1 ProfGroup 0 Oct 3 14:32 newfile lrwxrwxrwx 1 root root 23 Sep 14 2018 symlink -> /shared/unix/linkedfile drwxrwxrwx 2 root bin 4.0K Jan 31 2019 test drwxrwxrwx 3 root root 4.0K Sep 14 2018 unix -rwxrwxrwx 1 newuser1 ProfGroup 0 Jan 14 2019 userfile -rwxrwxrwx 1 root root 6.7G Nov 14 11:58 Windows2.iso -rwxrwxrwx 1 root root 6.7G Nov 14 11:37 Windows.iso

Everything is there!

Now, I expand the FlexGroup source to give us more capacity:

cluster::*> volume expand -vserver DEMO -volume data -aggr-list aggr1_node1,aggr1_node2 -aggr-list-multiplier Warning: The following number of constituents of size 30TB will be added to FlexGroup "data": 4. Expanding the FlexGroup will cause the state of all Snapshot copies to be set to "partial". Partial Snapshot copies cannot be restored. Do you want to continue? {y|n}: y [Job 23720] Job succeeded: Successful

If you notice, my source volume now has 5 member volumes. My destination volume… only has one:

cluster::*> vol show -vserver DEMO -volume data* Vserver Volume Aggregate State Type Size Available Used% --------- ------------ ------------ ---------- ---- ---------- ---------- ----- DEMO data - online RW 150TB 14.89TB 0% DEMO data__0001 aggr1_node2 online RW 30TB 7.57TB 0% DEMO data__0002 aggr1_node1 online RW 30TB 7.32TB 0% DEMO data__0003 aggr1_node2 online RW 30TB 7.57TB 0% DEMO data__0004 aggr1_node1 online RW 30TB 7.32TB 0% DEMO data__0005 aggr1_node2 online RW 30TB 7.57TB 0% DEMO data_dst - online DP 30TB 7.32TB 0% DEMO data_dst__0001 aggr1_node1 online DP 30TB 7.32TB 0% 8 entries were displayed.

No worries! Just update the mirror and ONTAP will fix it for you.

cluster::*> snapmirror update -destination-path DEMO:data_dst Operation is queued: snapmirror update of destination "DEMO:data_dst".

The update will initially fail with the following:

Last Transfer Error: A SnapMirror transfer for the relationship with destination FlexGroup "DEMO:data_dst" was aborted because the source FlexGroup was expanded. A SnapMirror AutoExpand job with id "23727" was created to expand the destination FlexGroup and to trigger a SnapMirror transfer for the SnapMirror relationship. After the SnapMirror transfer is successful, the "healthy" field of the SnapMirror relationship will be set to "true". The job can be monitored using either the "job show -id 23727" or "job history show -id 23727" commands.

The job will expand the volume and then we can update again:

cluster::*> job show -id 23727 Owning Job ID Name Vserver Node State ------ -------------------- ---------- -------------- ---------- 23727 Snapmirror Expand cluster node1 Success Description: SnapMirror FG Expand data_dst cluster::*> snapmirror show -destination-path DEMO:data_dst -fields state source-path destination-path state ----------- ---------------- ------------ DEMO:data DEMO:data_dst Snapmirrored

Now both FlexGroup volumes have the same number of members:

cluster::*> vol show -vserver DEMO -volume data* Vserver Volume Aggregate State Type Size Available Used% --------- ------------ ------------ ---------- ---- ---------- ---------- ----- DEMO data - online RW 150TB 14.88TB 0% DEMO data__0001 aggr1_node2 online RW 30TB 7.57TB 0% DEMO data__0002 aggr1_node1 online RW 30TB 7.32TB 0% DEMO data__0003 aggr1_node2 online RW 30TB 7.57TB 0% DEMO data__0004 aggr1_node1 online RW 30TB 7.32TB 0% DEMO data__0005 aggr1_node2 online RW 30TB 7.57TB 0% DEMO data_dst - online DP 150TB 14.88TB 0% DEMO data_dst__0001 aggr1_node1 online DP 30TB 7.32TB 0% DEMO data_dst__0002 aggr1_node1 online DP 30TB 7.32TB 0% DEMO data_dst__0003 aggr1_node2 online DP 30TB 7.57TB 0% DEMO data_dst__0004 aggr1_node1 online DP 30TB 7.32TB 0% DEMO data_dst__0005 aggr1_node2 online DP 30TB 7.57TB 0% 12 entries were displayed.

So, there you have it… a quick and easy way to move from FlexVol volumes to FlexGroups!

Addendum: Does a High File Count Impact the Convert Process?

Short answer: No!

In a comment a few weeks ago, someone pointed out, rightly, that my 300k file volume wasn’t a *true* high file count. So I set out to create 500 million files and convert that volume. The hardest part was creating 500 million files, but I finally got there:

cluster::*> vol show -vserver DEMO -volume fvconvert -fields files,files-used,is-flexgroup vserver volume files files-used is-flexgroup ------- --------- ---------- ---------- ------------ DEMO fvconvert 2040109451 502631608 false

Since it took me so long to create that many files, I went ahead and created a FlexClone volume of it and split it, so I could keep the origin volume intact, because who doesn’t need 500 million files laying around?

Fun fact: That process *did* take a while – about 30 minutes:

cluster::*> vol clone split start -vserver DEMO -flexclone fvconvert -foreground true Warning: Are you sure you want to split clone volume fvconvert in Vserver DEMO ? {y|n}: y [Job 24230] 0% inodes processed. cluster::*> job history show -id 24230 -fields starttime,endtime node record vserver endtime starttime ------------------ ------- --------------- -------------- -------------- node1 2832338 cluster 12/09 10:27:08 12/09 09:58:16

After the clone split, I ran the check to see if we were good to go. I did have to run a “volume clone sharing-by-split undo” to get rid of shared FlexClone blocks which took a while, but after that:

cluster::*> volume conversion start -vserver DEMO -volume fvconvert -foreground true -check-only true Conversion of volume "fvconvert" in Vserver "DEMO" to a FlexGroup can proceed with the following warnings: * After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume.

I went ahead and ran the same script I was running earlier to generate load and watched the statistics on the cluster to see if we hit any outage. Again, the convert took seconds (with 500 million files) and there was just a small blip.

cluster::*> volume conversion start -vserver DEMO -volume fvconvert -foreground true Warning: After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume. Do you want to continue? {y|n}: y [Job 24259] Job succeeded: success

Then, as the job was running I added new member volumes to the FlexGroup volume – again, no disruption.

cluster::*> volume expand -vserver DEMO -volume fvconvert -aggr-list aggr1_node1 -aggr-list-multiplier 3 -foreground true Info: Unable to get information for Snapshot copies of volume "fvconvert" on Vserver "DEMO". Reason: No such snapshot. Warning: The following number of constituents of size 40TB will be added to FlexGroup "fvconvert": 3. Do you want to continue? {y|n}: y [Job 24261] Job succeeded: Successful

Then 4 more member volumes:

cluster::*> volume expand -vserver DEMO -volume fvconvert -aggr-list aggr1_node2 -aggr-list-multiplier 4 Info: Unable to get information for Snapshot copies of volume "fvconvert" on Vserver "DEMO". Reason: No such snapshot. Warning: The following number of constituents of size 40TB will be added to FlexGroup "fvconvert": 4. Do you want to continue? {y|n}: y [Job 24264] Job succeeded: Successful

Plus, I started to see more IOPs for the workload, and the job itself took much less time overall than when I ran it on a FlexVol.

For a video of the capture, check it out here:

This was the job on the FlexVol:

# python file-create.py /fvconvert/files Starting overall work: 2019-12-09 10:32:21.966337 End overall work: 2019-12-09 12:11:15.990707 total time: 5934.024611

This is how long it took on the FlexVol converted to a FlexGroup (with added member volumes):

# python file-create.py /fvconvert/files2 Starting overall work: 2019-12-10 11:02:28.621532 End overall work: 2019-12-10 12:22:48.523772 total time: 4819.95753193

This was the file distribution:

cluster::*> volume show -vserver DEMO -volume fvconvert* -fields files,files-used vserver volume files files-used ------- --------- ---------- ---------- DEMO fvconvert 8160437804 502886230 DEMO fvconvert__0001 2040109451 502848737 DEMO fvconvert__0002 2040109451 12747 DEMO fvconvert__0003 2040109451 12749 DEMO fvconvert__0004 2040109451 12751

At the end of the job:

cluster::*> volume show -vserver DEMO -volume fvconvert* -fields files,files-used vserver volume files files-used ------- --------- ----------- ---------- DEMO fvconvert 16320875608 530132794 DEMO fvconvert__0001 2040109451 506770209 DEMO fvconvert__0002 2040109451 3345330 DEMO fvconvert__0003 2040109451 3345330 DEMO fvconvert__0004 2040109451 3345319 DEMO fvconvert__0005 2040109451 3331657 DEMO fvconvert__0006 2040109451 3331635 DEMO fvconvert__0007 2040109451 3331657 DEMO fvconvert__0008 2040109451 3331657

And, for fun, I kicked it off again on the new FlexGroup. This time, I wanted to see how much faster the job ran, as well as how the files distributed on the more empty FlexVol members.

Remember, we started out with the newer member volumes all at less than 1% of files used (3.3 million of 2 billion possible files). The member volume that was converted from a FlexVol was using 25% of the total files (500 million of 2 billion).

After the job ran, we saw a ~3.2 million file count delta on the original member volume and a ~3.58 million file count delta on all the other members, which means we’re still balancing across all member volumes, but favoring the less full ones.

cluster::*> volume show -vserver DEMO -volume fvconvert* -fields files,files-used vserver volume files files-used ------- --------- ----------- ---------- DEMO fvconvert 16320875608 557633288 DEMO fvconvert__0001 2040109451 509958440 DEMO fvconvert__0002 2040109451 6808792 DEMO fvconvert__0003 2040109451 6809225 DEMO fvconvert__0004 2040109451 6806843 DEMO fvconvert__0005 2040109451 6798959 DEMO fvconvert__0006 2040109451 6800054 DEMO fvconvert__0007 2040109451 6849375 DEMO fvconvert__0008 2040109451 6801600

With the new FlexGroup, converted from a FlexVol, our job time dropped from 5900 seconds to 4656 seconds and we were able to push 2x the amount of IOPs:

# python file-create.py /fvconvert/files3 Starting overall work: 2019-12-10 13:14:26.816860 End overall work: 2019-12-10 14:32:03.565705 total time: 4656.76723099

As you can see, there’s an imbalance of files and data in these member volumes (way more in the original FlexVol), but performance still blows away the previous FlexVol performance because we are doing more efficient work across multiple nodes.

Not too shabby!