* [PATCH v3 0/4] RAID1 with 3- and 4- copies @ 2019-10-31 15:13 David Sterba 2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba ` (6 more replies) 0 siblings, 7 replies; 13+ messages in thread From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba Here it goes again, RAID1 with 3- and 4- copies. I found the bug that stopped it from inclusion last time, it was in the test itself, so the kernel code is effectively unchanged. So, with 1 or 2 missing devices, replace by device id works. There's one annoying thing but not new: regarding replace of a missing device, some extra single/dup block groups are created during the replace process. Example below. This can happen on plain raid1 with degraded read-write mount as well. Now what's the merge target. The patches almost made it to 5.3, the changes build on existing code so the actual addition of new profiles is namely in the definitions and additional cases. So it should be safe. I'm for adding it to 5.5 queue, though we're at rc5 and this can be seen as a late time for a feature. The user benefits are noticeable, raid1c3 can replace raid6 of metadata which is the most problematic part and much more complicated to fix (write ahead journal or something like that). The feedback regarding the plain 3-copy as a replacement was positive, on IRC and there are mails about that too. Further information can be found in the 5.3-time submission: https://lore.kernel.org/linux-btrfs/cover.1559917235.git.dsterba@suse.com/ -- Example of 2 devices gone missing and replaced ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - mkfs -d raid1c3 -m raidc3 /dev/sda10 /dev/sda11 /dev/sda12 - delete devices 2 and 3 from the system Data Metadata System Id Path RAID1C3 RAID1C3 RAID1C3 Unallocated -- ---------- --------- --------- -------- ----------- 1 /dev/sda10 1.00GiB 256.00MiB 8.00MiB 8.74GiB 2 missing 1.00GiB 256.00MiB 8.00MiB -1.26GiB 3 missing 1.00GiB 256.00MiB 8.00MiB -1.26GiB -- ---------- --------- --------- -------- ----------- Total 1.00GiB 256.00MiB 8.00MiB 6.23GiB Used 200.31MiB 320.00KiB 16.00KiB - mount -o degraded - btrfs replace 2 /dev/sda13 Data Metadata Metadata System System Id Path RAID1C3 single RAID1C3 single RAID1C3 Unallocated -- ---------- --------- --------- --------- -------- ------- ----------- 1 /dev/sda10 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 8.46GiB 2 /dev/sda13 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB 3 missing 1.00GiB - 256.00MiB - 8.00MiB -1.26GiB -- ---------- --------- --------- --------- -------- ------- ----------- Total 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 15.95GiB Used 200.31MiB 0.00B 320.00KiB 16.00KiB 0.00B - btrfs replace 3 /dev/sda14 Data Metadata Metadata System System Id Path RAID1C3 single RAID1C3 single RAID1C3 Unallocated -- ---------- --------- --------- --------- -------- ------- ----------- 1 /dev/sda10 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 8.46GiB 2 /dev/sda13 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB 3 /dev/sda14 1.00GiB - 256.00MiB - 8.00MiB 8.74GiB -- ---------- --------- --------- --------- -------- ------- ----------- Total 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB 25.95GiB Used 200.31MiB 0.00B 320.00KiB 16.00KiB 0.00B There you can see the metadata/single and system/single chunks, that are otherwise unused if there are no other writes happening during replace. Running 'balance start -mconvert=raid1c3,profiles=single' should get rid of them. This is an annoyance, we have a plan to avoid that but it needs to change behaviour with degraded mount and enabled writes. Implementation details: The new profiles are reduced from the expected ones (raid1 -> single or dup) to allow writes without breaking the raid constraints. To relax that condition, allow writing to "half" of the raid with a missing device will skip creating the block groups. This is similar to MD-RAID that allows writing to just one of the RAID1 devices, and then sync to the other when it's available again. With the btrfs style raid1 we can do better in case there are enough other devices that would satify the raid1 constraint (yet with a missing device). -- David Sterba (4): btrfs: add support for 3-copy replication (raid1c3) btrfs: add support for 4-copy replication (raid1c4) btrfs: add incompat for raid1 with 3, 4 copies btrfs: drop incompat bit for raid1c34 after last block group is gone fs/btrfs/block-group.c | 27 ++++++++++++++-------- fs/btrfs/ctree.h | 7 +++--- fs/btrfs/super.c | 4 ++++ fs/btrfs/sysfs.c | 2 ++ fs/btrfs/volumes.c | 40 +++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 4 ++++ include/uapi/linux/btrfs.h | 5 ++++- include/uapi/linux/btrfs_tree.h | 10 ++++++++- 8 files changed, 83 insertions(+), 16 deletions(-) -- 2.23.0 ^ permalink raw reply [flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies 2019-11-14 5:13 ` Zygo Blaxell @ 2019-11-15 10:28 ` David Sterba 0 siblings, 0 replies; 13+ messages in thread From: David Sterba @ 2019-11-15 10:28 UTC (permalink / raw) To: Zygo Blaxell; +Cc: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS On Thu, Nov 14, 2019 at 12:13:24AM -0500, Zygo Blaxell wrote: > On Fri, Nov 01, 2019 at 04:09:08PM +0100, David Sterba wrote: > > The raid1c34 patches are not intrusive and could be backported on top of > > 5.3 because all the preparatory work has been merged already. > > Indeed, that's how I ended up testing them. I couldn't get the 5.4-rc > kernels to run long enough to do meaningful testing before they locked > up. I tested with 5.3.8 + patches. > > I left out the last patch that removes the raid1c3 incompat flag because > 5.3 didn't have the block group tree code to apply it to. > > I ran my raid1 and raid56 corruption recovery tests modified for raid1c3. > The first test is roughly: > > mkfs.btrfs -draid1c3 -mraid1c3 /dev/vd[bcdef] > mount /dev/vdb /test > cp -a 9GB_data /test > sync > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > head -c 9g /dev/urandom > /dev/vdb > head -c 9g /dev/urandom > /dev/vdc > sync > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > btrfs scrub start -Bd /test > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > btrfs scrub start -Bd /test > sysctl vm.drop_caches=3 > diff -r 9GB_data /test > > First scrub reported a lot of corruption on /dev/vdb and /dev/vdc. Second > scrub reported no errors. diff (all instances) reported no differences. > > Second test is: > > mkfs.btrfs -draid6 -mraid1c3 /dev/vd[bcdef] > # rest as above... > > Similar results: first scrub reported many errors as expected. > Second scrub reported no errors. No diffs. Thanks for the tests. ^ permalink raw reply [flat|nested] 13+ messages in thread

Linux-BTRFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \ linux-btrfs@vger.kernel.org public-inbox-index linux-btrfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git