16 Hard Drive RAID 50 Data Recovery Case Study

RAID array initial information provided from customer:

16 Seagate ST3750640AS hard drives within server believed to be grouped into two RAID 50 arrays of 8 hard drives each. Customer describes that one drive failed and was replaced however after the controller began rebuilding the array it went offline. Controller currently shows array as unavailable and will not allow any configuration changes or maintenance operations. Customer believes that the RAID card has failed, card is model is AMCC 9650SE/16ML.

Initial Diagnostics:

Tuesday Afternoon – Technician arrived on site and removed drives from RAID array carefully noting the drive order as connected to controller card. Server has four backplanes each with 4 SATA ports connected on ports 0-15 of controller from bottom to top in physical arrangement. Assumption is that drives are grouped as two rows on top and two on bottom.

Drives brought to data recovery laboratory for evaluation and imaging. Upon initial testing all drives are found to identify and have sector access, however five of the drives have S.M.A.R.T. warnings and one has a S.M.A.R.T. failure due to excessive reallocated sector count. Other S.M.A.R.T. warnings are related to reallocated sectors and excessive heat statistics.

MBR partition table found on two drives (one parity) and determined to show two user partitions of 4.77Gb and 4.78Gb on the array. This is inconsistent with customers description of two RAID 50 arrays, however is consistent with a single 16 drive RAID 50 containing two logical volumes.

Customer is quoted $5650 (non emergency service) for RAID 50 data recovery service. Quote approved.

Data Recovery Procedure:

Tuesday Night – A sampling of data is extracted from each drive to use in determining drive sets. Initial assumption of drives being in two groups top and bottom physical arrangement fails XOR consistency testing. After speaking with customer, he describes that drive on channel 10 may have been with the first set and out of arrangement. Again using this assumption the groupings fail XOR testing. Brute force XOR testing is employed to try and find the correct two sets of 8 drives, and all fail. It is determined that the replaced drive, despite having data on it, is not a part of the array at all and the data was never rebuilt onto it. Meanwhile RAID card metadata is found on two of the drives and it is confirmed that the new drive is not a part of the array.

Wednesday – The original drive which had failed is checked, found to still have read/write functionality and is also a victim of a S.M.A.R.T. failure relating to reallocated sectors. Again brute force XOR testing is employed first using the assumption that only one drive is out of physical arrangement. After a full day of testing the two groups of 8 are determined and both sets pass XOR testing confirming the groupings. Research begins to determine RAID card default and configurable settings and parameters. The card is determined to have a default stripe size of 64Kb and is also capable of 16Kb and 256Kb sizes. The manufacturer’s tech support confirmed that no parity delay exists and the parity rotation is right synchronous. They are however unable to provide information pertaining to the RAID 0 stripe size over the RAID 5 sets. Functional RAID card is ordered for testing to help determine settings.

Thursday – Meanwhile work begins to try and determine drive order and RAID 0 stripe size before it’s arrival. The arrangement of four drives is easily determined in set 1 of the array based on partition tables and NTFS structure locations. RAID 0 striping is confirmed to be larger 128Kb based on jpeg image files found to cross at least three members of the same drive set continuously. It’s assumed that the striping may be 448Kb (64 * number of drives (less one for parity)). Using this assumption drive order determination is attempted. Using other jpeg files and references in the MFT the drive order in set 1 is quickly determined leaving only set 2 to be arranged. Two drives of set 2 are found to have definite locations based on file references and image files which rotate onto these drives from set 1, leaving only 6 drives to determine using brute force methods.

Friday – Despite brute force arranging of all possible arrangements of the remaining 6 drives the results are inconclusive. Analysis of the raw structures of the data makes apparent that the parity rotation is possibly non-standard and seemingly inconsistent. This also may be caused by a failed RAID rebuild as the customer described.

Monday – Further testing is employed to try and determine possible parity rotation manually from existing data, however the process is slow and results inconsistent. Replacement RAID card arrives via FedEx. S.M.A.R.T. settings are reset on failed drives and original drives are connected to controller to force online and read out settings. Drive grouping and order is confirmed based on RAID card utility information, however BIOS utility shows that the array is degraded and requires a consistency scan. This possibly explains the anomalies relating to the parity rotation. RAID metadata also shows a second RAID grouping (likely using the replaced drive), which is offline and appears to have never rebuilt beyond a few sectors. This is consistent with the bad sectors found in a few drives near the lower LBA regions in the MFT region. The rebuild likely only affected the lowest LBA regions however higher sector regions should be consistent using the original settings.

Using this newly confirmed array settings the virtual array is built and scanned. As expected the higher LBA ranges are unaffected by the RAID rebuild. Logical partition 2 is recovered using settings pulled from RAID card metadata 100% with no errors. Partition 1 contains some corruption of the MFT due to bad sectors and the failed rebuild. The partition is imaged onto a single large HDD for further analysis using logical recovery software. It’s data also appears to be 100% recovered, however due to the MFT corruption it’s impossible to confirm this definitively. Customer had described that the logical partitions were two copies of the same data set and that only one or the other would be needed.

Customer is contacted and notified that the RAID 50 data recovery is completed.

Tuesday Morning – Customer arrives and reviews recovered data. All necessary data is found to be intact and fully recovered.

Total billed to customer: $5650

Standard service turnaround time: 1 Week

Have a RAID 50 Needing Recovery?

Contact us to see how we can help. In some cases we’re able to even do the work remotely, although we prefer to have the drives in hand for complete testing. Our pricing for raid arrays follows a standard structure of a base price plus additional for the number of drives and a fee for any drives needing clean room work in our lab. Call us today to immediately speak with a data recovery engineer (401) 400-2425 or toll free 1 (844) 4-My-DATA and see how we can help with your RAID 50 data recovery project.