My employer has been using Unitrends for quite a while. We were using an older 2U model against Exchange 2007 without problems. In the last year we upgraded to a Recovery-943, and I completed an Exchange 2013 upgrade/migration. We now have a 4-node DAG, using JBOD storage with auto-reseed and 4 copies per database. Because we are using auto-reseed, all disks are mounted in a Microsoft-specified way, and there are NOT any drive letters involved beyond the C: drive. I have 12 different disks on each server, hosting a total of 48 databases (4 DBs per disk, one active, 3 passives).

Unitrends is configured to backup a passive copy of each DB from the various nodes (all nodes are active, but we grab the passive on the node with the lowest activation preference, so it's "most likely to be passive" when backed up).

The problem I'm having is that, as best as I can tell, Unitrends will randomly fail to backup various databases correctly. I have a ticket open with their support team, who has been effectively worthless so far (they have yet to even acknowledge that I'm experiencing a failure and keep telling me the product is working as designed).

Some clarification is in order here. One of the things support has asked me to do is to edit master.ini and explicitly ﻿disable﻿ the standard verification/checksum performed on Exchange backups. They claim this is required because they claim that a passive copy is unable to be checksummed (untrue) and they claim that a checksum isn't required because Exchange itself is already verifying the data while it's being replicated (this may be true, but it's irrelevant).



The problem here is that disabling the checksum means that backups will always "Succeed" in the console, without giving the end user any sort of validation that the backup is actually legitimately "clean."



If the product was working as I expect it should, this wouldn't concern me, but bear with me. :)

Another part of the troubleshooting here has been to up the log level in master.ini for "wbps" (the client-side backup reader that runs under Windows) to 3, and eventually to 5. 3 is sufficient to expose the bug though, 5 just gives you better proof, but ProcMon.exe (from MS/SysInternals) is far better.

With WBPS at log level 3, take a backup of a passive database copy. Read through the wbps_*.log for that backup, and if you experience the same bug I am, you'll see this message: "Snapshot volume was not found". I can see VSS create a snapshot for the target backup EDB at some point in the log, but then when WBPS later goes to actually perform the backup, it can't find the snapshot that was just created and marked for it.

What does WBPS do? It tries to backup the live files on the disk, rather than targeting the shadow copy. I have proof of this happening via ProcMon logs. ﻿Of course﻿ a backup will fail checksumming, if the files are actively being changed / modified by the Exchange replication process... so Unitrends' suggestion to disable Exchange verification is a bad idea at best, and causes a false sense of security at worst. Their own product logs show that their own product is unable to find a shadow, and rather than simply failing at that point (which I think would be preferable), it instead continues to run the backup and read live files off of the disk. Reading a live Exchange database is never a valid backup strategy unless all Exchange processes are shutdown first.

This is semi-sporadic - most of the time I've been able to reproduce it first try, but then today I was able to NOT reproduce the bug, and actually got a clean, ﻿checksummed﻿/verified backup from a passive copy, since WBPS correctly located the VSS snap and used it. (Again, this is after their support has tried to tell me that a passive copy can't be checksummed/verified.) I have been playing around with the checksumming option because I strongly believe that the whole reason support is telling people not to checksum Exchange passives is because their product is broken, rather than because it can't be done. I finally actually got proof of that today.

Additionally, backing up an ﻿Active﻿ DB copy seems to work correctly every time. I've yet to see an Active copy ﻿not﻿ find the VSS faux-volume to backup from, only a Passive backup seems to do this. Microsoft and Unitrends both advise against backing up from an Active copy though, passive is preferred.

I would like to ask anyone else who is using E2013, a DAG, and Unitrends to turn up WBPS logging to level 3 and verify that your Exchange backups are actually occurring via VSS/shadow copy rather than reading the "live" passive files from disk. I'm sick of getting the same answer repeatedly from Unitrends support, so I'm looking for more proof from other parties. I'm about 99.5% sure that there is actually a problem here, but they keep telling me it's fine and refuting my various arguments about how VSS backups work.﻿ I suspect some of the problem here may be the strange disk layout required by auto-reseed, but I have no way to verify this short of finding someone else with a similar environment and asking them to test. Search your WBPS_*.log files for "Snapshot volume was not found" followed by a full volume path that will probably be incorrect. In our failures, it spits out a path for the C: drive rather than one of the Exchange volumes that is mounted in multiple folder mount points below C:.﻿

If anyone wants to see the proof, I can provide procmon dumps and WBPS log files, I'll have to throw them on dropbox or something. This has not been enough to convince Unitrends support (or development, since supposedly they have looked at this case) that anything is wrong. I'm at my wit's end.

Edited Mar 3, 2015 at 07:57 UTC