As I mentioned in my last Trail Log, the latch on the drive bay holding my OS mirror broke, and a drive popped out. The mirror worked as expected and things kept running.
I was able to replace the cage and rebuild the mirror. But this brought up the question of what if it was the controller that failed or the OS became corrupt? I’ve yet to test a server restore. It’s a long weekend and now’s as good as anytime to test. If I need to reinstall and manually recreate users and shares I have the time. Recreating shares and users would never be fun, but at least it would be a time of my choosing.
No sense tempting fate and I did check to make sure that the resent backups worked without error which is a luxury an unexpected failure won’t allow. To simulate the failure I’d delete the mirror and recreate it from scratch including an initialization. While the same hardware, it would be as bad as starting with a new controller.
This is how I have the server backup configured (double click for bigger picture):
The backup goes to an external USB drive and runs at noon and 11PM. The backup has been running reliably and not reporting any errors. So I was ready to check the reliability of those reports.
The restore wasn’t straight-forward but it wasn’t too complicated and it did work. After the restore I had a working server with all my shares and users.
Some tips from my experience:
- I needed to recreate my install configuration. When I installed I only had the system drive connected. When I tried doing the restore with all drives connected (excluding all but the system mirror as a restore location) the restore process stopped by saying there were no suitable drives for the restore. Removing power from all but the OS drives solved the problem. I suspect this is because with all the drives connected the OS mirror wasn’t seen as the primary drive (just like during an install with all the drives connected).
- The repair process didn’t always find the system backup on the external drive on the first scan. Sometimes I had to force a rescan then it was found. If the scan was quick I knew it hadn’t looked hard enough and told it to look again. Annoying but it just took persistence.
- I still needed to load the drivers for the OS Raid Controller before it could be found. So any drivers needed for the original install will be needed for the restore. Although once the restore is done whatever drivers were installed on the server will be used.
- The restore itself was quick, taking less than 15 minutes once the drives disconnected.
- The first reboot after the restore failed with a bootmgr not found error. But it’s been fine since then.
- The times displayed for the image backups (indicating the time the backup would be made) were GMT –8 hours, which is not the same as my server (GMT –5) so the times appeared a bit off until I read the offset and realized why. (Redmond centric I guess)
- The restore is back to when the backup was made, so any data that changed since then (such as for add-ins) will be lost and have to be recreated.
So, for that last bullet point: Cloudberry saves its information to the C: drive so after the restore I did a repository sync to make sure it was all up to date. But the restore itself worked fine. All my backup plans and repositories were still configured.
I don’t have any other add-ins but if there were any they maintain data on drive C: it would need to be refreshed.
The bottom line – I’m happy to know the server backup actually works. Not that I doubted Microsoft (no, really) but nice to know it works with my hardware with my configuration.