Re: [Savannah-users] Savannah outage v2

From: Sylvain Beucler Subject: Re: [Savannah-users] Savannah outage v2 Date: Sun, 31 May 2009 14:57:40 +0200 User-agent: Mutt/1.5.18 (2008-05-17)

Hi, A bit of news on our current status. History ------- - On Thursday/Friday night, a disk failed and the RAID bugged, and for safety Savannah was shut down. - On Friday, after getting physical access to the (distant) colocation and preparing a backup, we changed the faulty disk and after performing some checks, the system appeared fine. Ward mentioned he already saw a single disk make the whole RAID to fail. - On Friday night the RAID bugged again and we shutdown Savannah again - On Saturday, a new expedition to the colocation saw that the filesystem was corrupted. Attempt to recover it failed, to the point that we now need to reinstall everything. The cause of this corruption is still not known. Feel free to suggest. - The current disks were put aside for further recovery attempts. We've now reinstalled the base system on 2 new disks, and are reinstalling a partial service Data ---- Now this is getting gory. The last backup was performed while RAID was buggy, and lots of files were reported missing, in particular for CVS/SVN/Git/Hg. Hence the last backup is incomplete. And, our last full backup from tape is from end of April. Normally tape backups are more recent, but there were independent backup issues. We've not discussed since in detail as we're focusing on recovering the data asap. So, while the base of the system and data is there, we're partially missing May. Current status -------------- We're reinstalling a partial service. The frontend can be restored from its state on 29th 02:00 GMT. Probably available today. sftp-based services should be OK too, but will probably come later. The missing data is essentially CVS/SVN/Git/Hg. For the Git/Hg: we plan to install an empty service (maybe today), where you'll be able to import the last state of your project with a classic 'push' command. We'll also make available the data from the April backup (not before tomorrow). You can prepare by having a look at how 'push' works, for example the '--all' option in Git. For CVS/SVN: since you probably don't have a backup of the repo, this is more difficult. When we get the April backup tomorrow, we'll make it available, so you can check it and agree to reimport it. Meanwhile we're trying to see if we can recover May from the corrupted disks. In parallel, we're investigating DRBD to have better protection next time. -- The Savannah Hackers