Hi lplatypus,
Thanks for your question. In each server we have 10 drives and two RAID arrays. One Array holds the customer data and this consists of 8 x hard drives in RAID 10 configuration, these drives are hot swappable and when they fail and we replace them and the array usually rebuilds itself. The other array has the OS installed and consists of 2 x drives in RAID 1 and its one of these drives that failed, in this case once the drive was replaced the array started to rebuild however Xend service could not be restarted due to the array being in 'read only mode' so until the rebuild is complete we unfortunately cant restart the virtual machines.
We do have try to implement fail safes as you can see although unfortunately there are some issues that we cant prevent but can only mitigate the damage they can cause.
I might now make a note of saying that we are looking at a VMware solution that will incorporate fail over to spare nodes as well as live migrations and as long as all the testing goes well we'll have it out mid next year