|
|||
|
Hi,
I've been a customer for about 6 months now. There were minor glitches in that period, I know things can't be 100% perfect - I have strong technical background myself. However, today, Sunday 27th of March, my node is down and I can't power it up. This is after it was working intermittently during the morning, being available for a few seconds per minute. Something like this happened on Friday. I called the technical support and they said something was going on at the node that hosted my VPS, and that they'll investigate. It came back again today, however it seems nobody is available to address this. I'm desperate. I host a number of web sites and I'm being humiliated in front of my many customers. I submitted 2 tickets, marked them as "emergency" and "critical", left my mobile number - nobody's there. This is a MAJOR setback for me and my business and I wonder where are Crucial Paradigm's people now? VERY STRANGE FOR SOMEONE WHO GUARANTEES 99.9% AVAILABILITY in their SLA on the home page. |
|
||||
|
Hi Milan,
Thanks for posting some feedback. Feedback of any kind is always appreciated. From what you have detailed I believe you were affected by a recent VPS node outage, as detailed here, Linux Server Node s492.au.crucialx.net I believe the issue first sprung up on a Friday causing some Customer's VPS Service's to be inaccessible. Looking back at our Tech Team's investigation note's from Friday, they went through all the usual check's and the issue was resolved. It appears that then on Sunday later on the evening, a similar issue has re-occurred, resulting in your VPS Service being inaccessible. Both time's our Tech Team went through the usual checks, which includes checking logs regarding the RAID card and the Operating System logs. The symptoms being presented indicated there was an issue with the RAID Array which held your VPS Data. However the RAID was reporting as Healthy / Optimal as were each of the Drive's in the RAID Array. In an ideal situation, when there is a hardware problem or hardware fault, something simply fails, and we then replace it, whether it be a RAID card or a failed Hard Drive. This case however was not your usual circumstances, no fault messages were being received, and no hardware had failed by all logs and diagnostics. This makes it very difficult to identify the exact root cause of the issue. As a result we got a Senior Sys Admin to use his experience and knowledge to help further identify the issue. It was decided that we should replace the RAID Card regardless of whether or not we could isolate it as the cause of the issue. Our Senior Sys Admin felt this was the best choice to address the issue. A Technician was then sent to the DataCenter to perform this task and as such the RAID Card in the node running your VPS was replaced and the symptoms appear to have subsided. However we are still monitoring none the less. The thing to note here, is Yes sometimes hardware does have issues, or does fail. Sometimes it is not a black and white clear cut issue and we have to dive into the issue with our Team to get it sorted. I completely understand that any downtime is not ideal for anyone, and that is why we deployed a Technician to the DataCenter to replace a suspected faulty part. This was not a usual failure situation, so some further time to investigate the issue was taken. During outage's, queue lengths often increase quite dramatically leaving the on duty team having to work as fast as possible on the issue at hand, as well as respond to as many ticket's as possible. Our Team's priority is always the resolution of the problem first. Just to clarify we do not provide phone support or a phone call back service during Weekend hours. We only provide phone Support during Weekday Business Hours. If you feel any of the Ticket's you submitted were not responded too in an appropriate time frame I am more than happy to investigate that further for you. If you would like me to take a look, please email your details and all relevant ticket ID's to feedback at crucial.com.au and I personally will follow up with you on that.
__________________
Cheers, Ross Crucial Paradigm Last edited by ross; 28-03-2011 at 11:04 AM. |
|
|||
|
Dear Ross,
thank you for your reply. I perfectly understand the justification you presented and that hardware occasionally fails. What I don't understand is why your people haven't moved VPSes to a different node, at least to see if that would rectify the situation. On this page: Crucial Paradigm Web Solutions your company claims: "This means we will be able to provide a higher level of uptime on our entire product range. The new setup is designed with no single point of failure in mind across all aspects of the setup. This means if server goes offline or fails it will be automatically started on a new node with minimal downtime (seconds/minutes)" THIS IS THE KEY CLAIM THAT DREW ME TO YOUR SERVICE. I was counting on reliable hosting and your claimed ability to quickly respond to hardware failures. It rings shallow now, as the type of failure you had wouldn't be in a category of a "catastrophic" event, just a simple card failing. If you DO HAVE the claimed ability to restart the server on a new node, it's a mystery to me why you held all of us offline for hours (from about 2pm till about just after midnight), instead of moving everyone to a new node and then investigating what's wrong with the original node without any pressure on anyone, including your own support staff? You clearly failed to follow your own advice/promise. I also tried to find out more about your 99.9% promise of uptime on the same page, however, I was unable to find out what it actually meant. In particular, I couldn't find anything that said what would happen in case you failed to keep that promise. Yesterday, I literally BEGGED your support staff to move my VPS to a different, working node. That didn't happen. As it turns out, that would've solved my issue. The whole episode just leaves a lot of bitter after-taste. I now have to deal with aftershocks this caused for my reputation and relationship with my clients. Last edited by milan; 28-03-2011 at 11:39 AM. |
|
||||
|
Hi Milan,
Thanks for the update. I believe there appears to be some confusion about what solution / service you are on. As currently you are on this Service >> VPS & Virtual Dedicated Servers for Linux which is NOT a High Availability solution. In fact we are yet to release our High Availability solution, which is what the quote you provided was in relation too I believe. Feel free to email feedback at crucial.com.au so we can get your details, and clear up your concerns as there appears to be some confusion on what Service you believe you are on.
__________________
Cheers, Ross Crucial Paradigm |
|
|||
|
Hi Ross,
if 99.9% availability promise is for SOME SERVICES only, I think it should be fair to indicate this on your web page, along with the list of services for which the claim is true (or not true). I rechecked the page, and I couldn't figure out what this promise applies to, if it's not universally applicable across the range of your services. Using word "SOME" would at least hint that it doesn't apply to ALL. |
|
|||
|
Dear Ross
I waited to get the final report about the incident before posting the final comment. It's commendable that you described in detail the history of the failure. However, I'd like to add a few more things, from a customer point of view: 1) SALES PITCH. Reliability and redundancy are of utmost importance to business customers like me. When I inquired, in detail, what protection you had in case of various failures (network, power, nodes), your representative created an impression that nothing short of a direct nuclear strike could make the server go down. There was no mention that you had no spare capacity in terms of nodes. 2) EXCUSES. You pointed out that VPS solution never promised any kind of redundancy. TRUE. I checked the web page. However, NOT EVEN YOUR BUSINESS SOLUTIONS PROMISE ANYTHING LIKE THAT. I suppose, should any of your business customers find themselves in the situation like I had, you could use the same excuse - "nothing like that was promised". However, it would be MUCH BETTER if you were very open and EVEN if a customer did not explicitly ask, at least mention that you are running at full capacity, with no spare nodes, only spare parts. I noticed that even your customers who were paying extra for the backup service were stuck and helpless - I wonder how secure they feel now? 3) TIMELY TRANSPARENCY. When problems started, I asked your technical support to move my VPS to another node immediately, because a very big event was being reported on live on one of my sites. I had no response. That was rude, but I hoped maybe the solution was just around the corner. However, what makes me feel VERY DISAPPOINTED AND CHEATED was that your support HID THE FACT THAT THERE WAS NO SPARE NODE. Had I known this, I would have rerouted the whole operation to MY OWN SPARE SERVER in a different location immediately and reduced the downtime. I didn't do that because I wasn't sure whether you'd fix the problem before I do the switchover. However, the correspondence with you on this forum helped me realize that I can expect only excuses from Crucial and no real help and I did the switchover despite several messages of "all seems to be running ok now" variety. Anyway, now that it's clear that emperor has no clothes, I'll be moving to a different provider. All is working fine now, but clearly, it's a time bomb. Should this node fail too, every customer of yours on that node would be stuck in another week of downtime. I can't risk that. I find it very amusing in the incident report that you mentioned how you learned that it would be good to have a spare node. After 8 years in business? Amazing. |
|
||||
|
Hi Milan,
You are most definitely entitled to your opinion of the situation and we appreciate you taking the time to note down your thoughts. As the Linux VPS solution is NOT a redundant solution, there is no way we can 100% guarantee your Service will remain up during a hardware fault. It is just not possible with the solution available. We do claim reliability as we use reliable and standardized hardware which has a proven track record. Considering this is the first major incident we have had in a long time, our reliability and uptime have been excellent with the hardware / solution we offer. As I detailed in the Report we have already taken on board the fact we had no spare capacity to take on board all VPS Service's affected by this outage, in fact we already have a spare node ready to go as a Spare at the DC. So this area has been rectified. I believe I have already addressed to you why you were not migrated upon request in another thread. We did not hide the fact we did not have a spare, we quite openly on the forum indicated we were in the middle of preparing a new node. Our Techs during high queue loads when there are outages / incidents can not stop and provide detailed reports / reasons for each step of the way, their focus will and always will be addressing the cause of the issue. Reasoning and details need to be saved for the report at the end. As I said in the report, we did everything we could to address the issue, there was some delays due to a spare node not being ready, and this mistake has already been rectified. We are sorry to lose your business if you still feel we have not provided what you expected. However, I believe we were as up front and transparent considering the situation and how many Customers were affected.
__________________
Cheers, Ross Crucial Paradigm |
![]() |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
All times are GMT +11. The time now is 12:15 AM.
Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.5.2
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.5.2







Linear Mode
