Wednesday, August 5, 2009 started out as a normal day at HostGator’s Houston headquarters. Around 4:00 PM CT, a major power surge that occurred as the result of a transformer near our office blowing up made the day anything but ordinary.
Lights flickered, battery backups beeped, fire alarms went off, and Internet signals all died down almost immediately. People began to wait for the building’s $200,000 hurricane-ready generator to start up, but it didn’t.
In the mean time, one of the three major “legs” of power that feeds the building with the power it needs to function was out because of exploded transformer. The building was underpowered and the higher voltage motors and equipment started burning out from the heat and stress of running without the adequate amounts of power. Expensive equipment continued to get damaged.
A compressor on the air conditioning burnt out (cost: $35,000), air handlers got destroyed (cost: $5,000), an elevator motor got fried (cost: $10,000) and lots of other equipment in the building’s mechanical room still isn’t working correctly (cost: unknown). The total cost of the damages is expected to be upwards of $60,000.
As the building’s systems started to go down and the people in charge of HostGator’s office began calling in electricians, power companies, and repairmen, the rest of the management team began going into what we refer to internally as “hurricane mode.”
- Twitter updates started to go out informing customers of a power problem in the building and possible service delays.
- Employees were rallied and were sent to the other employees’ homes.
- Our phone number was redirected (our VOIP system is housed in our office) and the message on our phone system was updated to inform customers of the outage.
- Our support site was updated with an emergency notice.
- A forum post was made with additional details.
As the makeshift offices were being setup in our managers’ homes, chats were being taken, servers were being monitored, and updates were being provided. Within an hour of the surge, HostGator’s support operations were almost fully functional, albeit delayed (with the exception of phone support).
By 11:30 PM, employees were starting to work at the office again. The phones were turned on shortly afterwards and average email response times went back down to 45 minutes or less.
Much of this expensive and inconvenient damage would have been prevented had the building’s generator worked as planned. If it did, the building would have only lost power for a minute or so instead of multiple hours. The cause was the generator maintenance done less than a week before (by an outsourced company) was done improperly. The company put the wrong fuel filter on the generator, which caused the generator to immediately fail on start up.
The outage could have obviously been much worse. No customer servers or accounts were affected in any way (we don’t house any customer servers in our office building) and we were able to get back up and running relatively quickly.
Regardless of the relative severity of the event, though, HostGator did learn a lot.
- Most notably, the fact the immediate communication is essential was reaffirmed. We first learned about the importance of immediate communication during a datacenter outage at The Planet. In this situation, a Twitter update went out less than 15 minutes after the power surge occurred. Updates continued to be provided across Twitter, the forums, and our support site until the situation was completely resolved. We were even lucky enough to get comments from customers praising us for our handling of the situation.
- We also learned that it’s critical to have systems tested and maintained by companies we know are getting the job done properly. We are obviously looking into a new generator maintenance company and looking at our other vendors to ensure they’re prepared to deal with issues if they occur.
During the entire occurrence, our customers were patient and understanding and we sincerely appreciate that. Stanley Marcus of Neiman Marcus fame is credited with saying “The road to success is paved with well handled mistakes” and we couldn’t agree more.
Things happen (the web hosting business and the act of running a business are never dull) and Wednesday’s events were just one of the many examples of things that no one could have ever predicted happening.
Click on the images below to see a larger version with a caption.