who, me? “Expect the unexpected” is a cliché that often comes up during disaster planning. But how far should these plans go?Welcome to an episode who, me? Readers discovered a whole new failure mode.
Today’s story comes from “Brian” (not his name) and is set during a time when California, USA, is facing rolling blackouts.
Our readers are working for a struggling hardware supplier in the state, a once-mighty force now reduced to just 1,400 employees thanks to an old favorite of HR slashers: “restructuring.”
Brian works as a Unix/Linux system administrator in the data center, and the only remaining facility engineer is in another location, not far from the highway.
“We were warned that California was going to start ‘rolling blackouts,'” he told us, “but we were assured that the diesel generators were fully fueled and could provide power for a day or two, and we had battery backup for the building’s Data center section at least 30 minutes (just in case).”
What could go wrong?
On the day of the first power outage, the lights in Brian’s cubicle farm went out and the desktop died as expected. When the large generators are ready to start, the employees switch to the machines running on the UPS.
Sure enough, the diesel kicked in. However, the power did not flow. Buildings are still dark. Brian ran to the back of the data center to double check and yes, the generators were definitely running. But for some reason, the light didn’t come on.
He was unable to get into the generator case for further troubleshooting as it was wisely securely locked. And the key? with a facility engineer. who is on another site.
It’s 4:30 p.m., and anyone familiar with traffic on I-280 between 4 p.m. and 7 p.m. knows that the chances of an engineer navigating these bumps within 30 minutes are virtually nil.
“So our facility people are rushing left and right to the data center at a walking pace,” Brian said.
To make matters worse, the air conditioners in the data center use mains power, not the UPS. After all, it should only be a brief blip before the generator kicks in. But that’s not the case, things are heating up.
The team started desperately shutting everything down. Development kits, test hardware, and even redundant production systems. Anything that might draw valuable juice from the UPS, dissipate heat, and isn’t strictly necessary can escape the flick of the power button.
“By the time the facility personnel arrived at the data center, about an hour later, the UPS in the house had been drained,” Brian recalls, “even though we had cut servers and networking equipment to a minimum, and all the Doors are all open, in vain to keep cool.”
But what happened? The generator is running, but no switching has occurred. With the aid of the enclosure key, the facility engineer conducted an investigation and debriefed.
“It turned out that everything went according to plan except for a switch in the generator enclosure that was supposed to switch the building over to the generator.
“It used to be a favorite habitat for nearby birds, and it was so covered in feces that it wouldn’t actually convert.”
At least that’s how the engineers explained it.
“Ah, there’s always something unexpected, eh?” Brian said.
We’ve never seen a “switch full of poo” in any of our disaster recovery plans, but maybe we should. Or, the facility engineer might use the antics of his feathered friend to cover up his own shenanigans. Let us know what you think in the comments and email your own technical failures to Who, Me? ®