Wasn't there a backup?
I thought about this during the evening of Sept. 19 as I sat in a motel room near the Phoenix airport. I was the unintended guest of American Airlines (as presumably were hundreds of others) because of a "ground stop" at my intended destination, Dallas-Fort Worth (DFW).
Turns out 100,000 passengers were stranded and 400 flights canceled that day because a construction crew severed a crucial fiber-optic connection.
The first news report quoted a Federal Aviation Administration (FAA) spokesman saying the failure was "not FAA equipment", which of course is another way of saying "it's not our fault".The equipment was under the FAA's domain and responsibility. It was indeed their fault as much as the contractors responsible, who apparently the FAA couldn't even reach in the first couple of hours. Meanwhile at DFW 40 planes and perhaps 5,000 souls were stranded on the hot tarmac with no clearance to depart and no available gate at which to deplane.
Wasn't there a wireless alternative? Apparently not.
Was there a backup cable? Yes. Laid in same trench. So the backhoe crushed both of them.
You can't make this up!
If something so tragically comical can befall the nation's third-largest airport, there's a lesson here for the rest of us.
We should take some time to think through our own vulnerabilities. Plan our backup systems to account for real-world failures.
- A winter ice storm could affect any region of the country and destroy the power grid for days. Is the backup generator tested regularly? Is there enough fuel?
- In many locales the internet lives on utility-pole cables 14 feet off the ground. These would be severed by the same storm. Is there a wireless alternative (preferably with a different provider?) What happens if an outage strikes in the same hour that payroll is running 700 checks? Is there a work-around?
- Even the intermittent electrical outages we've learned to endure seem to last longer. How good are the power-supply batteries? Will they beep out at 90 minutes or will they last through the day?
- What happens when the hosting service (AWS, Azure, Google Cloud) goes dark? Can you fire up that old server in the back room to take over for a few hours?
- And what about a virus or ransomware attack that takes your system offline? How current is your backup and how quickly can you bring a replacement system online? Don't just believe the IT guys. Test it.
- Speaking of ransomware, it usually starts with stolen credentials or a user who clicks on something he/she shouldn't. Is there an enforced password-change protocol and an ongoing audit of user login accounts?
A backup system doesn't need to be a complete replacement, merely one to get you and your customers through the outage. It could require more hardware and software but the expense will be cheap, compared to the cost of unpreparedness when disaster strikes.
It pays to be prepared.
Getting back to that Sept. 19 DFW FAA fiasco, two positives did emerge: 1) Most assuredly, every FAA facility in the country immediately checked their own systems to find and hopefully fix vulnerablities, and 2) I was so impressed by the demonstrated professionalism of the American Airlines staff during this most difficult day, I'm now a regular AA flier.
