I don’t know why it always happens around Easter week. Perhaps it’s a 12-month duty cycle, or perhaps it’s spiritual warfare leading up to the week, but for the last 3 years we’ve had something significant break on Palm Sunday or the weekend before. This year, I thought we had escaped. Until Palm Sunday morning.
I was mixing and my lighting guy caught my attention and said, “Uh, Mike, the Hog keeps freezing.” I told him to re-start the computer. He replied that he had already done that twice. Uh oh. I moved over there to take a look. He reported that the console would work for about 5 minutes then freeze up. It became completely unresponsive and we couldn’t do anything. By now, it was about 8:30 and we had service in 30 minutes. Ugh.
Over the next few weeks, I’m going to write a series of posts that will detail some of our backup processes. Before I do that, I will tell you that we’re still in progress and our system is far from perfect. But we spend a lot of time on it and are moving toward a place where I feel like we can handle about anything. I will also tell you that our system was not working on Palm Sunday this year. That’s something else we learned. But before we get into all of that, I want to share something else I’ve learned over the years, and it’s probably more important than any of this.
The way in which you respond to a system failure is more important than getting the system back online. That is to say, if you start freaking out, yelling and running around like a madman, while you might get the system back online, you will have a lot more damage to clean up afterwards. If you stay calm, cool and focused on the big picture, not only will you likely recover from the disaster, you will have also gained a lot of respect from your team and leadership.
Now I’ll be the first to admit that I don’t always do this well. Sometimes I can be completely cool and collected, working through a problem with no real issue. Other times, I tend to grumble under my breath, complaining about this “crappy technology that never works…” I’ve been doing this long enough that I rarely go into full-on meltdown mode, but I can become a grumbler while I’m fixing something. It’s something I know about and am working on. Here are a few things that I think are key to recovering from a system failure.
Stay solutions focused
The moments after the lighting console craps out (again) are not when we should be complaining that they should not have cut the new one out of the budget (again), researching new options or blaming the previous guy for buying that one. What we need to do is fix the problem; in that case, get the lights back on.
We first thought we had a corrupt show file, so we tried opening a backup. That’s when we discovered our backup system hadn’t been working. We eventually found another backup show file, but the problem persisted. With time and options running out, we managed to program 3 quick looks (walk in/out, music, teaching) into the Paradigm controller before the console froze again. That got us through the weekend (albeit with the simplest lighting anyone’s ever seen there…), and we had time to really asses the situation.
Put the team first
That weekend, my lighting operator was a high school student who does great work, but has limited experience with the console. So rather than throw him into the middle of troubleshooting, I worked with him to try to come up with a solution. While the situation was a bit tense, I tried to make light of it and remind him that it was not the end of the world if we only had 3 lighting looks that weekend.
We joked that I would give him the cue when it was time to change from “music” to “teaching” so he wouldn’t miss it. Instead of getting upset, we just moved forward and tried to make it fun.
Find something that works and build from there
We discovered that we could get the console up for a few minutes before it froze up. That gave us enough time to program those three scenes into the Paradigm (ETC’s architectural control system), which got us through the weekend. As we had less than 30 minutes to figure it out before the service started, I didn’t have time to go into full-blown troubleshooting mode on the console. We just needed something to work.
A year or so ago, I had a power supply go bad in my stage rack, and somehow it took an input card with it. What was happening didn’t make any sense, I just lost 8 input channels. I spent a few minutes figuring out what happened, then simply re-patched those inputs to a card that seemed to be OK, and updated the patch on the console. We made it through the weekend, which then gave me time to figure out what was happening in the relative calm of a Sunday afternoon.
I think it’s important to not be surprised by calamitous events. If we’re not running systems that have grown to be complex and highly inter-connected, we’re running systems that are old and frail. It’s no surprise that things fail; all equipment, no matter how good or expensive has a finite lifespan. We simply need to prepare for it and remain calm when it all goes wrong. In upcoming posts, I’ll detail some of our process for surviving system failures.