If all IT systems fail

Warning, this post takes 20-30 minutes to read…

On June 27th 2017 I received a phone call around 15:00 CET, telling me to immediately shut-down my computer, as a virus was spreading through TNT’s IT network. I simply finished the workshop I was delivering as I had no dependency of my computer. I hate using powerpoint anyway in workshops.

I was facilitating a change and crisis management workshop in a hotel for TNT/FedEx change managers, internally known as the “Integrators”, who would be working in the field soon to support local management teams with in integration of TNT and FedEx operations in Europe. I did not realize at that moment how change and crisis would be in my life only a few hours later.

Crisis
That evening I received another phone call from TNT’s European Road Network. They requested my immediate help with managing a quickly growing crisis. They just received a phone call from the local police, demanding to open the entry gate of the road hub. Trucks were lining-up on both sides of the highway and another 100 trucks were forming a chain from the entry-gate to the highway,. Now trucks were parking on the highway, a very unsafe situation. Of course the gate was opened to avoid any further danger, all trucks were allowed on the yard, resulting in total operational chaos. Having worked there as European Road Network Director for 12 years, I of course did not hesitate a second.

Scale
TNT’s European Road Network consists of 13 large road hubs and 7 medium size hubs. Together they form the backbone of the largest time- & day-definite road network in Europe, connecting 48 countries and approx. 600 depots, moving some 40,000,000 kg that week. There is also TNT’s European air hub in Liege, Belgium where we operated just over 50 aircraft. This hub connects the same 600 depots via 70 airport facilities TNT operates from. TNT has in total some 1250 locations worldwide, but 80% of its global volume is in Europe. Every day some 40,000 vans and small trucks operate from the 1250 depots moving the road and air shipments to more that 1,000,000 customers. This huge network fully depends on technology, data, systems etc. Now, nothing worked, not one scanner, laptop, sort-machine, anything that needed a computer or data become useless in less than 2 hours. No e-mail, intranet, we were facing a total system black-out. We were blind and facing the biggest challenge (we called it “catachallenge”) ever. But we are TNT.

Millennium/Bug
As the virus spread globally, we were facing a challenge we faced 18 years earlier. In 1999 we prepared ourselves for the possible worst case scenario on 1-1-2000, the impact of the millennium bug. I was the millennium project director for all of TNT’s networks in those days. I checked our archives for the contingency plans developed in 1999, but soon I learned they were removed in 2015. We only kept the digital version ….

NotPetya
Security experts believe the attack originated from an update of a Ukrainian tax accounting package called MeDoc. Experts agreed that Petya was masquerading as ransomware, while it was actually designed to cause maximum damage, with Ukraine being the main target. The virus was therefore named “NotPetya”. TNT, the market leader in Ukraine, using the MeDoc application, was infected and the virus was able to spread throughout TNT’s IT network. All systems worldwide failed, central systems as well as local systems.

First 24 hours
Despite having no access to the digital version of our contingency plans, and all of TNT’s facilities having parcels stacked up-to the ceilings, we decided to stay open for our existing customers in the first 24 hours of the crisis. This was decided through a cascading approach, consulting all operations managers at all levels, worldwide, in just 2 hours, which revealed great confidence in the organisation being able to handle this crisis with an outlook of an acceptable service level for our customers, considering the circumstances.

First 3 days
The first 3 days were critical, it would proof if our confidence was appropriate or creating an even bigger disaster. We designed new processes (old people like myself still had knowledge of the ‘old’ days processes, which happened to be really useful). We bought new stand-alone laptops, walky-talky’s, phones, and other small communication equipment, signage material, and many more for physically organising a fully manual operational process. The big one thing that also helped us survive were our routing labels. Attached to each piece moving in our network, people could use the sequence of sort codes (location codes) printed on each label. The codes showed most of the routing to the piece’s destination. On top of this, 90% of our shipments were already labeled by our customers.
We survived the first 3 days, but it took working around thew clock for everybody. We redesigned processes every 2 hours by bringing all teamleaders and managers together, have an open feedback session, focus on what we learned and take decisions what we would change in the next 2 hours.
Volumes were not all moved, the road network also became a contingency for the air network, custom clearance was not possible in the first 3 days. However, software of our sort-machines was adjusted and were now able to sort in a stand-alone mode, based on sort tables we created. Our subcontractors were fabulous, they did whatever was needed without questions asked. Staff worked in 8 hour shifts but when we asked for more, they put their hand up, without any exception. New team leaders were born in this moment because they were natural leaders. Old problems, complains, issues between staff, etc. did no longer play any role in working together. We all got into survival mode, more serious than ever before. We all knew this was all or nothing, recovery or bankruptcy. We were so lucky to be part of the FedEx family, and they showed the confidence in the TNT way of thinking. It was really amazing to see how this all came together in just 3 days.

Crisis Management
After the first 3 days, you know what the outcome is going to be. It is a very strange sensation, but I clearly felt it and many colleagues said the same. We all said: “we will survive this one, and get stronger afterwards”.
We also knew this was not going to be a quick fix. Very quickly we understood that IT systems would not be up and running in a couple of weeks.
We also were very clear on staffing levels, we needed more hands. Our usual suppliers of temp staff scaled up, Head Office staff voluntary started working in our warehouses. We were not only transporting parcels and freight, we now also created schedules for busses, taking staff to those places were extra hands were needed.
Communication is vital from day one. After the initial shock and 3 days of finding our survival mode, communication steadily changed, focused on new requirements, press, customers, staff, suppliers, even family of staff. But also local government, customs and other formal bodies, somehow related to transport by air and road.
We needed a communication rhythm, a fast (and short) cascade process because progress information, or explaining new decisions is critical, you want to do it first time right whenever possible.
Systems need to be replaced where possible. Whatsapp became a the core communication channel. For teams, departments, facilities, leadership, but also here for suppliers. We even build a database connected to phones where WhatsApp messages were analysed and data extracted to build reports allowing us got understand where the trucks were in our network (approx. 2,500 at any moment in 24 hours), how, when, where deliveries were done. But also reports for our suppliers so they could send invoices and get paid. We also had many customers providing its with the information so we could create invoices to them. Cashflow, although not critical in the first weeks due to the fact we were now FedEx, becomes a hot topic soon when you usually spend 1xx million weekly and your revenues are just a bit higher. Salaries need to be paid monthly as well, especially now! But what if you lost all payroll data? We managed to pay all salaries on time.

People are flexible, thus so is the organisation
If we learned one thing, it is about how flexible people are. Also, how responsive they are to new requirements, willing to “just try it” and not being afraid to fail. The 180 degree turn in behaviour and mindset we saw before and after the crisis is really mind blowing.
Here are some observations:
– Purpose is extremely clear
– Focus and staying focused is natural
– Clear and concise communication is obvious
– Roles & responsibilities become better clarified at all levels and between levels
– Alignment between people and activities happens on the job and when needed
– Workload is self-managed by teams based on critical output needed
– Continuous improvement becomes everybody’s responsibility
– Managers become Leaders, and if not someone else stands-up in the team to support and make it work
– People don’t judge, they choose what works (together) and go with this flow
– Teams are suddenly steering themselves, far less need for management involvement
– Learning from experience is accelerated, as well as sharing this widely
– Customer focus doesn’t shift, in fact it becomes stronger
All benefits, that under normal circumstances would take years to reach the levels we reached in the first weeks of the crisis. And we were able to maintain a lot after the crisis.

Conclusions
Pre-crisis. If we would have saved the millennium procedure manuals, we would have had the storyboard for managing this crisis. This also includes maintaining the tools needed to maintain minimum operations and communication. A crisis procedure is also critical for every management team in the organisation, with clear roles and responsibilities, as well as agreements on place/time/intervals of meetings, and who sits in what team (e.g. communication, operations, planning, legal, IT, Finance, Customer Service, etc.).
Crisis-Detection is everybody’s responsibility. Being able to act on the first signals can have a significant impact. Taking bold decisions early enough prevents major disasters to escalate afterwards. Detection depends often also people’s awareness, willingness to recognise that whatever they observed could be resulting into a crisis. Having one place where people can communicate to is also very helpful for a quick and accurate response.
Crisis-containing. One aspect we learned is that acknowledging what you don’t know (yet) is very important. Furthermore, setting fairly rigid information priorities is key, so is expressing sympathy (surprise), and having (feeling) the liberty to express concern without opening the company to liability issues.
Crisis-recovery. Following-up on information requests is crucial, as well as communicating with all stakeholders. At the same time, informing people about corrective actions and mentioning financial implications, whilst continually expressing compassion, is maintaining clear purpose, energy and momentum. The crisis team should consistently track issues, risks, etc.
Post-crisis. Still continue the follow-up process on issues. Collect crisis records, stakeholder feedback, & media coverage. Conduct interviews with key personnel. Remember to shape memories: internal and external audiences, what did we learn? Assessing effectiveness and examining records. Take a look (on hindsight) at phases of the crisis, determine changes.

Post-Mortum.
Overall, there is so much to learn from a crisis. We managed to not only survive, we now know how to survive the next one, and also a different one. TNT/FedEx have created contingency plans that are unprecedented, for situations never imagined before. On top, we have trained/experience leadership and staff. Most important, we know what went wrong, not just the virus impact, we learned how people in our organisation respond to a crisis, we know how to be better next time, which hopefully never happens. But we all know Murphy’s law, and he was an optimist they say!

What’s important in a crisis is to stay in control of communication. These five Cs of communication that can help when communicating bad news:

Concerns – focus attention on the needs and concerns of the audience. Don’t make the message focused on you or on damage control. Where appropriate, acknowledge the concerns of the people and deal with them directly.

Clarity – where possible, leave no room for improper assumptions or best guesses. The clearer your message is, the more people will believe you are disclosing everything they need to know. When communication is vague it implies that you are hiding something or only revealing partial truths.

Control – remain in control of what is being said. When you lose control of the message there is no stopping the flow of inaccurate information. Your whole communication plan needs to center on remaining in control.

Confidence – your message and delivery must assure your listeners that the actions you are taking are in everyone’s best interests. It’s one thing to deliver bad news openly, and it’s another to effectively convey that you are doing everything you can to minimize the negative impact. Speak with confidence but don’t lose sight of your humanity – acknowledge that you can’t make everything ok, but make sure people know you’re doing your best.

Tagged on: Crisis Management, Cyberattack

XL-Growth B.V.

Leadership | Learning & Development | Change | Operations

Leave a Reply Cancel reply