Your Playbook to a better Incident Response Plan

In 2023, 1271 incidents were reported to European Authorities via EIDAS, NISD, and EECC, a 20% increase compared to the previous year. With more and more regulations entering into force in the next few years (such as NIS2 and CRA), a larger number of organisations will be forced to report their incidents too due to the increase of organisations in scope for this mandatory reporting, which will most likely make this number increase even more. Maybe you will be among these and are wondering how to prepare yourself.

In the previous entry in this series, we introduced the 3 pillars for building resilience: Respond, Sustain, and Recover. In this blogpost, we will explore the first pillar: Respond, the dimension that is crucial for mitigating the impact of an incident, particularly in the case of a ransomware attack.

Why exactly should we prepare for a potential incident? Why not wait after our first incident, once we have more experience to learn from and therefore lose less time on the theory and research? While there is merit in learning and adapting your response based on previous incidents, proactive preparation can significantly mitigate the impact of an such incident in the first place. Some of these benefits include:

Minimising the downtime of your core business by giving your Incident Response Team appropriate tools and measures to tackle the incident as soon as possible,
Defining clear communication guidelines and plans for both internal and external stakeholders to help safeguard your company’s reputation,
Ensuring regulatory compliance with an ever-evolving set of acts, regulations, and directives requiring very specific reporting at very specific intervals,
Improving response times and reducing likelihood of an incident by finding areas where automation or improvements can be made.

How and where can you start your incident preparedness journey? In this blogpost, we will cover some of the most important aspects you can work on before facing an incident.

Build your foundation and scope

As already depicted previously in this series of blogpost, when dealing with a crisis or incident, multiple teams operate, with very different but important objectives. Depending on the incident and its impact on the Business, one, multiple, or all teams may be involved. Each company has a different structure, different governance, and different standards and regulations to comply with. This inevitably has an impact on how incidents are handled. Therefore, you need to define the scope of your (Cybersecurity) Incident Response Process:

do you have an overall incident management team handling all types of incidents (e.g., your Incident Management Team handles CyberSecurity incidents as well as Physical, IT, Business, and other types of incidents)?
do you handle cyber incidents separately, or as part of the Major Incident Management Process (e.g., your Major Incident Management Team oversees teams handling all incidents types, including your Cyber Security Incident Management Team)?
do you have external stakeholders playing a role in your incident response (e.g., retainer with external Incident Response provider)?
do you provide your incident response process capability to other teams internally or externally (e.g., a subsidiary or remote branch)?

All of these points will drastically change the scope of your incident management process, and defining it correctly is the first step to ensure you do not interfere with another team’s process.

With this scope in mind, you can start determining and drafting the documents that you need. Among these potential documents, we recommend two in particular: the incident Response Plan, and Incident Response Playbooks.

The Incident Response Plan will be your main governance document, defining your process and relevant information surrounding it: incident taxonomy and classification, people involved, and steps of the overall process to only name a few.
Your Incident Response Playbook(s) will go further than your Plan, and will highlight the steps to be taken for particular incidents. These steps might have similarities between incidents types, but other will change drastically (the way you manage a Denial of Service incident will require different steps than a Data Exfiltration incident). These Playbooks will be particularly helpful for defining a common response process for your different Incident handlers, but also for training and briefing new members in your team.

Determine how to trigger your Incident Response process

Now that you have defined the scope of your Incident Response Process, we need to figure out how to trigger it. This also raises the question of defining “What is an Incident?”. This seems easy, but has a lot of components to it:

What is the threshold for an event to become an incident?
When does the incident become part of something bigger, such as a crisis?
Who triggers the Incident Response process?
How can someone report a potential incident?
Where, how, and to whom do we communicate during the Incident handling process?

You may also want to have different types of Incident severity or priority based on specific criteria, such as the type of Incident (this is where an Incident Taxonomy can be useful, such as the ENISA RSIT), the scale of the attack, the criticality of the systems impacted, or even the type of user impacted. By defining these criteria, you can start forming a table or matrix defining types of incidents:

Incident Severity	Potential criteria	Examples
Low	– Only one user or device impacted – Very low impact to Business – Pertains to an incident type typically considered as a low impact for your company (e.g. phishing) – Incident can be handled by 1 or 2 incident handlers	– A malicious executable was detected and blocked, but some analysis should be done – A user provided company credentials through a phishing email, and their account needs to be reset
Medium	– Impacts multiple devices, potentially servers – Low to medium impact to Business – Pertains to an incident type typically considered as a medium impact for your company (e.g. malware, DDoS) – Might require notification to relevant authorities depending on applicable regulations (e.g. NIS2, DORA) – Incident requires collaboration between a few teams to resolve the incident, prompting the involvement of the Incident Response Team	– Compromise on a single server leads to a downtime of low-criticality applications and requires restoration from backups – A Denial of Service Attack affects the availability of our public-facing services for a few hours
High	– Impacts multiple departments, crown jewel assets (e.g., Active Directory, backups servers), or even the company as a whole – Significant impact to Business – Pertains to an incident type typically considered as a high impact for your company (e.g. ransomware) – Must notify relevant authorities depending on applicable regulations (e.g. NIS2, DORA) – Incident requires strong collaboration with multiple teams, and prompts the escalation to the Crisis Management Team	– Large scale ransomware outbreak – Large data breach impacting sensitive data about customers and/or intellectual property – Running out of coffee beans in the morning?

Define and assign your roles & responsibilities

As part of your Incident Response efforts for large incidents, multiple people will need to play a role, whether it’s coordinating teams and communications, analysing the systems, implementing measures, taking decisions, … Among the roles you may want to define, we have:

Your Incident Response Team Leader, steering the discussion and the incident handling process. They are typically the owner of the process and are the point of contact with the Crisis Management Team for critical incidents.
Your Event Logger, helping keep track of the discussion by taking notes, keeping an eye on the communication platforms and Plans & Playbooks available. They are typically summarizing each Incident Response meeting and making sure actions discussed are assigned to owners. While in large and mature organisations this role might be filled by a specific person, most organisations will give this role to one of the Incident Handlers, potentially your Incident Response Team Leader or Incident Manager.
Your Senior Incident Handler(s), following the Incident Response Process you defined, suggesting measures to respond to the situation, and helping to drive the incident to its resolution.
The initial Incident Responder, typically responsible for the initial analysis of the security incident, and who triggered the Incident Response Team meeting if applicable.
Depending on the specificities of the incident, you may want to include any representatives of other teams who have actions to take in this incident. This typically involves more and more people as the company structure scales up and silos start forming. These can include: your infrastructure team, your identity team, your networking team, your security compliance team, and others.

For each of these roles, ensure you define at least two stakeholders who can assume the responsibilities linked to their role. This will ensure that, in the event of an unavailability of the primary role owner, someone can fill in for them and fulfill their duties.

Give a mandate to your Incident Response Team

Dealing with an incident often requires quick actions to contain the incident as fast as possible and avoid further impact to the Business. However, some of these actions may have unintended consequences on the Business and add even more unnecessary impact to the situation (e.g. cutting the internet uplink to stop a data exfiltration, which also halts the Business and affects income-generating processes). In such cases, it may be better to verify with the responsible teams or owners to ensure the actions can be taken safely. At the same time, we do not want to bombard top management with approval every time we need to erase a phishing email from mailboxes or reset the password of a compromised account…

This is where having a clearly defined mandate for your Incident Response Team is important: what actions can we take, without needing any prior approval, and under which circumstances. This can be a very simple “do whatever it takes to protect the business”, or be a very specific and highly defined list of actions and criteria. Among the important actions to consider, in particular for ransomware: can we isolate a workstation or server, can we isolate a network segment or the entire network, can we disable a user account, can we pre-emptively isolate backup systems, …

Create summaries, checklists, or flowcharts

If you followed all the previous steps in the blogpost and documented your process thoroughly, your Incident Response Plan is probably becoming a sizeable document, with multiple sections, subsections, graphs, matrices, and appendices. You also created a beautiful and complete Ransomware Playbook with every single measure you can take to respond to such an incident. In fact, it’s so well prepared that you now have 100+ pages on this Playbook, and appendices are dangerously reaching double-letters territory. You couldn’t be better prepared even if you tried to… But would you be able to navigate such documents during an actual incident or crisis? Where’s that prioritisation matrix again? Wait, what was the step after the initial containment action again?

These long documents are perfect to properly define the complete process but can become cumbersome to use when dealing with an actual incident. This is where you can introduce short summaries, checklists, flowcharts, and other types of infographs that condense the information in an easy and digestible way. The goal with these one or two-pagers is to guide your response and briefly remind you of steps and actions to discuss. Your experience does the rest (and if you hit a hole in your memory, the full document is still there to help you out if needed).

Test your newly created processes

Just as Monopoly prepares you to lose your entire fortune to your relatives and friends, Dungeons & Dragons prepares you to face a horde of goblins and dragons, and Escape Rooms prepares you to free yourself from the basement of a maniacal scientist, tabletop and simulation exercises prepare you to face potential incidents and crises.

Such an exercise typically lasts a few hours and forces you in a fictitious situation, in this case an incident scenario. The goal is to test your reaction when faced with these situations and determine if there are any lessons to be learned from this test. For ransomware in particular, such tabletops allow you to test the entire process: from the initial detection to the recovery, while also covering the analysis of the incident, protection of backups and containment of infected systems, ensuring sustainability of the operations with the Business Continuity Team, recovering from the incident via backups with the collaboration with the Disaster Recovery Team, and reporting to authorities and other stakeholders through the Crisis Management Team.

Exercises come in all shapes and sizes, from a casual read-through of the process (also sometimes called tabletops), to simulations or even full interruptions (e.g., with the use of a “Chaos Monkey“) that will disrupt some part of your business whenever it wishes to. A Simulation Exercise is a good middle-ground: it allows you to discuss your processes and put you in a situation to challenge your response, but it doesn’t impact your Business and production environment directly.

In the end, the best approach might be a combination of all types of exercises, where you can test multiple scenarios over time in different types of exercises. This is where your Exercise Test Plan comes into play. With this document, your goal is to define the types of exercises you want to organise, at which frequency, and for which types of impacts or scenarios. This allows you to plan multiple years in advance and see your progress over time.

Whichever method you choose, make sure you use it as an opportunity to determine areas of improvement for your processes. Review your documents regularly, test them in a fictitious setting via a mix of tests and exercises, and learn from your potential mistakes. This ongoing cycle of review and refinement will help you build stronger resilience over time.

Now what?

In this blogpost, we discussed the importance of defining our Incident Response Process through Plans and Playbooks:

Defining the scope of the process and documentation
Determining how to trigger the Incident Response Process
Assigning roles and responsibilities
Giving a mandate to the relevant stakeholders
Creating summaries and checklists
Testing these processes.

There are now of course other actions you can take to improve your readiness to face incidents further, such as creating procedures for specific tasks (e.g. how to recover the Active Directory from scratch), automating complex tasks (e.g., creating a script that will lock every account that you provide in a CSV file), and training your teams on new techniques.

Now that we explored the “Respond” pillar, stay tuned for the next entry in this series of blogposts, where we will explore the “Sustain” pillar.

Damien Gremes

Damien is a Cyber Strategy & Architecture consultant within NVISO, with a main focus on Incident Readiness. Through creating, reviewing, and testing multiple incident response plans, playbooks, and exercises for companies of different industries and sizes, he gained insights on how to run efficient and effective incident management operations.

Series Navigation

<< Building Cyber Resilience Against Ransomware Attacks

F1TYM1

Your Playbook to a better Incident Response Plan

Build your foundation and scope

Determine how to trigger your Incident Response process

Define and assign your roles & responsibilities

Give a mandate to your Incident Response Team

Create summaries, checklists, or flowcharts

Test your newly created processes

Now what?

Damien Gremes

Like this:

Your Playbook to a better Incident Response Plan

Build your foundation and scope

Determine how to trigger your Incident Response process

Define and assign your roles & responsibilities

Give a mandate to your Incident Response Team

Create summaries, checklists, or flowcharts

Test your newly created processes

Now what?

Damien Gremes

Share this:

Like this: