A playbook for modernizing security operations

The security community is continuously changing, growing, and learning from each other to better position the world against cyber threats. In the latest post from our new Voice of the Community blog series, Microsoft Product Marketing Manager Natalia Godyla talks with Dave Kennedy, Founder and Chief Technology Officer at Binary Defense. Dave shares his insights on security operations—what these teams need to work effectively, best practices for maturing the security operations center (SOC), as well as the biggest security challenges in the years to come.

Natalia: What are the standard tools, roles, frameworks, and services for a security operations team? What are the basic elements a SecOps team needs to succeed?

Dave: Your security operations team must have visibility into your infrastructure, both on and off-premises. Visibility is key because many of these attacks start with one compromised asset or one compromised credential. They spread across the network and in many cases, they wreak a lot of damage. Your endpoints, network infrastructure, and cloud environments are where a lot of these issues happen. I recommend starting with high-risk areas like your endpoints.

Then, you need somewhere to ingest that data, such as security information and event management systems like Microsoft Azure Sentinel, and to go through log analysis and determine if anything has been compromised.

Also, frameworks like the MITRE ATT&CK framework are a great baseline of saying, well, here are specific attacks that we’ve seen in the wild that are mapped to specific adversaries that are in our industry vertical. That can help you prioritize those, get better at detection, and make sure you have the right logs coming into your environment to build detections.

Natalia: How can a team operationalize the MITRE ATT&CK framework?

Dave: When people first look at the MITRE ATT&CK framework, they freak out because it’s so big, but it’s a treasure trove of information. Everybody was focused on a castle mentality of being able to protect everything but what happens when an attacker is in your environment? Protection is still very important and you want to have protective mechanisms in place, but protection takes time and requires cultural changes in many cases. If you’re doing something like multifactor authentication, you have to communicate that to users.

The MITRE ATT&CK framework tells you what happens when attackers have gotten around your preventive controls. What happens when they execute code onto a system and take other actions that allow them to either extract additional information or move to different systems through lateral movement or post-exploitation scenarios and get access to the data? The MITRE ATT&CK framework is a way to conceptualize exactly what’s happening from an attacker’s standpoint and to build detections around those attack patterns.

With the damage we see, it’s usually several hours, days, or months that an attacker has had access to an environment. If we can shave that time down and detect them in the first few minutes or the first few hours of an attack and shut them down, we’ve saved our company a substantial amount of damage. It’s a framework to help you understand what’s happening in your environment and when unusual activities are occurring so you can respond much more effectively.

Natalia: How much of the MITRE ATT&CK framework should a security team build into their detections? How much should they rely on existing tools to map the framework?

Dave: Many tools today have already done a lot of mapping to things like the MITRE ATT&CK framework, but it’s not comprehensive. If you have an endpoint detection and response product, it may cover only 20 percent of the MITRE ATT&CK framework. Mapping your existing tools and technology to the MITRE ATT&CK framework is a very common practice. For instance, you may have an email gateway that uses sandboxing virtualization techniques that detonate potential malware to see whether it’s effective. That’s one component of your technology stack that can help cover certain components of the MITRE ATT&CK framework. You might have web content filtering that covers a different component of the framework, and then you have endpoint detection and responses (EDRs) that cover a percentage of the endpoint detection pieces.

Technology products can help you shave away the amount of effort that goes into the MITRE ATT&CK framework. It’s really important, though, that organizations map those out to understand where they have gaps and weaknesses. Maybe they need additional technology for better visibility into their environment. I’m a huge fan of the Windows systems service, System Monitor (Sysmon). If you talk to any incident responder, they’ll tell you that if they have access to Sysmon data logs, that’s a treasure trove of information from a threat hunting and incident response perspective.

It’s also important to look at it from an adversary perspective. Not every single adversary in the world wants to target your organization or business. If you’re in manufacturing, for instance, you’re not going to be a target of all adversaries. Look at what the adversaries do and what type of industry vertical they’re targeting so you don’t have to do everything in the MITRE ATT&CK framework. You can whittle the framework down to what’s important for you and build your detections based on which adversaries are most likely to target your organization.

Natalia: If a team has all the basics down and wants to mature their SecOps practices, what do you suggest?

Dave: Most security operations centers are very reactive. Mature organizations are moving toward more proactive hunting or threat hunting. A good example is if you’re sending all of your logs through Azure Sentinel, you can do things like Kusto Query Language and queries in analysis and data sets to look for unusual activity. These organizations go through command line arguments, service creations, parent-child process relationships, or Markov chaining, where you can look at unusual deviations of parent-child process relationships or unusual network activity.

It’s a continual progression starting off with the basics and becoming more advanced over time as you run through new emulation criteria or simulation criteria through either red teaming or automation tools. They can help you get good baselines of your environment and look for unusual traffic that may indicate a potential compromise. Adversary emulations are where you’re imitating a specific adversary attacker through known techniques discovered through data breaches. For example, we look at what happened with the SolarWinds supply chain attack—and kudos to Microsoft for all the research out there—and we say, here are the techniques these specific actors were using, and let’s build detections off of those so they can’t use them again.

More mature organizations already have that in place, and they’re moving toward what we call adversary simulation, where you take a look at an organization’s threat models and you build your attacks and techniques off of how those adversaries would operate. You don’t do it by using the same type of techniques that have previously been discovered. You’re trying to simulate what an attacker would do in an environment and can a blue team identify those.

Natalia: What are best practices for threat hunting?

Dave: Threat hunting varies based on timing and resources. It doesn’t mean you have to have dedicated resources. Threat hunting can be an exercise you conduct once a week, once a month, or once a quarter. It involves going through your data and looking for unusual activity. Look at all service creations. Look at all your command line arguments that are being passed. A large percentage of the MITRE ATT&CK framework can be covered just by parent-child process relationships and command line auditing in the environment. Look at East to West traffic, not just North to South. Look at all your audit logs. Go through Domain Name System (DNS traffic).

For instance, a user was using Outlook and then clicked on an email that opened an Excel document that triggered a macro that then called PowerShell or CMD EXE. That’s an unusual activity that you wouldn’t expect to see from a normal user so let’s hone in on that and figure out what occurred.

You can also conduct more purple teaming engagements, where you have a red team launch attacks and detection teams look through the logs at the same time to build better detections or see where you might have gaps in visibility. Companies that have threat hunting teams make it very difficult for red teamers to get around the different landmines that they’ve laid across the network.

Natalia: What should an incident response workflow look like?

Dave: An alert or unusual activity during a threat hunting exercise is usually raised to somebody to do an analysis. A SOC analyst typically has between 30 seconds and four minutes per alarm to determine whether the alarm is a false positive or something they need to analyze. Obviously, what stands out are things like obfuscation techniques, such as where you have PowerShell with a bunch of code that looks very unusual and obfuscation to try to evade endpoint protection products. Some of the more confusing ones are things like living off the land, which are attacks that leverage legitimate applications that are code signed by the operating system to download files and execute in the future.

A research phase kicks off to see what’s actually going on. If it’s determined that there is malicious activity, usually that’s when incident response kicks in. How bad is it? Have they moved to other systems? Let’s get this machine off the network and figure out everything that’s happening. Let’s do memory analysis. Let’s figure out who the actual attacker was. Can we combine this with red intelligence and determine the specific adversary? What are their capabilities? You start to build the timeline to ensure that you have all the right data and to determine if it’s a major breach or self-contained to one individual system.

We ran several incident response scenarios for customers that were impacted by the supply chain attacks on SolarWinds and the biggest challenge for the customers was their logs didn’t go back that far so it was very difficult for them to say definitively with evidence, that they know what happened.

Natalia: What does an incident responder need to succeed?

Dave: I’d strongly recommend doing an incident response readiness assessment for your organization. I also recommend centralized logging—whether that’s a security information and event management (SIEM) or a data analytics tool or a data lake—that you can comb through. I’m a huge advocate of Sysmon. You can do power execution, command line auditing, DNS traffic, process injection, and parent-child process relationships. I’d also suggest network logs. If you can do full packet captures, which not a lot of organizations can do, that’s also great. If you can pull data packets coming from a secure sockets layer (SSL) or transport layer security (TLS) and do remote memory acquisition, that’s also really important. Can we retrieve artifacts from systems in a very consistent way?

Tabletop exercises can also get executives and IT on the same page about how to handle incidents and work together. Running through very specific types of scenarios can help you figure out where you have gaps or weaknesses. When I was the Chief Security Officer at Diebold, we would run through three to four tabletop exercises a year and include our senior leadership, like our CEO and CFO, twice a year. It was eye-opening for them because they never really understood what goes into incident response and what can happen from a cyber perspective. We’d run through actual simulations and scenarios of very specific attacks and see how they would respond. Those types of scenarios really help build your team’s understanding and determine where you may need better communication, better tooling, or better ways to respond.

Natalia: What other strategies can security operators implement to try to avoid attacks?

Dave: When you look at layered defense, always improving protection is key. You don’t want to just focus on detection because you’re going to be in firefighting mode all the time. The basics really are a big deal: things like multifactor authentication, patch management, and security architecture.

Reducing the attack surface is important, such as with application control and allowed application lists. Application control is probably one of the most effective ways of shutting down most attacks out there today because you have a good baseline of your organization. That applies very consistently to things like the Zero Trust model. Become more of a service provider for your organization versus providing everything for your organization. Reducing your attack surface will eliminate the noise that incident responders or SOC analysts must deal with and allow them to focus on a lot of the high-fidelity type things that we want to see.

One of the things that I see continuously going into a lot of organizations is that they’re just always in firefighting mode, 90 percent of their alarms are false positives, and they’re in alarm fatigue. Their security operations center isn’t improving on detections. You really need somebody on the strategy side to come in and say: Can we lock our users down in a way that doesn’t hinder the business, but also lowers the attack surface?

Natalia: How does vulnerability assessment strategy fit into a SOC strategy?

Dave: Program vulnerabilities and exposures are key opportunities that attackers will use. When we look at historic data breaches, those that use direct exploitation and not phishing were using common vulnerabilities and exposures (CVE) typically of six months or older that allowed them access to a specific system. That makes it really important to reduce attack surfaces and understand where vulnerabilities are so we can make it a lot more difficult for attackers to get in.

It’s not a zero-day attack that’s hitting companies today. It’s out-of-date systems. It’s not patching appropriately. A lot of companies will do well on the operating system side. They’ll patch their Windows machines, their Linux machines, and Apple. But they fail really hard with the third-party applications and especially the web application tier of the house—middleware, microservices. In almost every case, it comes down to ownership of the application. A lot of times, IT will own the operating system platforms and the infrastructure that it’s on, but business owners typically sponsor those applications and so ownership becomes a very murky area. Is it the business owners that own the updates of the applications or does IT? Make sure you have clear owners in charge of making sure patches go out regularly.

If you’re not going through regular vulnerability assessments and looking for the vulnerabilities in your environment, you’re very predisposed to a data breach that attackers would leverage based on missing patches or missing specific security fixes. The first few stages of an attack are the most critical because that’s where most organizations have built their defenses. In the latter phases of post-exploitation, especially as you get to the exfiltration components, most organizations don’t have good detection capabilities. It’s really important to have those detection mechanisms in place ahead of time and ensure those systems are patched.

Natalia: We often discuss the challenges facing security today. Let’s take a different approach. What gives you hope?

Dave: What gives me hope is the shift in security. Ten years ago, we would go into organizations from a penetration testing perspective and just destroy these companies. And then the next year, we’d go in and we’d destroy these companies again. Their focus was always on the technical vulnerabilities and not on what happens after attackers are in your castle. The industry has really shifted toward the mindset of we have to get better at looking for deviations of patterns of behavior to be able to respond much more effectively. The industry is definitely tracking in the right direction, and that really gives me hope.

Learn how Microsoft Security solutions can help modernize Security Operations.

To learn more about Microsoft Security solutions visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.