As a security analyst in a growing company, it is often easy to get into the “set it and forget it” mentality. You create one alert after another. Then another. And another. With each alert comes a certain amount of work for an analyst. Analyst time costs money, and some alerts consume more time than others. If most of the alerts result in false positives, a large amount of resources are being spent unnecessarily. This underscores the importance of continuous tuning of alerts instead of iterating through useless noise.
But how can we know what to prioritize in order to free up our analysts’ valuable time? You should have a mechanism and strategy for collecting metrics describing the outcome of your security events. Whether you track alert triage activity in a ticketing system or not, you can start measuring today which events were true positive and actionable, or false positive and a waste of time.
True positive, or positively a waste of time?
What is a true positive anyway? Ideally, you are setting alerts tied to actionable use cases driven by a business objective. For example, your organization likely wants to prevent unauthorized access to sensitive internal resources. You can set alert and enrichment logic in your monitoring instrumentation to notify you when such activity occurs. You might even have a triage playbook that is enacted if the alert fires. An analyst picks up the ticket, looks at the data, and validates that the activity did in fact occur and then proceeds to triage the event according to the playbook.
A false positive is when the analyst determines that the use case the playbook was written for didn’t actually occur. As previously mentioned, you want to tune these as they occur instead of continuing to waste time on them.
This seems like a long-winded way to distinguish between true and false positives, but it is necessary to illustrate how valuable time can be spent.
Where is the valuable time being spent?
How many of your SOC analyst tickets are true positives? 100%? 50%? 10%? Or even less? I’d bet money that 10% or less of your company’s security incident tickets are true positive. That leaves around 90% or so false positives. If you don’t have a false positives rate on hand, I suggest adding a field within your ticketing software to include a drop down for false positives and true positives. Depending on how far you go back in the ticket, it could take some time to start this process. As a starting place, I’d go back at least one month from the current date. You can see in the diagram below that the first iteration of your new metric can be simple and high level.
Even with this high level view you can see there might be some alerts that need to be tuned. But do you know which ones?
Now, we get to the fun part, breaking false positives down by alert. Getting to this point is how you can make an impact in your organization.
Once you have it broken down, it might look something like the table above, with one alert firing 150 times. Assuming this alert takes about 30 minutes to properly triage, that’s 75 hours an analyst was unable to better use to help the organization.
Now what do I do with all these metrics?
Now that we have some actionable metrics, what do we do with this data? We pick out what is our biggest contributor to the false positives. In our example above, it’s a malicious website visit alert. What can we do with it? There are a few things we can do, starting off: take a look at the alert in your SIEM. What logs are coming in for this particular alert? Is there a pattern? In the case of the malicious website visit alert the logs being returned have a field “action= deny”. With this action it’s saying that these have been denied, so now we know what we can change.
Once we have the cause behind the false positives, we can tune the alert. After tuning this alert to look for successful connections and a threshold for denies. Immediately we see the impact of daily ticket numbers going down.
Conclusion
Starting a cycle where you are looking at the overall classification of tickets is extremely important to any SOC. With doing this on a monthly basis, you can highlight: pain points for the team, show why you need to start using automation, and get into a mentality of “set it and improve it”