Drowning in logs from firewalls, Windows, network switches, Wi-Fi routers and the like? System information and event management (SIEM) suites are supposed to swallow such data and refine it into manageable alerts.
However, despite the increasing use of machine learning security teams still aren’t getting the most out of their SIEMs.
At this month’s annual SecTor conference in Toronto, Julian Pileggi, a senior incident response consultant at the Canadian branch of FireEye’s Mandiant consulting services gave infosec pros some tips on supercharging their SIEMs.
First the bad news: SIEMs still need dedicated attention. “A key that a lot of organizations are realizing is you need to have a dedicated team to manage the content development [rules and use cases] and help with automating workflows within your blue team.”
Most organizations start with Level1 and 2 experienced staffers, he said, the add those with Level 3 experience (subject matter experts like forensic analysts, experienced incident responders and others). One main goal is to automate workflows as much as possible, Pileggi said. Ultimately those with threat intelligence and threat hunting can be added to the team.
He advised administrators to have the most recent version of Windows PowerShell for creating scripts. The usual logs will be poured in – including from incident detection and prevention devices – but if possible also include Windows SYSMON logs.
In addition to having primary and secondary redundancy, ensure the SIEM’s test environment uses production data. Otherwise, when testing the logic of rules being drafted the performance impact won’t be relevant to the operational system.
The good news is that SIEMs can and should be tuned for zippy performance. After all, Pileggi pointed out, what use is a query of historical data if it takes a week to get an answer? Remember, he said the median dwell time of an attacker can be as much as 99 days.
And, he added, the performance of a SIEM should be measured by its speed, not by CPUs or RAM or hard drive size. “You should be able to do a query on an atomic indicator –for example, one IP address for a 14 day period – in 60 seconds or less. It is totally possible.”
For management he suggested creating a use case library of incidents that will trigger alerts, as well as an alert playbook for incident response. There’s no sense having rules in the SIEM if the security team doesn’t know what to do when an alarm goes off, he reasoned. The playbook can be tied to a threat intelligence database so a staffer has context of what the alert means.
Use cases are built around an attack’s techniques the security team wants to detect (the download of an EXE file, malware on a critical server and so on).
For better organization, each case should be numbered, named and include several fields – for example Case Name (perhaps UC001 EXE downloading from Uncategorized Web Site; UC002 Detect PoisonIvy RAT etc), Category (typically linked to an attack cycle or kill chain); Priority; Description; Log Source Requirements; Associated Reference Lists; Rule pseudo-code; Rule Logic; Other Notes; and Test Script.
Rules are first written in ‘pseudo-code’ (quasi-programming language that should come with the SIEM) so the team has an idea of the logical structures of the rules. The rule itself has to include the actual logic so an analyst looking into an event can understand what it is for. If something triggers a particular alert and no one knows why, Pileggi said, it becomes hard to investigate.
A test script to trigger a use case should be included for a simple reason: Many use cases that are created may never be seen (or at least not regularly), he pointed out, but they still have to be created. The test script is there to see if the rule still works and staff respond appropriately.
Finally, the playbook lists a series of numbered procedures to be done when an alert triggers, so the team’s response is repeatable and consistent. (So, for example, the playbook for UC001 EXE listed above might list the Priority as Medium, and list several Actions to Take (AT), which are also numbered (for example, AT4 Submit File to Automated Sandbox)
In his presentation Pileggi suggested 10 basic rules or use cases (among them, a successful remote authorization by TOR exit node; detection of credential theft tools; potential brute force attack detected; download of a suspicious file; service installation on a critical server).
He also suggested another 16 rules for “tricky” situations (for example, named pipe impersonation, which is a form of privilege escalation; database acting unusually; deletion of Windows Shadow Copies; a local account created on a server and others).
Finally, he suggested creating use cases for a number of known threat actors. These will vary according to an organization’s or industry’s experience.
In an interview Pileggi agreed many organizations aren’t getting the most out of their SIEMs. “A lot of what we see is organizations get a SIEM, implement it, do an initial configuration and then expect that it’s going to work for a long time. But really you need to have constant development and consistent tuning.”
“It’s a combination of getting the right information into the SIEM, making sure that it’s available and accessible as you collect more data, and also putting the right rules and logic in place to bubble up or highlight the pieces of data that are most relevant to at least initiate an investigation.
“You want to use a SIEM for two things: Alerting to suspicious activity, but also for investigating historical activity. Balancing those two can sometimes be difficult.”