Shomiron DAS GUPTA Jul 27, 2022 12:59:00 AM 6 min read

Opinion - Short term log retention has a problem

IMHO Retention of log events has not been discussed enough. In my interaction with customers and prospects I have noticed they do not spend enough time challenging this parameter and are settled to accept the lowest offering providers put out there. It is starting to become the norm to think of your SIEM (Security Information and Event Management) as a threat sifter that pops out threats by processing raw events and find no use of the events after they have been churned.

It made sense to challenge that position and put out real scenarios that dictate the need to access raw events weeks or months after they have run their course in this world.

Threat hunting

Novel threat campaigns are discovered every week, these are attacks targeting specific companies, industries, countries or sometimes individuals. Researchers around the world spend time uncovering these attacks and release key attributes or what we call IOCs (Indicators of Compromise) to the world. These IOCs could be anything from IP addresses, domain names to filenames and email addresses.

IOCs are captured and reported on forums and researcher websites so security practitioners around the globe can look for them in their environment and ascertain if they were under attack or have been compromised. This process is more like a look back search (historical sweep) over events in that time period to find a possible match that then can be investigated further.

Because there is a significant delay in these campaigns being documented and released to public, a look back search wouldn't yield results if events from the reported time frame are not available.

As an example, you hear of a campaign that targeted your industry eight months ago and has compromised key assets in your space. The threat report provides email addresses used to send out a malicious .doc file. Your first reaction will be to scan events from your email server back in time (eight months ago) and see if some of your colleagues received an email from the email address in question and if there was indeed an attached .doc file.

Once your nightmare is confirmed - you really want to double click and explore how that .doc file impact or did your EDR pick it up and break the threat. At this time you have more questions and unless you have events from your email server from eight months ago, you'll never know.

Intrusion analysis and forensics

One wednesday afternoon you receive a call from your dark web monitoring service "hacker is reportedly selling three million our customer records on hackforums", ouch. Apart from dealing with the publisher and negotiating on the ransom you need to activate customer response and appraising the management about the incident. Soon the first question that will emerge from the board room is "do we know how this happened?".

Overnight you will now need to put a team of experts, hire external consultants to come figure out how we got here. All the exercises going forward will need you to produce raw or enriched data from your key devices and applications. This dataset will need to be searched again to find symptoms of compromise that could lead us to this outcome.

A successful exfiltration like in this scenario takes not weeks but months to manifest and unless you retain events for as long it might be difficult if not impossible to find the evidence you are looking for.

Compliance

Every organization that is governed by a regulator or needs to meet a certain benchmark or customer expectation will need to hold on to event data for longer than a few days. Most compliance standards are pretty prescriptive on logging benchmarks. However, in some cases you could use logger applications to hold on to log events but these events will not be easily searchable when required.

Conclusion

Traditional providers are limited by their ability to scale without proportionately scaling their infrastructure footprint. Therefore cost of storing events long term is extremely expensive and prohibitive for you as a customer. In this scenario some vendors have chosen to adopt the tiering approach where events are put into hot, warm and cold buckets, where warm and cold buckets hold older data. You might be able to retrieve events from a warm or cold bucket, just that the queries will run for longer adding a time penalty for the cheaper resource.

Some vendors also suggest a parallel datastore that is more or less a dump yard of all log events, these logs are not parsed and difficult to search, however on the brighter side you still have the events when you need them.

Finally, the remaining vendors are out there convincing customers against long term event log retention. I thought it would be worth the while trying to counter that argument by putting up real scenarios that convince you otherwise.