Skip to content
Shomiron DAS GUPTA Sep 7, 2022 6:00:00 AM 6 min read

Why cost is a barrier to good detection

It was Q3 of 2018, when we were running surveys across the globe, talking to analysts and partners to figure out the key challenges the SIEM industry was facing. A big part of that process was crystal ball gazing within the product group to figure out what expectations the market was going to have from us in the next 3 years. A lot of time was spent debating technologies in use back then and what bottlenecks we were likely to experience in the next few years.

Most of all the feedback we received was about product features that would enhance threat detection, behavior analytics and the use of machine learning. CISO's were keen on reducing dependencies on the workforce and bringing consistency in their decision making process. Practitioners we spoke to were worried about missing critical signs of compromise and wanted enhanced automation capabilities.

Future thought on the roadmap

In this process we were trying to scope the edge where the current technology stack / gear would break and will result in failure of our objective or delivery of expectations. The number one prospect that emerged from this exercise was not SIEM use cases or untested machine learning models. The number one challenge that we found was the infrastructure, and the ability of the platform to scale rapidly and infinitely with growing datasets. 

When analyzed, two items that came up as blockers and they were - 

  • Ability of the platform to scale to future workloads
  • Cost of scaling - will the platform be affordable at scale

Scaling challenges

When you look around, all product options available either still use an RDBMS or they use an index based software that scales horizontally as their backend. In the last decade of working with data lakes we had a lot of experience in working with the RDBMS, key value data stores and index based platforms. The RDBMS and the key value data store had clearly demonstrated their inability to scale beyond a certain size, we called it the glass ceiling.

Previous versions of DNIF were built on Apache Lucene, so the challenges with scale and operating at scale were very well known to us. From our experience it was clear that Lucene and its derivatives would either be extremely expensive at scale or will not scale at all.

Therefore we would have to look at solutions outside of the options available on the table, and soon we were sure this would require us to invent a novel approach to solving these challenges.

Impossible cost of scaling

All major SIEM platforms are built on index based search platforms, which offer horizontal scaling. This scaling strategy allows you to keep growing the capacity of your data backend by adding compute and storage resources to the cluster. Theoretically speaking, horizontally scaling platforms are able to scale infinitely, however in practice there are practical challenges in making it work.

Apache Lucene and its derivatives (including Elasticsearch) are great index based search platforms and are therefore extremely popular choices. However, our use case requires a high speed of ingest and performant query capabilities. Doing both in the case of an index based platform, that is, high speed ingestion and query, makes it extremely demanding on the compute and storage infrastructure. 

Also, index based search platforms when scaled are sensitive to network / system fluctuations and present a challenge for cluster administrators adding significantly to the total cost of ownership. In effect index based backends present an exponential rise in cost and effort at high volume scenarios, making them unviable for customers with large datasets.

Cost of ownership directly impacts visibility and in turn there is a negative impact on threat detection and campaign discovery.

DNIF HYPERCLOUD with the distributed filer

We use filers to compress our streams and are able to reduce the dataset down to 98% of its original size. Filer blocks use hyper stores that produce extremely high IO, which also helps reduce the compute footprint to a fraction. Therefore, HYPERCLOUD is able to provide long term retention (12 months) by default at the same performance benchmarks and cost.

Conclusion

In effect if the platform is not able to scale, you are likely to reduce or distribute your workload across multiple platforms either leading to a partial if not a total loss in visibility. SaaS platforms are known to cover this weakness by offering a reduced retention period, read - Opinion - Short term log retention has a problem

Costly scale is equally damaging, if you have a large dataset you will always struggle to cope with the escalating cost, does not matter the resources you have access to. Everyone is forced to categorize event sources by value. Almost always damaging visibility and scope of detection.

avatar

Shomiron DAS GUPTA

Founder / CEO