On-Premises SecOps Platform. PHOTO: Cybercrime Magazine.

Bring Your Own Data Lake: Do It The Right Way

Modern SIEM solutions offer a key value proposition: seamlessly adding an analytics layer into existing data lakes.

Christophe Briguet, Product Management, AI/ML, Stellar Cyber

San Jose, Calif. – Oct. 2, 2024

Having spent a significant amount of time in the SIEM industry, I’ve seen patterns and evolutions that define the landscape. One of the most notable changes has been the shift from traditional, monolithic SIEM deployments to more flexible, scalable solutions that allow organizations to adapt and grow without significant overhauls.

The Evolution of SIEM Storage

Historically, SIEM solutions like ArcSight required a dedicated Oracle Database to function. I recall the days when a large SUN server running Oracle was solely dedicated to storing logs and security events. This vertical scaling was the only way to manage the increasing data loads. However, as data volumes grew, the market saw the advent of purpose-built log management solutions that enabled horizontal scaling.

Splunk, Loglogic and ArcSight Logger were among the pioneers, creating the first data lake layers for storage. These solutions centralized data storage, allowing SIEM platforms to focus on correlation and analytics rather than the complexities of data management.

Enter the Era of Multi-Data Platform SIEM

Fast forward 15 years, and we are now in the era of multi-data platform SIEM. These solutions recognize the force of data gravity — a metaphorical concept where data attracts other data and applications towards itself, similar to how a massive object in space attracts others with its gravitational pull.

Modern SIEM solutions embrace the concept of data gravity to avoid the complexity and expense of rip-and-replace processes. Instead, they offer a key value proposition: seamlessly adding an analytics layer into existing data lakes. This approach ensures optimal performance, reduced storage/retention costs, and simplified data management by keeping data and applications close to their origin.

Applications and Services are attracted towards the Data Lake for optimal performance and cost efficiency.

Bring Your Own Data Lake (BYODL)

Stellar Cyber’s recent announcement of “Bring Your Own Data Lake” (BYODL) support marks a significant milestone in this evolution. Organizations that have standardized their data storage on platforms like Splunk, Snowflake, Elastic, or AWS can now seamlessly integrate Stellar Cyber’s AI-driven Open XDR platform with this data without rip-and-replace. Taking advantage of the existing data lake emphasizes the importance of optimized data ingestion, data pre-processing, like normalization and enrichment before data is fully utilized for automated threat detection through machine learning or contextualized alert investigation.

Here’s why this structured approach offers five clear advantages over traditional methods:

1. Optimized Ingest and Turnkey Integration

Stellar Cyber’s decoupled deployment starts with optimized data collection and filtering. This ensures that only security-relevant and high-quality data enters the system, reducing noise and enhancing the signal-to-noise ratio. The immediate benefits include:

  • Improved Performance: By filtering out irrelevant data early in the process, the system can operate more efficiently, reducing the load on downstream processes.
  • Enhanced Data Quality: Ensuring that only clean, relevant data is ingested reduces the chances of false positives and improves the accuracy of analytics.

2. Normalization and Enrichment

Once the data is collected, the Stellar Cyber platform normalizes and enriches it, adding valuable context such as threat intelligence, geolocation, user information, and vulnerability details. This step is essential for several reasons:

  • Contextualized Data: Enriched data provides a richer context for security events, making it easier to correlate and analyze potential threats.
  • Streamlined Analysis: Normalized data allows for consistent and accurate querying, enabling security analysts to perform more effective investigations. It also allows the same machine-learning algorithms to be applied to many data sources with different original formats.

3. Detection & Analytics

Stellar Cyber’s approach maximizes the use of clean, enriched data for detection and analytics tools. This offers:

  • Out-of-the-Box Analytics: Ready-to-use analytics tools powered by machine learning can quickly retrieve and analyze structured data, enabling rapid threat detection and response.
  • Reduced Complexity: By having a standardized data format, the integration between the data lake and analytical tools becomes straightforward, reducing the need for custom integrations and ad-hoc solutions.

4. Flexible Data Management

Stellar Cyber’s flexible data management approach allows organizations to decide whether to send only alerts or all normalized and enriched events to a third-party data lake. This flexibility is essential for optimizing the consumption of third-party data lakes, particularly those with high costs, like Splunk. The key benefits include:

  • Cost Efficiency: By selectively storing only high-quality and useful data, organizations can significantly reduce unnecessary data storage costs. Optimizing these storage investments avoids expenses associated with maintaining vast amounts of irrelevant data.
  • Enhanced Data Quality: Storing only normalized and enriched data ensures that the data lake contains high-integrity, valuable information. This improves the efficiency of querying and data retrieval, making it easier to extract meaningful insights and enhancing overall data analytics capabilities.

5. Enhanced Custom Applications

Structured and enriched data in the data lake also benefits custom applications that may require access to security data. Key advantages include:

  • Optimized Threat Hunting: High-quality, standardized data with context simplifies the process of querying and retrieving relevant information.
  • Better Reporting: Ensuring that custom applications like reporting receive clean, enriched data improves their performance and accuracy, leading to better overall security outcomes.

Challenges with Traditional Methods

The other way to look at this evolution is to contrast it with traditional hybrid SIEM deployments. These methods often bring significant challenges:

  • Ad-Hoc Integration: Integrating raw data with detection and analytics tools often requires custom, ad-hoc solutions, increasing complexity and operational overhead.
  • Bespoke Detections: Without normalized and enriched data, creating effective detection rules and analytics through machine learning becomes more challenging, requiring specialized, bespoke solutions.
  • Raw Data Issues: Directly integrating raw data lakes with detection tools can lead to inefficiencies and inaccuracies, as the data lacks the necessary context and normalization.

Using BYODL to process and analyze data before consumption and storage offers clear advantages in terms of performance, accuracy, and operational efficiency. This approach can significantly enhance an organization’s security posture and streamline its SIEM operations with consolidated data storage, ensuring that data is clean, normalized, and enriched before it is stored and/or after detection and analytics via machine learning. This method also reduces complexity and cost and maximizes the value derived from security data, providing a robust foundation for effective threat detection and response.

Adopting such a structured approach can be a game-changer for organizations looking to optimize their security operations and leverage the full potential of their data lakes.

Christophe Briguet is head of product management, AI/ML, at Stellar Cyber.


About Stellar Cyber

Stellar Cyber’s Open XDR Platform delivers comprehensive, unified security without complexity, empowering lean security teams of any skill level to secure their environments successfully. With Stellar Cyber, organizations reduce risk with early and precise identification and remediation of threats while slashing costs, retaining investments in existing tools, and improving analyst productivity, delivering an 8X improvement in MTTD and a 20X improvement in MTTR. The company is based in Silicon Valley. For more information, visit https://stellarcyber.ai.