🚀 A powerful open-source SIEM and security data pipeline management toolkit
StrIEM is an open-source SIEM system that leverages modern data engineering tools and open standards to provide security monitoring capabilities, data processing / normalization, and data routing.
StrIEM combines and builds on:
-
🔄 Vector-powered Data Pipelines: Uses Vector by Datadog for robust log collection and processing
-
🛡️ Sigma Rules Integration: Detection engine using industry-standard Sigma rules
-
📋 OCSF Normalization: Events are transformed and normalized to OCSF for consistent analysis, simplified querying, easier correlation across sources, and reduced storage complexity
-
💾 Enterprise Storage Options: Store security events in Parquet format, with support for local storage, Snowflake, AWS Security Lake, and various data lake solutions. Search, analyze & investigate with DuckDB, Apache Arrow, Snowflake SQL, AWS Athena & more
-
🔌 Integrations:
- AWS CloudTrail
- Google Cloud / Google Workspace
- GitHub Enterprise
- Okta
- ...and anything else supported by Vector
-
Install the configuration utility:
pip install striem-configure
(or, from this repository,
pip install .) -
Generate your configuration, and follow the prompts:
striem-configure
-
Launch StrIEM:
docker-compose up -d
The configuration utility will help you set up:
- Data sources and authentication
- Detection rules and alerts
- Storage configuration
The utility creates a directory containing docker-config.yaml and several subdirectories:
assets/schema: OCSF parquet schema, generated from crowdalert/ocsf-parquetassets/detections: Sigma detection rules. You will be prompted if you'd like to download the open source rules from SigmaHQ, and you can add your ownassets/vrl: VRL transforms for normalizing data in to OCSF. Retrieved from crowdalert/ocsf-vrlconfig/striem.yaml: configuration for StrIEM Store, if non-Vector sources have been configuredconfig/vector: Directory containing Vector config filesconfig/vector/static: Contains Vector configuration specific to StrIEMdata: The output directory for post-processed & normalized data. Hive partitioned by date. This is where the Parquet database lives.
StrIEM consists of two major pieces:
- Vector: Handles log ingestion, transformation, and routing
- StrIEM State: A helper for the SIEM functions not currently supported by or outside the conceptual model of Vector: detections, correlations, enrichments, trigger actions (SOAR playbooks), and database generation (ie, Parquet). Also ingests data from sources not currently in Vector (eg, Okta)
striem-configure (this repository) generates a set of configuration files creating a security data pipeline with Vector. Each step of the pipeline follows a naming schema so you can add your own sources, transforms and sinks:
-
source-<source type>-<source id>: The initial ingest point for data -
logsource-<source type>-<source id>: events will have a%logsourcefield added to metadata corresponding to the Sigma Log Source. StrIEM uses thecategory,productandservicefields as filters if they are present, ignoring Sigma rules that do not apply to this log sourceTransforms should also add a
%source_idmetadata field equal to the source id for identitfication by downstream consumers matching on wildcards ( ie, Vector components configured withinputs: [logsource-*]) -
ocsf-<source type>-<source id>: events will be normalized to valid OCSFEvents from
ocsf-*are then sent to StrIEM State to be written to Parquet files -
action-<action type>: Events from this data stream are OCSF normalized data filtered by type, indicating some actionFor instance, a Vector sink configuration can consume
action-alertas it'sinputsparameter to send all detection matches to it's target. A Vector configuration for writing alerts to the console might look like the following:sinks: console-alerts: type: console encoding: codec: json inputs: ["action-alert"]
We welcome contributions! Submit your PR's, Issues, Suggestions or Enhancements!
Licensed under MPL-2.0. See LICENSE file for details.
Built with ❤️ by CrowdAlert, Inc.