A hands-on cybersecurity project demonstrating SIEM operations — from Splunk installation and log ingestion to writing SPL detection queries, building security dashboards, creating alert rules, and investigating a real multi-stage web attack scenario across correlated log sources.
Setup & Ingestion · SPL Fundamentals · Threat Detection · Dashboards · Alerts · Investigation
A SIEM (Security Information and Event Management) platform is the nerve center of a Security Operations Center. It collects logs from across the environment, correlates events, and enables analysts to detect and investigate threats. This project demonstrates practical SIEM skills using Splunk — the industry-leading platform — working with 33.4 million real security events from the Boss of the SOC dataset to detect web attacks, privilege changes, and reconstruct an attacker's full activity timeline.
| Section | Skill Demonstrated | Tools Used |
|---|---|---|
| Setup & Ingestion | Splunk installation, data inputs, index management | Splunk Enterprise, inputs.conf |
| SPL Fundamentals | Search Processing Language queries and data exploration | SPL, stats, table, timechart |
| Threat Detection | Writing detection queries for real attack patterns | SPL, where, eval, search |
| Dashboards | Building operational security monitoring dashboards | Splunk Classic Dashboards |
| Alert Rules | Creating automated detection with alert actions | Scheduled searches, triggers |
| Investigation | Correlating events to reconstruct an attack timeline | Cross-sourcetype correlation, timechart |
The lab runs Splunk Enterprise on Ubuntu 24.04 inside VirtualBox, analyzing over 33 million security events across 26 different log sources from the Boss of the SOC (BOTS) v1 dataset.
+----------------------------------------------------------------+
| Splunk SIEM Lab |
| |
| +----------------------+ +-------------------------+ |
| | BOTS v1 Dataset | | Splunk Enterprise | |
| | (33.4M events) | | (Ubuntu 24.04 VM) | |
| | | | | |
| | - Windows Security | ----> | Index: botsv1 | |
| | - Fortinet Firewall | ----> | | |
| | - Suricata IDS | ----> | Source Types: 26 | |
| | - Stream HTTP/TCP | ----> | | |
| | - Sysmon | ----> | Time Range: Aug 2016 | |
| | - Stream DNS | ----> | | |
| +----------------------+ +-------------------------+ |
+----------------------------------------------------------------+
| Source Type | Event Count | Purpose |
|---|---|---|
| WinEventLog:Security | 14,131,490 | Windows authentication, process creation, privilege events |
| fgt_traffic | 7,675,023 | Fortinet firewall traffic logs |
| suricata | 5,078,376 | IDS/IPS alerts and network detections |
| stream:tcp | 1,754,601 | TCP connection metadata |
| stream:ip | 1,435,025 | IP-layer packet metadata |
| stream:dns | 1,369,998 | DNS query/response records |
| XmlWinEventLog:Microsoft-Windows-Sysmon/Operational | 559,792 | Detailed process/network Sysmon telemetry |
| stream:smb | 448,008 | SMB file-sharing activity |
| fgt_utm | 257,477 | Fortinet UTM security events |
| stream:http | 39,010 | HTTP request/response stream data |
Data source: This project uses the Boss of the SOC (BOTS) v1 dataset — a realistic, labeled attack dataset created by Splunk for security training. It contains a complete web attack scenario targeting a corporate environment, with data captured from August 2016.
I install Splunk Enterprise on an Ubuntu 24.04 VM:
# Download the .deb package from splunk.com (requires free account)
sudo dpkg -i splunk.deb
# Start Splunk for the first time and accept the license
sudo /opt/splunk/bin/splunk start --accept-license --answer-yes --run-as-rootDuring the initial start, Splunk prompts for an admin username and password. After starting, the web interface is available at http://localhost:8000.
Splunk Enterprise 10.2.2 home screen showing available apps, bookmarks, and common tasks — the starting point for all SIEM operations
I install the BOTS v1 dataset as a Splunk app by extracting it into /opt/splunk/etc/apps/, then restart Splunk to load the index. After restart, I verify the index is loaded and contains data:
| eventcount summarize=false index=botsv1 | table index, count
I explore the variety of log sources in the dataset:
index=botsv1 | stats count by sourcetype | sort -count
26 distinct source types ingested — from Windows Security events (14.1M) and Fortinet firewall logs (7.6M) to Suricata IDS alerts (5M) and stream data across multiple protocols
Before hunting for threats, I explore the distribution of Windows Security event codes to understand what activity the dataset captured:
index=botsv1 sourcetype="WinEventLog:Security"
| stats count by EventCode
| sort -count
| head 20
Top Windows EventCodes — 4703 (token rights adjusted), 4689 (process exited), 4688 (process created), and 4624 (successful logon) dominate the dataset, providing rich telemetry for behavioral detections
Key EventCodes identified in the dataset:
| EventCode | Description | Count | Detection Use |
|---|---|---|---|
| 4703 | Token rights adjusted | 3,034,865 | Privilege manipulation |
| 4689 | Process has exited | 2,577,818 | Process execution tracking |
| 4688 | New process created | 2,575,010 | Execution-based threat hunting |
| 4624 | Account successfully logged on | 407,843 | Authentication monitoring |
| 4634 | Account logged off | 407,595 | Session tracking |
| 4672 | Special privileges assigned | 378,789 | Privilege escalation detection |
| 4656 | Object handle requested | 306,618 | File/registry access monitoring |
| Command | Purpose | Example |
|---|---|---|
stats |
Aggregate data | stats count by src_ip |
table |
Display specific fields | table _time, user, src_ip |
timechart |
Time-based aggregation | timechart span=1h count |
where |
Filter with expressions | where count > 100 |
eval |
Create calculated fields | eval source_type=sourcetype |
search |
Filter with search terms | search uri_path="*passwd*" |
sort |
Order results | sort -count |
head |
Limit results | head 10 |
Process creation events (EventCode 4688) provide one of the richest sources for threat hunting. I search for the most frequently executed processes to establish a baseline and identify outliers:
index=botsv1 sourcetype="WinEventLog:Security" EventCode=4688
| stats count AS executions by New_Process_Name
| where executions > 100
| sort -executions
| head 20
Process execution baseline — Splunk Universal Forwarder components dominate (expected), while spikes in wmiprvse.exe (45,429), dllhost.exe (9,866), and conhost.exe (9,313) warrant investigation as these are commonly abused by attackers for lateral movement and command execution
Why this matters: In real threat hunting, attackers often use legitimate Windows binaries ("living-off-the-land") to evade detection. Establishing execution baselines lets analysts spot anomalous spikes that indicate attacker activity — for example, unusually high PowerShell, WMI, or cmd.exe execution rates.
I search the HTTP stream data for common web attack patterns — path traversal, local file inclusion, and remote command execution attempts:
index=botsv1 sourcetype="stream:http"
| search uri_path="*SELECT*" OR uri_path="*UNION*" OR uri_path="*../*" OR uri_path="*passwd*"
| stats count by src_ip, uri_path
| sort -count
Web attack detection revealing a single attacker (40.80.148.42) attempting path traversal, local file inclusion (/etc/passwd, /.htpasswd), and Windows command execution via cgi-bin — using UTF-8 overlong encoding bypass techniques (%C0%AF, %E0%80%AF) to evade web application filters
Attack techniques identified from a single source IP (40.80.148.42):
| Attack Category | Example Payload | Technique |
|---|---|---|
| Local File Inclusion | /etc/passwd, /etc/passwd%00 |
Null-byte injection for path bypass |
| Credential File Access | /.htpasswd, /.passwd |
Sensitive file enumeration |
| Remote Command Execution | /cgi-bin/../../winnt/system32/cmd.exe |
Classic IIS directory traversal |
| Encoding Bypass | %C0%AF, %E0%80%AF |
UTF-8 overlong encoding |
| Application Targeting | /vti_bin/, /samples/, /scripts/ |
Known-vulnerable path probing |
This single scan identified 40.80.148.42 as the primary attacker — an IP that becomes the focus of the Part 6 investigation.
I build a four-panel security monitoring dashboard that gives an analyst immediate visibility into key security indicators:
SOC Security Overview dashboard showing process creation trends, top web attackers (with 40.80.148.42 dominating at ~17K requests), most-executed processes, and top accounts by activity — combining multiple data sources into a single analyst view
Dashboard panels:
Panel 1 — Process Creations Over Time (Line Chart):
index=botsv1 sourcetype="WinEventLog:Security" EventCode=4688
| timechart span=1h count AS "Process Creations"
Panel 2 — Top Source IPs Hitting Web Server (Bar Chart):
index=botsv1 sourcetype="stream:http"
| stats count by src_ip
| sort -count
| head 10
Panel 3 — Top Processes Executed (Bar Chart):
index=botsv1 sourcetype="WinEventLog:Security" EventCode=4688
| stats count by New_Process_Name
| sort -count
| head 15
Panel 4 — Top Accounts by Activity (Bar Chart):
index=botsv1 sourcetype="WinEventLog:Security" EventCode=4688
| stats count by Account_Name
| sort -count
| head 10
I add a geographic analysis panel showing the countries and cities of web requests hitting the server:
index=botsv1 sourcetype="stream:http"
| iplocation src_ip
| where isnotnull(Country)
| stats count by Country, City
| sort -count
| head 20
Geolocation analysis revealing Washington, D.C. as the top source of web traffic (17,547 requests) — traced to the attacker IP 40.80.148.42 — followed by Ashburn, Oakland, and other U.S. cities
Why dashboards matter: In a real SOC, analysts monitor dashboards continuously during their shifts. A well-designed dashboard surfaces anomalies immediately — the dominance of a single IP in the "Top Source IPs" panel (40.80.148.42 at ~17K requests vs ~1,500 for the next highest) is the kind of pattern that jumps out visually and triggers investigation.
I configure an automated alert to detect path traversal and local file inclusion attempts in real time:
Automated web attack alert configuration — detects path traversal, LFI, and command execution attempts via SPL pattern matching, scheduled to run every hour
Alert configuration:
- Title: Web Attack Detected - Path Traversal or LFI
- Search:
index=botsv1 sourcetype="stream:http" | search uri_path="*passwd*" OR uri_path="*../*" OR uri_path="*cmd.exe*" OR uri_path="*%C0%AF*" | stats count by src_ip | where count > 5 - Schedule: Every hour
- Trigger: Number of results > 0
- Severity: High
Configured security alerts providing layered automated detection — web attacks (High), suspicious process execution (Medium), and new account creation (High) — all scheduled and enabled
| Alert Name | Condition | Severity | Schedule |
|---|---|---|---|
| Web Attack Detected - Path Traversal or LFI | >5 malicious URI patterns from single IP | High | Every hour |
| Suspicious Process Execution | Unusual cmd.exe/powershell.exe by same account >10 times | Medium | Every hour |
| New Account Created | EventCode 4720 detected | High | Every hour |
Using the attacker IP identified in Part 3 (40.80.148.42), I correlate their activity across all log sources to reconstruct the full attack timeline:
index=botsv1 (src_ip="40.80.148.42" OR src="40.80.148.42")
| eval source_type=sourcetype
| timechart span=5m count by source_type
Cross-source correlation of attacker 40.80.148.42 — revealing 35,732 total events spanning HTTP stream data, IP/TCP connections, and Suricata IDS alerts, all clustered into a 45-minute attack window starting at 21:35 on August 10, 2016
Correlating events across stream:http, stream:ip, stream:tcp, and suricata reveals the attacker's activity pattern:
| Time (UTC) | Phase | Primary Evidence | Activity |
|---|---|---|---|
| 21:35 | Reconnaissance | stream:http (2,512 events) + suricata (3,003 alerts) | Initial web scanning — IDS immediately detects attack patterns |
| 21:40 | Active Exploitation | stream:http (1,713) + suricata (2,880) | Path traversal + LFI payloads sent |
| 21:45 | Exploitation | stream:http (729) + suricata (1,653) | Attack continues, exploring attack surface |
| 21:50 | Peak Activity | stream:http (2,340) + suricata (4,237 alerts) | Attack intensifies — highest IDS alert volume |
| 21:55 | Exploitation | stream:http (2,207) + suricata (3,797) | Continued aggressive scanning |
| 22:00-22:10 | Persistence Attempts | stream:http (~1,500-1,800/5min) | TCP/IP layer activity stops, only HTTP continues |
| 22:15-22:20 | Winding Down | stream:http (~947-1,594) | Attack activity decreases |
Key investigative findings:
- Clear attack signature: The attacker generated 10,000+ Suricata IDS alerts in under 15 minutes — an overwhelming volume that any SOC would catch
- Multi-layer detection: The same malicious activity appears across HTTP stream, TCP/IP stream, and IDS logs simultaneously — demonstrating defense-in-depth
- Attack duration: The entire attack spanned approximately 45 minutes, typical for automated scanning tools
- Attack scope: 35,732 total events from a single source IP — high signal-to-noise ratio for detection
- Attacker technique signature: Heavy use of UTF-8 overlong encoding (
%C0%AF,%E0%80%AF) suggests an automated scanner or custom tooling rather than manual exploitation
Why this matters: In a real incident response, the ability to correlate events across multiple log sources and reconstruct an attack timeline is one of the most valuable skills an analyst can have. This investigation demonstrates the complete workflow: identify the attacker via anomaly detection (Part 3), correlate their activity across all available log sources, establish the timeline, and document the scope of the incident. From initial detection to full timeline reconstruction took under 10 SPL queries.
A quick reference of all detection queries used throughout this project:
| Query Purpose | Key SPL |
|---|---|
| Event count by index | | eventcount summarize=false index=botsv1 | table index, count |
| Sourcetype distribution | index=botsv1 | stats count by sourcetype | sort -count |
| EventCode frequency | index=botsv1 sourcetype="WinEventLog:Security" | stats count by EventCode | sort -count |
| Process execution baseline | index=botsv1 EventCode=4688 | stats count by New_Process_Name | sort -count |
| Web attack detection | index=botsv1 sourcetype="stream:http" | search uri_path="*../*" OR uri_path="*passwd*" |
| Top web source IPs | index=botsv1 sourcetype="stream:http" | stats count by src_ip | sort -count |
| Geolocation analysis | index=botsv1 sourcetype="stream:http" | iplocation src_ip | stats count by Country, City |
| Cross-source IP correlation | index=botsv1 (src_ip="X" OR src="X") | eval source_type=sourcetype | timechart span=5m count by source_type |
| Component | Version | Purpose |
|---|---|---|
| Ubuntu | 24.04 LTS | Host operating system (VirtualBox VM) |
| Splunk Enterprise | 10.2.2 | SIEM platform |
| BOTS v1 Dataset | 1.0 | Realistic attack scenario data (33.4M events) |
| VirtualBox | Latest | VM hypervisor |
This project demonstrates practical SIEM operations skills through six progressive exercises:
- Setup & Ingestion — Installed Splunk Enterprise 10.2.2 on Ubuntu 24.04, loaded the BOTS v1 dataset (33.4 million events across 26 source types), and verified successful ingestion
- SPL Fundamentals — Explored data structure using SPL queries, identifying key Windows EventCodes (4688, 4703, 4624) and understanding the dataset's composition
- Threat Detection — Wrote detection queries for process execution anomalies and web application attacks, identifying a single attacker IP (40.80.148.42) performing path traversal, LFI, and RCE attempts with encoding bypass techniques
- Security Dashboards — Built a four-panel SOC Overview dashboard plus geolocation analysis, visualizing the attacker's activity against normal baseline traffic
- Alert Rules — Configured three automated alerts (web attacks, suspicious process execution, new account creation) with severity-based triage and hourly scheduling
- Attack Investigation — Correlated 35,732 events from the attacker across HTTP stream, TCP/IP stream, and Suricata IDS logs, reconstructing a 45-minute attack timeline with 10,000+ IDS alerts demonstrating defense-in-depth detection
SIEM Operations · Splunk Administration · SPL Queries · Threat Detection · Log Analysis · Security Dashboards · Alert Engineering · Incident Investigation · Log Correlation · Attack Timeline Reconstruction
