Event Pipeline (Scala 3 + Cats Effect + fs2)

A small, production-style data engineering pipeline written in Scala 3. The aim is to demonstrate clear functional design, separation of concerns, testing discipline, and general engineering practices.

This project provides:

A modern Cats / Cats Effect codebase
A streaming pipeline built with fs2
Clean domain modelling and algebras
Configurable and testable application structure
Unit and end-to-end tests
A VS Code dev container for consistent tooling

Architecture Overview

This project implements a small, realistic data-engineering pipeline using functional Scala. The pipeline ingests raw CSV transactions, validates the input, aggregates the data, and writes daily metrics to an output file in JSONL format. Everything is built using pure, testable components with clear separation of concerns.

 ┌──────────────────────────┐
 │      CSV File (input)    │
 └───────────────┬──────────┘
                 │
                 v
 ┌────────────────────────────────────┐
 │  FileTransactionSource             │
 │  (stream file line by line)        │
 └──────────────────┬─────────────────┘
                    │
                    v
 ┌────────────────────────────────────┐
 │  parseRow                          │
 │  CSV → RawTransaction              │
 └──────────────────┬─────────────────┘
                    │
                    v
 ┌────────────────────────────────────┐
 │  validateRow                       │
 │  RawTransaction → Transaction      │
 │  (logs errors, drops invalid rows) │
 └──────────────────┬─────────────────┘
                    │
                    v
 ┌────────────────────────────────────┐
 │       Aggregator.aggregate         │
 │   group, sum, count by (customer,  │
 │   date)                            │
 └──────────────────┬─────────────────┘
                    │
                    v
 ┌────────────────────────────────────┐
 │        FileMetricsSink             │
 │   write daily metrics as JSONL     │
 └──────────────────┬─────────────────┘
                    │
                    v
 ┌──────────────────────────┐
 │  Output File (JSONL)     │
 │  daily_metrics.jsonl     │
 └──────────────────────────┘

Getting Started

This project uses a VS Code Dev Container to ensure a consistent, reproducible development environment for all contributors. The container provides:

a pre-configured Docker-based image
the correct versions of Java, sbt, Scala, and Metals
a predictable build environment regardless of host OS
isolation from local machine configuration issues This means every developer runs the same toolchain, avoiding the usual “works on my machine” problems.

Testing

The project includes:

Unit tests for:
- validation logic
- aggregator business rules
- parsing behaviour
- file source and sink

Pipeline tests using in-memory stubs End-to-end tests running the real file-based pipeline Dev container for reproducible builds

Running the Pipeline

sbt run

or inside the VS Code dev container:

sbt test
sbt run

The output metrics are written to the configured JSONL output file.

Possible Improvements

Generalise the pipeline from IO to any F[_]: Async to show deeper FP abstraction.
Align naming between the interfaces directory and the algebra package for clarity.
Add summary metrics (processed rows, validation failures, dropped lines) for better observability.
Make malformed-row handling configurable (drop, halt, or send to a separate sink).
Add property-based tests for the Aggregator to demonstrate stronger correctness guarantees.
Extend end-to-end tests to cover multiple customers, multiple days, and mixed valid/invalid data.
Add a second output sink (for example a snapshot file) to demonstrate fan-out.
Include a simple CI pipeline (formatting, tests) for completeness.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
data/input		data/input
project		project
src		src
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
README.MD		README.MD
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Event Pipeline (Scala 3 + Cats Effect + fs2)

Architecture Overview

Getting Started

Testing

Running the Pipeline

Possible Improvements

About

Uh oh!

Releases

Packages

Languages

emeadows/event-pipeline

Folders and files

Latest commit

History

Repository files navigation

Event Pipeline (Scala 3 + Cats Effect + fs2)

Architecture Overview

Getting Started

Testing

Running the Pipeline

Possible Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages