Name	Name	Last commit message	Last commit date
Latest commit History 24 Commits
data	data
docs	docs
experiments	experiments
src	src
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt

Multi Scale Time Series Clustering

Time-series data is a collection of data points recorded over time, each associated with a specific timestamp. This form of data is prevalent in various fields such as finance, economics, meteorology, healthcare, energy, telecommunications, and transportation. Current algorithms assume that we only have time-series data of the same scaling, but in real-world data time-series often consists of different scalings, e.g. hourly, daily, or weekly weather forecasts. This project will mainly focus on the development of a clustering algorithm that can handle time series with different scalings.

Overview

This repository contains code and documentation for my bachelor thesis on clustering time-series data with different scalings. The project focuses on developing clustering algorithms that are robust to the varying temporal resolutions found in real-world data (e.g., hourly, daily, weekly).

Objectives

Develop clustering algorithms: Create/apply methods that effectively cluster time-series data with different scalings.
Evaluate performance: Test and validate the algorithms on various datasets.
(Optional Extension): Generate graphs from time-series data, apply clustering algorithms to the graphs, and compare the results with the time series clustering using similarity measures.

Project Structure

.
├│ data/                  		# Sample datasets or links to data sources
│   ├── ts_demo_data_clean.csv  # Synthetical demo data (prototype mode)
│   ├── ts_demo_data_corrupted.csv  	# modified faulty data (prototype mode)
│   ├── restored/  				# restored data by multiple means(e.g.: interpolation)
│      ├── ts_demo_data_<method>.csv  	# restored data through interpolation method
│      ├── ... 
├│ docs/                  		# Documentation and thesis drafts
├│ notebooks/					# Jupyter notebooks for exploratory analysis
├│ src/							# Source code (algorithms, utility functions)
│   ├── config.py  				# Stores essential parameters and constants
│   ├── data_corruption.py  	# Module for synthetic dataset corruption
│   ├── data_generation.py  	# Module for synthetic dataset generation
│   ├── data_restoration.py  	# Module for the restoration of data through various means
│   ├── main.py					# Main script with mode selection
│   ├── project_utilities.py	# helper utilities for the project
├│ experiments/             	# Scripts and logs from experimental runs
│   ├── distance_matrices		# exported dissimilarity/distance matrices used for clustering
│   ├── logs					# log files from various experimental operations
│      ├── interpolations
│   ├── plots					# plot diagrams from various experimental operations
│      ├── clustering
│      ├── interpolations
└│ README.md                	# Project overview and instructions

Installation/Usage

Clone the repository:

git clone https://github.com/QuirkyCroissant/Multi-Scale-Time-Series-Clustering

Create and activate a virtual environment:

python -m venv env
source env/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the project:

Run the application by specifying the mode and additional optional flags:
```
# For demo (prototype) mode with new synthetic data and various plots:
python src/main.py --mode demo --new_data --comp_img

# For production mode:
python src/main.py --mode prod
```
Available Command-Line Flags:
- --mode: Required. Choose demo for synthetic dataset generation, corruption, restoration, and clustering; or prod for processing a pre-specified dataset.
- --new_data: Optional (demo mode only). Generates new synthetic data (clean and corrupted). Cannot be used with production mode.
- --comp_img: Optional. Saves comparison plots of the time series at various pipeline stages (e.g., clean vs. corrupted, and clean vs. interpolated).
- --restore: Optional. Aggregate, interpolate, and save faulty input data that will be used for clustering (saved in data/restored).
- --dist: Optional. Compute and save the dissimilarity measure (saved in experiments/distance_matrices).
- --normalized: Optional. Runs the application in (non-)normalized mode, depending on users cluster usecase(shape or scale based clustering).

Application Runtime Workflow

The flowchart below summarizes the main pipelines of the project:

Figure: Overall Projects Pipeline Flowchart

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact & Acknowledgements

Supervisor: Ass.-Prof. Dott.ssa Dott.ssa.mag.Yllka Velaj, PhD
Student: Florian Hajek

Thank you to everyone who contributed to this project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi Scale Time Series Clustering

Overview

Objectives

Project Structure

Installation/Usage

Application Runtime Workflow

License

Contact & Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

QuirkyCroissant/Multi-Scale-Time-Series-Clustering

Folders and files

Latest commit

History

Repository files navigation

Multi Scale Time Series Clustering

Overview

Objectives

Project Structure

Installation/Usage

Application Runtime Workflow

License

Contact & Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages