Skip to content

RWTH-HPC/io-transfer-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

I/O Bandwidth Prediction using Transfer Learning

Directory structure:

+-- sourcecode
|   +-- blue_waters
|   |   +-- cross-validation
|   |   |   +-- Cross_Validation_Baseline_Random_Model_One_Fold.py
|   |   |   +-- cross-validation-baseline-random-model.sh
|   |   |   +-- Cross_Validation_Filtered_by_NProcs.py
|   |   |   +-- cross-validation-one-fold-per-job-filtered-nprocs.sh
|   |   |   +-- cross-validation-one-fold-per-job.sh
|   |   |   +-- Cross_Validation_Run_One_Fold_Full_Dataset.py
|   |   |   +-- Split_Cross-Val_Folds_by_Seed.py
|   |   +-- darshan_files_parsing
|   |   |   +-- Blue_Waters_CSV_Merging_POSIX.py
|   |   |   +-- blue-waters-darshan-files-parsing-home-posix-batch.sh
|   |   |   +-- blue-waters-darshan-files-parsing-thes-work-posix-batch.sh
|   |   |   +-- blue-waters-darshan-files-parsing-user-work-posix-batch.sh
|   |   |   +-- Blue_Waters_Data_Files_Counter.py
|   |   |   +-- Blue_Waters_Data_Parsing_all_dirs_at_once_POSIX.py
|   |   |   +-- blue-waters-merge-csv.sh
|   |   |   +-- parallel_unzip.sh
|   |   |   +-- Unzip_Blue_Water_Files.py
|   |   +-- SmoothL1Loss_Bigger_Batch_Size_Filtered_by_NProcs.py
|   |   +-- SmoothL1Loss_Bigger_Batch_Size_Full_Dataset.py
|   +-- claix
|   |   +-- augmented_data
|   |   |   +-- Claix_Fine-tune_Pre-trained_Model_Augmented_Data.ipynb
|   |   |   +-- Claix_Generate_Augmented_Data.ipynb
|   |   +-- Claix_Fine-tune_Pre-trained_Model_Filtered_NProcs.ipynb
|   |   +-- Claix_Fine-tune_Pre-trained_Model_Full.ipynb
|   |   +-- cross-validation
|   |   |   +-- Claix_Cross_Validate_Baseline_Random_Model.py
|   |   |   +-- Claix_Cross_Validate_Fine-tuning_of_Pre-trained_Model_Filtered_NProcs.py
|   |   |   +-- Claix_Cross_Validate_Fine-tuning_of_Pre-trained_Model_Full_Dataset.py
|   |   |   +-- Claix_Cross_Validate_Training_of_Model_From_Scratch.py
|   |   |   +-- nn-cross-validation-c16g-filtered-nprocs.sh
|   |   |   +-- nn-cross-validation-c16g-full-dataset.sh
|   |   |   +-- nn-from-scratch-cross-validation.sh
|   |   +-- darshan_files_parsing
|   |   |   +-- Claix_CSV_Merging_POSIX.py
|   |   |   +-- Claix_Darshan_Data_Files_Counter.py
|   |   |   +-- Claix_Data_Parsing_all_dirs_at_once_POSIX.py
|   |   +-- feature_importance_attribution
|   |   |   +-- Claix_Evaluate_Pre-trained_Model_Filtered_Nprocs_all_attribution_algorithms.ipynb
|   |   |   +-- Claix_Evaluate_Pre-trained_Model_Full_all_attribution_algorithms.ipynb
|   +-- visualization
|   |   +-- data
|   |   +-- paper_visualization.ipynb

General instruction

  • All Darshan logs are not stored in this repository. To reproduce this work, users need to download the Blue Waters Darshan logs and also generate their own Darshan logs for the target cluster
  • The scripts are divided into building the model using Blue Waters dataset and target cluster (CLAIX) dataset. Cross validation scripts are also available for the base model and the transfer learning model
  • The directories' addresses needs to be adjusted to the users' own directories

Reproducing the work

The detailed work can be read in this Master's thesis:

D. Povaliaiev, “Transfer learning workflow for I/O bandwidth prediction,” M.S. thesis, RWTH Aachen University, Aachen, 2023. https://publications.rwth-aachen.de/record/958007.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published