The project aimed to analyze wearable physiological sensor data through the use of 6 feature extraction techniques and 3 feature selection methods. The reduced data was then evaluated for its classification performance using 4 machine learning algorithms: KNN, Decision Trees, SVC, and Random Forest. The results showed impressive compression rates of up to 99.25% with a minor accuracy loss of only 6.7%. The project aimed to demonstrate the effectiveness of these techniques in achieving high data compression while preserving the accuracy of the analysis.
Check out the video presentation here!
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── readme-assets <- Resources used in README.md.
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ ├── figures <- Generated graphics and figures to be used in reporting
│ └── presentation <- Presentation for reporting experimental findings
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
└── src <- Source code for use in this project.
├── __init__.py <- Makes src a Python module
│
├── data <- Scripts to download or generate data
│ └── make_dataset.py
│
├── calculations <- Scripts to calculate statistics
│ └── calculate.py
│
├── features <- Scripts to turn raw data into features for modeling
│ └── build_features.py
│
├── models <- Scripts to train models and evaluate models
│ └── train_and_evaluate_model.py
│
└── visualization <- Scripts to create exploratory and results oriented visualizations
└── visualize.py
Before you begin, ensure you have met the following requirements:
- You have a
Linux/Mac/Windows
machine. - You have installed a
python
distribution. - You have installed
pip
. - You have installed
make
.
- Clone the repository.
git clone https://github.com/himalayasharma/data-compression-using-dimensionality-reduction.git
- Traverse into project directory.
- Create virtual environment.
make create_environment
- Activate virtual environment.
- Download and install all required packages.
make requirements
- Download and process physiological sensor dataset.
make data
- Build new set of features after dimensionality reduction.
make build_features
- Calculate required statistics (compression ratio, space saving etc).
make calculate
- Train and evaluate models.
make train_and_evaluate
- Generate plots.
make plot
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. If you have a suggestion that would make this better, please fork the repo and create a pull request. Don't forget to give the project a star! Thanks again!
- Fork this repository.
- Create a branch:
git checkout -b <branch_name>
. - Make your changes and commit them:
git commit -m '<commit_message>'
- Push to the original branch:
git push origin <project_name>/<location>
- Create the pull request.
Alternatively see the GitHub documentation on creating a pull request.
Distributed under the MIT License. See LICENSE
for more information.
- Mohino-Herranz I, Gil-Pita R, Rosa-Zurera M, Seoane F. Activity Recognition Using Wearable Physiological Measurements: Selection of Features from a Comprehensive Literature Study. Sensors (Basel). 2019 Dec 13;19(24):0. doi: 10.3390/s19245524. PMID: 31847261; PMCID: PMC6960825.
- Compression ratio. (2022, June 2). In Wikipedia. https://en.wikipedia.org/wiki/Compression_ratio