Collaborative Data Analysis for All
ColDA is an open source project aimed at providing distributed machine learning tools for data analysis and machine learning based on Assisted Learning.
- Algorithm
- Frontend
- Backend
- Package
The project uses Gradient Assisted Learning as the fundamental algorithm for collaboratively training distributed models.
- Use
data/make_dataset.pyto split csv files - Use command in
run_[dataset]_[number_of_sponsor]s_[number of assistor]a.shto run experiments
- files ends with
_exe.pyare local operations baseline.pyproduces baseline results on joint datasetsmake_train_local.pyproduces baseline results on joint datasetsmake_hash.pyusessha256to encode identification for alignmentsave_match_id.pysaves hash resultsmake_match_idx.pymatch identification with hash resultsmake_residual.pycomputes residualssave_residual.pysaves residualsmake_train.pylocally fits the residualssave_output.pysaves outputs of trained modelsmake_result.pyproduces aggregated resultsmake_test.pyproduces inference resultsmake_eval.pyevaluates inference results
conda create --name myenv python --no-default-packages
conda activate myenv
pip install pyinstaller
pip install numpy
pip install -U scikit-learn
cd algorithm
pyinstaller run.spec # To one folder
pyinstaller -F run.py # To one folderRun the following command to launch the software for the first time:
sudo apt install npm
# update node
sudo npm cache clean -f
dudo npm install -g n
sudo n stable
PATH = "$PATH"
sudo snap install vue
npm install
npm run electron:serve
./node_modules/.bin/electron-rebuild # If there is bug on windows: .\node_modules\.bin\electron-rebuild
Run the following command to launch the software after first time:
npm install
npm run electron:serveRun the following command to package the software:
npm install
npm run electron:buildRun the following command to run unittest:
npm run testNavbar.vuepresents the software navigation bar, and the communication between the software and the backend is mainly completed by the functions in this fileassetsfolder contains image, font, css resources used in the softwarecomponentsfolder contains reusable interface componentsnetworkfolder contains request sending and interception configurationrouterfolder conatins routing configuration filestorefolder is used for storing some local informationNotificationsfolder contains functions that handle notifications and historyAuthfolder contains functions that handle user registration and loginSettingsfolder contains functions that handle user customized settingstestsfolder contains unittest function
-
launch procedures
- export FLASK_APP=application.py (first time you clone the github)
- pipenv install
- pipenv shell
- flask run
-
Unittest:
- flask test (test all files, use this command in top file level)
- notes: You could switch the test framework to pytest, which is more convenient
- notes: tests/test_unread_test_output.py contains most the logic for your reference
-
Deploy:
- Install some dependencies first follow this
- heroku login (Use username and pwd in google drive key file)
- git add .
- git commit -m 'Commit_Name'
- git push
- git push heroku Current_branch_name
- heroku open (view our app)
- Examples and Instructions can be found in
examples/
-
Basic package structure can be found in Github repository
-
Compared to the Basic package structure,
docs/will contain different element. But at this point, you can follow the template -
py-pkgis the main part of the package, you can add more modules (with__init__.py) in this part. For example, if you addtempmodule, you can importtempmodule by:
import temp from py-pkg-
This package structure can be improved by learning PyTorch package structure.
-
Basic Structure:
py-package-tempate/
|-- docs/
|-- |-- build_html/
|-- |-- build_latex/
|-- |-- source/
|-- py-pkg/
|-- |-- __init__.py
|-- |-- __version__.py
|-- |-- curves.py
|-- |-- entry_points.py
|-- tests/
|-- |-- test_data/
|-- | |-- supply_demand_data.json
|-- | __init__.py
|-- | conftest.py
|-- | test_curves.py
|-- .env
|-- .gitignore
|-- Pipfile
|-- Pipfile.lock
|-- README.md
|-- setup.pypipenvis used to manage package. You can installpipenvby:
pip3 install pipenv- Use
pipenvto install package. The first command is to install the package for development. The second command is to install the package for production.
pipenv install --dev
pipenv install - Use
pipenvto uninstall package:
pipenv uninstall- Entering into a Pipenv-managed shell. Remeber doing this every time before running the project.
cd py-package-tempate
pipenv install
pipenv shellColDA is licensed under the Apache 2.0 License.
Please review and adhere to the Code of Conduct when contributing to ColDA.
Please use the following reference
@article{diao2022gal,
title={GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations},
author={Diao, Enmao and Ding, Jie and Tarokh, Vahid},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={11854--11868},
year={2022}
}