Collaborative Data Analysis for All
ColDA is an open source project aimed at providing distributed machine learning tools for data analysis and machine learning based on Assisted Learning.
- Algorithm
- Frontend
- Backend
- Package
The project uses Gradient Assisted Learning as the fundamental algorithm for collaboratively training distributed models.
- Use
data/make_dataset.py
to split csv files - Use command in
run_[dataset]_[number_of_sponsor]s_[number of assistor]a.sh
to run experiments
- files ends with
_exe.py
are local operations baseline.py
produces baseline results on joint datasetsmake_train_local.py
produces baseline results on joint datasetsmake_hash.py
usessha256
to encode identification for alignmentsave_match_id.py
saves hash resultsmake_match_idx.py
match identification with hash resultsmake_residual.py
computes residualssave_residual.py
saves residualsmake_train.py
locally fits the residualssave_output.py
saves outputs of trained modelsmake_result.py
produces aggregated resultsmake_test.py
produces inference resultsmake_eval.py
evaluates inference results
conda create --name myenv python --no-default-packages
conda activate myenv
pip install pyinstaller
pip install numpy
pip install -U scikit-learn
cd algorithm
pyinstaller run.spec # To one folder
pyinstaller -F run.py # To one folder
Run the following command to launch the software for the first time:
sudo apt install npm
# update node
sudo npm cache clean -f
dudo npm install -g n
sudo n stable
PATH = "$PATH"
sudo snap install vue
npm install
npm run electron:serve
./node_modules/.bin/electron-rebuild # If there is bug on windows: .\node_modules\.bin\electron-rebuild
Run the following command to launch the software after first time:
npm install
npm run electron:serve
Run the following command to package the software:
npm install
npm run electron:build
Run the following command to run unittest:
npm run test
Navbar.vue
presents the software navigation bar, and the communication between the software and the backend is mainly completed by the functions in this fileassets
folder contains image, font, css resources used in the softwarecomponents
folder contains reusable interface componentsnetwork
folder contains request sending and interception configurationrouter
folder conatins routing configuration filestore
folder is used for storing some local informationNotifications
folder contains functions that handle notifications and historyAuth
folder contains functions that handle user registration and loginSettings
folder contains functions that handle user customized settingstests
folder contains unittest function
-
launch procedures
- export FLASK_APP=application.py (first time you clone the github)
- pipenv install
- pipenv shell
- flask run
-
Unittest:
- flask test (test all files, use this command in top file level)
- notes: You could switch the test framework to pytest, which is more convenient
- notes: tests/test_unread_test_output.py contains most the logic for your reference
-
Deploy:
- Install some dependencies first follow this
- heroku login (Use username and pwd in google drive key file)
- git add .
- git commit -m 'Commit_Name'
- git push
- git push heroku Current_branch_name
- heroku open (view our app)
- Examples and Instructions can be found in
examples/
-
Basic package structure can be found in Github repository
-
Compared to the Basic package structure,
docs/
will contain different element. But at this point, you can follow the template -
py-pkg
is the main part of the package, you can add more modules (with__init__.py
) in this part. For example, if you addtemp
module, you can importtemp
module by:
import temp from py-pkg
-
This package structure can be improved by learning PyTorch package structure.
-
Basic Structure:
py-package-tempate/
|-- docs/
|-- |-- build_html/
|-- |-- build_latex/
|-- |-- source/
|-- py-pkg/
|-- |-- __init__.py
|-- |-- __version__.py
|-- |-- curves.py
|-- |-- entry_points.py
|-- tests/
|-- |-- test_data/
|-- | |-- supply_demand_data.json
|-- | __init__.py
|-- | conftest.py
|-- | test_curves.py
|-- .env
|-- .gitignore
|-- Pipfile
|-- Pipfile.lock
|-- README.md
|-- setup.py
pipenv
is used to manage package. You can installpipenv
by:
pip3 install pipenv
- Use
pipenv
to install package. The first command is to install the package for development. The second command is to install the package for production.
pipenv install --dev
pipenv install
- Use
pipenv
to uninstall package:
pipenv uninstall
- Entering into a Pipenv-managed shell. Remeber doing this every time before running the project.
cd py-package-tempate
pipenv install
pipenv shell
ColDA is licensed under the Apache 2.0 License.
Please review and adhere to the Code of Conduct when contributing to ColDA.
Please use the following reference
@article{diao2022gal,
title={GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations},
author={Diao, Enmao and Ding, Jie and Tarokh, Vahid},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={11854--11868},
year={2022}
}