FedMat

CSCE 585 - Machine Learning Systems Project
Title: Federated Learning for Materials Property Prediction

Team member 1:
- Name: Sadman Sadeed Omee
- Major: Computer Science (Ph.D. Candidate)
- Role: Machine learning researcher
- Email: [email protected]

Team member 2:
- Name: Md. Hasibul Amin
- Major: Computer Engineering (Ph.D. Candidate)
- Role: Machine learning researcher
- Email: [email protected]

Federated Learning Framework: Implements FL with support for scalability experiments (varying number of clients) and privacy preservation.
Advanced GNN Architectures: Includes state-of-the-art models like DeeperGATGNN, SchNet, and MPNN, developed for materials property prediction.
Benchmark Datasets: Supports training and evaluation on standard materials datasets, such as those for bandgap, formation energy, and dielectric properties.
Out-of-Distribution (OOD) Generalization: Evaluates the models' ability to generalize to unseen distributions, highlighting challenges in FL for materials science.
Scalability Experiments: Tests the framework's performance with varying client numbers, from small-scale to large-scale federated setups.

Project Highlights

Privacy-Aware Training: FL enables collaboration across multiple institutions without sharing sensitive or proprietary data. Companies/instituitions can share their model without explicityly sahring the name of the special materials they synthesized.
Detailed Performance Analysis: Performance compariosn on in-distribution (ID) and OOD datasets, along with MAE vs. communication round plots.
Scalability Insights: Provides analysis on the trade-offs between client scalability, model performance, and memory usage.

A general pipeline and the federated leaning basic algorithm are shown below:

Datasets

Materials datasets are large in memory. We provided links from where you can download the datasets and downloading instructions. Additionally, we have provided an already downloaded test data to run to model for a quick test.

To access the datasets, go to the following folder and access the instructions:

cd data/datasets/

How to Run

The default configuration for training federated DeeperGATGNN, federated SchNet, and federated MPNN are mentioned in the config.yml file. Use the following command to run the federated training of a specific model (DeeperGATGNN shown here):

python main.py -dataset mat -model deepergatgnn -fedmid avg -part_alpha 0.1 -numClient 4

This command will train the model in a federated way, evaluate the model on test data, generate necessary figures and store them. Here you can change the dataset and model name to get results for different models on different datasets. Additionally, you can change the number of clients for scalability experiments.

Make sure to unzip the data in FedMat/data/datasets/test_data/ to run it. This command will run for the already downloaded test data. You may download other dataset from the instruction provided in the dataset folders to test the models on them. The command line options for the dataset are mat, band, 2d, alloy, formation, pt, dielectric, gvrh, and perovskites. The command line options for the model are deepergatgnn, schnet, and mpnn.

The command to run the code for the SchNet model for the band-gap dataset with 8 clients:

python main.py -dataset band -model schnet -fedmid avg -part_alpha 0.1 -numClient 8

The command to run the code for the MPNN model for the perovskites (OOD) dataset with 5 clients:

python main.py -dataset perovskites -model mpnn -fedmid avg -part_alpha 0.1 -numClient 5

Project Presentation Video

The project presentation video can be found by clicking here.

Acknowledgement

Our code is based on the FedChem algorithm's repository, which has a well-developed federated learning pipeline for molecular data. We adopted this code to make it suitable for the materials datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Final_project_presentation_pdf		Final_project_presentation_pdf
Final_project_presentation_video		Final_project_presentation_video
Final_project_report		Final_project_report
Milestone3		Milestone3
Project_proposal		Project_proposal
data		data
fedml_api		fedml_api
fedml_core		fedml_core
figures		figures
matdeeplearn		matdeeplearn
network		network
README.md		README.md
__init__.py		__init__.py
client.py		client.py
config.yml		config.yml
data_loader.py		data_loader.py
downloadDataset.py		downloadDataset.py
fedavg_api.py		fedavg_api.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py
vat.py		vat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FedMat

Table of Contents

Necessary Installations

Key Features

Project Highlights

Datasets

How to Run

Project Presentation Video

Acknowledgement

About

Releases 3

Packages

Languages

csce585-mlsystems/FedMat

Folders and files

Latest commit

History

Repository files navigation

FedMat

Table of Contents

Necessary Installations

Key Features

Project Highlights

Datasets

How to Run

Project Presentation Video

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages