ClusterFuG: Clustering Fully connected Graphs by Multicut (arxiv)
Solve multicut problem on complete graph alleviating need for graph structure design. Similarity between a pair of nodes is computed by inner product of input node features.
We use GCC 10
. Other combinations might also work but not tested. CMake
is required for compilation.
git clone [email protected]:aabbas90/cluster-fug.git
cd cluster-fug
git submodule update --init --recursive
mkdir build
cd build
cmake ..
make -j 4
We also provide python bindings using pybind. Simply run the following command:
python -m pip install git+https://github.com/aabbas90/cluster-fug.git
We require multicut instance stored in a (.txt) file in the following format:
FACTORIZED COMPLETE MULTICUT
a
n d
feature_node_1_dim_1 ... feature_node_1_dim_d
...
feature_node_n_dim_1 ... feature_node_n_dim_d
which corresponds to a graph with n
nodes, and d
-dimensional node features. The value of a
controls affinity strength where increasing the value helps in creating more clusters and vice versa.
Afterwards run (for our DAppLAEC algorithm):
./src/dense_multicut_text_input <PATH_TO_PROBLEM_INSTANCE_TXT> HNSW laec_bf_later
Other algorithms mentioned in the paper can be run as:
./src/dense_multicut_text_input <PATH_TO_PROBLEM_INSTANCE_TXT> <INDEX_TYPE> <CONTRACTION_TYPE>
where INDEX_TYPE
and CONTRACTION_TYPE
can be choosen as:
Algorithm | INDEX_TYPE | CONTRACTION_TYPE |
---|---|---|
GAEC | brute_force | gaec |
DGAEC | faiss_brute_force | dense_gaec |
DGAECInc | faiss_brute_force | dense_gaec_inc |
DLAEC | faiss_brute_force | dense_laec |
DAppLAEC | HNSW | dense_laec_bf_later |
For example to run GAEC:
bash ./src/dense_multicut_text_input <PATH_TO_PROBLEM_INSTANCE_TXT> brute_force gaec
For more information run:
./src/dense_multicut_text_input --help
```
### Python solver:
An example to compute multicut on a set of features:
```python
import dense_multicut_py
import numpy as np
num_nodes = 100
dim = 16
affinity_strength = 5.0
k = 5
k_cap = k
# DAppLAEC:
INDEX_TYPE, CONTRACTION_TYPE = "HNSW", "dense_laec_bf_later"
features = np.random.rand(num_nodes, dim).astype(np.float32)
node_labels = dense_multicut_py.dense_multicut(features.flatten(), num_nodes, dim, affinity_strength, k, INDEX_TYPE, CONTRACTION_TYPE, k_cap)
Instances used in the paper can be obtained from structured-prediction-prob-archive.
If you use this work please cite as
@article{abbas2023clusterfug,
title={ClusterFuG: Clustering Fully connected Graphs by Multicut},
author={Abbas, Ahmed and Swoboda, Paul},
journal={arXiv preprint arXiv:2301.12159},
year={2023}
}