The code in C++ code/, main.py, vector_analysis.py, and data_rep.py runs an independent cascade simulation to generate vectors, runs analysis on those vectors, and represents the results visually.
Run the code:
- to run an experiment, type the following: python3 main.py config_files/testing.ini where testing.ini is the config file corresponding to your experiment
Directory Structure:
- Make sure you have an output directory whose path matches the one in the config file variable [FILES][outputDir]. This is where your results will go
- For organizational purposes you should have two directories above this repository named "data" and "results". These should hold any needed input data (such as the files referenced in config variables [FILES][inEdgesFile], [FILES][inNodesFile], [FILES][inHoldEdgesFile], [FILES][inHoldNodesFile], [FILES][inAnalysisDir])
- When writing paths to directories in the config file, always include the slash at the end of the path to a directory (i.e, use .../Foo/Bar/ NOT .../Foo/Bar)
Config Files:
- find config files in the config_files folder
- see EXAMPLE.ini for a guide of how to use config files
- generally try to have a unique [GENERAL][experimentName] for each file
- NOTE: config files from previous experiments will not always work when run again. this is because as the pipeline grown, I add things to the config file. So always check the format of the most recent config file (EXAMPLE.ini) before running
When adding an analysis method, make sure to add:
- variable to config file
- global variable to main
- clause to main.run_analysis()
- analysis function in vector_analysis.py
- clause to main.run_datarep()
This repository consists of code that runs the full Information Access Clustering pipeline:
- Reconstructing graphs and edgelists for independent cascade simulations.
- Performing simulations that generate vector files, given alpha values.
- Tuning the hyperparameter K, the number of clusters for information access clustering, through Gap Statistic, Silhouette Analysis, and Elbow Method.
- Running the Information Access Clustering and relevant statistical analyses.
- Clustering the graph with existing methods for deeper analysis.
- run.sh: bash script for running "build_*" scripts, simulations, and after_vectors pipeline.
- run_k.sh: for finding the K hyperparameter.
Please edit the bash scripts with the specific methods you'd like to run, as well as the relevant hyperparameters the methods use in main_pipelines (specified inside).
Tuning K:
Clustering:
Hypothesis Testing:
Additional Methods:
When running regression experiments make sure to add the heatmap function in data_rep.py
TO DO:
- make one heatmap function and just pass in analysis name