KOIOS is an efficient and exact filter verification framework to find the top-k sets with the maximum biparitie matching to a query set. Here we use KOIOS for semantic overlap search, where semantic overlap is the maximum biparite matching score between the tokens of the query set and the candidate set.
- Clone the repository onto your local machine.
- Download the fasttext-database from here, and save it in the root folder.
- Make sure all paths are correct in the
Makefile
- Run the following commands to initialize environment and Intel-OneAPI:
source bashrc
. /opt/intel/oneapi/setvars.sh --config=intel.config
For Syntactic Overlap Search
make koios-semantic
./build/koios-semantic <data-lake-path> <query> <result-location> <sim-threshold> <k> <number-of-partitions> 1
For Semantic Overlap Search using KOIOS
make koios-semantic
./build/koios-semantic <data-lake-path> <query> <result-location> <sim-threshold> <k> <number-of-partitions> 0
For Semantic Overlap Search using Baseline
make baseline-semantic
./build/baseline-semantic <data-lake-path> <query> <result-location> <sim-threshold> <k>
Cmake version 3.18 (version important):
If older version installed:
apt remove --purge --auto-remove cmake
Faiss index by Facebook:
- Refer INSTALL.md for details
- For now, if encountering any CUDA error, please use -DFAISS_ENABLE_GPU=OFF when compiling faiss
- Step 3: sudo make install "is not optional"
Sqlite3:
apt-get install sqlite3 libsqlite3-dev
FastText:
- Use the following API to generate the FastTextDB https://github.com/ekzhu/go-fasttext
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.