Skip to content

Commit

Permalink
Add wheels for Windows and MacOS (#11)
Browse files Browse the repository at this point in the history
* change sphinx theme and add a download button

* update benchmark result for ScaNN index

* add encoding==utf-8 to unify the action on all platform;

* update GithubActions to build windows/macos wheels.

* bump version string
  • Loading branch information
ZhuochengZhang98 authored Jan 8, 2025
1 parent a70acb8 commit 78fe189
Show file tree
Hide file tree
Showing 16 changed files with 67 additions and 40 deletions.
8 changes: 6 additions & 2 deletions .github/workflows/publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@ on:

jobs:
build_and_publish:
runs-on: ubuntu-latest
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]

permissions:
contents: read
Expand All @@ -26,7 +30,7 @@ jobs:
run: pip install setuptools wheel twine cibuildwheel

- name: Build wheels
run: cibuildwheel --platform linux --output-dir wheelhouse
run: python -m cibuildwheel --output-dir wheelhouse

- name: Publish to PyPI
env:
Expand Down
2 changes: 2 additions & 0 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ FlexRAG 是一个灵活的高性能框架,专为检索增强生成 (RAG) 任
- **轻量化**: FlexRAG 采用最少的开销设计,高效且易于集成到您的项目中。

# 📢 最新消息
- **2025-01-08**: FlexRAG 现已支持 Windows 和 MacOS 系统,您可以直接通过 `pip install flexrag` 来安装。
- **2025-01-08**: FlexRAG 在单跳QA数据集上的基准测试现已公开,详情请参考 [benchmarks](benchmarks/README.md) 页面。
- **2025-01-05**: FlexRAG 的[文档](https://flexrag.readthedocs.io/en/latest/)现已上线。

# 🚀 框架入门
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ https://github.com/user-attachments/assets/4dfc0ec9-686b-40e2-b1f0-daa2b918e093
- **Lightweight**: Designed with minimal overhead, FlexRAG is efficient and easy to integrate into your project.

# 📢 News
- **2025-01-08**: We provide wheels on Windows & MacOS for FlexRAG. You can install FlexRAG via pip on Windows & MacOS now.
- **2025-01-08**: The benchmark of FlexRAG on Single-hop QA tasks is now available. Check out the [benchmarks](benchmarks/README.md) for more details.
- **2025-01-05**: Documentation for FlexRAG is now available. Check out the [documentation](https://flexrag.readthedocs.io/en/latest/) for more details.

# 🚀 Getting Started
Expand Down
9 changes: 6 additions & 3 deletions benchmarks/singlehop_qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,12 @@ We recommend using facebook/contriever-msmarco or E5 for academic usage as it is
| Faiss Auto(nprobe=128) | 59.91 | 54.97 | 76.20 | 49.05 | 38.53 | 77.23 | 70.14 | 62.31 | 79.49 | 59.70 | 51.94 | 77.64 |
| Faiss Auto(nprobe=512) | 64.14 | 59.04 | 81.42 | 49.62 | 39.11 | 77.87 | 70.48 | 62.57 | 79.80 | 61.41 | 53.57 | 79.70 |
| Faiss Refine | 64.11 | 58.90 | 81.27 | 48.91 | 38.34 | 77.81 | 70.24 | 62.43 | 79.89 | 61.09 | 53.22 | 79.66 |
| ScaNN | 63.26 | 58.11 | 82.13 | | | | | | | | | |
| Annoy(40000) | | | | | | | | | | | | |
| Annoy(400000) | | | | | | | | | | | | |
| ScaNN | 63.26 | 58.11 | 82.13 | 49.31 | 39.25 | 77.76 | 70.50 | 62.64 | 79.93 | 61.02 | 53.33 | 79.94 |


Observations:
- Faiss provides a good balance between performance and efficiency.
- ScaNN offers high retrieval speed and accuracy, but it consumes a large amount of memory, making it suitable for use on platforms with ample memory.


## Reranker Benchmarks
Expand Down
3 changes: 2 additions & 1 deletion docs/requirements.docs.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
sphinx==8.1.3
myst-parser==4.0.0
piccolo_theme==0.24.0
sphinx-book-theme==1.1.3
sphinx-copybutton==0.5.2
21 changes: 17 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,30 @@

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
import re
import pathlib


def get_version() -> str:
version_string_path = pathlib.Path(__file__).parents[2] / "src/flexrag/__init__.py"
with open(version_string_path, encoding="utf-8") as f:
version = re.search(r"__VERSION__ = \"(.*?)\"", f.read()).group(1)
return version


project = "FlexRAG Documentation"
html_short_title = "FlexRAG Documentation"
copyright = "2025, ZhuochengZhang"
author = "ZhuochengZhang"
release = "0.1.2"
release = get_version()

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx_copybutton",
"myst_parser",
]

Expand All @@ -27,9 +39,10 @@
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "piccolo_theme"
html_theme = "sphinx_book_theme"
html_static_path = ["_static", "../../assets"]
html_theme_options = {
"source_url": "https://github.com/ictnlp/flexrag",
"source_icon": "github",
"path_to_docs": "docs/source",
"repository_url": "https://github.com/ictnlp/flexrag",
"use_repository_button": True,
}
10 changes: 5 additions & 5 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
.. SphinxTest documentation master file, created by
sphinx-quickstart on Thu Jan 2 10:05:21 2025.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
|
.. image:: ../../assets/flexrag-wide.png
:alt: FlexRAG
:align: center

FlexRAG documentation
|
Welecome to FlexRAG Documentation
=====================

FlexRAG is a flexible and high-performance framework designed for Retrieval-Augmented Generation (RAG) tasks, offering support for multimodal data, seamless configuration management, and out-of-the-box performance for both research and prototyping.
Expand Down
2 changes: 1 addition & 1 deletion src/flexrag/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from .models import GENERATORS, ENCODERS


__VERSION__ = "0.1.4"
__VERSION__ = "0.1.5"


__all__ = [
Expand Down
8 changes: 5 additions & 3 deletions src/flexrag/data/line_delimited_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ def __init__(
self,
file_paths: list[str] | str,
data_ranges: Optional[list[list[int, int]] | list[int, int]] = None,
encoding: str = "utf-8",
):
# for single file path
if isinstance(file_paths, str):
Expand All @@ -37,6 +38,7 @@ def __init__(

self.file_paths = file_paths
self.data_ranges = data_ranges
self.encoding = encoding
return

def __iter__(self) -> Iterator[dict]:
Expand All @@ -48,7 +50,7 @@ def __iter__(self) -> Iterator[dict]:
if end_point > 0:
assert end_point > start_point, f"Invalid data range: {data_range}"
if file_path.endswith(".jsonl"):
with open(file_path, "r") as f:
with open(file_path, "r", encoding=self.encoding) as f:
for i, line in enumerate(f):
if i < start_point:
continue
Expand All @@ -57,7 +59,7 @@ def __iter__(self) -> Iterator[dict]:
yield json.loads(line)
elif file_path.endswith(".tsv"):
title = []
with open(file_path, "r") as f:
with open(file_path, "r", encoding=self.encoding) as f:
for i, row in enumerate(csv_reader(f, delimiter="\t")):
if i == 0:
title = row
Expand All @@ -69,7 +71,7 @@ def __iter__(self) -> Iterator[dict]:
yield dict(zip(title, row))
elif file_path.endswith(".csv"):
title = []
with open(file_path, "r") as f:
with open(file_path, "r", encoding=self.encoding) as f:
for i, row in enumerate(csv_reader(f)):
if i == 0:
title = row
Expand Down
2 changes: 1 addition & 1 deletion src/flexrag/entrypoints/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def main(config: Config):
case "clear":
RETRIEVAL_CACHE.clear()
case "export":
with open(config.export_path, "w") as f:
with open(config.export_path, "w", encoding="utf-8") as f:
for key in RETRIEVAL_CACHE:
data = json.loads(key)
data["retrieved_contexts"] = RETRIEVAL_CACHE[key]
Expand Down
6 changes: 3 additions & 3 deletions src/flexrag/entrypoints/combine_outputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ def main(cfg: Config):
golden_contexts = []
responses = []
contexts = []
with open(output_details_path, "w") as f:
with open(output_details_path, "w", encoding="utf-8") as f:
for result_path in cfg.result_paths:
details_path = os.path.join(result_path, "details.jsonl")
for line in open(details_path, "r"):
for line in open(details_path, "r", encoding="utf-8"):
f.write(line)
data = json.loads(line)
questions.append(data["question"])
Expand All @@ -69,7 +69,7 @@ def main(cfg: Config):
golden_contexts=golden_contexts,
log=True,
)
with open(output_eval_score_path, "w") as f:
with open(output_eval_score_path, "w", encoding="utf-8") as f:
json.dump(
{
"eval_scores": resp_score,
Expand Down
2 changes: 1 addition & 1 deletion src/flexrag/entrypoints/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def main(config: Config):
log=True,
)
if config.output_path is not None:
with open(config.output_path, "w") as f:
with open(config.output_path, "w", encoding="utf-8") as f:
json.dump(
{
"eval_scores": resp_score,
Expand Down
14 changes: 7 additions & 7 deletions src/flexrag/entrypoints/run_assistant.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,13 +68,13 @@ def main(config: Config):
config_path = os.path.join(config.output_path, "config.yaml")
log_path = os.path.join(config.output_path, "log.txt")
else:
details_path = "/dev/null"
eval_score_path = "/dev/null"
config_path = "/dev/null"
log_path = "/dev/null"
details_path = os.devnull
eval_score_path = os.devnull
config_path = os.devnull
log_path = os.devnull

# save config and set logger
with open(config_path, "w") as f:
with open(config_path, "w", encoding="utf-8") as f:
OmegaConf.save(config, f)
handler = logging.FileHandler(log_path)
LOGGER_MANAGER.add_handler(handler)
Expand All @@ -87,7 +87,7 @@ def main(config: Config):
golden_contexts = []
responses = []
contexts: list[list[RetrievedContext]] = []
with open(details_path, "w") as f:
with open(details_path, "w", encoding="utf-8") as f:
for item in testset:
questions.append(item.question)
golden_answers.append(item.golden_answers)
Expand Down Expand Up @@ -121,7 +121,7 @@ def main(config: Config):
golden_contexts=golden_contexts,
log=True,
)
with open(eval_score_path, "w") as f:
with open(eval_score_path, "w", encoding="utf-8") as f:
json.dump(
{
"eval_scores": resp_score,
Expand Down
12 changes: 6 additions & 6 deletions src/flexrag/prompt/prompt_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def to_json(self, path: str | PathLike):
data["history"].append(turn.to_dict())
for demo in self.demonstrations:
data["demonstrations"].append([turn.to_dict() for turn in demo])
with open(path, "w") as f:
with open(path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=4, ensure_ascii=False)
return

Expand All @@ -139,7 +139,7 @@ def from_list(cls, prompt: list[dict[str, str]]) -> "ChatPrompt":

@classmethod
def from_json(cls, path: str | PathLike) -> "ChatPrompt":
with open(path, "r") as f:
with open(path, "r", encoding="utf-8") as f:
data = json.load(f)
if isinstance(data, list):
return cls.from_list(data)
Expand All @@ -153,7 +153,7 @@ def from_json(cls, path: str | PathLike) -> "ChatPrompt":
)

def load_demonstrations(self, demo_path: str | PathLike):
with open(demo_path, "r") as f:
with open(demo_path, "r", encoding="utf-8") as f:
data = json.load(f)
self.demonstrations = [
[ChatTurn.from_dict(turn) for turn in demo] for demo in data
Expand Down Expand Up @@ -249,7 +249,7 @@ def to_json(self, path: str | PathLike):
data["history"].append(turn.to_dict())
for demo in self.demonstrations:
data["demonstrations"].append([turn.to_dict() for turn in demo])
with open(path, "w") as f:
with open(path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=4, ensure_ascii=False)
return

Expand All @@ -264,7 +264,7 @@ def from_list(cls, prompt: list[dict[str, str]]) -> "ChatPrompt":

@classmethod
def from_json(cls, path: str | PathLike) -> "ChatPrompt":
with open(path, "r") as f:
with open(path, "r", encoding="utf-8") as f:
data = json.load(f)
if isinstance(data, list):
return cls.from_list(data)
Expand All @@ -288,7 +288,7 @@ def images(self) -> list[Image]:
return images

def load_demonstrations(self, demo_path: str | PathLike):
with open(demo_path, "r") as f:
with open(demo_path, "r", encoding="utf-8") as f:
data = json.load(f)
self.demonstrations = [
[MultiModelChatTurn.from_dict(turn) for turn in demo] for demo in data
Expand Down
4 changes: 2 additions & 2 deletions src/flexrag/retriever/index/annoy_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,14 @@ def serialize(self) -> None:
if not os.path.exists(os.path.dirname(self.index_path)):
os.makedirs(os.path.dirname(self.index_path))
self.index.save(self.index_path)
with open(f"{self.index_path}.meta", "w") as f:
with open(f"{self.index_path}.meta", "w", encoding="utf-8") as f:
f.write(f"distance_function: {self.distance_function}\n")
f.write(f"embedding_size: {self.embedding_size}\n")
return

def deserialize(self) -> None:
logger.info(f"Loading index from {self.index_path}")
with open(f"{self.index_path}.meta", "r") as f:
with open(f"{self.index_path}.meta", "r", encoding="utf-8") as f:
self.distance_function = f.readline()[len("distance_function: ") :].strip()
embedding_size = int(f.readline()[len("embedding_size: ") :].strip())
match self.distance_function:
Expand Down
2 changes: 1 addition & 1 deletion src/flexrag/retriever/web_retrievers/web_retriever.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def _save_error_state(retry_state: RetryCallState) -> Exception:
"args": retry_state.args,
"kwargs": retry_state.kwargs,
}
with open("web_retriever_error_state.json", "w") as f:
with open("web_retriever_error_state.json", "w", encoding="utf-8") as f:
json.dump(args, f)
raise retry_state.outcome.exception()

Expand Down

0 comments on commit 78fe189

Please sign in to comment.