Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryWithRag 方法传了本地的llm模型,但是还是提示AssertionError: DASHSCOPE_API_KEY should be set in environ. #520

Open
3 tasks
yebanliuying opened this issue Jul 5, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@yebanliuying
Copy link

Initial Checks

  • I have searched GitHub for a duplicate issue and I'm sure this is something new
  • I have read and followed the docs & demos and still think this is a bug
  • I am confident that the issue is with modelscope-agent (not my code, or another library in the ecosystem)

What happened + What you expected to happen

image 提示:AssertionError: DASHSCOPE_API_KEY should be set in environ. 因为是本地环境连不了网,只能连接本地ollama

Versions / Dependencies

最新版本

Reproduction script

llm_config = {
'model': 'qwen2',
'model_server': 'ollama',
}
function_list = []
memory = MemoryWithRag(urls=['tests/samples/常见QA.pdf'], function_list=function_list,llm=llm_config, use_knowledge_cache=False)

Issue Severity

High: It blocks me from completing my task.

@yebanliuying yebanliuying added the bug Something isn't working label Jul 5, 2024
@zzhangpurdue zzhangpurdue removed their assignment Jul 5, 2024
@suluyana
Copy link
Collaborator

suluyana commented Jul 9, 2024

MemoryWithRag默认的embedding模型也会调用dashscope api,导致了这个问题。下载使用本地embedding模型的方式理论上直接可用,但还未测试,我们测试后提供。这里直接用dashscope api是因为本地模型支持的并发量较低,此前出现过响应过慢超时的问题。

另外两个MemoryWithXxx类会使用下载开源embedding模型。

@yebanliuying
Copy link
Author

MemoryWithRag默认的embedding模型也会调用dashscope api,导致了这个问题。下载使用本地embedding模型的方式理论上直接可用,但还未测试,我们测试后提供。这里直接用dashscope api是因为本地模型支持的并发量较低,此前出现过响应过慢超时的问题。

另外两个MemoryWithXxx类会使用下载开源embedding模型。

image
我现在用本地模型去调用MemoryWithRetrievalKnowledge 报错,就是上面说的原因吗?我看modelscope_hub.py 包里面model_id: str = "damo/nlp_corom_sentence-embedding_english-base" 。报错信息:ImportError: Could not import some python packages.Please install it with pip install modelscope.

@RyanOvO
Copy link

RyanOvO commented Aug 14, 2024

MemoryWithRag默认的embedding模型也会调用dashscope api,导致了这个问题。下载使用本地embedding模型的方式理论上直接可用,但还未测试,我们测试后提供。这里直接用dashscope api是因为本地模型支持的并发量较低,此前出现过响应过慢超时的问题。

另外两个MemoryWithXxx类会使用下载开源embedding模型。

什么时候能支持本地embedding模型的接入?老实说,能理解你们希望尽量往modelscope靠拢的初衷,但很多时候,受限于环境和要求只能是本地部署,不让接入外部网络。

@bg4xsd
Copy link

bg4xsd commented Aug 15, 2024

[CN]
采用 langchain的样例,目前的嵌入模型被下载后是放在ModelScope的cache目录中,虽然不会重复下载,但迁移起来还是有点麻烦,但通常可以在环境变量中设置cache目录和下载目录进行管理,所以下载和管理本地嵌入模型的问题不大,特别是llamaindex的例子,在Settgins里面可以用 local进行修饰,明确调用本地模型。
现在遇到的问题是,对于 MemoryWithRetrievalKnowledge 系列的例子,这个嵌入模型的前置Agent使用的LLM和embedding嵌入之间有依赖关系。qwen-max + damo/nlp_gte_sentence-embedding_chinese-base, OK; qwen-max + Xorbits/bge-large-zh-v1.5, error, siliconflow的qwen-7b+damo/nlp_gte_sentence-embedding_chinese-base, error。我猜测换ollama 经过简单测试可以发现 MemoryWithRetrievalKnowledge 这个工具内耦合的比较重。
相比 llmaindex_rag的例子对这种组合测试就可以轻松过关。 如果大师们有空帮忙看看应该从那个角度入手去解决问题。

[EN]
Using the example of langchain, the current embedding model is placed in the cache directory of ModelScope after being downloaded. Although it will not be downloaded repeatedly, it is still a bit troublesome to migrate. However, it can usually be managed by setting the cache directory and the download directory in the environment variables. Therefore, the problem of downloading and managing local embedding models is not a big issue, especially in the example of llamaindex, where the local model can be explicitly called with the 'local' modifier in Settings.

The problem I'm facing now is that for the MemoryWithRetrievalKnowledge series of examples, there is a dependency between the LLM and embedding model. Some tests shown that qwen-max + damo/nlp_gte_sentence-embedding_chinese-base, OK; qwen-max + Xorbits/bge-large-zh-v1.5, error, siliconflow's qwen-7b + damo/nlp_gte_sentence-embedding_chinese-base, error. I guess that ollama can not work well, it can be found that the MemoryWithRetrievalKnowledge tool has a heavy internal coupling.

Compared to the example of llmaindex_rag, this combination test can pass easily. If the Masters are available, please help to see which angle should be approached to solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants