Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ds.JSONDatasource #539

Open
3 tasks done
ariexBear opened this issue Jan 10, 2025 · 4 comments
Open
3 tasks done

[Bug]: ds.JSONDatasource #539

ariexBear opened this issue Jan 10, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@ariexBear
Copy link

Before Reporting 报告之前

  • I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。

  • I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template) 我已经仔细阅读了 README 上的操作指引,并且在安装过程中没有错误发生。(否则,我们建议您使用Question模板向我们进行提问)

Search before reporting 先搜索,再报告

  • I have searched the Data-Juicer issues and found no similar bugs. 我已经在 issue列表 中搜索但是没有发现类似的bug报告。

OS 系统

Ubuntu

Installation Method 安装方式

Official image: FROM datajuicer/data-juicer:v1.0.3

Data-Juicer Version Data-Juicer版本

1.0.3

Python Version Python版本

3.10.12

Describe the bug 描述这个bug

与Ray 2.40.0不兼容。

To Reproduce 如何复现

在本机启动ray start --head,启动任何一个任务,都会在运行中保错:

2025-01-10 08:26:02 | ERROR | main:33 - An error has been caught in function '', process 'MainProcess' (135719), thread 'MainThread' (139973282997376):
Traceback (most recent call last):

File "/usr/local/bin/dj-process", line 33, in
sys.exit(load_entry_point('py-data-juicer', 'console_scripts', 'dj-process')())
│ │ └ <function importlib_load_entry_point at 0x7f4e11c03d90>
│ └
└ <module 'sys' (built-in)>

File "/data-juicer/tools/process_data.py", line 13, in main
from data_juicer.core.ray_executor import RayExecutor

File "/data-juicer/data_juicer/core/ray_executor.py", line 8, in
from data_juicer.core.ray_data import RayDataset

File "/data-juicer/data_juicer/core/ray_data.py", line 197, in
class JSONStreamDatasource(ds.JSONDatasource):
└ <module 'ray.data.datasource' from '/usr/local/lib/python3.10/dist-packages/ray/data/datasource/init.py'>

File "/data-juicer/data_juicer/utils/lazy_loader.py", line 65, in getattr
return getattr(module, item)
│ └ 'JSONDatasource'
└ <module 'ray.data.datasource' from '/usr/local/lib/python3.10/dist-packages/ray/data/datasource/init.py'>

AttributeError: module 'ray.data.datasource' has no attribute 'JSONDatasource'. Did you mean: 'Datasource'?
Traceback (most recent call last):
File "/usr/local/bin/dj-process", line 33, in
sys.exit(load_entry_point('py-data-juicer', 'console_scripts', 'dj-process')())
File "/usr/local/lib/python3.10/dist-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
File "/data-juicer/tools/process_data.py", line 13, in main
from data_juicer.core.ray_executor import RayExecutor
File "/data-juicer/data_juicer/core/ray_executor.py", line 8, in
from data_juicer.core.ray_data import RayDataset
File "/data-juicer/data_juicer/core/ray_data.py", line 197, in
class JSONStreamDatasource(ds.JSONDatasource):
File "/data-juicer/data_juicer/utils/lazy_loader.py", line 65, in getattr
return getattr(module, item)
AttributeError: module 'ray.data.datasource' has no attribute 'JSONDatasource'. Did you mean: 'Datasource'?


把文件:
/data-juicer/data_juicer/core/ray_data.py 第210行从:
class JSONStreamDatasource(ds.JSONDatasource):
改为:
class JSONStreamDatasource(ds.Datasource):
问题解决。

Configs 配置信息

No response

Logs 报错日志

No response

Screenshots 截图

No response

Additional 额外信息

No response

@ariexBear ariexBear added the bug Something isn't working label Jan 10, 2025
@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 10, 2025

请检查 Ray 的版本,当前 PyPi 上的 Data-Juicer 要求 Ray版本必须为 2.31.0,其他版本可能存在兼容性问题

@ariexBear
Copy link
Author

ariexBear commented Jan 10, 2025

@pan-x-c pan 这个就是问题,datajuicer/data-juicer:v1.0.3 这个镜像里安装的Ray 是2.40.0

https://hub.docker.com/layers/datajuicer/data-juicer/v1.0.3/images/sha256-e33f463ee21dd55c3d1a0b6c861e11a45bd260d1d253977c84575c6f8d07ab0f

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 10, 2025

可以尝试安装当前仓库主分支的最新版本 (pip install -e .),最新的主分支的依赖已经更新到 2.40.0,我们预计会在下周在 PyPi 发布新版本

@ariexBear
Copy link
Author

@pan-x-c Main branch目前看可用,多谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants