-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get started? #76
Comments
What os are you trying to run this on? Also, it looks like you do not have CUDA installed properly which will make it difficult to train quickly:
|
HI ncoop57, Can you help me a little more? I am trying to load the data. I downloaded the dataset from the-eye.eu but I am not able to correctly pass it to training. PLease help.
|
Do you have this file stored here?
If not, I believe this file is the same one: https://github.com/CodedotAl/gpt-code-clippy/blob/camera-ready/data_processing/code_clippy_filter.py |
I tried the other way. Is it possible that the huggingface method is not working anymore since the data page's format has changed? I will try the download method and try now. Error details:
Output
|
If there is a command you use, could you tell me? @ncoop57 |
After some not giving up talk while watching TBBT in the last season, I am finally able to at least get the data download to start. Here are the steps:
Here is the command that I ran: """python3 run_clm_apps.py --output_dir /data/opengpt/output/ --cache_dir /data/opengpt/cache --dataset_name CodedotAI/code_clippy""" |
Hello Pankaj. Are you trying to fine-tune a model with the dataset? If so, my suggestion would be the following,
|
Hi Reshinth I am trying to run it on my new GPU and see how good it can get on it if possible. I am new to using transformers. So you are suggesting to download the run_clm.py file run it and pass the code_clippy python file as a parameter. LEt me try that. |
hi @reshinthadithyan and @ncoop57 , the data download often breaks with read timeouts. IS there a way to handle it? |
I think I could download the data but the command gives an error at the end before loading data (would 64GB memory suffice?):
|
Hey @pankajkumar229 that should be enough as long as you read it with streaming mode enabled, else it will not work. However, the error you are showing seems to be different and one that I'm unsure why it is happening. Could you share the |
Is there an easier way to get started?
I tried to setup a machine and install all requirements. Would try tomorrow to go further but maybe I am doing something wrong:
The error I am at currently is:
"""
2021-11-05 22:23:59.523515: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "run_clm_apps.py", line 800, in
main()
File "run_clm_apps.py", line 342, in main
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/home/pankaj/.local/lib/python3.8/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "", line 14, in init
File "run_clm_apps.py", line 174, in post_init
raise ValueError("Need either a dataset name or a training/validation file.")
ValueError: Need either a dataset name or a training/validation file.
"""
Also, getting the requirements to work was quite difficult on my machine. Wondering if I am doing something wrong.
The text was updated successfully, but these errors were encountered: