Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'scores' in judgelm_preprocess.py #22

Open
ilyes-c opened this issue Jun 2, 2024 · 0 comments
Open

KeyError: 'scores' in judgelm_preprocess.py #22

ilyes-c opened this issue Jun 2, 2024 · 0 comments

Comments

@ilyes-c
Copy link

ilyes-c commented Jun 2, 2024

When running the judgelm_preprocess.py script, I encountered a KeyError: 'scores' error. It appears that the script expects a scores field in the JSON files, but the documentation does not mention that this field is required. Here are the details:

Steps to Reproduce:

  1. Clone the repository: git clone https://github.com/baaivision/JudgeLM
  2. Navigate to the directory: cd JudgeLM
  3. Create and activate a conda environment.
  4. Install the required dependencies: pip install -r requirements.txt
  5. Run the preprocessing script with the following command:
    python C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py --ans1_file_path C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/answers/gpt-4o_judgelm_val.jsonl --ans2_file_path C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/answers/gpt-4_judgelm_val.jsonl
    

Error Message:

Traceback (most recent call last):
  File "C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py", line 95, in <module>
    combine_judgelm_val_judge_samples(args.ans1_file_path, args.ans2_file_path, args.ansmore_file_paths)
  File "C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py", line 18, in combine_judgelm_val_judge_samples
    ans1_dict_list = extract_jsonl(ans1_file_path)
  File "C:/Users/mliki/JudgeLM/judgelm/utils.py", line 26, in extract_jsonl
    data = json.loads(line)
  File "C:/Users/mliki/anaconda3/envs/judgelm/lib/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:/Users/mliki/anaconda3/envs/judgelm/lib/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:/Users/mliki/anaconda3/envs/judgelm/lib/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py", line 36, in combine_judgelm_val_judge_samples
    'score': [ans1_dict['scores'], ans2_dict['scores']],
KeyError: 'scores'

Suggested Fix:

  • Either update the documentation to specify that the scores field is required in the JSON files.
  • Or, modify the script to handle cases where the scores field is not present.

I have made a temporary modification to the script to handle missing scores fields by using default values. Here is the updated function:

    sample_dict = {
        'question_id': i,
        'score': [ans1_dict.get('scores', []), ans2_dict.get('scores', [])],
        'question_body': question_body,
        'answer1_body': ans1_dict['text'],
        'answer2_body': ans2_dict['text'],
        'answer1_model_id': ans1_dict['model'],
        'answer2_model_id': ans2_dict['model'],
        'answer1_metadata': {
            'decoding_method': ans1_dict.get('decoding_method', ''),
        },
        'answer2_metadata': {
            'decoding_method': ans2_dict.get('decoding_method', ''),
        }
    }

Please let me know if there are any other suggestions or if I should make further modifications. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant