Releases · kakao/FunctionChat-Bench · GitHub

04 Dec 05:50

gannim

v1.2.0 Latest

Latest

호환 포맷 평가 옵션 'common' 개발
'dialog' 평가 옵션에 exact match pass 로직 적용
작업 경로 설정 변경
신규 평가셋 FunctonChat-CallDecision 추가
call 루브릭에 세부사항 추가
slot 루브릭에 pass 기준 보강
slot 루브릭에 fail 기준 보강
시스템 프롬프트에 slot question 관련 디테일 추가

Assets 2

23 Oct 07:36

gannim

v1.1.1

Tool Call type 평가 시 Exact Match 누락 오류 해결
최종 점수 산정 시 'pass' counting 누락 사례 추가
출처 인용 정보 추가

Assets 2

24 Sep 01:43

gannim

v1.1.0

acceptable_arguments valid format으로 수정

acceptable_arguments 추가 (e.g. 홈베이킹 도구 + 홈베이킹, 기영이 결혼식 + 기영이, 결혼식, 170 + 170.0 등)

fix typos

Exact match 로직 오류 해결 (exact match가 아님에도 exact match pass로 처리되어 루브릭을 타지 않는 오류 케이스 해결)

LLM judge 모델인 gpt-4-0125-preview 를 위한 루브릭 개선

Full Changelog: v1.0.0...v1.1.0

Assets 2