Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLP0006: Monlam learners dictionary example sentence #1

Open
4 tasks done
kaldan007 opened this issue Jan 7, 2025 · 6 comments
Open
4 tasks done

NLP0006: Monlam learners dictionary example sentence #1

kaldan007 opened this issue Jan 7, 2025 · 6 comments
Assignees

Comments

@kaldan007
Copy link
Contributor

kaldan007 commented Jan 7, 2025

Description:
We have learners dictionary but many of the dictionary entries doesn't have the example sentences or example appropriate to the CEFR level. Therefore we would like to get AI suggestion example and let our dictionary editor select or edit the AI suggestion. Since we can't make dedicated editor immediately we would love to load all the data in google sheet.

NOTE: We would like to process only A0,A1 and A2 level words

Input:

{
    "word_id": "3915",
    "lemma": "ཉམས་",
    "meanings": {
        "273": {
            "meaning": "བྱ་ཚིག་གཟུགས་མི་འགྱུར་བ།༡ ནུས་པ་སོགས་ཞན་པར་གྱུར་པ།",
            "examples": {
                "1": "ལུས་སྟོབས་ཉམས།"
            },
            "pos": "མིང་ཚིག"
        },
        "274": {
            "meaning": "ལུས་ངག་གི་བཟོ་ལྟའམ་རྣམ་འགྱུར།",
            "examples": {
                "1": "ཁོ་ལ་མཁས་ཉམས་འདུག"
            },
            "pos": "མིང་ཚིག"
        }
    },
    "level": "A2"
}

Expected output:

  • We would like to have google sheet with these column
    • word id
    • མ་ཚིག
    • meaning id
    • འགྲེལ་བ།
    • དཔེར་བརྗོད་ཚིག་སྒྲུབ།
    • རིག་ནུས་དཔེར་བརྗོད་ཚིག་སྒྲུབ།
    • བརྡ་སྤྲོད་ཀྱི་དབྱེ་བ།
    • གནས་ཚད།
    • ཞུ་དག་པ།
    • གཏན་འབེབས།
  • We would like to ship 200 words per google sheet

Implementation

Image

Subtasks

  • Write a script to extract data from json files and store it in a csv file.
  • Use the sense parser code and generate རིག་ནུས་དཔེར་བརྗོད་ཚིག་སྒྲུབ། using Claude.
  • store it in the csv file.

###Reviewer : @kaldan007

  • Reviewed
@kaldan007
Copy link
Contributor Author

kaldan007 commented Jan 13, 2025

u can use this prompt:
གཤམ་གྱི་མ་ཚིག་དང་ནང་དོན་གཞིར་བཟུང་གཞིར་བཟུང་ནས་མ་ཚིག་དེར་དཔེ་མཚོན་བརྗོད་པ་<{number_of_sentence}>སྤྲད་རོགས། བརྗོད་པའི་སྐད་ཡིག་ཚད་གཞི་དེ་ <{level}>ཡིན་དགོས།
མ་ཚིག: <{lemma}>
ནང་དོན།: <{meaning}>
དཔེར་བརྗོད་རྣམས Python list format ཁོ་ནའི་ནང་སྤྲད་རོགས།

@dhakar66
Copy link
Contributor

if default example of json object is empty then recommend one ai_generated example must be provided. else if example is true then split it by space and store it in a list. check the length of it and generate ai example accordingly.

@dhakar66
Copy link
Contributor

dhakar66 commented Jan 16, 2025

UPDATED PROMPT 1 ->
གཤམ་གྱི་མ་ཚིག་དང་ནང་དོན་གཞིར་བཟུང་གཞིར་བཟུང་ནས་མ་ཚིག་དེར་དཔེ་མཚོན་བརྗོད་པ་<{number_of_sentences}>སྤྲད་རོགས།
བརྗོད་པའི་སྐད་ཡིག་ཚད་གཞི་དེ་ <{level}>ཡིན་དགོས།
མ་ཚིག: <{lemma}>
ནང་དོན།: <{meaning}>
དཔེར་བརྗོད་རྣམས Python list format ཁོ་ནའི་ནང་སྤྲད་རོགས། give only the exact amount of number of sentences that is written up there and give only in python list format. nothing else extra

@dhakar66
Copy link
Contributor

UPDATED PROMPT 2->
You are a person with CEFR level <{level}> in Tibetan language.
Task:
Write example sentences using a Tibetan word and its meaning according to your language level.
Input:
number of sentences: {number of sentence}
word:<{word}>
meaning: <{meaning}>
Output:
Return only the example sentences without any explanation or additional text in python list format

@dhakar66
Copy link
Contributor

UPDATED PROMPT 3 ->
ཁྱེད་རང་ནི་བོད་སྐད་ཀྱི་ཤེས་ཚད་ CEFR རིམ་པ་{}ཅན་གྱི་སློབ་མ་ཞིག་རེད།
གཤམ་གྱི་མ་ཚིག་དང་ནང་དོན་གཞིར་བཟུང་ནས་མ་ཚིག་དེའི་དཔེ་མཚོན་བརྗོད་པ་<{number_of_sentences}>སྤྲོད་རོགས།
མ་ཚིག: <{lemma}>
ནང་དོན།: <{meaning}>
དཔེར་བརྗོད་རྣམས Python list format ཁོ་ནའི་ནང་སྤྲོད་རོགས།
Give only the exact amount of number of sentences that is written up there and give only in python list format. nothing else extra

@kaldan007
Copy link
Contributor Author

kaldan007 commented Jan 17, 2025

@dhakar66 After evaluating the 100 sentence above, 56% is good, 15% is normal and 29% is bad quality or not usable at all. I guess we will run the script. i will update the dictionary team later about our recommendation system.
here is sheet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants