Name	Name	Last commit message	Last commit date
Latest commit History 41 Commits
data	data
images	images
models	models
README.md	README.md
requirements.txt	requirements.txt

MEDSQ: Towards Personalized Medical Education via Multi-Modal Interaction Guidance

Dataset Sources

Our EduDiag and ScqTest datasets is constructed from two resources: MIMIC-CXR and Chest ImaGenome.

Statistics

Abnormal zones

The proportion of each abnormal zones in the entire datasets.

Reponses

The average number of words per response in each round for every template in the bilingual dataset EduDiag.

Options

The distribution of options for single-choice questions in the bilingual dataset ScqTest.

Dataset Cases

EduDiag

English

Chinese

ScqTest

English

Chinese

Dataset Details

Each data in EduDiag contains image, report_en, qa_en, report_zh, qa_zh. image records the information contained in the patient's chest X-ray, image_path indicates the path of the image, reason_for_exam contains the patient's medical history and the purpose of the examination, bbox lists all anatomical locations with abnormalities and uses focuses to indicate specific abnormalities, and the remaining fields are directly derived from Chest ImaGenome. report_en and report_zh are cleaned English and Chinese medical reports respectively. qa_en and qa_zh contain multi-round question and answer of bilingual templates.

{
  "image": {
    "image_id": "19d2573b-bbbb5192-d992c5a2-7b72f28b-b6182646",
    "image_path": "img/19d2573b-bbbb5192-d992c5a2-7b72f28b-b6182646.jpg",
    "viewpoint": "AP",
    "patient_id": 19422157,
    "study_id": 53040876,
    "gender": "F",
    "patient_age": "40-50",
    "reason_for_exam": "A woman with severe upper abdominal pain s/p endoscopy.  // evaluate for free air.",
    "bbox": [
      {
        "bbox_name": "left lower lung zone",
        "original_x": 1364,
        "original_y": 1882,
        "original_width": 777,
        "original_height": 723,
        "x": 119,
        "y": 138,
        "width": 57,
        "height": 53,
        "focuses": [
          "atelectasis",
          ...
        ]
      },
      ...
    ]
  },
  "report_en": "...",
  "qa_en": [
    [
      {
        "Question": "...",
        "Answer": "..."
      },
      ...
    ],
    ...
  ],
  "report_zh": "...",
  "qa_zh": [...]
}

Each data of ScqTest contains image, report_en, test_en, report_zh, test_zh. image records the information contained in the patient's chest X-ray in the same way. test_en and test_zh are English and Chinese single-choice question banks.

{
  "image": {...},
  "report_en": "...",
  "test_en": [
    [
      {
          "Question": "...",
          "A": "...",
          "B": "...",
          "C": "...",
          "D": "...",
          "GT": "D"
      },
      ...
    ],
    ...
  ],
  "report_zh": "...",
  "test_zh": [...]
}

Access

Our datasets are available in the data directory. Both dataset EduDiag and dataset ScqTest are stored in json format. The loading and conversion methods are as follows:

from models.utils.convert_data import seed, read_json, convert_for_gen

# Set random seed
seed(42)

# Load data
data = read_json('data/EduDiag.json')

# Convert the original data into training format
train_dataset, _, _ = divide_data('data/EduDiag.json')
data = convert_for_gen(train_dataset, 'en')
to_trained_data(data, 'data/train_data.json')

Fine-tuning

Before running the code, you need to install the following dependencies:

pip install -r requirements.txt

Fine-tune using our dataset:

cd MEDSQ
bash models/scripts/finetune_lora.sh

Inference

Use the following code for inference, or run inference.py.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


instruction = 'FINDINGS:The lungs are clear of consolidation. Linear left basilar opacity is most likely atelectasis versus scarring. The cardiomediastinal silhouette is within normal limits. Median sternotomy wires are again noted. There is no free air below the diaphragm.IMPRESSION:No acute cardiopulmonary process. No free intraperitoneal air.\nBased on the above information, answer the question.\nQuestion: Please provide detailed and comprehensive diagnostic results.'
saved_model_path = 'MEDSQ'
tokenizer = AutoTokenizer.from_pretrained(saved_model_path)
model = AutoModelForCausalLM.from_pretrained(saved_model_path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
response = model.chat(tokenizer, instruction, history=None, eos_token_id=2, pad_token_id=2, temperature=0.3, top_p=0.8, max_length=None, max_new_tokens=512)[0]
print(response)

Comparsion

Ablation study of filtering operations in abnormal area localization scenario.

Assessment of medical students' satisfaction with individual model responses.

Evaluation of the relevance of generated text answers to the Ground Truth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MEDSQ: Towards Personalized Medical Education via Multi-Modal Interaction Guidance

Dataset Sources

Statistics

Abnormal zones

Reponses

Options

Dataset Cases

EduDiag

English

Chinese

ScqTest

English

Chinese

Dataset Details

Fine-tuning

Inference

Comparsion

About

Releases

Packages

Languages

JaneGovan/MEDSQ

Folders and files

Latest commit

History

Repository files navigation

MEDSQ: Towards Personalized Medical Education via Multi-Modal Interaction Guidance

Dataset Sources

Statistics

Abnormal zones

Reponses

Options

Dataset Cases

EduDiag

English

Chinese

ScqTest

English

Chinese

Dataset Details

Fine-tuning

Inference

Comparsion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages