Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Trainer Training - AttributeError: 'NoneType' object has no attribute 'shape' #36054

Closed
Md-Nasif03 opened this issue Feb 5, 2025 · 7 comments

Comments

@Md-Nasif03
Copy link

Description:

I am working on fine-tuning an image-text model using the Hugging Face AutoModelForImageTextToText and LlavaProcessor. While attempting to train the model using the SFTTrainer, I encountered an error related to a NoneType object during the training loop. The error occurs specifically in the _merge_input_ids_with_image_features method in the modeling_llava.py file.

Note:

I have load the data(json) from my GDrive

Error Details:

AttributeError: 'NoneType' object has no attribute 'shape'

Image

Error Occurrence:

The error occurs after calling trainer.train(), and it seems that during the training, the image_features passed into the _merge_input_ids_with_image_features function is None, causing the AttributeError when the code tries to access its shape.

Code Snippet Leading to the Error:

trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
peft_config=peft_config,
tokenizer=tokenizer,
args=sft_config,
)

Train model
trainer.train()

Relevant Model Function:

The error occurs within the following function in modeling_llava.py:

def _merge_input_ids_with_image_features(self, image_features, inputs_embeds, input_ids, attention_mask, labels):
num_images, num_image_patches, embed_dim = image_features.shape # Error here
batch_size, sequence_length = input_ids.shape
# Further processing...

Potential Causes:

  • image_features might not be properly processed or passed to the model.
  • The image preprocessing function might not return the correct features, or the dataset might not have the expected structure.

Request:

  1. Could you help me troubleshoot this issue and suggest how to fix the NoneType error?
  2. What might cause the image_features variable to be None?
  3. How can I ensure that image_features is properly populated and passed to the model?

Code:

Load the base model

model = AutoModelForImageTextToText.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Load LLaMA tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training
processor = LlavaProcessor.from_pretrained(model_name)

Prompt Template

conversation = [
{

  "role": "user",
  "content": [
      {"type": "text", "text": "is there any fracture"},
      {"type": "image"},
    ],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

Reload the Dataset from Google Drive

import json
with open("/content/drive/MyDrive/fineTune model1/sub_datset1.json", "r", encoding="utf-8") as f:
reloaded_dataset1 = json.load(f)
with open("/content/drive/MyDrive/fineTune model1/sub_datset2.json", "r", encoding="utf-8") as f:
reloaded_dataset2 = json.load(f)
with open("/content/drive/MyDrive/fineTune model1/sub_datset3.json", "r", encoding="utf-8") as f:
reloaded_dataset3 = json.load(f)

Converting to Hugging Face Dataset

from datasets import Dataset

Convert the reformatted data into a Hugging Face Dataset
hf_dataset1 = Dataset.from_dict({
"image": [item["image"] for item in reloaded_dataset1],
"question": [item["question"] for item in reloaded_dataset1],
"answer": [item["answer"] for item in reloaded_dataset1]
})

Convert the reformatted data into a Hugging Face Dataset
hf_dataset2 = Dataset.from_dict({
"image": [item["image"] for item in reloaded_dataset2],
"question": [item["question"] for item in reloaded_dataset2],
"answer": [item["answer"] for item in reloaded_dataset2]
})

Convert the reformatted data into a Hugging Face Dataset
hf_dataset3 = Dataset.from_dict({
"image": [item["image"] for item in reloaded_dataset3],
"question": [item["question"] for item in reloaded_dataset3],
"answer": [item["answer"] for item in reloaded_dataset3]
})

Merge the dataset

from datasets import concatenate_datasets

Concatenate the datasets
merged_dataset1 = concatenate_datasets([hf_dataset1, hf_dataset2, hf_dataset3])

Print the size of the merged dataset
print(len(merged_dataset1))

Split the Dataset into Train, Validation, and Test

Import the required function from the sklearn.model_selection module
from sklearn.model_selection import train_test_split
from datasets import DatasetDict

#Convert Hugging Face Dataset to Pandas DataFrame for splitting
df = merged_dataset1.to_pandas()

Split into train (80%) and temp (20%)
train_df, temp_df = train_test_split(df, test_size=0.2, random_state=42)

Split temp into validation (10%) and test (10%)
val_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)

print(f"Train size: {len(train_df)}, Validation size: {len(val_df)}, Test size: {len(test_df)}")

Convert back to Hugging Face Dataset

train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)
test_dataset = Dataset.from_pandas(test_df)

Create DatasetDict

final_dataset = DatasetDict({
"train": train_dataset,
"validation": val_dataset,
"test": test_dataset
})

Preprocessing Function

from PIL import Image
import base64
from io import BytesIO
import torch

Define your preprocess function
def preprocess_function(samples):
# # Debugging: Print the type and first image entry in the batch
# print(f"Type of samples['image']: {type(samples['image'])}")
# print(f"First image entry (base64): {samples['image'][0]}")

# Initialize an empty list for images
images = []

# Decode and process each image
for img_data in samples["image"]:
    if isinstance(img_data, str):  # Assuming base64 encoding
        try:
            # Decode the image from base64 and convert to RGB
            img = Image.open(BytesIO(base64.b64decode(img_data))).convert("RGB")
        except Exception as e:
            print(f"Error loading base64 image: {e}")
            img = None
    elif isinstance(img_data, Image.Image):  # If it's already a PIL Image object
        img = img_data.convert("RGB")
    else:
        print(f"Unsupported image type: {type(img_data)}")
        img = None

    if img is not None:
        images.append(img)
    else:
        print("Image could not be processed or is None.")

# Now, process the question and images using your processor
inputs = processor(
    text=samples["question"],
    images=images,
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512
)

# Ensure the processor tokenizes the answer correctly
labels = processor.tokenizer(
    text=samples["answer"],
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512
)["input_ids"]

# Add labels to the input dictionary
inputs["labels"] = torch.tensor(labels)

# Debugging: Check if pixel_values is present and has the correct shape
print(f"Inputs dictionary: {inputs.keys()}")
if "pixel_values" in inputs:
    print(f"Shape of pixel_values: {inputs['pixel_values'].shape}")
else:
    print("pixel_values not found in inputs.")
return inputs

Data preprocessing

train_dataset = final_dataset['train']
test_dataset = final_dataset['test']
eval_dataset=final_dataset["validation"]
print(train_dataset.column_names)

#output: ['image', 'question', 'answer', 'index_level_0']

Apply preprocessing function

train_dataset = train_dataset.map(preprocess_function)

Now remove the unnecessary columns
train_dataset = train_dataset.remove_columns(["image", "question", "answer", "index_level_0"])

Set the format to PyTorch tensors
train_dataset.set_format(type="torch")

test_dataset = test_dataset.map(preprocess_function, remove_columns=["image", "question", "answer"])
test_dataset.set_format(type="torch")

eval_dataset=eval_dataset.map(preprocess_function, remove_columns=["image", "question", "answer"])
eval_dataset.set_format(type="torch")
print(train_dataset.column_names)

#output: ['input_ids', 'attention_mask', 'pixel_values', 'labels']

Prepare for finetuning

from trl import SFTConfig
from trl.trainer.utils import ConstantLengthDataset

#Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
major, _ = torch.cuda.get_device_capability()
if major >= 8:
print("=" * 80)
print("Your GPU supports bfloat16: accelerate training with bf16=True")
print("=" * 80)

Load LoRA configuration
peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "v_proj","k_proj", "o_proj"]
)

sft_config = SFTConfig(
# SFT-specific settings
max_seq_length=max_seq_length,
dataset_text_field="text",
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=False,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard",
)

tokenizer.chat_template = "default"

def formatting_func(example):
if isinstance(example["input_ids"], torch.Tensor):
return example["input_ids"].squeeze().tolist()
elif isinstance(example["input_ids"], list): # Check if it's already a list
return example["input_ids"] # Return as is
elif isinstance(example["input_ids"], dict): # Check if it's a dictionary
return example["input_ids"].get("input_ids", []) # Attempt to extract input_ids if it's a dictionary
else:
return [] # Return an empty list in other cases

train_dataset = ConstantLengthDataset(
tokenizer,
train_dataset,
formatting_func=formatting_func,
seq_length=128,
)

trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
peft_config=peft_config,
tokenizer=tokenizer,
args=sft_config,
)

Train model
trainer.train()
@zucchini-nlp
Copy link
Member

@Md-Nasif03 hey! I believe you are using an old version of transformers. The method you mentioned for merging has been deprecated for ~5 releases and is alreadt removed in v4.48. Can you please try to update transformers with pip install -U transformers?

@Md-Nasif03
Copy link
Author

@zucchini-nlp Thank you, ma'am.

@Md-Nasif03
Copy link
Author

ValueError: Image features and image tokens do not match: tokens: 575, features 576

After finetune I saved my model in the drive. and load it for testing where I got an error:
ValueError: Image features and image tokens do not match: tokens: 575, features 576

code:

user_question = "What is the situation shown in the picture?"
# Define the Chat Prompt
processor1.chat_template = f"USER: {user_question} <image> ASSISTANT:"
conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": user_question},
          {"type": "image"},
        ],
    },
]
prompt = processor1.apply_chat_template(conversation, add_generation_prompt=True)

Get Patch Size from Model Config

patch_size = new_model_v1.config.vision_config.patch_size

# Resize image while keeping aspect ratio
shortest_edge = processor1.image_processor.size.get("shortest_edge", 336)
original_width, original_height = raw_image.size
scale_factor = shortest_edge / min(original_width, original_height)
new_width = int(original_width * scale_factor)
new_height = int(original_height * scale_factor)

# width & height are multiples of `patch_size`
new_width = (new_width // patch_size) * patch_size
new_height = (new_height // patch_size) * patch_size

# Resize the image
raw_image = raw_image.resize((new_width, new_height))
print(f"Resized Image to: {new_width}x{new_height}")

output: Resized Image to: 406x336

Process Inputs

inputs = processor1(images=raw_image, text=prompt, return_tensors="pt")
inputs = {k: v.to("cuda:0") for k, v in inputs.items()}

output = new_model_v1.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=200,
    do_sample=False
)

Error:
ValueError: Image features and image tokens do not match: tokens: 575, features 576

Image

@zucchini-nlp
Copy link
Member

Can you share your model files on the hub? I believe something is wrong in the config, probably you need to add processor.num_additional_tokens = 1 or change vision_feature_select_strategy

@Md-Nasif03
Copy link
Author

Md-Nasif03 commented Feb 6, 2025

Good morning @zucchini-nlp ma'am, I have just uploaded my model at the hugging face hub.

  • Issue: ValueError: Image features and image tokens do not match: tokens: 575, features 576
  • model name: mdnasif/LLaVA-med-MAKAUT
  • model link: LLaVA-med-MAKAUT Model

You can check here and please check what is going wrong

Also, I have given you the details of how I saved the model on Gdrive after fine-tuning and the list of model files saved in the hub.

Saving the fine tune model to Google Drive

import os
save_path = "/content/drive/MyDrive/fineTune model1/LLaVA-med-MAKAUT_v1"
os.makedirs(save_path, exist_ok=True)

trainer.model.save_pretrained(save_path)
trainer.tokenizer.save_pretrained(save_path)
processor.image_processor.save_pretrained(save_path)

List files in the model directory

for file in os.listdir(save_path):
    print(file)

Output:

README.md
adapter_model.safetensors
adapter_config.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
tokenizer.model
tokenizer.json
preprocessor_config.json

Copy the necessary file to our finetune model

!pip install huggingface_hub

from huggingface_hub import hf_hub_download
import os

# Define the base model and the files you need
base_model_name = "llava-hf/llava-1.5-7b-hf"
files_to_copy = ["config.json"]

# Define the destination directory (your fine-tuned model directory)
destination_dir = "/content/drive/MyDrive/fineTune model1/LLaVA-med-MAKAUT_v1"

# Create the destination directory if it doesn't exist
os.makedirs(destination_dir, exist_ok=True)

# Download and copy each file
for filename in files_to_copy:
    # Download the file from the base model
    downloaded_file = hf_hub_download(repo_id=base_model_name, filename=filename)

    # Copy the file to the destination directory
    !cp "{downloaded_file}" "{destination_dir}/"

Check if "pytorch_model.bin" exists in the main directory

model_path = "/content/drive/MyDrive/fineTune model1/LLaVA-med-MAKAUT_v1"
weights_path = os.path.join(model_path, "pytorch_model.bin")
if not os.path.exists(weights_path):
    # If not, search for it in subdirectories
    for root, _, files in os.walk(model_path):
        if "pytorch_model.bin" in files:
            weights_path = os.path.join(root, "pytorch_model.bin")
            break


List files in the model directory

for file in os.listdir(save_path):
    print(file)

output

README.md
adapter_model.safetensors
adapter_config.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
tokenizer.model
tokenizer.json
preprocessor_config.json
config.json

Fine tuned model

import torch
from PIL import Image
from transformers import LlavaProcessor, LlavaForConditionalGeneration,CLIPImageProcessor
model_path = "/content/drive/MyDrive/fineTune model1/LLaVA-med-MAKAUT_v1"

Load the LlavaProcessor with the correct image processor

processor1 = LlavaProcessor.from_pretrained(model_path)

# Move model to GPU
new_model_v1 = LlavaForConditionalGeneration.from_pretrained(model_path,quantization_config=bnb_config,
    device_map=device_map).to("cuda:0")
print(new_model_v1.config.vision_config.patch_size)
#output: 14
print(processor1.patch_size)
#output: None

Manually update missing attributes

processor1.patch_size = new_model_v1.config.vision_config.patch_size
processor1.vision_feature_select_strategy = new_model_v1.config.vision_feature_select_strategy
processor1.num_additional_tokens = 1

Save the updated processor back to the model directory

processor1.save_pretrained(model_path)

Reload the processor

processor1 = LlavaProcessor.from_pretrained(model_path)

print(processor1.patch_size)
print("Processor vision_feature_select_strategy:", processor1.vision_feature_select_strategy)

output:

patch size: 14
Processor vision_feature_select_strategy: Default

Prompt

# Define User Question
user_question = "What is the situation shown in the picture?"
# Define the Chat Prompt
processor1.chat_template = f"USER: {user_question} <image> ASSISTANT:"
conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": user_question},
          {"type": "image"},
        ],
    },
]
prompt = processor1.apply_chat_template(conversation, add_generation_prompt=True)

resize the image

import requests
from PIL import Image
image_file="/content/drive/MyDrive/fineTune model1/image-1d100e9.jpg"
raw_image=Image.open(image_file).convert("RGB")
# Get Patch Size from Model Config
patch_size = new_model_v1.config.vision_config.patch_size
# Resize image while keeping aspect ratio
shortest_edge = processor1.image_processor.size.get("shortest_edge", 336)
original_width, original_height = raw_image.size
scale_factor = shortest_edge / min(original_width, original_height)
new_width = int(original_width * scale_factor)
new_height = int(original_height * scale_factor)

# width & height are multiples of `patch_size`
new_width = (new_width // patch_size) * patch_size
new_height = (new_height // patch_size) * patch_size

# Resize the image
raw_image = raw_image.resize((new_width, new_height))
print(f"Resized Image to: {new_width}x{new_height}")

output:Resized Image to: 406x336

Process Inputs

inputs = processor1(images=raw_image, text=prompt, return_tensors="pt")
inputs = {k: v.to("cuda:0") for k, v in inputs.items()}

Generate Response from the Model

# output = new_model_v1.generate(**inputs, max_new_tokens=200, do_sample=False)
# Generate Response Using Correct Input Format
output = new_model_v1.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=200,
    do_sample=False
)
Image

@zucchini-nlp
Copy link
Member

Your processor_config.json on the hub has num_additional_tokens set to 0, but it should be 1. Make sure to change that and save again, should solve the problem

@Md-Nasif03
Copy link
Author

@zucchini-nlp thank you, ma'am. It's work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants