Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starcoder2-15B model - AttributeError: 'TensorParallelColumnLinear' object has no attribute [rank3]: 'base_layer' #2881

Open
3 of 4 tasks
ashwincv0112 opened this issue Jan 6, 2025 · 6 comments

Comments

@ashwincv0112
Copy link

System Info

Using the below TGI version:
ghcr.io/huggingface/text-generation-inference:3.0.1

Running on AWS g5.12xlarge instance (which is having 4 GPUs)

model used: bigcode/starcoder2-15b-instruct-v0.1

Deployment: Using docker

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Please be informed that, we are trying to deploy starcoder2-15 instruct model with custom fine-tuned LoRA Adpaters using TGI multi-lora capability.
We are using AWS g5.12xlarge instance for this.
We have our base model and lora adapters saved in the data directory. We then ran the below docker command.

docker run -it \
  --gpus all \
  --shm-size 1g \
  -v /home/ubuntu/data:/data \
  -p 8080:8080 \
  ghcr.io/huggingface/text-generation-inference:3.0.1 \
	--model-id=/data/starcoder2-15b-instruct-v0.1 \
	--lora-adapters=adapter=/data/starcoder2-15b-lora-adapter \
	--dtype bfloat16 \
	--num-shard 4 

Requirement:
Base Model: bigcode/starcoder2-15b-instruct-v0.1
Custom LoRA adapters.
AWS g5.12xlarge instance.

On running the above docker command, we are getting the below error:

/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/lora │
[rank3]: │ .py:209 in prepare_weights                                                   │
[rank3]: │                                                                              │
[rank3]: │   206 │   │   for layer_id in range(nlayers):                                │
[rank3]: │   207 │   │   │   key = (layer_id, layer_type)                               │
[rank3]: │   208 │   │   │   weight_name, layer = target_to_layer[key]                  │
[rank3]: │ ❱ 209 │   │   │   base_weight = layer.base_layer.linear.weight               │
[rank3]: │   210 │   │   │   base_device = base_weight.device                           │
[rank3]: │   211 │   │   │                                                              │
[rank3]: │   212 │   │   │   if weight_name not in module_map:  
config = LoraConfig(                                        │ │
[rank3]: │ │                       │                                                  │ │
[rank3]: │ │                       base_model_name_or_path='bigcode/starcoder2-15b',  │ │
[rank3]: │ │                       │   r=8,                                           │ │
[rank3]: │ │                       │   target_modules={                               │ │
[rank3]: │ │                       │   │   'o_proj',                                  │ │
[rank3]: │ │                       │   │   'up_proj',                                 │ │
[rank3]: │ │                       │   │   'k_proj',                                  │ │
[rank3]: │ │                       │   │   'v_proj',                                  │ │
[rank3]: │ │                       │   │   'gate_proj',                               │ │
[rank3]: │ │                       │   │   'q_proj',                                  │ │
[rank3]: │ │                       │   │   'down_proj'                                │ │
[rank3]: │ │                       │   },                                             │ │
[rank3]: │ │                       │   fan_in_fan_out=False,                          │ │
[rank3]: │ │                       │   lora_alpha=8,                                  │ │
[rank3]: │ │                       │   use_rslora=False                               │ │
[rank3]: │ │                       )                                                  │ │
[rank3]: │ │               dtype = torch.bfloat16                                     │ │
[rank3]: │ │                 key = (0, 'q_proj')                                      │ │
[rank3]: │ │               layer = TensorParallelColumnLinear(                        │ │
[rank3]: │ │                         (linear): FastLinear()                           │ │
[rank3]: │ │                       )                                                  │ │
[rank3]: │ │            layer_id = 0                                                  │ │
[rank3]: │ │          layer_type = 'q_proj'       

Also, one of the observation, in the below file we were able to see Starcoder2-15 instruct model mentiond and was our understanding that the model is supported for the multi-lora functionality using TGI.

https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/__init__.py

Please let us know if there are any gaps in our understanding.

If the Starcoder2-15B model is supported, could you help in resolving the above issue.

Thanks.

Expected behavior

The model should be deployed along with the multi-lora TGI functionality.

@drbh
Copy link
Collaborator

drbh commented Jan 7, 2025

Hi @ashwincv0112 thank you for opening this issue, it appears that the starcoder2 modeling code has not been updated to handle multi lora correctly. I've started a PR here with changes to enable multi lora, please see the PR for change details: #2883.

Thank you!

@ashwincv0112
Copy link
Author

Hi @drbh thank you for the quick fix. Appreciate it.
Will wait for the changes to be reviewed and be merged to the main.
Also, just confirming would the changes be reflected in the version 3.0.1 automatically if I would be installing that once again?

@vsoesanto
Copy link

Hi @ashwincv0112, @drbh I'm getting the same error using Mixtral-8x7B-v0.1. I also am using custom LoRA adapters loaded from my local.

Is Mixtral 8x7B supported for multi-lora serving? What can I do to make this work?

@drbh
Copy link
Collaborator

drbh commented Jan 13, 2025

Hi @ashwincv0112, the PR should be merged soon - just working on adding some tests today. Regarding the versioning once the changes are merged - the changes will not be contained in 3.0.1 as we do not change past version - however the latest source on main is always available as a docker container here: https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference. Note its important to specify the sha tag and not latest to ensure that the image is pulled.

@vsoesanto thank you for reporting this, mixtral should support lora - would you be able to share an example of the error message? Thank you!

@vsoesanto
Copy link

vsoesanto commented Jan 13, 2025

The error message is identical to what @ashwincv0112 got above AttributeError: 'TensorParallelColumnLinear' object has no attribute 'base_layer'. I also received the same error trace in the two code blocks that OP shared.

My base model is Mixtral-8x7B-v0.1. I am also using custom lora adapters loaded from local. Also using similar docker run command above with tgi==3.0.1.

Looking at the file OP linked, I am seeing Mixtral-8x22B-Instruct-v0.1 but not Mixtral-8x7B-v0.1

Looking at the file changed in your PR above, I don't see the same changes applied to Starcoder2 applied to the file for Mixtral . But maybe i'm not looking at the right file.

Any insight would be very helpful.

@mehravehs
Copy link

Has there been any updates on this? @drbh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants