Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown quantization type, got fp8 #35471

Open
2 of 4 tasks
ruidazeng opened this issue Dec 31, 2024 · 19 comments · May be fixed by #35963
Open
2 of 4 tasks

Unknown quantization type, got fp8 #35471

ruidazeng opened this issue Dec 31, 2024 · 19 comments · May be fixed by #35963
Labels

Comments

@ruidazeng
Copy link

ruidazeng commented Dec 31, 2024

System Info

  • transformers version: 4.47.1
  • Platform: macOS-15.1.1-arm64-arm-64bit
  • Python version: 3.10.16
  • Huggingface_hub version: 0.27.0
  • Safetensors version: 0.4.5
  • Accelerate version: 1.2.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed

Who can help?

@SunMarc @MekkCyber

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Issue arises when using AutoModelForCasualLM.from_pretrained()

The model used is "deepseek-ai/DeepSeek-V3"

File "/Users/ruidazeng/Demo/chatbot.py", line 13, in init
self.model = AutoModelForCausalLM.from_pretrained(
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
return model_class.from_pretrained(
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3659, in from_pretrained
config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 173, in merge_quantization_configs
quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 97, in from_dict
raise ValueError(
ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']

Expected behavior

To be able to run Deepseek-R1

@ruidazeng ruidazeng added the bug label Dec 31, 2024
@ruidazeng ruidazeng changed the title Backwards compatibility Unknown quantization type, got fp8 Dec 31, 2024
@ani1797
Copy link

ani1797 commented Jan 7, 2025

I got the same issue today while trying. 👎🏼 Did you find any fix?

@ruidazeng
Copy link
Author

I got the same issue today while trying. 👎🏼 Did you find any fix?

No, was hoping one of the maintainers could help.

@SunMarc
Copy link
Member

SunMarc commented Jan 8, 2025

Deepseek is not supported directly by transformers, only through custom code. The issue here is that they added a quantization_config in config.json which triggered some checks and the thing is that we don't support their fp8 method yet (must be used in vllm). One thing you can try is to remove that attribute in the config.json ! Also, I'm a bit surprised that it works with 4.37.2, can you double check ?

@ruidazeng
Copy link
Author

Deepseek is not supported directly by transformers, only through custom code. The issue here is that they added a quantization_config in config.json which triggered some checks and the thing is that we don't support their fp8 method yet (must be used in vllm). One thing you can try is to remove that attribute in the config.json ! Also, I'm a bit surprised that it works with 4.37.2, can you double check ?

4.37.2 did not work. I got further along with it, but it did not work at the end.

Is there any potential fixes I can try besides removing that attribute in the config.json? I will see if I can work a compatibility fix and open a PR.

@SunMarc
Copy link
Member

SunMarc commented Jan 9, 2025

We can potentially just skip the quantization step and trigger a warning saying that this specific quantization backend is not supported directly in transformers, and that they can open an issue to request compatibility

@Nanayali
Copy link

hi, i remove that attribute in the config.json, and then, i get error:Some weights of the model checkpoint at /root/DeepSeek-V3 were not used when initializing DeepseekV3ForCausalLM: ['model.layers.0.mlp.down_proj.weight_scale_inv', 'model.layers.0.mlp.gate_proj.weight_scale_inv',

@ruidazeng
Copy link
Author

hi, i remove that attribute in the config.json, and then, i get error:Some weights of the model checkpoint at /root/DeepSeek-V3 were not used when initializing DeepseekV3ForCausalLM: ['model.layers.0.mlp.down_proj.weight_scale_inv', 'model.layers.0.mlp.gate_proj.weight_scale_inv',

@SunMarc I got the same error when I tried this too

@SunMarc
Copy link
Member

SunMarc commented Jan 10, 2025

Since this is a custom code, you will have better chance to fix this issue by trying to reach out to the author in the community section of their model.

@AbyssGaze
Copy link

AbyssGaze commented Jan 22, 2025

I edited the config.json file by removing or adjusting parameters related to quantization and custom weight scaling. This allowed the DeepSeek model to load correctly. And you can try editing the config.json file directly to resolve the issue.

def load_model_with_quantization_fallback(
    model_name: str = "deepseek-ai/DeepSeek-R1",
    trust_remote_code: bool = True,
    device_map: Optional[Union[str, Dict[str, Any]]] = "auto",
    **kwargs
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]:

  try:
      model = AutoModel.from_pretrained(
          model_name,
          trust_remote_code=trust_remote_code,
          device_map=device_map,
          **kwargs
      )
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      logger.info("Model loaded successfully with original configuration")
      return model, tokenizer
  except ValueError as e:
      if "Unknown quantization type" in str(e):
          logger.warning(
              "Quantization type not supported directly. "
              "Attempting to load without quantization..."
          )
          
          config = AutoConfig.from_pretrained(
              model_name,
              trust_remote_code=trust_remote_code
          )
          if hasattr(config, "quantization_config"):
              delattr(config, "quantization_config")
          
          try:
              model = AutoModel.from_pretrained(
                  model_name,
                  config=config,
                  trust_remote_code=trust_remote_code,
                  device_map=device_map,
                  **kwargs
              )
              tokenizer = AutoTokenizer.from_pretrained(
                  model_name,
                  trust_remote_code=trust_remote_code
              )
              logger.info("Model loaded successfully without quantization")
              return model, tokenizer
              
          except Exception as inner_e:
              logger.error(f"Failed to load model without quantization: {str(inner_e)}")
              raise
      else:
          logger.error(f"Unexpected error during model loading: {str(e)}")
          raise

@ruidazeng
Copy link
Author

I edited the config.json file by removing or adjusting parameters related to quantization and custom weight scaling. This allowed the DeepSeek model to load correctly. And you can try editing the config.json file directly to resolve the issue.

def load_model_with_quantization_fallback(
    model_name: str = "deepseek-ai/DeepSeek-R1",
    trust_remote_code: bool = True,
    device_map: Optional[Union[str, Dict[str, Any]]] = "auto",
    **kwargs
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]:

  try:
      model = AutoModel.from_pretrained(
          model_name,
          trust_remote_code=trust_remote_code,
          device_map=device_map,
          **kwargs
      )
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      logger.info("Model loaded successfully with original configuration")
      return model, tokenizer
  except ValueError as e:
      if "Unknown quantization type" in str(e):
          logger.warning(
              "Quantization type not supported directly. "
              "Attempting to load without quantization..."
          )
          
          config = AutoConfig.from_pretrained(
              model_name,
              trust_remote_code=trust_remote_code
          )
          if hasattr(config, "quantization_config"):
              delattr(config, "quantization_config")
          
          try:
              model = AutoModel.from_pretrained(
                  model_name,
                  config=config,
                  trust_remote_code=trust_remote_code,
                  device_map=device_map,
                  **kwargs
              )
              tokenizer = AutoTokenizer.from_pretrained(
                  model_name,
                  trust_remote_code=trust_remote_code
              )
              logger.info("Model loaded successfully without quantization")
              return model, tokenizer
              
          except Exception as inner_e:
              logger.error(f"Failed to load model without quantization: {str(inner_e)}")
              raise
      else:
          logger.error(f"Unexpected error during model loading: {str(e)}")
          raise

Is this something we can PR to main branch?

@HBMTech
Copy link

HBMTech commented Jan 29, 2025

I spent the whole day trying to make it work, even going as far as replacing this parameter in config.json.
"quant_method": "bitsandbytes_4bit"
Between the versions of CUDA, Transformers, and Torch, it's impossible to pinpoint where the real problem is coming from
Thanks to @SunMarc

@ArthurZucker
Copy link
Collaborator

#35926 should be supported soon!

@ruidazeng
Copy link
Author

ruidazeng commented Jan 29, 2025

#35926 should be supported soon!

Will it support Deepseek-R1?

@Jiadalee
Copy link

Jiadalee commented Jan 29, 2025

I edited the config.json file by removing or adjusting parameters related to quantization and custom weight scaling. This allowed the DeepSeek model to load correctly. And you can try editing the config.json file directly to resolve the issue.

def load_model_with_quantization_fallback(
    model_name: str = "deepseek-ai/DeepSeek-R1",
    trust_remote_code: bool = True,
    device_map: Optional[Union[str, Dict[str, Any]]] = "auto",
    **kwargs
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]:

  try:
      model = AutoModel.from_pretrained(
          model_name,
          trust_remote_code=trust_remote_code,
          device_map=device_map,
          **kwargs
      )
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      logger.info("Model loaded successfully with original configuration")
      return model, tokenizer
  except ValueError as e:
      if "Unknown quantization type" in str(e):
          logger.warning(
              "Quantization type not supported directly. "
              "Attempting to load without quantization..."
          )
          
          config = AutoConfig.from_pretrained(
              model_name,
              trust_remote_code=trust_remote_code
          )
          if hasattr(config, "quantization_config"):
              delattr(config, "quantization_config")
          
          try:
              model = AutoModel.from_pretrained(
                  model_name,
                  config=config,
                  trust_remote_code=trust_remote_code,
                  device_map=device_map,
                  **kwargs
              )
              tokenizer = AutoTokenizer.from_pretrained(
                  model_name,
                  trust_remote_code=trust_remote_code
              )
              logger.info("Model loaded successfully without quantization")
              return model, tokenizer
              
          except Exception as inner_e:
              logger.error(f"Failed to load model without quantization: {str(inner_e)}")
              raise
      else:
          logger.error(f"Unexpected error during model loading: {str(e)}")
          raise

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

@gtyellow
Copy link

#35926 should be supported soon!

Will it support Deepseek-R1?

Also having the same issue with Deepseek-R1

@SunMarc
Copy link
Member

SunMarc commented Jan 29, 2025

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

The following PR should do that. Instead of raising an error, we will just ignore the quantization config. LMK if this helps

@zzj0402
Copy link

zzj0402 commented Jan 29, 2025

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

The following PR should do that. Instead of raising an error, we will just ignore the quantization config. LMK if this helps

/.conda/bin/python /home/zing/Projects/inference_ds.py
	  No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).  
	  Using a pipeline without specifying a model name and revision in production is not recommended.  
	  Device set to use cuda  
	  Train data size: 72  
	  Test data size: 37  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:147: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    descriptions.append(row[0])  # First cell to descriptions  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:148: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    references.append(row[1])    # Second cell to references  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:159: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    "input format: <CVE := Description>": row[0],  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:160: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    "output format: <CWE := Explanation | Highlight>": row[1]  
	  Token indices sequence length is longer than the specified maximum sequence length for this model (14574 > 1024). Running this sequence through the model will result in indexing errors  
	  Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  Traceback (most recent call last):  
	    File "/home/zing/Projects/xcwe/icl/inference_ds.py", line 165, in <module>  
	      prediction = get_cwe(d, demonstrations)  
	                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/icl/inference_ds.py", line 97, in get_cwe
	      outputs = pipeline(
	                ^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/text*generation.py", line 285, in __call*_
	      return super().**call**(text_inputs, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1362, in **call**
	      return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1369, in run_single
	      model_outputs = self.forward(model_inputs, **forward_params)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1269, in forward
	      model_outputs = self._forward(model_inputs, **forward_params)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 383, in _forward
	      generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
	                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
	      return func(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 2255, in generate
	      result = self._sample(
	               ^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample
	      outputs = self(**model_inputs, return_dict=True)
	                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
	      return self._call_impl(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
	      return forward_call(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1061, in forward
	      transformer_outputs = self.transformer(
	                            ^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
	      return self._call_impl(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
	      return forward_call(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 829, in forward
	      attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
	                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py", line 378, in _prepare_4d_causal_attention_mask_for_sdpa
	      ignore_causal_mask = AttentionMaskConverter._ignore_causal_mask_sdpa(
	                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py", line 288, in _ignore_causal_mask_sdpa
	      elif not is_tracing and torch.all(attention_mask == 1):
	                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
	  RuntimeError: CUDA error: device-side assert triggered  
	  CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.  
	  For debugging consider passing CUDA_LAUNCH_BLOCKING=1  
	  Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@amankumarhal
Copy link

amankumarhal commented Jan 30, 2025

I think there is an issue with different transformer versions. I tried 4.44/4.48 and it didn't work. Then I tried 4.46.1 and it is working well without any issues and I can download deepseek-ai/DeepSeek-R1.

@Jiadalee
Copy link

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

The following PR should do that. Instead of raising an error, we will just ignore the quantization config. LMK if this helps

@SunMarc I simply removed the quantization part in the config.json file, and then it works! Tricky thing is that downloading the r1 model quite slow. I suppose downloading will take a couple of hrs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.