Unknown quantization type, got fp8 #35471

ruidazeng · 2024-12-31T20:34:57Z

System Info

transformers version: 4.47.1
Platform: macOS-15.1.1-arm64-arm-64bit
Python version: 3.10.16
Huggingface_hub version: 0.27.0
Safetensors version: 0.4.5
Accelerate version: 1.2.1
Accelerate config: not found
PyTorch version (GPU?): 2.5.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed

Who can help?

@SunMarc @MekkCyber

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Issue arises when using AutoModelForCasualLM.from_pretrained()

The model used is "deepseek-ai/DeepSeek-V3"

File "/Users/ruidazeng/Demo/chatbot.py", line 13, in init
self.model = AutoModelForCausalLM.from_pretrained(
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
return model_class.from_pretrained(
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3659, in from_pretrained
config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 173, in merge_quantization_configs
quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
File "/opt/anaconda3/envs/gaming-bot/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 97, in from_dict
raise ValueError(
ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']

Expected behavior

To be able to run Deepseek-R1

The text was updated successfully, but these errors were encountered:

ani1797 · 2025-01-07T18:21:00Z

I got the same issue today while trying. 👎🏼 Did you find any fix?

ruidazeng · 2025-01-07T20:34:47Z

I got the same issue today while trying. 👎🏼 Did you find any fix?

No, was hoping one of the maintainers could help.

SunMarc · 2025-01-08T11:06:21Z

Deepseek is not supported directly by transformers, only through custom code. The issue here is that they added a quantization_config in config.json which triggered some checks and the thing is that we don't support their fp8 method yet (must be used in vllm). One thing you can try is to remove that attribute in the config.json ! Also, I'm a bit surprised that it works with 4.37.2, can you double check ?

ruidazeng · 2025-01-08T22:41:26Z

Deepseek is not supported directly by transformers, only through custom code. The issue here is that they added a quantization_config in config.json which triggered some checks and the thing is that we don't support their fp8 method yet (must be used in vllm). One thing you can try is to remove that attribute in the config.json ! Also, I'm a bit surprised that it works with 4.37.2, can you double check ?

4.37.2 did not work. I got further along with it, but it did not work at the end.

Is there any potential fixes I can try besides removing that attribute in the config.json? I will see if I can work a compatibility fix and open a PR.

SunMarc · 2025-01-09T11:03:33Z

We can potentially just skip the quantization step and trigger a warning saying that this specific quantization backend is not supported directly in transformers, and that they can open an issue to request compatibility

Nanayali · 2025-01-10T02:38:00Z

hi, i remove that attribute in the config.json, and then, i get error:Some weights of the model checkpoint at /root/DeepSeek-V3 were not used when initializing DeepseekV3ForCausalLM: ['model.layers.0.mlp.down_proj.weight_scale_inv', 'model.layers.0.mlp.gate_proj.weight_scale_inv',

ruidazeng · 2025-01-10T05:07:29Z

hi, i remove that attribute in the config.json, and then, i get error:Some weights of the model checkpoint at /root/DeepSeek-V3 were not used when initializing DeepseekV3ForCausalLM: ['model.layers.0.mlp.down_proj.weight_scale_inv', 'model.layers.0.mlp.gate_proj.weight_scale_inv',

@SunMarc I got the same error when I tried this too

SunMarc · 2025-01-10T09:19:28Z

Since this is a custom code, you will have better chance to fix this issue by trying to reach out to the author in the community section of their model.

AbyssGaze · 2025-01-22T03:47:12Z

I edited the config.json file by removing or adjusting parameters related to quantization and custom weight scaling. This allowed the DeepSeek model to load correctly. And you can try editing the config.json file directly to resolve the issue.

def load_model_with_quantization_fallback(
    model_name: str = "deepseek-ai/DeepSeek-R1",
    trust_remote_code: bool = True,
    device_map: Optional[Union[str, Dict[str, Any]]] = "auto",
    **kwargs
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]:

  try:
      model = AutoModel.from_pretrained(
          model_name,
          trust_remote_code=trust_remote_code,
          device_map=device_map,
          **kwargs
      )
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      logger.info("Model loaded successfully with original configuration")
      return model, tokenizer
  except ValueError as e:
      if "Unknown quantization type" in str(e):
          logger.warning(
              "Quantization type not supported directly. "
              "Attempting to load without quantization..."
          )
          
          config = AutoConfig.from_pretrained(
              model_name,
              trust_remote_code=trust_remote_code
          )
          if hasattr(config, "quantization_config"):
              delattr(config, "quantization_config")
          
          try:
              model = AutoModel.from_pretrained(
                  model_name,
                  config=config,
                  trust_remote_code=trust_remote_code,
                  device_map=device_map,
                  **kwargs
              )
              tokenizer = AutoTokenizer.from_pretrained(
                  model_name,
                  trust_remote_code=trust_remote_code
              )
              logger.info("Model loaded successfully without quantization")
              return model, tokenizer
              
          except Exception as inner_e:
              logger.error(f"Failed to load model without quantization: {str(inner_e)}")
              raise
      else:
          logger.error(f"Unexpected error during model loading: {str(e)}")
          raise

ruidazeng · 2025-01-22T16:21:06Z

I edited the config.json file by removing or adjusting parameters related to quantization and custom weight scaling. This allowed the DeepSeek model to load correctly. And you can try editing the config.json file directly to resolve the issue.

def load_model_with_quantization_fallback(
    model_name: str = "deepseek-ai/DeepSeek-R1",
    trust_remote_code: bool = True,
    device_map: Optional[Union[str, Dict[str, Any]]] = "auto",
    **kwargs
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]:

  try:
      model = AutoModel.from_pretrained(
          model_name,
          trust_remote_code=trust_remote_code,
          device_map=device_map,
          **kwargs
      )
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      logger.info("Model loaded successfully with original configuration")
      return model, tokenizer
  except ValueError as e:
      if "Unknown quantization type" in str(e):
          logger.warning(
              "Quantization type not supported directly. "
              "Attempting to load without quantization..."
          )
          
          config = AutoConfig.from_pretrained(
              model_name,
              trust_remote_code=trust_remote_code
          )
          if hasattr(config, "quantization_config"):
              delattr(config, "quantization_config")
          
          try:
              model = AutoModel.from_pretrained(
                  model_name,
                  config=config,
                  trust_remote_code=trust_remote_code,
                  device_map=device_map,
                  **kwargs
              )
              tokenizer = AutoTokenizer.from_pretrained(
                  model_name,
                  trust_remote_code=trust_remote_code
              )
              logger.info("Model loaded successfully without quantization")
              return model, tokenizer
              
          except Exception as inner_e:
              logger.error(f"Failed to load model without quantization: {str(inner_e)}")
              raise
      else:
          logger.error(f"Unexpected error during model loading: {str(e)}")
          raise

Is this something we can PR to main branch?

HBMTech · 2025-01-29T00:10:06Z

I spent the whole day trying to make it work, even going as far as replacing this parameter in config.json.
"quant_method": "bitsandbytes_4bit"
Between the versions of CUDA, Transformers, and Torch, it's impossible to pinpoint where the real problem is coming from
Thanks to @SunMarc

ArthurZucker · 2025-01-29T09:53:55Z

#35926 should be supported soon!

ruidazeng · 2025-01-29T14:31:35Z

#35926 should be supported soon!

Will it support Deepseek-R1?

Jiadalee · 2025-01-29T14:45:29Z

I edited the config.json file by removing or adjusting parameters related to quantization and custom weight scaling. This allowed the DeepSeek model to load correctly. And you can try editing the config.json file directly to resolve the issue.

def load_model_with_quantization_fallback(
    model_name: str = "deepseek-ai/DeepSeek-R1",
    trust_remote_code: bool = True,
    device_map: Optional[Union[str, Dict[str, Any]]] = "auto",
    **kwargs
) -> Tuple[PreTrainedModel, PreTrainedTokenizer]:

  try:
      model = AutoModel.from_pretrained(
          model_name,
          trust_remote_code=trust_remote_code,
          device_map=device_map,
          **kwargs
      )
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      logger.info("Model loaded successfully with original configuration")
      return model, tokenizer
  except ValueError as e:
      if "Unknown quantization type" in str(e):
          logger.warning(
              "Quantization type not supported directly. "
              "Attempting to load without quantization..."
          )
          
          config = AutoConfig.from_pretrained(
              model_name,
              trust_remote_code=trust_remote_code
          )
          if hasattr(config, "quantization_config"):
              delattr(config, "quantization_config")
          
          try:
              model = AutoModel.from_pretrained(
                  model_name,
                  config=config,
                  trust_remote_code=trust_remote_code,
                  device_map=device_map,
                  **kwargs
              )
              tokenizer = AutoTokenizer.from_pretrained(
                  model_name,
                  trust_remote_code=trust_remote_code
              )
              logger.info("Model loaded successfully without quantization")
              return model, tokenizer
              
          except Exception as inner_e:
              logger.error(f"Failed to load model without quantization: {str(inner_e)}")
              raise
      else:
          logger.error(f"Unexpected error during model loading: {str(e)}")
          raise

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

gtyellow · 2025-01-29T16:25:00Z

#35926 should be supported soon!

Will it support Deepseek-R1?

Also having the same issue with Deepseek-R1

SunMarc · 2025-01-29T16:33:29Z

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

The following PR should do that. Instead of raising an error, we will just ignore the quantization config. LMK if this helps

zzj0402 · 2025-01-29T19:39:30Z

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

The following PR should do that. Instead of raising an error, we will just ignore the quantization config. LMK if this helps

/.conda/bin/python /home/zing/Projects/inference_ds.py
	  No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).  
	  Using a pipeline without specifying a model name and revision in production is not recommended.  
	  Device set to use cuda  
	  Train data size: 72  
	  Test data size: 37  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:147: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    descriptions.append(row[0])  # First cell to descriptions  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:148: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    references.append(row[1])    # Second cell to references  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:159: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    "input format: <CVE := Description>": row[0],  
	  /home/zing/Projects/xcwe/icl/inference_ds.py:160: FutureWarning: Series.**getitem** treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`  
	    "output format: <CWE := Explanation | Highlight>": row[1]  
	  Token indices sequence length is longer than the specified maximum sequence length for this model (14574 > 1024). Running this sequence through the model will result in indexing errors  
	  Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [160,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [134,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  ../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [6,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.  
	  Traceback (most recent call last):  
	    File "/home/zing/Projects/xcwe/icl/inference_ds.py", line 165, in <module>  
	      prediction = get_cwe(d, demonstrations)  
	                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/icl/inference_ds.py", line 97, in get_cwe
	      outputs = pipeline(
	                ^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/text*generation.py", line 285, in __call*_
	      return super().**call**(text_inputs, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1362, in **call**
	      return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1369, in run_single
	      model_outputs = self.forward(model_inputs, **forward_params)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1269, in forward
	      model_outputs = self._forward(model_inputs, **forward_params)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 383, in _forward
	      generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
	                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
	      return func(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 2255, in generate
	      result = self._sample(
	               ^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample
	      outputs = self(**model_inputs, return_dict=True)
	                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
	      return self._call_impl(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
	      return forward_call(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1061, in forward
	      transformer_outputs = self.transformer(
	                            ^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
	      return self._call_impl(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
	      return forward_call(*args, **kwargs)
	             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 829, in forward
	      attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
	                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py", line 378, in _prepare_4d_causal_attention_mask_for_sdpa
	      ignore_causal_mask = AttentionMaskConverter._ignore_causal_mask_sdpa(
	                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	    File "/home/zing/Projects/xcwe/.conda/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py", line 288, in _ignore_causal_mask_sdpa
	      elif not is_tracing and torch.all(attention_mask == 1):
	                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
	  RuntimeError: CUDA error: device-side assert triggered  
	  CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.  
	  For debugging consider passing CUDA_LAUNCH_BLOCKING=1  
	  Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

amankumarhal · 2025-01-30T02:27:53Z

I think there is an issue with different transformer versions. I tried 4.44/4.48 and it didn't work. Then I tried 4.46.1 and it is working well without any issues and I can download deepseek-ai/DeepSeek-R1.

Jiadalee · 2025-01-30T15:24:14Z

@AbyssGaze how did you modify the config.json file to fix this issue? the codes you posted seem not for the config.json file. I would appreciate if you could clarify your fix

The following PR should do that. Instead of raising an error, we will just ignore the quantization config. LMK if this helps

@SunMarc I simply removed the quantization part in the config.json file, and then it works! Tricky thing is that downloading the r1 model quite slow. I suppose downloading will take a couple of hrs.

ruidazeng added the bug label Dec 31, 2024

ruidazeng changed the title ~~Backwards compatibility~~ Unknown quantization type, got fp8 Dec 31, 2024

zharry29 mentioned this issue Jan 27, 2025

Python库矛盾 deepseek-ai/DeepSeek-R1#55

Open

SunMarc linked a pull request Jan 29, 2025 that will close this issue

Display warning for unknown quants config instead of an error #35963

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unknown quantization type, got fp8 #35471

Unknown quantization type, got fp8 #35471

ruidazeng commented Dec 31, 2024 •

edited

Loading

ani1797 commented Jan 7, 2025

ruidazeng commented Jan 7, 2025

SunMarc commented Jan 8, 2025

ruidazeng commented Jan 8, 2025

SunMarc commented Jan 9, 2025

Nanayali commented Jan 10, 2025

ruidazeng commented Jan 10, 2025

SunMarc commented Jan 10, 2025

AbyssGaze commented Jan 22, 2025 •

edited

Loading

ruidazeng commented Jan 22, 2025

HBMTech commented Jan 29, 2025

ArthurZucker commented Jan 29, 2025

ruidazeng commented Jan 29, 2025 •

edited

Loading

Jiadalee commented Jan 29, 2025 •

edited

Loading

gtyellow commented Jan 29, 2025

SunMarc commented Jan 29, 2025

zzj0402 commented Jan 29, 2025 •

edited

Loading

amankumarhal commented Jan 30, 2025 •

edited

Loading

Jiadalee commented Jan 30, 2025

Unknown quantization type, got fp8 #35471

Unknown quantization type, got fp8 #35471

Comments

ruidazeng commented Dec 31, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ani1797 commented Jan 7, 2025

ruidazeng commented Jan 7, 2025

SunMarc commented Jan 8, 2025

ruidazeng commented Jan 8, 2025

SunMarc commented Jan 9, 2025

Nanayali commented Jan 10, 2025

ruidazeng commented Jan 10, 2025

SunMarc commented Jan 10, 2025

AbyssGaze commented Jan 22, 2025 • edited Loading

ruidazeng commented Jan 22, 2025

HBMTech commented Jan 29, 2025

ArthurZucker commented Jan 29, 2025

ruidazeng commented Jan 29, 2025 • edited Loading

Jiadalee commented Jan 29, 2025 • edited Loading

gtyellow commented Jan 29, 2025

SunMarc commented Jan 29, 2025

zzj0402 commented Jan 29, 2025 • edited Loading

amankumarhal commented Jan 30, 2025 • edited Loading

Jiadalee commented Jan 30, 2025

ruidazeng commented Dec 31, 2024 •

edited

Loading

AbyssGaze commented Jan 22, 2025 •

edited

Loading

ruidazeng commented Jan 29, 2025 •

edited

Loading

Jiadalee commented Jan 29, 2025 •

edited

Loading

zzj0402 commented Jan 29, 2025 •

edited

Loading

amankumarhal commented Jan 30, 2025 •

edited

Loading