Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] 0.18.0 release breaks Hummingbird build pipeline #20715

Open
ksaur opened this issue May 17, 2024 · 5 comments
Open

[Build] 0.18.0 release breaks Hummingbird build pipeline #20715

ksaur opened this issue May 17, 2024 · 5 comments
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider ep:MIGraphX issues related to AMD MI GraphX execution provider ep:ROCm questions/issues related to ROCm execution provider ep:TensorRT issues related to TensorRT execution provider stale issues that have not been addressed in a while; categorized by a bot

Comments

@ksaur
Copy link

ksaur commented May 17, 2024

Describe the issue

With the release of 0.18.0, we are having issues with the Transpose op:

>           sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E           onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {

Can you please help point us to the directions of the changes that might have broken us? Thank you!

Please see microsoft/hummingbird#770

Urgency

This is blocking the Microsoft Hummingbird runners.

Target platform

all

Build script

This is part of the Hummingbird build which depends on onnxruntime. Can you please point us to the relevant changes in your 0.18.0 build?

Error / output

self = <onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x7fb91dde3e90>
providers = [], provider_options = [], disabled_optimizers = None

    def _create_inference_session(self, providers, provider_options, disabled_optimizers=None):
        available_providers = C.get_available_providers()
    
        # Tensorrt can fall back to CUDA if it's explicitly assigned. All others fall back to CPU.
        if "TensorrtExecutionProvider" in available_providers:
            if providers and any(
                provider == "CUDAExecutionProvider"
                or (isinstance(provider, tuple) and provider[0] == "CUDAExecutionProvider")
                for provider in providers
            ):
                self._fallback_providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
            else:
                self._fallback_providers = ["CPUExecutionProvider"]
        # MIGraphX can fall back to ROCM if it's explicitly assigned. All others fall back to CPU.
        elif "MIGraphXExecutionProvider" in available_providers:
            if providers and any(
                provider == "ROCMExecutionProvider"
                or (isinstance(provider, tuple) and provider[0] == "ROCMExecutionProvider")
                for provider in providers
            ):
                self._fallback_providers = ["ROCMExecutionProvider", "CPUExecutionProvider"]
            else:
                self._fallback_providers = ["CPUExecutionProvider"]
        else:
            self._fallback_providers = ["CPUExecutionProvider"]
    
        # validate providers and provider_options before other initialization
        providers, provider_options = check_and_normalize_provider_args(
            providers, provider_options, available_providers
        )
    
        session_options = self._sess_options if self._sess_options else C.get_default_session_options()
    
        self._register_ep_custom_ops(session_options, providers, provider_options, available_providers)
    
        if self._model_path:
            sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
        else:
>           sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E           onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {

Visual Studio Version

No response

GCC / Compiler Version

No response

@ksaur ksaur added the build build issues; typically submitted using template label May 17, 2024
@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:MIGraphX issues related to AMD MI GraphX execution provider ep:ROCm questions/issues related to ROCm execution provider ep:TensorRT issues related to TensorRT execution provider labels May 17, 2024
@edgchen1
Copy link
Contributor

@sophies927
Copy link
Contributor

@snnn @yufenglee @jywu-msft @pranavsharma for visibility

@jywu-msft
Copy link
Member

jywu-msft commented May 18, 2024

this looks like due to an update to transpose opset 21 spec.
see: https://onnx.ai/onnx/operators/text_diff_Transpose_13_21.html for difference between transpose opset 13 vs 21
this was added to the description of perms attribute
"Its length must be equal to the rank of the input."
and it looks like that is being enforced now (see @edgchen1 's link above)
from the main error message
"[TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {"
so the input shape seems missing? I guess the Transpose nodes in the model don't conform to the new spec.

@ksaur
Copy link
Author

ksaur commented May 20, 2024

Thanks so much for the response and for looking into it! :)

In digging a bit more, I see some warnings about [ShapeInferenceError] Inference error(s). Were there any changes to the way dynamic axes work? (I put some debug notes here). Thanks!!

ksaur added a commit to microsoft/hummingbird that referenced this issue May 20, 2024
See #770 and microsoft/onnxruntime#20715  

We need to investigate what's going on with the dynamic args
ksaur added a commit to microsoft/hummingbird that referenced this issue May 20, 2024
See #770 and microsoft/onnxruntime#20715  

We need to investigate what's going on with the dynamic args
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider ep:MIGraphX issues related to AMD MI GraphX execution provider ep:ROCm questions/issues related to ROCm execution provider ep:TensorRT issues related to TensorRT execution provider stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

4 participants