Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a small bug with trocr example and large model #1645

Closed
fsa3z opened this issue Feb 1, 2024 · 3 comments
Closed

a small bug with trocr example and large model #1645

fsa3z opened this issue Feb 1, 2024 · 3 comments

Comments

@fsa3z
Copy link

fsa3z commented Feb 1, 2024

Hi,

  • Problem : The example works well with the base model, but not with the large model.
cargo run --example trocr --release --  --which large --cpu --image candle-examples/examples/trocr/assets/trocr.png
model: "/huggingface/hub/models--microsoft--trocr-large-handwritten/snapshots/f07eb3a73a9b06a73141dba2ae1f1671c5c346af/model.safetensors"
Error: shape mismatch for encoder.embeddings.cls_token, expected: [1, 1, 768], got: [1, 1, 1024]

the trouble came from :

let encoder_config = match args.which {
        Which::Base => candle_transformers::models::vit::Config::microsoft_trocr_base_handwritten(),
        Which::Large => {
            candle_transformers::models::vit::Config::microsoft_trocr_base_handwritten()
        }
    };

Which::Large is build with the same config as Which::Base

katopz added a commit to katopz/candle that referenced this issue Feb 10, 2024
@katopz
Copy link
Contributor

katopz commented Feb 10, 2024

Thanks for head up, already made a PR with printed supported.

Working

cargo run --example trocr --release --  --which base-hand-written --cpu --image candle-examples/examples/trocr/assets/trocr.png
cargo run --example trocr --release --  --which large-hand-written --cpu --image candle-examples/examples/trocr/assets/trocr.png
cargo run --example trocr --release --  --which base-printed --cpu --image candle-examples/examples/trocr/assets/printed-number.jpg

Remain bug

cargo run --example trocr --release --  --which large-printed --cpu --image candle-examples/examples/trocr/assets/printed-number.jpg

got

Error: cannot find tensor decoder.model.decoder.embed_positions.weight

Any idea on this one?

@LaurentMazare
Copy link
Collaborator

I've just merged #1689 which instead of using an hardcoded config gets it from the HF hub. This should make it easier to add more supported models in the future if compatible architectures appear.
For the large-printed model, the trickiness is that the position embeddings are not learnt but rather hardcoded in the model. I've made the error message be more specific about it and will look at adding support for this.

@LaurentMazare
Copy link
Collaborator

Closing this now as hopefully it's all good, feel free to re-open if you run into further issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants