Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about SkipBERT #1

Open
jonsaadfalcon opened this issue Aug 3, 2022 · 2 comments
Open

Questions about SkipBERT #1

jonsaadfalcon opened this issue Aug 3, 2022 · 2 comments

Comments

@jonsaadfalcon
Copy link

Hello, I was interested in experimenting with the SkipBERT architecture. I was wondering if you could help with a few questions about implementation:

  • In the example on the ReadME, it seems like an input needs to be run at inference and then run again before we see the speed-up. Assuming the model checkpoint is a SkipBERT model that was already finetuned on the training set, could we still see the same improvement to inference time without running that specific input twice?
  • Is there any script or notebook for training our own SkipBERT models?
  • The second model checkpoint on the Github page, SkipBERT6+4, does not seem to be working. I was wondering if the checkpoint was available for usage.

Thank you for the help!

@LorrinWWW
Copy link
Owner

Thank for your interests in our work!

In the example on the ReadME, it seems like an input needs to be run at inference and then run again before we see the speed-up. Assuming the model checkpoint is a SkipBERT model that was already finetuned on the training set, could we still see the same improvement to inference time without running that specific input twice?

A: The input tri-grams need to be cached before seeing speed-up when config.plot_mode = 'plot_passive'.
If we allow some tri-grams to be OOV, we can set the config.plot_mode = 'plot_only', which we will see constant speed-up (though the accuracy will be hurt if OOV rate is too high).

Regarding 'plot_mode':

  • force_compute: compute tri-grams on demand. (usually used for training)
  • update_all: compute tri-grams, bi-grams, and uni-grams and write them to PLOT. (usually used to update PLOT)
  • plot_passive: use PLOT if no OOV; else it will use GPU to compute tri-grams and write them to PLOT.
  • plot_only: use PLOT only. Looking up order: trigram -> bigram -> unigram -> 0.

Is there any script or notebook for training our own SkipBERT models?

A: The code for training SkipBERT is under general_distillation.

We use distillation to train SkipBERT, but it should be feasible to train with MLM objective or other pretraining scheme.

The second model checkpoint on the Github page, SkipBERT6+4, does not seem to be working. I was wondering if the checkpoint was available for usage.

A: We are so sorry for the mistake. It was supposed be released before. We will upload it again quickly.

@jonsaadfalcon
Copy link
Author

Thank you for the information! In regards to training our own SkipBERT models, I'm assuming we are supposed to use the run_train.sh script. I tried configuring it to run training but I'm having some issues with the dependencies and setup on a single GPU system. I was wondering if there was any example instructions for getting started.

Additionally, is it possible to calculate OOV at inference and see what percentage of trigrams encountered are OOV?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants