-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add files via upload #497
Add files via upload #497
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Looks promising at a glance! cc @MoritzLaurer |
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would recommend versioning the key libraries to avoid issues with breaking changes in the future
Reply via ReviewNB
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide a bit more context where this figure comes from and what it represents. i.e. zeroshot for the 3 generative LLMs and a fine-tuned RoBERTa based on the zeroshot synthetic data from Mixtral (CoT + SC) (1800~ data rows/texts)
Reply via ReviewNB
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use consistent terminology: pseudo labels or synthetic data (or better: explain that pseudo labels are synthetic data)
Reply via ReviewNB
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean "download" instead of "upload"?
would also slightly reformulate to make it clear that by "skip the training step" you mean skipping the code two cells further down. (could maybe even add an if else to let people choose)
Reply via ReviewNB
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, didn't know the WOS approach/metric. I suppose that word order is only one thing that transformers take into account. Another aspect would be semantic (dis)similarity of different strings. With a countvectorizer you only capture the exact words that are in the training corpus, but it can't capture semantically similar words that are outside the training data distribution
Reply via ReviewNB
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #11. print('The WOS implies that in average {:0.1f}% of the sentences in the financial sentiment analysis (FSA) dataset are rather simple.\n'.format(100-100*WOS))
Reply via ReviewNB
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "prbabilities".
I would maybe also add a note somewhere that this is less likely to work on more complex reasoning tasks (I'd assume e.g. that countvectorizers just can't represent complex semantics / classes well enough)
Reply via ReviewNB
@@ -0,0 +1,2428 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting! (worth noting that you are also increasing the size of the MLP here, in addition to adding more training data. maybe make that (small) increase in size explicit)
Reply via ReviewNB
Looks interesting and good to me. Would assume that this works less well for more complex tasks and the additional step of distilling from the setfit model takes more developer time, but overall a cool approach for further compressing the model and making things much more efficient for inference |
Thanks @MoritzLaurer for the comments. We updated the notebook accordingly. |
Great! Don't have bandwidth for a joint blog atm unfortunately. |
Hi @tomaarsen I think we are good to go and merge into main. |
Hi @tomaarsen Could you merge into main? |
@MosheWasserb My apologies for the radio silence here, I was very busy with https://github.com/UKPLab/sentence-transformers/releases/tag/v2.6.0 |
Hi @tomaarsen Sure, no problem. model after fine-tuningX_train = model.encode(x_train) PCAestimator = PCA(n_components=2) 2D vectorsX_train_em = estimator.transform(X_train) Logistic 2nd phasesgd = LogisticRegression() |
Thank you! PCA remains strong indeed, especially for classification. It doesn't work very well for retrieval however, there I've had more luck with 1. Matryoshka models and 2. Quantization to speed up the comparisons. |
Adding a new notebook demonstrates Zero cost Zero time Zero shot Financial Sentiment Analysis
From GPT4/Mixtral to MLP128K with SetFit
@tomaarsen Could you also send to Moritz Laurer for review?