-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fully support float16 and bfloat16 for embeddings (or at least float16) #3155
Comments
Correct me if I'm mistaken, but it seems that the only want to get embeddings in bfloat16 or float16 would be through the complex process of specifying "none" for the "precision" parameter and so on, according to this visualization. I'd be interested in hearing from you guys - for better or for worse - if you'll explicitly support float16 and/or bfloat16 embeddings being output, not merely being able to specify the "dtype" of the embedding model. Thanks. Mermaid graph as a courtesy: graph TD
A[The embeddings will initially have the same datatype as the embedding model, but this can be controlled with the 'dtype' parameter] --> B{Initial Datatype<br>e.g. float32/float16/bfloat16<br><br>Initially a PyTorch tensor}
B -->|Initially float32,| C{convert_to_numpy parameter?}
B -->|Initially float16| D{convert_to_numpy parameter?}
B -->|Initially bfloat16| E{convert_to_numpy parameter?}
C -->|True| F[Convert to float32 numpy array]
C -->|False| G[Keep as float32 PyTorch tensor]
C -->|Not Specified| G
D -->|True| H[Convert to float32 numpy array]
D -->|False| I[Keep as float16 PyTorch tensor]
D -->|Not Specified| I
E -->|True| J[Convert to float32 numpy array]
E -->|False| K[Keep as bfloat16 PyTorch tensor]
E -->|Not Specified| K
F --> L{Precision parameter within the 'encode' method?}
G --> L
H --> L
I --> L
J --> L
K --> L
L -->|'None' Value Used| M[Keep current datatype and format]--> S
L -->|Parameter not used| N[Originally float16 embedding converted to float16 numpy array.<br><br>Originally float32 or bfloat16 embeddings converted to float32 numpy array]--> S
L -->|Explicit value used| O{Precision parameter accepts 'float32,' 'int8,' 'uint8,' 'binary,' and 'ubinary'}
O -->|float32| P[Converted to float32 numpy array]--> S
O -->|int8/uint8| Q[Converted to float32 numpy array then then Linear Quantization to 8-bit integers]--> S
O -->|binary/ubinary| R[Converted to float32 numpy array then Binary Quantization packed bits]--> S
S{convert_to_tensor parameter?}
S -->|True| T[Converted to a single stacked PyTorch tensor<br><br>Overrides any 'convert_to_numpy' setting]
S -->|False| U{convert_to_numpy parameter?}
S -->|Not Specified| U
U -->|True| V[Convert to a single numpy array]
U -->|False| W[Remain a list of PyTorch tensors]
U -->|Not Specified| W
style A fill:#2C3E50,stroke:#fff,color:#fff
style B fill:#34495E,stroke:#fff,color:#fff
style C fill:#34495E,stroke:#fff,color:#fff
style D fill:#34495E,stroke:#fff,color:#fff
style E fill:#34495E,stroke:#fff,color:#fff
style F fill:#2980B9,stroke:#fff,color:#fff
style G fill:#2980B9,stroke:#fff,color:#fff
style H fill:#2980B9,stroke:#fff,color:#fff
style I fill:#2980B9,stroke:#fff,color:#fff
style J fill:#2980B9,stroke:#fff,color:#fff
style K fill:#2980B9,stroke:#fff,color:#fff
style L fill:#34495E,stroke:#fff,color:#fff
style M fill:#16A085,stroke:#fff,color:#fff
style N fill:#16A085,stroke:#fff,color:#fff
style O fill:#34495E,stroke:#fff,color:#fff
style P fill:#16A085,stroke:#fff,color:#fff
style Q fill:#16A085,stroke:#fff,color:#fff
style R fill:#16A085,stroke:#fff,color:#fff
style S fill:#34495E,stroke:#fff,color:#fff
style T fill:#27AE60,stroke:#fff,color:#fff
style U fill:#34495E,stroke:#fff,color:#fff
style V fill:#27AE60,stroke:#fff,color:#fff
style W fill:#27AE60,stroke:#fff,color:#fff
|
Since bfloat16 support is a little more difficult and might require tensorflow, can we at least get direct support for saving embeddings into float16? This would halve the storage required without having to rely on quantized embeddings for those of us who want the middle ground? See e.g. here for the tensorflow issue: https://github.com/milvus-io/pymilvus/blob/master/examples/datatypes/bfloat16_example.py |
Any thoughts on this? |
Apologies for the delay, I've been a bit distracted with my Static Embeddings blogpost. I'm a little unsure about what you'd like to change. These are the outputs:
The value marked with a * is the only one that's perhaps a bit unexpected, but it's caused by In short, I believe that this is not true:
Snippets: from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# model.half()
model.bfloat16()
output = model.encode("Hello!", convert_to_tensor=True)
print(type(output))
print(output.dtype)
|
I'll review this and may or may not follow up...but coincidentally I was just reading your blogpost about the new kinds of embedding models actually. |
Currently, sentence-transformers' encode method only supports 'float32' and select quantized formats (int8, uint8, binary, ubinary) when returning embeddings. However, many embedding models can create embeddings in float16 and bfloat16...and..in-fact...sentence transformers supports loading them with those "dtypes."
Further...I understand that float16 is supported by numpy but bfloat16 is not.
Therefore, currently, even if a model is loaded into sentence transformers with a dtype of float16 or bfloat16 and "initially" produces embeddings, the embeddings themselves are converted to float32 or one of the quantizations. There is no way to currently keep them in float16 or bfloat16 throughout the entire process. Lots of vector databases can handle other datatypes...
Since numpy already supports float16 apparently, the ml_dtypes library (https://github.com/jax-ml/ml_dtypes) library could in theory be used to also support bfloat.
Can we please at least get "float16" supported as a supported "precision" parameter argument since numpy already supports it? It'd be nice to support bfloat16 as well though.
The text was updated successfully, but these errors were encountered: