-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue in IndexFlatIP #4121
Comments
Hi Museum7432! I don't have access to the colab link so can't check your setup. |
Hi, @pankajsingh88. Weird, I can access that notebook normally even in a private tab.
You're right, the issue was mainly caused by how Faiss cannot scale well with the number of threads. And after reading the faiss code, the cause is mainly that in IndexFlatIP (and similar flat indexes) it processes each query separately in each thread. Which means that each core will have to read the whole index from ram to its cache (each core has its own cache in a multicore system) which doesn't scale well with the number of threads.
I suppose it was the memory bandwidth bottleneck that slowed faiss down in large indexes with more than 2 threads. I had tested my implementation in the VBS competition (~3 million vectors/keyframes with 1024 dimensions), and the average latency scaled linearly with the number of threads: ~1.7s for 50 queries with 8 threads. On the same dataset, faiss latency was a constant 7.5s w.r.t the number of threads. So I don't think Faiss performance is limited by OpenMP's performance, but more on the memory bandwidth between cpu and ram, and my implementation simply requires less bandwidth. I also had similar performance degradation when switching to processing each query in separate threads, similar to Faiss. So it is most likely the cause. Here is my search function if you can't access the notebook. |
Thanks for your code pointer link! Can you point me to the code location in faiss where you see the queries being iterated? While I follow the index copy part in theory, am not able to locate it in FAISS. |
Sorry for the late reply, as it turned out, I was looking at the Memory bandwidth usage of faiss with 19 queries (in GB/s), the number of threads used increase over time, the first segment is the initialization of the index (start at the 6th second): The same benchmark but with 20 queries: With 19 queries, faiss seems to use the whole memory bandwidth and scale with the number of threads. But weirdly, with 20 queries, it only uses a constant 5 GB/s. For reference, the same benchmark but with my custom implementation (which uses single-threaded
|
This issue is stale because it has been open for 7 days with no activity. |
Summary
IndexFlatIP doesn't seem to use parallelization correctly, as my custom flat index is somehow 2x to 3x (depends on the number of threads) faster than IndexFlatIP when I wrote a benchmark to compare the two. I kind of expected mine to be slower than Faiss as it is the same under the hood with fewer optimizations, not faster like this.
Platform
OS: Linux
Faiss version: 1.9.0
Installed from: faiss-cpu from anaconda and pip
Running on:
Interface:
Reproduction instructions
Here is the colab notebook for reproducing the problem.
It could be caused by the memory access pattern in IndexFlatIP (like why does the function
fvec_inner_products_by_idx
parallelize over the queries notids
, I can't read the code fordispatch_knn_ResultHandler
though, so i can't say it is caused by it).A possible explanation could be that processing each query in separate threads would mean reading the whole index into the cache of each core every search, and creates a bottleneck. And my implementation, instead of doing that, processes all the queries and a segment of the index in each thread, and does not require that much memory bandwidth.
This issue becomes really noticeable on machines with high core count.
Here is how IndexFlatIP scales with number of threads (ntotal=1000000, d=512, n_queries=100, topk=5):
The text was updated successfully, but these errors were encountered: