Local LLM performance issue #59
-
What Stanford Spezi module is your challenge related to?SpeziML DescriptionHi, my team is interested in your framework and I am currently exploring especially with LLM features.
I have uploaded the sample project to Github. Please find my repository from the link below / my profile. Reproductionhttps://github.com/dg6546/spezillm_demo Expected behaviorFaster generation possible? Additional contextN/A Code of Conduct
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@dg6546 Thank you for sharing this challenge! Llamacpp might not be ideal for a lot of on-device use cases, and we don't do a great job optimizing for this in the current state; it would be great to explore the new improvements that Apple shipped with iOS 18 and use CoreML and other Apple technologies for the local LLM execution. PRs to support and extend SpeziLLM are always more than welcome! You can find documentation on how you can load a model file from anywhere within your app and without a UI using the Feel free to create any issues that you encounter in the SpeziLLM repo. PRs and contributions are always more than welcome! |
Beta Was this translation helpful? Give feedback.
We want to point out that we now use a way more efficient LLM inference backend in SpeziLLM that builds on top of MLX, it should resolve most of the performance constraints we have seen before.