Running llama.cpp directly on iOS devices #4423
Replies: 3 comments
-
For anyone who would like to try this out, I made a fork that uses Phi3 3.8B as the default model. It is much smaller (2.2GB) and hence much faster for inference on weaker devices, but powerful enough to test most use cases. I’m making a simple nutrition-counting app, and it works just fine. @philippzagar, in case you still support this project, I think that Phi3 or another smaller model would be a better default option. |
Beta Was this translation helpful? Give feedback.
-
@philippzagar this is a fantastic project, you've nailed both the simplicity for developers to integrate with llama.cpp and users to get a great UX chatting with an LLM. I was curious to know if you plan on continuing to maintain this (i.e. https://github.com/StanfordBDHG/llama.cpp synced with upstream). |
Beta Was this translation helpful? Give feedback.
-
Thank you for introducing me to the SpeziLLM repository. I have a question: is it possible to use LLMRunner without relying on the @Environment property wrapper (@Environment(LLMRunner.self) var runner)? I'm integrating LLMRunner into my existing app, which follows the VIPER architecture and I would prefer to have the LLM logic in the Interactor layer rather than the View layer. |
Beta Was this translation helpful? Give feedback.
-
For my Master's thesis in the digital health field, I developed a Swift package that encapsulates llama.cpp, offering a streamlined and easy-to-use Swift API for developers. The SpeziLLM package, entirely open-source, is accessible within the Stanford Spezi ecosystem: StanfordSpezi/SpeziLLM (specifically, the
SpeziLLMLocal
target).Internally, SpeziLLM leverages a precompiled XCFramework version of llama.cpp. We chose this approach as using llama.cpp via the provided Package.swift file in the repo requires the use of
unsafeFlags(_:)
, which prevents semantic versioning via SPM as discussed in the Swift community forum and on StackOverflow. By compiling llama.cpp into an XCFramework and exposing it as abinaryTarget(_:)
in SPM, we enable proper semantic versioning of the package. You can explore the complete source code and the respective GitHub Actions here: StanfordBDHG/llama.cpp.I welcome any feedback on the implementation, particularly concerning the llama.cpp inference (take a closer look at this source file)
An example workflow utilizing the Llama 2 7B model running on an iPhone 15 Pro with 6GB of main memory looks like this:
(the SpeziLLM repo includes this example as a UI test application)
SpeziLLM.mp4
Beta Was this translation helpful? Give feedback.
All reactions