Running llama.cpp directly on iOS devices #4423

philippzagar · 2023-12-12T06:13:24Z

philippzagar
Dec 12, 2023

For my Master's thesis in the digital health field, I developed a Swift package that encapsulates llama.cpp, offering a streamlined and easy-to-use Swift API for developers. The SpeziLLM package, entirely open-source, is accessible within the Stanford Spezi ecosystem: StanfordSpezi/SpeziLLM (specifically, the SpeziLLMLocal target).

Internally, SpeziLLM leverages a precompiled XCFramework version of llama.cpp. We chose this approach as using llama.cpp via the provided Package.swift file in the repo requires the use of unsafeFlags(_:), which prevents semantic versioning via SPM as discussed in the Swift community forum and on StackOverflow. By compiling llama.cpp into an XCFramework and exposing it as a binaryTarget(_:) in SPM, we enable proper semantic versioning of the package. You can explore the complete source code and the respective GitHub Actions here: StanfordBDHG/llama.cpp.

I welcome any feedback on the implementation, particularly concerning the llama.cpp inference (take a closer look at this source file)

An example workflow utilizing the Llama 2 7B model running on an iPhone 15 Pro with 6GB of main memory looks like this:
(the SpeziLLM repo includes this example as a UI test application)

SpeziLLM.mp4

breakingwave · 2024-08-01T11:24:34Z

breakingwave
Aug 1, 2024

For anyone who would like to try this out, I made a fork that uses Phi3 3.8B as the default model. It is much smaller (2.2GB) and hence much faster for inference on weaker devices, but powerful enough to test most use cases. I’m making a simple nutrition-counting app, and it works just fine.

@philippzagar, in case you still support this project, I think that Phi3 or another smaller model would be a better default option.

0 replies

ptrkstr · 2024-10-03T06:22:51Z

ptrkstr
Oct 3, 2024

@philippzagar this is a fantastic project, you've nailed both the simplicity for developers to integrate with llama.cpp and users to get a great UX chatting with an LLM. I was curious to know if you plan on continuing to maintain this (i.e. https://github.com/StanfordBDHG/llama.cpp synced with upstream).

0 replies

AdnanZahid · 2025-04-12T21:34:21Z

AdnanZahid
Apr 12, 2025

Thank you for introducing me to the SpeziLLM repository. I have a question: is it possible to use LLMRunner without relying on the @Environment property wrapper (@Environment(LLMRunner.self) var runner)? I'm integrating LLMRunner into my existing app, which follows the VIPER architecture and I would prefer to have the LLM logic in the Interactor layer rather than the View layer.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running llama.cpp directly on iOS devices #4423

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running llama.cpp directly on iOS devices #4423

Uh oh!

philippzagar Dec 12, 2023

Replies: 3 comments

Uh oh!

breakingwave Aug 1, 2024

Uh oh!

ptrkstr Oct 3, 2024

Uh oh!

AdnanZahid Apr 12, 2025

philippzagar
Dec 12, 2023

breakingwave
Aug 1, 2024

ptrkstr
Oct 3, 2024

AdnanZahid
Apr 12, 2025