Skip to content

vibemachine-labs/arty

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Meet Arty, your Voice-Powered Mobile Assistant

TestFlight Ask DeepWiki Snyk OSSF Scorecard CodeRabbit Pull Request Reviews

An open-source, privacy-first voice assistant for mobile with real-time API integration. Think "Ollama for mobile + realtime voice."

Connects to your Google Drive, Github, Hacker News, and the web.

It's currently a thin wrapper around the OpenAI Realtime speech API, however the long term vision is to make it extensible and pluggable, with a fully open source stack.

If this sounds interesting, ⭐️ the project on GitHub to help it grow.

🎤 Demo audio - browse hacker news (1 min 30 secs)

hackernews-arty-demo.mp4
Whats's in the demo
  • "What are top storiees on hacker news?"
  • "What are commetns about montana law story?"
  • "SUmmarize new montana law"

🎥 Demo Reel (80 seconds)

Arty_Demo.mp4

or view the full resolution version

📱 Screenshots

Voice chat home screen
Voice chat (home screen)
Text chat conversation
Text chat
Configure connectors screen
Configure connectors

🎯 Why This Project Was Created

Voice AI is now incredibly powerful when connected to your data, yet current solutions are closed source, compromise your privacy, and are headed toward ads and lock-in.

This project offers a fully open alternative: local execution, no data monetization, and complete control over where your data goes.

▶️ Install it via TestFlight

Join the TestFlight beta

Test Flight Installation

Security note: TestFlight builds are compiled binaries; do not assume they exactly match this source code. If you require verifiability, build from source and review the code before installing.

Getting Started Instructions
  1. Create a new OpenAI API key. Grant the minimum realtime permissions shown below: (Models read, Model capabilities write)
    OpenAI key scopes step 1
  2. Grant access to Responses API.
    OpenAI key scopes step 2
  3. Paste the key into the onboarding wizard and tap Next.
    Onboarding wizard OpenAI key entry
  4. Connect Google Drive so Arty can see your files. OAuth tokens stay on-device. See Security + Privacy for details.
    Google Drive permission prompt
  5. Choose the Google account you want to use.
    Google account selection
  6. Tap “Hide Advanced” and then “Go to vibemachine (unsafe).”
    Google Drive advanced warning
  7. Review the OAuth scopes that Arty is requesting.
    Google Drive scopes
  8. Confirm the connection. You should see a success screen when Drive is linked.
    Google Drive connected confirmation
  9. Optional: Provide your own Google Drive Client ID for extra control.
    Custom Google Drive client ID
  10. Finish the onboarding wizard.
    Onboarding completion screen
  11. Start chatting with Arty.
    Voice chat home screen
How to get the most out of it
  • Connect your GitHub account: open the Hamburger Menu → Configure Connectors → GitHub and add a Personal Access Token. When creating the PAT, the recommended scopes are gist, read:org, and repo.
  • Personalize Arty: adjust the system prompt, voice, VAD mode, and tool configuration from the Advanced settings sheets to match your workflow.
  • Try out text chat mode when you can't use voice. Under settings, configure it to use text chat mode. Note, there's no streaming token support yet, so it feels pretty slow.

✨ Features

  1. Supports several connectors: Google Drive, Github, Hacker News, and Web Search - Voice assistant that can summarize content in Google Drive, interact with Github, browse Hacker News, and search the web
  2. Extensible - Adding connectors is fairly easy. File an issue to request the connector you'd want to see.
  3. Customizable prompts - Edit system and tool prompts directly from the UI
  4. Multi-mode audio - Works with speaker, handset, or Bluetooth headphones
  5. Background noise handling - Mute yourself in loud environments
  6. Session recording - Optional conversation recording and sharing
  7. Voice and text modes - Switch between input methods seamlessly
  8. Observability - Optional Logfire integration for debugging (disabled by default)
  9. Privacy-first - No server except connected services—your data stays yours

🚧 Limitations

  1. Cost - High OpenAI API costs due to poor context window management and fallback strategies
  2. Text Mode is limited - The Text mode does not support streaming tokens yet. It has a very basic and limited UX.
  3. Platform - iOS only, no Android support yet due to currently using native WebRTC library, despite using React Native via Expo.
  4. Performance - Codegen is slow and unreliable. Most functionality should be moved to static tools
  5. UX - No progress indicators during operations
  6. Security - Dynamic codegen poses risks. Mitigation: use read-only access for connected services
  7. Recording - Optional call recording implementation doesn't work very reliably since it regenerates the conversation based on a text transcript

🔐 Security + Privacy

Important note: Although tokens never leave the device, some user prompts and connector content are transmitted to the OpenAI Realtime API by design. If you require strict local-only execution, do not use this app. Watch for future updates that support fully self-contained usage or privately hosted models instead.

From a security perspective, the main risks are credential leakage or abuse:

  1. OpenAI API Key
  2. Google Drive Auth Token
  3. GitHub PAT

Mitigation: All credentials remain on-device, stored only in memory or secure storage (iOS Keychain). Audit the source code to verify that no credentials are transmitted externally.

Security + privacy: storage, scopes, and network flow recap
  • All token storage in memory and secure storage happens in lib/secure-storage.ts

  • The actual saving/retrieval of tokens is delegated to the Expo library expo-secure-store

  • Transport security: All outbound requests to OpenAI, Google, GitHub, and Logfire use HTTPS with TLS handled by each provider. This project does not introduce custom proxies or MITM layers.

  • Prompt-injection and mis-issuance: The app does not currently detect or prevent malicious model output from executing unexpected write actions. Use read-only scopes wherever possible.

  • OAuth tokens and API keys are stored via expo-secure-store, which maps to the iOS Keychain using the kSecAttrAccessibleAfterFirstUnlockThisDeviceOnly accessibility level. Tokens are never written to plaintext disk.

  • Recording is off by default, and conversation transcripts are not saved. Optional recordings remain on-device and rely on standard iOS filesystem encryption.

  • No third-party endpoints beyond OpenAI, Google, GitHub, and optional Logfire are contacted at runtime. The app does not embed analytics, crash reporting SDKs, or ad networks.

  • The Google Drive OAuth scope used by the default Client ID in the TestFlight build is read-only—it can create or edit files that the app created, but cannot edit or delete files that originated elsewhere. For tighter control, register your own Google Drive app, supply its Client ID, and grant the permissions you deem appropriate.

  • When creating a GitHub Personal Access Token, choose scopes based on your comfort level. Enable write scopes (for example, issue creation) explicitly—they are not required for basic usage.

  • Assume that connector operations which retrieve file contents may send that content to the LLM for summarization unless you have deliberately disabled that behavior.

Observability logs are disabled by default. Note that these should be automatically scrubbed of API tokens by Logfire itself. Only enable Logfire after you have audited the code and feel comfortable—this is mainly a developer feature and not recommended for casual usage or testing.

Out of scope: This project does not currently defend against (1) on-device compromise, (2) malicious LLM responses executing actions against connected services using delegated tokens, or (3) interception of API traffic by the model provider.

🛠️ Building from source

Installation steps

Clone project and install dependencies

git clone https://github.com/vibemachine-labs/arty.git
cd arty
curl -fsSL https://bun.sh/install | bash
bun install

Create a Google Drive Client ID

When building from source, you will need to provide your own Google Drive Client ID. You can decide the permissions you want to give it, as well as whether you want to go through the verification process.

Google API Instructions

For testing, the following oauth scopes are suggested:

  1. See and download your google drive files (included by default)
  2. See, edit, create, and delete only the specific Google Drive files you use with this app

Run the app

To run in the iOS simulator:

bunx expo run:ios

⚠️ Audio is flaky on the iOS Simulator. Using a real device is highly recommended.

To run on a physical device:

bunx expo run:ios --device
Editing Swift code in Xcode

Open Xcode project

To open the project in Xcode:

xed ios

In Xcode, the native swift code will be under Pods / Development Pods

Misc Dev Notes

Disable onboarding wizard (optional)

For certain testing scenarios, disable the onboarding wizard by editing app/index.tsx and commenting out the useEffect block that evaluates onboarding status:

useEffect(() => {
  let isActive = true;

  const evaluateOnboardingStatus = async () => {
    try {
      const storedKey = await getApiKey();
      const hasStoredKey = typeof storedKey === "string" && storedKey.trim().length > 0;
      if (!isActive) {
        return;
      }
      setOnboardingVisible(!hasStoredKey);
    } catch (error) {
      if (!isActive) {
        return;
      }
      log.warn("Unable to determine onboarding status from secure storage", error);
      setOnboardingVisible(true);
    }
  };

  if (!apiKeyConfigVisible) {
    void evaluateOnboardingStatus();
  }

  return () => {
    isActive = false;
  };
}, [apiKeyConfigVisible, onboardingCheckToken]);

Development notes

  • Project bootstrapped with bunx create-expo-app@latest .
  • Refresh dependencies after pulling new changes: bunx expo install
  • Install new dependencies: bunx expo install <package-name>
  • Allow LAN access once: bunx expo start --lan

Run on iOS device via ad hoc distribution

  1. Register device: eas device:create
  2. Scan the generated QR code on the device and install the provisioning profile via Settings.
  3. Configure build: bunx eas build:configure
  4. Build: eas build --platform ios --profile dev_self_contained

Clean build

If pods misbehave, rebuild from scratch:

bunx expo prebuild --clean --platform ios
bunx expo run:ios

⚙️ Technical Details

Architecture overview

Native Swift WebRTC Client

React Native WebRTC libraries did not reliably support speakerphone mode during prototyping. The native Swift implementation resolves this issue but adds complexity and delays Android support.

Codegen vs Static Tools

Dynamic code generation currently powers some connector operations (Google Drive, GitHub), enabling rapid prototyping. However, the Hacker News tool demonstrates the preferred approach: statically defined tools that don't rely on codegen.

Migration in progress: Google Drive and GitHub tools will be converted from the codegen approach to static tools, improving reliability and performance. Long-term, codegen will remain available as a fallback option for rapid prototyping of new connectors.

MCP Support

Not yet implemented since all tools are currently local. Future versions will add MCP server support via cloud or local tunnel connections.

Web Search

GPT-4 web search serves as a temporary solution. The roadmap includes integrating a dedicated search API (e.g., Brave Search) using user-provided API tokens.

Voice / Text LLM backend

OpenAI is currently the only supported backend. Adding support for multiple providers and self-hosted backends is on the roadmap.

🗺️ Roadmap

  1. Address limitations listed above
  2. Improve text mode support
  3. Investigate async voice processing to reduce cost
  4. Add support for alternative voice providers (Unmute.sh, Speaches.ai, self-hosted)
  5. Remote MCP integration
  6. TypeScript MCP plugin support

💼 Business Model

The app itself will remain completely open source, with no restrictions or limitations.

Business model TBD. Likely a managed backend service using either:

🤝 How You Can Help

📬 Contact & Feedback

  • Email/Twitter: Email or Twitter/X via my Github profile.
  • Issues, Ideas: Submit bugs, feature requests, or connector suggestions on GitHub Issues.
  • Discord: A server will be launched if there’s enough interest.
  • Responsible disclosure: Report security-relevant issues privately via email using the address listed on my Github profile before any public disclosure.

About

An open-source, privacy-first realtime voice assistant for mobile

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7