GN
GlobalNews.one
Technology

Run LLMs locally in Flutter with <200ms latency

February 17, 2026
Sponsored
Run LLMs locally in Flutter with <200ms latency

A managed on-device AI runtime for Flutter — text, vision, speech, and RAG running sustainably on real phones under real constraints. Private by default.

~22,700 LOC | 50 C API functions | 32 Dart SDK files | 0 cloud dependencies

Modern on-device AI demos break instantly in real usage:

Edge-Veda exists to make on-device AI predictable, observable, and sustainable — not just runnable.

Edge-Veda is a supervised on-device AI runtime that:

Edge-Veda is designed for behavior over time, not benchmark bursts.

Key design constraint: Dart FFI is synchronous — calling llama.cpp directly would freeze the UI. All inference runs in background isolates. Native pointers never cross isolate boundaries. Workers maintain persistent contexts so models load once and stay in memory across the entire session.

Edge-Veda continuously monitors:

Based on these signals, it dynamically adjusts:

Escalation is immediate. Thermal spikes are dangerous and must be responded to without delay.

Restoration requires cooldown (60s per level) and happens one level at a time. Full recovery from paused to full takes 3 minutes. This prevents oscillation where the system rapidly alternates between high and low quality.

Declare runtime guarantees. The Scheduler enforces them.

Adaptive profiles resolve against measured device performance after warm-up:

All numbers measured on a physical iPhone (A16 Bionic, 6GB RAM, iOS 26.2.1) with Metal GPU. See BENCHMARKS.md for full details.

Built-in performance flight recorder writes per-frame JSONL traces:

Traces are analyzed offline using tools/analyze_trace.py (p50/p95/p99 stats, throughput charts, thermal overlays).

Pre-configured in ModelRegistry with download URLs and SHA-256 checksums:

Any GGUF model compatible with llama.cpp can be loaded by file path.

Compiles llama.cpp + whisper.cpp + Edge Veda C code for device (arm64) and simulator (arm64), merges static libraries into a single XCFramework.

The demo app includes Chat (multi-turn with tool calling), Vision (continuous camera scanning), STT (live microphone transcription), and Settings (model management, device info).

Edge-Veda is designed for teams building:

Contributions are welcome. Here's how to get started:

Apache 2.0

Built on llama.cpp and whisper.cpp by Georgi Gerganov and contributors.

Sponsored
Alex Chen

Alex Chen

Senior Tech Editor

Covering the latest in consumer electronics and software updates. Obsessed with clean code and cleaner desks.