Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

On-device AI has been a talking point for years, but Google’s latest move makes it harder to dismiss. Gemma 4, Google’s open-source model family, now runs directly on iPhones, full local inference, fully offline. It’s a meaningful step, and it signals that edge AI deployment isn’t a future priority anymore; it’s happening right now.

So, where does Gemma 4 stand against the competition? Early benchmarks position the 31B variant alongside Qwen 3.5’s 27B model, a reasonably close matchup, with Gemma carrying roughly 4 billion additional parameters. Both models carry trade-offs, and neither is a clear sweep across every task.

More compelling story, though, isn’t the flagship size — it’s the smaller ones. E2B and E4B variants are clearly engineered for mobile deployment, prioritizing efficiency over raw capability. Google’s own app nudges users toward E2B, and that makes sense: it’s faster, lighter, and better suited for real-world on-device conditions where memory and thermal limits matter.

Getting started requires nothing more than downloading the Google AI Edge Gallery from the App Store. From there, users select their preferred model variant and start running inference directly on their device. No API calls. No cloud dependency.

Google AI Edge Gallery isn’t just a text interface. It bundles image recognition, voice interaction, and an extensible Skills framework, positioning it less like a demo and more like a platform for on-device AI experimentation. That framing matters; it suggests Google wants developers and power users to treat this as a foundation, not a feature.

Under the hood, Gemma 4 routes inference through the iPhone’s GPU. In practice, responses arrive with notably low latency, a strong indicator that consumer hardware is now capable of sustaining this class of workload without visible degradation. That’s not a minor footnote; it’s the whole argument for why local AI deployment is becoming commercially viable.

Offline capability, in particular, changes the calculus for enterprise use cases — field applications, healthcare settings, and scenarios where data privacy rules out cloud processing entirely.

When all’s said and done, Gemma 4 on iPhone isn’t just a technical proof-of-concept. It’s a signal that the on-device AI era has arrived — and for Google, the Gemma is definitely out of the bottle.