Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

← Back to news ← Back to articleDiscussions: hackernews11h51mArticle 39 Comments

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

All
hackernews

andsoitis11h11m

is there a comparison of it running on iPhone vs. Android phones?

lrvick6h57m

You can run Android on just about anything so it boils down to Linux GPU benchmarks.

jeroenhd5h15m

Running Gemma-4-E2B-it on an iPhone 15 (can't go higher than that due to RAM limitations) versus a Pixel 9 Pro, I don't really notice much of a difference between the two. The Pixel is a bit faster, but also a year more recent.

The model itself works absolutely fine, though the iPhone thermal throttles at some point which really reduces the token generation speed. When I asked it to write me a business plan for a fish farm in the Nevada desert, it slowed down after a couple thousand tokens, whereas the Pixel seems to just keep going.

mistic927h17m

It runs on Android too, with AI Core or even with llama.cpp

bossyTeacher7h1m

Is the output coherent though? I am yet to see a local model working on consumer grade hardware being actually useful.

lrvick6h57m

I run qwen3.5 122b on a Framework Desktop at 35/ts as a daily driver doing security and OS systems and software engineering.

Never paid an LLM provider and I have no reason to ever start.

a_paddy6h57m

I can try it for you

jfoster6h51m

It can write (some) code that works. Just roughly guessing from my use, but I think of it as being a bit like ChatGPT circa-2024 in terms of capability & speed.

Disappointing if you compare it to anything else from 2026, but fairly impressive for something that can run locally at an OK speed.

fsiefken6h24m

Qwen3.5-9b and Qwen3.5-27b are pretty coherent on my 24G android phone

logicallee5h57m

It's highly coherent (see my other comment for an example of its text output) and yes, it's useful. I am starting to use Gemma 4:e4b as my daily driver for simple commands it definitely knows, things that are too simple to use ChatGPT for. It is also able to code through moderately difficult coding tasks. If you want to see it in action, I posted a video about it here[1] (the 10 GB one is at the 2 minute mark and the 20 GB one says hello at 5 minutes 45 seconds into the video.) You can see its speed and output on simple consumer grade hardware, in this case a Mac Mini M4 with 24 GB of RAM.

[1] https://youtube.com/live/G5OVcKO70ns

jeroenhd5h51m

Google's models work quite well on my Android phone. I haven't found a use case beyond generating shitposts, but the model does its job pretty well. It's not exactly ChatGPT, but minor things like "alter the tone of this email to make it more professional" work like a charm.

You need a relatively beefy phone to run this stuff on large amounts of text, though, and you can't have every app run it because your battery wouldn't last more than an hour.

I think the real use case for apps is more like going to be something like tiny, purpose-trained models, like the 270M models Google wants people to train and use: https://developers.googleblog.com/on-device-function-calling... With these things, you can set up somewhat intelligent situational automation without having to work out logic trees and edge cases beforehand.

the_pwner2245h19m

I have a 128 GB Strix Halo tablet (same as the other commenter here with the Framework Desktop). I'm using the larger Gemma 4 26B-A4B model (only 28 GB @ Q8) and it's been working great and runs very fast.

It's a 100% replacement for free ChatGPT/Gemini.

Compared to the paid pro/thinking models... Gemma does have reasoning, and I have used the reasoning mode for some tax & legal/accounting advice recently as well as other misc problems. It's worked well for that, but I haven't tried any real difficult tasks. From what I've heard re. agentic coding, the open weight models are ~18-24 months behind Anthropic & Google's SOTA.

Qwen 3.5 122B-A10B should just fit into 128 GB with a Q4/5 and may be a bit smarter. There's apparently also a similar sized Gemma 4 model but they haven't released it yet, the 26B was the largest released.

camillomiller6h40m

Can we please ban content that is CLEARLY written by AI?

dax_6h4m

That bugged me too, so I started looking at other articles - they all look AI generated to me. Whole website should be banned.

stingraycharles6h

I find it fascinating that after all this time reporters still don’t even bother to proofread content for obvious AI tells. I guess nobody really cares anymore?

pabs36h30m

> edge AI deployment

Isn't the "edge" meant to be computing near the user, but not on their devices?

pgt6h15m

Your device is the ultimate edge. The next frontier would be running models on your wetware.

hhh6h6m

It depends, because edge is a meaningless term and people choose what they want for it. In 2022, we set up a call with a vendor for ‘edge’ AI. Their edge meant something like 5kW, and our edge was a single raspberry pi in the best case.

stingraycharles6h3m

No it does not. This is about as “edge” as AI gets.

In a general sense, edge just means moving the computation to the user, rather than in a central cloud (although the two aren’t mutually exclusive, eg Cloudflare Workers)

codybontecou6h29m

Unfortunately Apple appears to be blocking the use of these llms within apps on their app store. I've been trying to ship an app that contains local llms and have hit a brick wall with issue 2.5.2

CubsFan10605h43m

Though of course Apple's rules aren't always consistent, I have 2 separate apps currently on my phone that can/are running this (Google's Edge Gallery and Locally AI)

saagarjha5h43m

Use of the LLMs to do what?

Gareth3215h41m

I think Apple will become increasingly draconian about LLMs. Very soon people won't need to buy many of their apps. They can just make them. This threatens Apple's entire business model.

karimf6h29m

Related: Gemma 4 on iPhone (254 comments) - https://news.ycombinator.com/item?id=47652561

redbell5h53m

Another related submission from 22 days ago : iPhone 17 Pro Demonstrated Running a 400B LLM (+700pts, +300cmts): https://news.ycombinator.com/item?id=47490070

logicallee6h13m

For those who would like an example of its output, I'm currently working through creating a small, free (cc0, public domain) encyclopedia (just a couple of thousand entries) of core concepts in Biology and Health Sciences, Physical Sciences, and Technology. Each entry is being entirely written by Gemma 4:e4b (the 10 GB model.) I believe that this may be slightly larger than the size of the model that runs locally on phones, so perhaps this model is slightly better, but the output is similar. Here is an example entry:

https://pastebin.com/ZfSKmfWp

Seems pretty good to me!

everyday77325h25m

What's your goal? Do you have a project you want the encyclopedia for?

usmanshaikh066h7m

ESET is blocking this site saying:

Threat found This web page may contain dangerous content that can provide remote access to an infected device, leak sensitive data from the device or harm the targeted device. Threat: JS/Agent.RDW trojan

ValleZ6h5m

There are many apps to run local LLMs on both iOS & Android

andsoitis11h11m

is there a comparison of it running on iPhone vs. Android phones?

lrvick6h57m

You can run Android on just about anything so it boils down to Linux GPU benchmarks.

jeroenhd5h15m

mistic927h17m

It runs on Android too, with AI Core or even with llama.cpp

bossyTeacher7h1m

Is the output coherent though? I am yet to see a local model working on consumer grade hardware being actually useful.

lrvick6h57m

I run qwen3.5 122b on a Framework Desktop at 35/ts as a daily driver doing security and OS systems and software engineering.

Never paid an LLM provider and I have no reason to ever start.

a_paddy6h57m

I can try it for you

jfoster6h51m

It can write (some) code that works. Just roughly guessing from my use, but I think of it as being a bit like ChatGPT circa-2024 in terms of capability & speed.

Disappointing if you compare it to anything else from 2026, but fairly impressive for something that can run locally at an OK speed.

fsiefken6h24m

Qwen3.5-9b and Qwen3.5-27b are pretty coherent on my 24G android phone

logicallee5h57m

[1] https://youtube.com/live/G5OVcKO70ns

jeroenhd5h51m

You need a relatively beefy phone to run this stuff on large amounts of text, though, and you can't have every app run it because your battery wouldn't last more than an hour.

the_pwner2245h19m

It's a 100% replacement for free ChatGPT/Gemini.

camillomiller6h40m

Can we please ban content that is CLEARLY written by AI?

dax_6h4m

That bugged me too, so I started looking at other articles - they all look AI generated to me. Whole website should be banned.

stingraycharles6h

I find it fascinating that after all this time reporters still don’t even bother to proofread content for obvious AI tells. I guess nobody really cares anymore?

pabs36h30m

> edge AI deployment

Isn't the "edge" meant to be computing near the user, but not on their devices?

pgt6h15m

Your device is the ultimate edge. The next frontier would be running models on your wetware.

hhh6h6m

stingraycharles6h3m

No it does not. This is about as “edge” as AI gets.

In a general sense, edge just means moving the computation to the user, rather than in a central cloud (although the two aren’t mutually exclusive, eg Cloudflare Workers)

codybontecou6h29m

Unfortunately Apple appears to be blocking the use of these llms within apps on their app store. I've been trying to ship an app that contains local llms and have hit a brick wall with issue 2.5.2

CubsFan10605h43m

Though of course Apple's rules aren't always consistent, I have 2 separate apps currently on my phone that can/are running this (Google's Edge Gallery and Locally AI)

saagarjha5h43m

Use of the LLMs to do what?

Gareth3215h41m

I think Apple will become increasingly draconian about LLMs. Very soon people won't need to buy many of their apps. They can just make them. This threatens Apple's entire business model.

karimf6h29m

Related: Gemma 4 on iPhone (254 comments) - https://news.ycombinator.com/item?id=47652561

redbell5h53m

Another related submission from 22 days ago : iPhone 17 Pro Demonstrated Running a 400B LLM (+700pts, +300cmts): https://news.ycombinator.com/item?id=47490070

logicallee6h13m

https://pastebin.com/ZfSKmfWp

Seems pretty good to me!

everyday77325h25m

What's your goal? Do you have a project you want the encyclopedia for?

usmanshaikh066h7m

ESET is blocking this site saying:

ValleZ6h5m

There are many apps to run local LLMs on both iOS & Android