Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference - Comments

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

andsoitis

is there a comparison of it running on iPhone vs. Android phones?

lrvick

You can run Android on just about anything so it boils down to Linux GPU benchmarks.

jeroenhd

Running Gemma-4-E2B-it on an iPhone 15 (can't go higher than that due to RAM limitations) versus a Pixel 9 Pro, I don't really notice much of a difference between the two. The Pixel is a bit faster, but also a year more recent.

The model itself works absolutely fine, though the iPhone thermal throttles at some point which really reduces the token generation speed. When I asked it to write me a business plan for a fish farm in the Nevada desert, it slowed down after a couple thousand tokens, whereas the Pixel seems to just keep going.

mistic92

It runs on Android too, with AI Core or even with llama.cpp

bossyTeacher

Is the output coherent though? I am yet to see a local model working on consumer grade hardware being actually useful.

lrvick

I run qwen3.5 122b on a Framework Desktop at 35/ts as a daily driver doing security and OS systems and software engineering.

Never paid an LLM provider and I have no reason to ever start.

a_paddy

I can try it for you

jfoster

It can write (some) code that works. Just roughly guessing from my use, but I think of it as being a bit like ChatGPT circa-2024 in terms of capability & speed.

Disappointing if you compare it to anything else from 2026, but fairly impressive for something that can run locally at an OK speed.

fsiefken

Qwen3.5-9b and Qwen3.5-27b are pretty coherent on my 24G android phone

logicallee

It's highly coherent (see my other comment for an example of its text output) and yes, it's useful. I am starting to use Gemma 4:e4b as my daily driver for simple commands it definitely knows, things that are too simple to use ChatGPT for. It is also able to code through moderately difficult coding tasks. If you want to see it in action, I posted a video about it here[1] (the 10 GB one is at the 2 minute mark and the 20 GB one says hello at 5 minutes 45 seconds into the video.) You can see its speed and output on simple consumer grade hardware, in this case a Mac Mini M4 with 24 GB of RAM.

[1] https://youtube.com/live/G5OVcKO70ns

jeroenhd

Google's models work quite well on my Android phone. I haven't found a use case beyond generating shitposts, but the model does its job pretty well. It's not exactly ChatGPT, but minor things like "alter the tone of this email to make it more professional" work like a charm.

You need a relatively beefy phone to run this stuff on large amounts of text, though, and you can't have every app run it because your battery wouldn't last more than an hour.

I think the real use case for apps is more like going to be something like tiny, purpose-trained models, like the 270M models Google wants people to train and use: https://developers.googleblog.com/on-device-function-calling... With these things, you can set up somewhat intelligent situational automation without having to work out logic trees and edge cases beforehand.

the_pwner224

I have a 128 GB Strix Halo tablet (same as the other commenter here with the Framework Desktop). I'm using the larger Gemma 4 26B-A4B model (only 28 GB @ Q8) and it's been working great and runs very fast.

It's a 100% replacement for free ChatGPT/Gemini.

Compared to the paid pro/thinking models... Gemma does have reasoning, and I have used the reasoning mode for some tax & legal/accounting advice recently as well as other misc problems. It's worked well for that, but I haven't tried any real difficult tasks. From what I've heard re. agentic coding, the open weight models are ~18-24 months behind Anthropic & Google's SOTA.

Qwen 3.5 122B-A10B should just fit into 128 GB with a Q4/5 and may be a bit smarter. There's apparently also a similar sized Gemma 4 model but they haven't released it yet, the 26B was the largest released.

camillomiller

Can we please ban content that is CLEARLY written by AI?

dax_

That bugged me too, so I started looking at other articles - they all look AI generated to me. Whole website should be banned.

stingraycharles

I find it fascinating that after all this time reporters still don’t even bother to proofread content for obvious AI tells. I guess nobody really cares anymore?

pabs3

> edge AI deployment

Isn't the "edge" meant to be computing near the user, but not on their devices?

pgt

Your device is the ultimate edge. The next frontier would be running models on your wetware.

hhh

It depends, because edge is a meaningless term and people choose what they want for it. In 2022, we set up a call with a vendor for ‘edge’ AI. Their edge meant something like 5kW, and our edge was a single raspberry pi in the best case.

stingraycharles

No it does not. This is about as “edge” as AI gets.

In a general sense, edge just means moving the computation to the user, rather than in a central cloud (although the two aren’t mutually exclusive, eg Cloudflare Workers)

codybontecou

Unfortunately Apple appears to be blocking the use of these llms within apps on their app store. I've been trying to ship an app that contains local llms and have hit a brick wall with issue 2.5.2

CubsFan1060

Though of course Apple's rules aren't always consistent, I have 2 separate apps currently on my phone that can/are running this (Google's Edge Gallery and Locally AI)

saagarjha

Use of the LLMs to do what?

Gareth321

I think Apple will become increasingly draconian about LLMs. Very soon people won't need to buy many of their apps. They can just make them. This threatens Apple's entire business model.

karimf

Related: Gemma 4 on iPhone (254 comments) - https://news.ycombinator.com/item?id=47652561

redbell

Another related submission from 22 days ago : iPhone 17 Pro Demonstrated Running a 400B LLM (+700pts, +300cmts): https://news.ycombinator.com/item?id=47490070

logicallee

For those who would like an example of its output, I'm currently working through creating a small, free (cc0, public domain) encyclopedia (just a couple of thousand entries) of core concepts in Biology and Health Sciences, Physical Sciences, and Technology. Each entry is being entirely written by Gemma 4:e4b (the 10 GB model.) I believe that this may be slightly larger than the size of the model that runs locally on phones, so perhaps this model is slightly better, but the output is similar. Here is an example entry:

https://pastebin.com/ZfSKmfWp

Seems pretty good to me!

everyday7732

What's your goal? Do you have a project you want the encyclopedia for?

usmanshaikh06

ESET is blocking this site saying:

Threat found This web page may contain dangerous content that can provide remote access to an infected device, leak sensitive data from the device or harm the targeted device. Threat: JS/Agent.RDW trojan

ValleZ

There are many apps to run local LLMs on both iOS & Android

andsoitis

is there a comparison of it running on iPhone vs. Android phones?

lrvick

You can run Android on just about anything so it boils down to Linux GPU benchmarks.

jeroenhd

Running Gemma-4-E2B-it on an iPhone 15 (can't go higher than that due to RAM limitations) versus a Pixel 9 Pro, I don't really notice much of a difference between the two. The Pixel is a bit faster, but also a year more recent.

The model itself works absolutely fine, though the iPhone thermal throttles at some point which really reduces the token generation speed. When I asked it to write me a business plan for a fish farm in the Nevada desert, it slowed down after a couple thousand tokens, whereas the Pixel seems to just keep going.

mistic92

It runs on Android too, with AI Core or even with llama.cpp

bossyTeacher

Is the output coherent though? I am yet to see a local model working on consumer grade hardware being actually useful.

lrvick

I run qwen3.5 122b on a Framework Desktop at 35/ts as a daily driver doing security and OS systems and software engineering.

Never paid an LLM provider and I have no reason to ever start.

a_paddy

I can try it for you

jfoster

It can write (some) code that works. Just roughly guessing from my use, but I think of it as being a bit like ChatGPT circa-2024 in terms of capability & speed.

Disappointing if you compare it to anything else from 2026, but fairly impressive for something that can run locally at an OK speed.

fsiefken

Qwen3.5-9b and Qwen3.5-27b are pretty coherent on my 24G android phone

logicallee

It's highly coherent (see my other comment for an example of its text output) and yes, it's useful. I am starting to use Gemma 4:e4b as my daily driver for simple commands it definitely knows, things that are too simple to use ChatGPT for. It is also able to code through moderately difficult coding tasks. If you want to see it in action, I posted a video about it here[1] (the 10 GB one is at the 2 minute mark and the 20 GB one says hello at 5 minutes 45 seconds into the video.) You can see its speed and output on simple consumer grade hardware, in this case a Mac Mini M4 with 24 GB of RAM.

[1] https://youtube.com/live/G5OVcKO70ns

jeroenhd

Google's models work quite well on my Android phone. I haven't found a use case beyond generating shitposts, but the model does its job pretty well. It's not exactly ChatGPT, but minor things like "alter the tone of this email to make it more professional" work like a charm.

You need a relatively beefy phone to run this stuff on large amounts of text, though, and you can't have every app run it because your battery wouldn't last more than an hour.

I think the real use case for apps is more like going to be something like tiny, purpose-trained models, like the 270M models Google wants people to train and use: https://developers.googleblog.com/on-device-function-calling... With these things, you can set up somewhat intelligent situational automation without having to work out logic trees and edge cases beforehand.

the_pwner224

I have a 128 GB Strix Halo tablet (same as the other commenter here with the Framework Desktop). I'm using the larger Gemma 4 26B-A4B model (only 28 GB @ Q8) and it's been working great and runs very fast.

It's a 100% replacement for free ChatGPT/Gemini.

Compared to the paid pro/thinking models... Gemma does have reasoning, and I have used the reasoning mode for some tax & legal/accounting advice recently as well as other misc problems. It's worked well for that, but I haven't tried any real difficult tasks. From what I've heard re. agentic coding, the open weight models are ~18-24 months behind Anthropic & Google's SOTA.

Qwen 3.5 122B-A10B should just fit into 128 GB with a Q4/5 and may be a bit smarter. There's apparently also a similar sized Gemma 4 model but they haven't released it yet, the 26B was the largest released.

camillomiller

Can we please ban content that is CLEARLY written by AI?

dax_

That bugged me too, so I started looking at other articles - they all look AI generated to me. Whole website should be banned.

stingraycharles

I find it fascinating that after all this time reporters still don’t even bother to proofread content for obvious AI tells. I guess nobody really cares anymore?

pabs3

> edge AI deployment

Isn't the "edge" meant to be computing near the user, but not on their devices?

pgt

Your device is the ultimate edge. The next frontier would be running models on your wetware.

hhh

It depends, because edge is a meaningless term and people choose what they want for it. In 2022, we set up a call with a vendor for ‘edge’ AI. Their edge meant something like 5kW, and our edge was a single raspberry pi in the best case.

stingraycharles

No it does not. This is about as “edge” as AI gets.

In a general sense, edge just means moving the computation to the user, rather than in a central cloud (although the two aren’t mutually exclusive, eg Cloudflare Workers)

codybontecou

Unfortunately Apple appears to be blocking the use of these llms within apps on their app store. I've been trying to ship an app that contains local llms and have hit a brick wall with issue 2.5.2

CubsFan1060

Though of course Apple's rules aren't always consistent, I have 2 separate apps currently on my phone that can/are running this (Google's Edge Gallery and Locally AI)

saagarjha

Use of the LLMs to do what?

Gareth321

I think Apple will become increasingly draconian about LLMs. Very soon people won't need to buy many of their apps. They can just make them. This threatens Apple's entire business model.

karimf

Related: Gemma 4 on iPhone (254 comments) - https://news.ycombinator.com/item?id=47652561

redbell

Another related submission from 22 days ago : iPhone 17 Pro Demonstrated Running a 400B LLM (+700pts, +300cmts): https://news.ycombinator.com/item?id=47490070

logicallee

For those who would like an example of its output, I'm currently working through creating a small, free (cc0, public domain) encyclopedia (just a couple of thousand entries) of core concepts in Biology and Health Sciences, Physical Sciences, and Technology. Each entry is being entirely written by Gemma 4:e4b (the 10 GB model.) I believe that this may be slightly larger than the size of the model that runs locally on phones, so perhaps this model is slightly better, but the output is similar. Here is an example entry:

https://pastebin.com/ZfSKmfWp

Seems pretty good to me!

everyday7732

What's your goal? Do you have a project you want the encyclopedia for?

usmanshaikh06

ESET is blocking this site saying:

Threat found This web page may contain dangerous content that can provide remote access to an infected device, leak sensitive data from the device or harm the targeted device. Threat: JS/Agent.RDW trojan

ValleZ

There are many apps to run local LLMs on both iOS & Android