Notes.app handles big notebooks without choking on storage?
Remi_Etien
[dead]
p1anecrazy
Really like demo cli tools description. Are they limited by the context window as well? What’s your experience with log file sizes?
franze
the 2 hard limits of Appel Intelligence Foundation Model and therefor apfel is the 4k token context window and the super hard guardrails (the model prefers to tell you nothing before it tells you something wrong ie ask it to describe a color)
parsing logfiles line by line, sure
parsing a whole logfile, well it must be tiny, logfile hardly ever are
khalic
AFM models are very impressive, but they’re not made for conversation, so keep your expectations down in chat mode.
elcritch
Any know if these only installed on Tahoe? I'm running Sequoia still and get an error about model not found.
HelloUsername
> Apple Silicon Mac, macOS 26 Tahoe or newer, Apple Intelligence enabled
swiftcoder
Anyone tried using this as a sub-agent for a more capable model like Claude/Codex?
LatencyKills
The combined (input/output) context window length is 4K. Claude would blow through that even when trying to read and summarize a small file.
franze
project started with
trying to run openclaw with it in ultra token saving mode, did totally not work.
great for shell scripts though (my major use case now)
khalic
If you’re looking into small models for tiny local tasks, you should try Qwen coder 0,5B. It’s more of an experiment, but it can output decent functions given the right context instructions.
coredog64
I was thinking about the other way: Could you use this in front of Claude to summarize inputs and so reduce your token counts?
gigatexal
It’s a very small model but I’ve been playing with it for some time now I’m impressed. Have we been sleeping on Apple’s models?
Imagine they baked Qwen 3.5 level stuff into the OS. Wow that’d be cool.
thenthenthen
The vision models and OCR are SUPER
bombcar
Apparently the Overcast guy build a beowulf cluster of Mac minis to use the Apple transcription service.
I’ve seen several projects like this that offer a network server with access to these Apple models. The danger is when they expose that, even on a loop port, to every other application on your system, including the browser. Random webpages are now shipping with JavaScript that will post to that port. Same-origin restrictions will stop data flow back to the webpage, but that doesn’t stop them from issuing commands to make changes.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
stingraycharles
I don’t think many browsers will allow posting to 127.0.0.1 from a random website. What’s the threat model here?
brians
They offer it as an option but default it to false! This is still a --footgun option but it’s the least unsafe version I’ve seen yet! Well done, Apfel authors.
robotswantdata
Keep seeing similar mistakes with vibe coded AI & MCP projects. Even experienced engineers seem oblivious to this attack vector
snarkyturtle
Noting that there's an option to require a Bearer token to the API
Oras
I like the idea and the clarity to explain the usage, my question would be: what kind of tasks it would be useful for?
khalic
Making a sentence out of a json
convexly
I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant. The amount of articles that come out about accidents happening because of people handing too much context to cloud models the more self reinforcing this will become.
aswanson
That's the way things have to go. Business risk is too high having everything ran over exposed networks.
lukewarm707
local is best for privacy, but i personally think you don't need to go local.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
ge96
The other thing, is encrypted inferencing a thing/service currently? I want to run my own models locally just because if I'm going to be chatting to it about my day to day life why send it to a server in plaintext.
cousin_it
It's only half of the solution though. If the models are trained in a closed way, they can prioritize values encoded during training even if that's not what you want (example: ask the open Chinese models about Tiananmen). It's not beyond imagining that these models would e.g. try to send your data to authorities or advertisers when their training says so, even if you run them locally.
So the full solution would be models trained in an open verifiable way and running locally.
hombre_fatal
Another angle is when you're passing untrusted content to the AI service, e.g. anything from using it to crawl websites to spam-detection on new forum user posts.
You can trigger the the service's ToS violation or worse, get tipped off to law enforcement for something you didn't even write.
arendtio
For those who don't know, 'Apfel' is the German word for Apple.
gherkinnn
And for those who did know that and want to know more, the shift from apple - apfel and water -> wasser happened during the High German consonant shift.
Just a small thing about the website: your examples shift all the elements below it on mobile when changing, making it jump randomly when trying to read.
gherkinnn
Now this is a development I like.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
naravara
Yeah I don’t think the models are meaningfully differentiated outside of very specific edge cases. I suspect this was the thinking behind OpenAI and Facebook and all trying to lean hard into presenting their chatbots as friends and romantic partners. If they can’t maintain a technical moat they can try to cultivate an emotional one.
m-s-y
A serious project would do the work to be delivered via the native homebrew repository, not a “selfhosted” one.
brtkwr
Isn't the whole idea of "home brew" to enable hackers and enthusiasts to easily share what they built?
post-it
Is this you signing up as a packager or
nose-wuzzy-pad
Does the local LLM have access to personal information from the Apple account associated with the logged-in user? Maybe through a RAG pipeline or similar? Just curious if there are any risks associated with exposing this in a way that could be exploited via CORS or through another rogue app querying it locally.
franze
no. the on device foundationmodels framework that apfel uses does not have access to personal information from the apple account. the model is a bare language model with no built in personal data access.
apple does have an on device rag pipeline called the semantic index that feeds personal data like contacts emails calendar and photos into the model context but this is only available to apples own first party features like siri and system summaries.
it is not exposed through the foundationmodels api.
phplovesong
This is pretty cool. My bet is that we have more LLMs running locally when its possible, either thru "better hardware as default" or some new tech that can run the models on commodity hardware (like apple silicon / equivalent PC setup).
brtkwr
[dead]
alwinaugustin
Read Austria as Australia and thought this as an April fool
nottorp
> Starting with macOS 26 (Tahoe), every Apple Silicon Mac includes a language model as part of Apple Intelligence.
So you have to put up with the low contrast buggy UI to use that.
mattkevan
As an experiment I built a prototype chatbot app that uses the built-in LLM. It’s got a small context window, but is surprisingly capable and has tool-calling support. Without too much effort I was able to get it to fetch weather data, fetch and summarise emails, read and write reminders and calendar events.
joriskok1
How much storage does it take up?
franze
4mb download, after install about 15mb, model is already on your mac with mac os x tahoe
volume_tech
[dead]
Barbing
Just discovered iOS shortcuts has a native action called “use model” that lets you use local, Apple cloud, or ChatGPT— before that I would have agreed with the author about being locked behind Siri (natively)
api
BoltAI also does this, but a CLI tool is nice.
It’s a nice LLM because it seems fairly decent and it loads instantly and uses the CPU neural engine. The GPU is faster but when I run bigger LLMs on the GPU the normally very cool M series Mac becomes a lap roaster.
It’s a small LLM though. Seems decent but it’s also been safety trained to a somewhat comical degree. It will balk over safety at requests that are in fact quite banal.
pbronez
Digging into this, found Apple’s release notes for the Foundation Model Service
Is this for Tahoe only? I’m still clutching onto Sequoia
linsomniac
Yes, it says on that page that it uses Apple Intelligence from Tahoe. I'm also hanging onto Sequoia, though I'm ready to make the leap any time here.
anentropic
Yeah seems to need Tahoe (I'm on Sequoia):
dyld[71398]: Library not loaded: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
Reason: tried: '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file, not in dyld cache)
kangraemin
4,096 token context window is pretty limiting. That's roughly 3,000 words — fine for "summarize this paragraph" but not enough for anything that needs real context. Still, zero cost and fully local is hard to beat for quick throwaway tasks. Does it handle streaming or is it request-response only?
xandrius
Try it and see
rbbydotdev
Would really love to see a web api standard for on device llms. This could get us closer. Some in-browser language model usage could be very powerful. In the interim maybe a little protocol spec + a discovery protocol used with browser plugins, web apps could detect and interface with on-device llms making it universally available.
Unfortunately, I found the small context window makes the utility pretty limited.
troyvit
Yeah I think you hit on the head a good way to use it though. I'm not on MacOS but KDE has a little tool called krunner[1] that lets you perform simple tasks from a small pop-up on your desktop. It would be cool if I could do slightly agentic things from there with a local model like ask what the capital of Austria is, or what's the current exchange rate between two currencies.
I have been using Apple’s built-in system LLM model for the last 7 or 8 months. I like the feature that if it needs to, it occasionally uses a more powerful secure private cloud model. I also write my own app to wrap it.
donmb
Local AIs are the future in times of limited resources. This could be the beginning of something big. I like that Apple opens up like this. Hopefully more to come.
enjoyitasus
completely agree.
Phemist
Nice! The example should imo say
apfel -o json "Translate to German: apple" | jq .content
Multiplayer
Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of frontier and open-source models on a fairly heavy data analysis workflow I’d been running in the cloud.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
animanoir
[dead]
sys_64738
Tahoe+ only
EddieLomax
This is similar to something I was playing around with last month-- basically just a CLI for accessing the foundational models.
It's really handy for quick things like "what's the capital of country x" but for coding, I feel that it is severely limited. With such a small context it's (currently) not great for complicated things.
aiiaro
[dead]
divan
What's the easiest way to use it with on-device voice model for voice chat?
How does this model compare against other local models like Qwen run through LMStudio?
frontsideair
> Apple locked it behind Siri. apfel sets it free
This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.
> Runs on Neural Engine
Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
reaperducer
This doesn't feel…
Also unsure…
Thank you for sharing your feelings and uncertainty.
Perhaps resist the urge to post until you have something to contribute.
reaperducer
[dead]
xp84
Real experience I've had:
"Text Carol bring me a glass of water please"
"I'm sorry, I don't see a 'Carol Bring' in your contacts"
furyofantares
Looks like a nice wrapper around the APIs. Extremely oversold landing page, very marketing heavy for what it is. You can actually make nice looking landing pages that are about 10% the size of this and more straightforward, rather than some mimicry of a SaaS that's trying desperately to sell you something. Makes it easier for you to review the content for factuality too, and heck you couldn't even take ownership of some of the voice.
Hard to know what to do with this. I'm interested in the project and know others who would be, but I feel like shit after being slopped on by a landing page and I don't wish to slop on my friends by sharing it with them. I suppose the github link is indeed significantly better, I'll share that.
xandrius
It's absolutely free and open source, no need to bash it like this.
devcraft_ai
[dead]
lewisjoe
Tempted to write a grammarly-like underline engine that flags writing mistakes across all apps and browser. Fully private grammarly alternative without even bundling an LLM!
malshe
That's a great idea. I would be very interested in using it of someone builds it.
gurjeet
Thank you for making it open source!
Submitted a PR to prevent its installation on macos versions older than Tahoe(26), since I was able to install it on my older macos 15, but it aborted on execution.
I'm a Linux user who wanted exactly this but for Linux — so I ended up building it myself. It's called TalkType, it runs Whisper locally for offline speech-to-text. The privacy angle was a big reason I went local from the start — I didn't want my voice being sent to anyone's server. Nice to see the same idea getting traction on Mac.
yalogin
This is great. A few questions come to mind, I need to go look up. Is the model an OpenAI one or home grown for Apple. And can I still use it if Siri is disabled?
contingencies
On a similar bent, I recently discovered Handy (cross-platform) which is very well implemented local voice input: https://handy.computer/ ... serious finger saver and ideal for LLM conversations
rgbrgb
love the simple website and typography. AI design or you? tasteful and fast animations. nice work and thanks for sharing!
deadfox
This is cool!
karimf
The big question is whether Apple can keep shipping new models constantly.
AFAIK the current model is on par with with Qwen-3-4B, which is from a year ago [0]. There's a big leap going from last year Qwen-3-4B to Qwen-3.5-4B or to Gemma 4.
Apple model is nice since you don't need to download anything else, but I'd rather use the latest model than to use a model from a year ago.
> Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
This actually looks really neat. I'll have to bookmark this for whenever I'm dragged kicking and screaming into the abomination that is "Tahoe."
witnessme
Interesting. How does this foundational model compares with other LLMs?
contingencies
1. Hugely non-deterministic: repeat queries give vastly different responses. 2. Often returns incorrect and inconsistent results even for mathematical queries. 3. Often the responses include unwanted highlighting or presentation markup. 4. Defaults to German decimal notation.
Notes.app handles big notebooks without choking on storage?
Remi_Etien
[dead]
p1anecrazy
Really like demo cli tools description. Are they limited by the context window as well? What’s your experience with log file sizes?
franze
the 2 hard limits of Appel Intelligence Foundation Model and therefor apfel is the 4k token context window and the super hard guardrails (the model prefers to tell you nothing before it tells you something wrong ie ask it to describe a color)
parsing logfiles line by line, sure
parsing a whole logfile, well it must be tiny, logfile hardly ever are
khalic
AFM models are very impressive, but they’re not made for conversation, so keep your expectations down in chat mode.
elcritch
Any know if these only installed on Tahoe? I'm running Sequoia still and get an error about model not found.
HelloUsername
> Apple Silicon Mac, macOS 26 Tahoe or newer, Apple Intelligence enabled
swiftcoder
Anyone tried using this as a sub-agent for a more capable model like Claude/Codex?
LatencyKills
The combined (input/output) context window length is 4K. Claude would blow through that even when trying to read and summarize a small file.
franze
project started with
trying to run openclaw with it in ultra token saving mode, did totally not work.
great for shell scripts though (my major use case now)
khalic
If you’re looking into small models for tiny local tasks, you should try Qwen coder 0,5B. It’s more of an experiment, but it can output decent functions given the right context instructions.
coredog64
I was thinking about the other way: Could you use this in front of Claude to summarize inputs and so reduce your token counts?
gigatexal
It’s a very small model but I’ve been playing with it for some time now I’m impressed. Have we been sleeping on Apple’s models?
Imagine they baked Qwen 3.5 level stuff into the OS. Wow that’d be cool.
thenthenthen
The vision models and OCR are SUPER
bombcar
Apparently the Overcast guy build a beowulf cluster of Mac minis to use the Apple transcription service.
I’ve seen several projects like this that offer a network server with access to these Apple models. The danger is when they expose that, even on a loop port, to every other application on your system, including the browser. Random webpages are now shipping with JavaScript that will post to that port. Same-origin restrictions will stop data flow back to the webpage, but that doesn’t stop them from issuing commands to make changes.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
stingraycharles
I don’t think many browsers will allow posting to 127.0.0.1 from a random website. What’s the threat model here?
brians
They offer it as an option but default it to false! This is still a --footgun option but it’s the least unsafe version I’ve seen yet! Well done, Apfel authors.
robotswantdata
Keep seeing similar mistakes with vibe coded AI & MCP projects. Even experienced engineers seem oblivious to this attack vector
snarkyturtle
Noting that there's an option to require a Bearer token to the API
Oras
I like the idea and the clarity to explain the usage, my question would be: what kind of tasks it would be useful for?
khalic
Making a sentence out of a json
convexly
I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant. The amount of articles that come out about accidents happening because of people handing too much context to cloud models the more self reinforcing this will become.
aswanson
That's the way things have to go. Business risk is too high having everything ran over exposed networks.
lukewarm707
local is best for privacy, but i personally think you don't need to go local.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
ge96
The other thing, is encrypted inferencing a thing/service currently? I want to run my own models locally just because if I'm going to be chatting to it about my day to day life why send it to a server in plaintext.
cousin_it
It's only half of the solution though. If the models are trained in a closed way, they can prioritize values encoded during training even if that's not what you want (example: ask the open Chinese models about Tiananmen). It's not beyond imagining that these models would e.g. try to send your data to authorities or advertisers when their training says so, even if you run them locally.
So the full solution would be models trained in an open verifiable way and running locally.
hombre_fatal
Another angle is when you're passing untrusted content to the AI service, e.g. anything from using it to crawl websites to spam-detection on new forum user posts.
You can trigger the the service's ToS violation or worse, get tipped off to law enforcement for something you didn't even write.
arendtio
For those who don't know, 'Apfel' is the German word for Apple.
gherkinnn
And for those who did know that and want to know more, the shift from apple - apfel and water -> wasser happened during the High German consonant shift.
Just a small thing about the website: your examples shift all the elements below it on mobile when changing, making it jump randomly when trying to read.
gherkinnn
Now this is a development I like.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
naravara
Yeah I don’t think the models are meaningfully differentiated outside of very specific edge cases. I suspect this was the thinking behind OpenAI and Facebook and all trying to lean hard into presenting their chatbots as friends and romantic partners. If they can’t maintain a technical moat they can try to cultivate an emotional one.
m-s-y
A serious project would do the work to be delivered via the native homebrew repository, not a “selfhosted” one.
brtkwr
Isn't the whole idea of "home brew" to enable hackers and enthusiasts to easily share what they built?
post-it
Is this you signing up as a packager or
nose-wuzzy-pad
Does the local LLM have access to personal information from the Apple account associated with the logged-in user? Maybe through a RAG pipeline or similar? Just curious if there are any risks associated with exposing this in a way that could be exploited via CORS or through another rogue app querying it locally.
franze
no. the on device foundationmodels framework that apfel uses does not have access to personal information from the apple account. the model is a bare language model with no built in personal data access.
apple does have an on device rag pipeline called the semantic index that feeds personal data like contacts emails calendar and photos into the model context but this is only available to apples own first party features like siri and system summaries.
it is not exposed through the foundationmodels api.
phplovesong
This is pretty cool. My bet is that we have more LLMs running locally when its possible, either thru "better hardware as default" or some new tech that can run the models on commodity hardware (like apple silicon / equivalent PC setup).
brtkwr
[dead]
alwinaugustin
Read Austria as Australia and thought this as an April fool
nottorp
> Starting with macOS 26 (Tahoe), every Apple Silicon Mac includes a language model as part of Apple Intelligence.
So you have to put up with the low contrast buggy UI to use that.
mattkevan
As an experiment I built a prototype chatbot app that uses the built-in LLM. It’s got a small context window, but is surprisingly capable and has tool-calling support. Without too much effort I was able to get it to fetch weather data, fetch and summarise emails, read and write reminders and calendar events.
joriskok1
How much storage does it take up?
franze
4mb download, after install about 15mb, model is already on your mac with mac os x tahoe
volume_tech
[dead]
Barbing
Just discovered iOS shortcuts has a native action called “use model” that lets you use local, Apple cloud, or ChatGPT— before that I would have agreed with the author about being locked behind Siri (natively)
api
BoltAI also does this, but a CLI tool is nice.
It’s a nice LLM because it seems fairly decent and it loads instantly and uses the CPU neural engine. The GPU is faster but when I run bigger LLMs on the GPU the normally very cool M series Mac becomes a lap roaster.
It’s a small LLM though. Seems decent but it’s also been safety trained to a somewhat comical degree. It will balk over safety at requests that are in fact quite banal.
pbronez
Digging into this, found Apple’s release notes for the Foundation Model Service
Is this for Tahoe only? I’m still clutching onto Sequoia
linsomniac
Yes, it says on that page that it uses Apple Intelligence from Tahoe. I'm also hanging onto Sequoia, though I'm ready to make the leap any time here.
anentropic
Yeah seems to need Tahoe (I'm on Sequoia):
dyld[71398]: Library not loaded: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
Reason: tried: '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file, not in dyld cache)
kangraemin
4,096 token context window is pretty limiting. That's roughly 3,000 words — fine for "summarize this paragraph" but not enough for anything that needs real context. Still, zero cost and fully local is hard to beat for quick throwaway tasks. Does it handle streaming or is it request-response only?
xandrius
Try it and see
rbbydotdev
Would really love to see a web api standard for on device llms. This could get us closer. Some in-browser language model usage could be very powerful. In the interim maybe a little protocol spec + a discovery protocol used with browser plugins, web apps could detect and interface with on-device llms making it universally available.
Unfortunately, I found the small context window makes the utility pretty limited.
troyvit
Yeah I think you hit on the head a good way to use it though. I'm not on MacOS but KDE has a little tool called krunner[1] that lets you perform simple tasks from a small pop-up on your desktop. It would be cool if I could do slightly agentic things from there with a local model like ask what the capital of Austria is, or what's the current exchange rate between two currencies.
I have been using Apple’s built-in system LLM model for the last 7 or 8 months. I like the feature that if it needs to, it occasionally uses a more powerful secure private cloud model. I also write my own app to wrap it.
donmb
Local AIs are the future in times of limited resources. This could be the beginning of something big. I like that Apple opens up like this. Hopefully more to come.
enjoyitasus
completely agree.
Phemist
Nice! The example should imo say
apfel -o json "Translate to German: apple" | jq .content
Multiplayer
Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of frontier and open-source models on a fairly heavy data analysis workflow I’d been running in the cloud.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
animanoir
[dead]
sys_64738
Tahoe+ only
EddieLomax
This is similar to something I was playing around with last month-- basically just a CLI for accessing the foundational models.
It's really handy for quick things like "what's the capital of country x" but for coding, I feel that it is severely limited. With such a small context it's (currently) not great for complicated things.
aiiaro
[dead]
divan
What's the easiest way to use it with on-device voice model for voice chat?
How does this model compare against other local models like Qwen run through LMStudio?
frontsideair
> Apple locked it behind Siri. apfel sets it free
This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.
> Runs on Neural Engine
Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
reaperducer
This doesn't feel…
Also unsure…
Thank you for sharing your feelings and uncertainty.
Perhaps resist the urge to post until you have something to contribute.
reaperducer
[dead]
xp84
Real experience I've had:
"Text Carol bring me a glass of water please"
"I'm sorry, I don't see a 'Carol Bring' in your contacts"
furyofantares
Looks like a nice wrapper around the APIs. Extremely oversold landing page, very marketing heavy for what it is. You can actually make nice looking landing pages that are about 10% the size of this and more straightforward, rather than some mimicry of a SaaS that's trying desperately to sell you something. Makes it easier for you to review the content for factuality too, and heck you couldn't even take ownership of some of the voice.
Hard to know what to do with this. I'm interested in the project and know others who would be, but I feel like shit after being slopped on by a landing page and I don't wish to slop on my friends by sharing it with them. I suppose the github link is indeed significantly better, I'll share that.
xandrius
It's absolutely free and open source, no need to bash it like this.
devcraft_ai
[dead]
lewisjoe
Tempted to write a grammarly-like underline engine that flags writing mistakes across all apps and browser. Fully private grammarly alternative without even bundling an LLM!
malshe
That's a great idea. I would be very interested in using it of someone builds it.
gurjeet
Thank you for making it open source!
Submitted a PR to prevent its installation on macos versions older than Tahoe(26), since I was able to install it on my older macos 15, but it aborted on execution.
I'm a Linux user who wanted exactly this but for Linux — so I ended up building it myself. It's called TalkType, it runs Whisper locally for offline speech-to-text. The privacy angle was a big reason I went local from the start — I didn't want my voice being sent to anyone's server. Nice to see the same idea getting traction on Mac.
yalogin
This is great. A few questions come to mind, I need to go look up. Is the model an OpenAI one or home grown for Apple. And can I still use it if Siri is disabled?
contingencies
On a similar bent, I recently discovered Handy (cross-platform) which is very well implemented local voice input: https://handy.computer/ ... serious finger saver and ideal for LLM conversations
rgbrgb
love the simple website and typography. AI design or you? tasteful and fast animations. nice work and thanks for sharing!
deadfox
This is cool!
karimf
The big question is whether Apple can keep shipping new models constantly.
AFAIK the current model is on par with with Qwen-3-4B, which is from a year ago [0]. There's a big leap going from last year Qwen-3-4B to Qwen-3.5-4B or to Gemma 4.
Apple model is nice since you don't need to download anything else, but I'd rather use the latest model than to use a model from a year ago.
> Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
This actually looks really neat. I'll have to bookmark this for whenever I'm dragged kicking and screaming into the abomination that is "Tahoe."
witnessme
Interesting. How does this foundational model compares with other LLMs?
contingencies
1. Hugely non-deterministic: repeat queries give vastly different responses. 2. Often returns incorrect and inconsistent results even for mathematical queries. 3. Often the responses include unwanted highlighting or presentation markup. 4. Defaults to German decimal notation.
Github: https://github.com/Arthur-Ficial/apfel
Notes.app handles big notebooks without choking on storage?
[dead]
Really like demo cli tools description. Are they limited by the context window as well? What’s your experience with log file sizes?
the 2 hard limits of Appel Intelligence Foundation Model and therefor apfel is the 4k token context window and the super hard guardrails (the model prefers to tell you nothing before it tells you something wrong ie ask it to describe a color)
parsing logfiles line by line, sure
parsing a whole logfile, well it must be tiny, logfile hardly ever are
AFM models are very impressive, but they’re not made for conversation, so keep your expectations down in chat mode.
Any know if these only installed on Tahoe? I'm running Sequoia still and get an error about model not found.
> Apple Silicon Mac, macOS 26 Tahoe or newer, Apple Intelligence enabled
Anyone tried using this as a sub-agent for a more capable model like Claude/Codex?
The combined (input/output) context window length is 4K. Claude would blow through that even when trying to read and summarize a small file.
project started with
trying to run openclaw with it in ultra token saving mode, did totally not work.
great for shell scripts though (my major use case now)
If you’re looking into small models for tiny local tasks, you should try Qwen coder 0,5B. It’s more of an experiment, but it can output decent functions given the right context instructions.
I was thinking about the other way: Could you use this in front of Claude to summarize inputs and so reduce your token counts?
It’s a very small model but I’ve been playing with it for some time now I’m impressed. Have we been sleeping on Apple’s models?
Imagine they baked Qwen 3.5 level stuff into the OS. Wow that’d be cool.
The vision models and OCR are SUPER
Apparently the Overcast guy build a beowulf cluster of Mac minis to use the Apple transcription service.
https://www.linkedin.com/posts/nathangathright_marco-arment-...
For small tasks this seems perfect. However it being limited to English from what I can tell is quite a downsite for me.
Cool tool but I don't get why these websites make idiotic claims
> $0 cost
No kidding.
Why not just link the GH Github: https://github.com/Arthur-Ficial/apfel
He did?
https://news.ycombinator.com/item?id=47624647
I’ve seen several projects like this that offer a network server with access to these Apple models. The danger is when they expose that, even on a loop port, to every other application on your system, including the browser. Random webpages are now shipping with JavaScript that will post to that port. Same-origin restrictions will stop data flow back to the webpage, but that doesn’t stop them from issuing commands to make changes.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
I don’t think many browsers will allow posting to 127.0.0.1 from a random website. What’s the threat model here?
They offer it as an option but default it to false! This is still a --footgun option but it’s the least unsafe version I’ve seen yet! Well done, Apfel authors.
Keep seeing similar mistakes with vibe coded AI & MCP projects. Even experienced engineers seem oblivious to this attack vector
Noting that there's an option to require a Bearer token to the API
I like the idea and the clarity to explain the usage, my question would be: what kind of tasks it would be useful for?
Making a sentence out of a json
I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant. The amount of articles that come out about accidents happening because of people handing too much context to cloud models the more self reinforcing this will become.
That's the way things have to go. Business risk is too high having everything ran over exposed networks.
local is best for privacy, but i personally think you don't need to go local.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
The other thing, is encrypted inferencing a thing/service currently? I want to run my own models locally just because if I'm going to be chatting to it about my day to day life why send it to a server in plaintext.
It's only half of the solution though. If the models are trained in a closed way, they can prioritize values encoded during training even if that's not what you want (example: ask the open Chinese models about Tiananmen). It's not beyond imagining that these models would e.g. try to send your data to authorities or advertisers when their training says so, even if you run them locally.
So the full solution would be models trained in an open verifiable way and running locally.
Another angle is when you're passing untrusted content to the AI service, e.g. anything from using it to crawl websites to spam-detection on new forum user posts.
You can trigger the the service's ToS violation or worse, get tipped off to law enforcement for something you didn't even write.
For those who don't know, 'Apfel' is the German word for Apple.
And for those who did know that and want to know more, the shift from apple - apfel and water -> wasser happened during the High German consonant shift.
https://en.wikipedia.org/wiki/High_German_consonant_shift
[dead]
Just a small thing about the website: your examples shift all the elements below it on mobile when changing, making it jump randomly when trying to read.
Now this is a development I like.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
Yeah I don’t think the models are meaningfully differentiated outside of very specific edge cases. I suspect this was the thinking behind OpenAI and Facebook and all trying to lean hard into presenting their chatbots as friends and romantic partners. If they can’t maintain a technical moat they can try to cultivate an emotional one.
A serious project would do the work to be delivered via the native homebrew repository, not a “selfhosted” one.
Isn't the whole idea of "home brew" to enable hackers and enthusiasts to easily share what they built?
Is this you signing up as a packager or
Does the local LLM have access to personal information from the Apple account associated with the logged-in user? Maybe through a RAG pipeline or similar? Just curious if there are any risks associated with exposing this in a way that could be exploited via CORS or through another rogue app querying it locally.
no. the on device foundationmodels framework that apfel uses does not have access to personal information from the apple account. the model is a bare language model with no built in personal data access.
apple does have an on device rag pipeline called the semantic index that feeds personal data like contacts emails calendar and photos into the model context but this is only available to apples own first party features like siri and system summaries.
it is not exposed through the foundationmodels api.
This is pretty cool. My bet is that we have more LLMs running locally when its possible, either thru "better hardware as default" or some new tech that can run the models on commodity hardware (like apple silicon / equivalent PC setup).
[dead]
Read Austria as Australia and thought this as an April fool
> Starting with macOS 26 (Tahoe), every Apple Silicon Mac includes a language model as part of Apple Intelligence.
So you have to put up with the low contrast buggy UI to use that.
As an experiment I built a prototype chatbot app that uses the built-in LLM. It’s got a small context window, but is surprisingly capable and has tool-calling support. Without too much effort I was able to get it to fetch weather data, fetch and summarise emails, read and write reminders and calendar events.
How much storage does it take up?
4mb download, after install about 15mb, model is already on your mac with mac os x tahoe
[dead]
Just discovered iOS shortcuts has a native action called “use model” that lets you use local, Apple cloud, or ChatGPT— before that I would have agreed with the author about being locked behind Siri (natively)
BoltAI also does this, but a CLI tool is nice.
It’s a nice LLM because it seems fairly decent and it loads instantly and uses the CPU neural engine. The GPU is faster but when I run bigger LLMs on the GPU the normally very cool M series Mac becomes a lap roaster.
It’s a small LLM though. Seems decent but it’s also been safety trained to a somewhat comical degree. It will balk over safety at requests that are in fact quite banal.
Digging into this, found Apple’s release notes for the Foundation Model Service
https://developer.apple.com/documentation/Updates/Foundation...
They released an official python SDK in March 2026:
https://github.com/apple/python-apple-fm-sdk
[flagged]
Is this for Tahoe only? I’m still clutching onto Sequoia
Yes, it says on that page that it uses Apple Intelligence from Tahoe. I'm also hanging onto Sequoia, though I'm ready to make the leap any time here.
Yeah seems to need Tahoe (I'm on Sequoia):
4,096 token context window is pretty limiting. That's roughly 3,000 words — fine for "summarize this paragraph" but not enough for anything that needs real context. Still, zero cost and fully local is hard to beat for quick throwaway tasks. Does it handle streaming or is it request-response only?
Try it and see
Would really love to see a web api standard for on device llms. This could get us closer. Some in-browser language model usage could be very powerful. In the interim maybe a little protocol spec + a discovery protocol used with browser plugins, web apps could detect and interface with on-device llms making it universally available.
https://webmachinelearning.github.io/prompt-api/
Already in Chrome as an origin trial: https://developer.chrome.com/docs/ai/prompt-api
You have to enable Apple Intelligence so that's a hard no from me. I'll stick to LM Studio and gpt-oss/qwen. Very cool project though.
I hacked this together last fall to let you use Apple Foundation Models with llm: https://github.com/btucker/llm-apple . To enable that I built python bindings with Claude Code: https://github.com/btucker/apple-foundation-models-py
Unfortunately, I found the small context window makes the utility pretty limited.
Yeah I think you hit on the head a good way to use it though. I'm not on MacOS but KDE has a little tool called krunner[1] that lets you perform simple tasks from a small pop-up on your desktop. It would be cool if I could do slightly agentic things from there with a local model like ask what the capital of Austria is, or what's the current exchange rate between two currencies.
Then save the heavy lifting for the big boys.
[1] https://userbase.kde.org/Plasma/Krunner
I have been using Apple’s built-in system LLM model for the last 7 or 8 months. I like the feature that if it needs to, it occasionally uses a more powerful secure private cloud model. I also write my own app to wrap it.
Local AIs are the future in times of limited resources. This could be the beginning of something big. I like that Apple opens up like this. Hopefully more to come.
completely agree.
Nice! The example should imo say
apfel -o json "Translate to German: apple" | jq .content
Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of frontier and open-source models on a fairly heavy data analysis workflow I’d been running in the cloud.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
[dead]
Tahoe+ only
This is similar to something I was playing around with last month-- basically just a CLI for accessing the foundational models.
https://github.com/ehamiter/afm
It's really handy for quick things like "what's the capital of country x" but for coding, I feel that it is severely limited. With such a small context it's (currently) not great for complicated things.
[dead]
What's the easiest way to use it with on-device voice model for voice chat?
https://github.com/Arthur-Ficial/apfel-gui uses on-device speech-to-text and text-to-speech
https://handy.computer
How does this model compare against other local models like Qwen run through LMStudio?
> Apple locked it behind Siri. apfel sets it free
This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.
> Runs on Neural Engine
Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
This doesn't feel…
Also unsure…
Thank you for sharing your feelings and uncertainty.
Perhaps resist the urge to post until you have something to contribute.
[dead]
Real experience I've had:
"Text Carol bring me a glass of water please"
"I'm sorry, I don't see a 'Carol Bring' in your contacts"
Looks like a nice wrapper around the APIs. Extremely oversold landing page, very marketing heavy for what it is. You can actually make nice looking landing pages that are about 10% the size of this and more straightforward, rather than some mimicry of a SaaS that's trying desperately to sell you something. Makes it easier for you to review the content for factuality too, and heck you couldn't even take ownership of some of the voice.
Hard to know what to do with this. I'm interested in the project and know others who would be, but I feel like shit after being slopped on by a landing page and I don't wish to slop on my friends by sharing it with them. I suppose the github link is indeed significantly better, I'll share that.
It's absolutely free and open source, no need to bash it like this.
[dead]
Tempted to write a grammarly-like underline engine that flags writing mistakes across all apps and browser. Fully private grammarly alternative without even bundling an LLM!
That's a great idea. I would be very interested in using it of someone builds it.
Thank you for making it open source!
Submitted a PR to prevent its installation on macos versions older than Tahoe(26), since I was able to install it on my older macos 15, but it aborted on execution.
https://github.com/Arthur-Ficial/homebrew-tap/pull/1
[dead]
I'm a Linux user who wanted exactly this but for Linux — so I ended up building it myself. It's called TalkType, it runs Whisper locally for offline speech-to-text. The privacy angle was a big reason I went local from the start — I didn't want my voice being sent to anyone's server. Nice to see the same idea getting traction on Mac.
This is great. A few questions come to mind, I need to go look up. Is the model an OpenAI one or home grown for Apple. And can I still use it if Siri is disabled?
On a similar bent, I recently discovered Handy (cross-platform) which is very well implemented local voice input: https://handy.computer/ ... serious finger saver and ideal for LLM conversations
love the simple website and typography. AI design or you? tasteful and fast animations. nice work and thanks for sharing!
This is cool!
The big question is whether Apple can keep shipping new models constantly.
AFAIK the current model is on par with with Qwen-3-4B, which is from a year ago [0]. There's a big leap going from last year Qwen-3-4B to Qwen-3.5-4B or to Gemma 4.
Apple model is nice since you don't need to download anything else, but I'd rather use the latest model than to use a model from a year ago.
https://machinelearning.apple.com/research/apple-foundation-...
> Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
This actually looks really neat. I'll have to bookmark this for whenever I'm dragged kicking and screaming into the abomination that is "Tahoe."
Interesting. How does this foundational model compares with other LLMs?
1. Hugely non-deterministic: repeat queries give vastly different responses. 2. Often returns incorrect and inconsistent results even for mathematical queries. 3. Often the responses include unwanted highlighting or presentation markup. 4. Defaults to German decimal notation.
Github: https://github.com/Arthur-Ficial/apfel
Notes.app handles big notebooks without choking on storage?
[dead]
Really like demo cli tools description. Are they limited by the context window as well? What’s your experience with log file sizes?
the 2 hard limits of Appel Intelligence Foundation Model and therefor apfel is the 4k token context window and the super hard guardrails (the model prefers to tell you nothing before it tells you something wrong ie ask it to describe a color)
parsing logfiles line by line, sure
parsing a whole logfile, well it must be tiny, logfile hardly ever are
AFM models are very impressive, but they’re not made for conversation, so keep your expectations down in chat mode.
Any know if these only installed on Tahoe? I'm running Sequoia still and get an error about model not found.
> Apple Silicon Mac, macOS 26 Tahoe or newer, Apple Intelligence enabled
Anyone tried using this as a sub-agent for a more capable model like Claude/Codex?
The combined (input/output) context window length is 4K. Claude would blow through that even when trying to read and summarize a small file.
project started with
trying to run openclaw with it in ultra token saving mode, did totally not work.
great for shell scripts though (my major use case now)
If you’re looking into small models for tiny local tasks, you should try Qwen coder 0,5B. It’s more of an experiment, but it can output decent functions given the right context instructions.
I was thinking about the other way: Could you use this in front of Claude to summarize inputs and so reduce your token counts?
It’s a very small model but I’ve been playing with it for some time now I’m impressed. Have we been sleeping on Apple’s models?
Imagine they baked Qwen 3.5 level stuff into the OS. Wow that’d be cool.
The vision models and OCR are SUPER
Apparently the Overcast guy build a beowulf cluster of Mac minis to use the Apple transcription service.
https://www.linkedin.com/posts/nathangathright_marco-arment-...
For small tasks this seems perfect. However it being limited to English from what I can tell is quite a downsite for me.
Cool tool but I don't get why these websites make idiotic claims
> $0 cost
No kidding.
Why not just link the GH Github: https://github.com/Arthur-Ficial/apfel
He did?
https://news.ycombinator.com/item?id=47624647
I’ve seen several projects like this that offer a network server with access to these Apple models. The danger is when they expose that, even on a loop port, to every other application on your system, including the browser. Random webpages are now shipping with JavaScript that will post to that port. Same-origin restrictions will stop data flow back to the webpage, but that doesn’t stop them from issuing commands to make changes.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
I don’t think many browsers will allow posting to 127.0.0.1 from a random website. What’s the threat model here?
They offer it as an option but default it to false! This is still a --footgun option but it’s the least unsafe version I’ve seen yet! Well done, Apfel authors.
Keep seeing similar mistakes with vibe coded AI & MCP projects. Even experienced engineers seem oblivious to this attack vector
Noting that there's an option to require a Bearer token to the API
I like the idea and the clarity to explain the usage, my question would be: what kind of tasks it would be useful for?
Making a sentence out of a json
I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant. The amount of articles that come out about accidents happening because of people handing too much context to cloud models the more self reinforcing this will become.
That's the way things have to go. Business risk is too high having everything ran over exposed networks.
local is best for privacy, but i personally think you don't need to go local.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
The other thing, is encrypted inferencing a thing/service currently? I want to run my own models locally just because if I'm going to be chatting to it about my day to day life why send it to a server in plaintext.
It's only half of the solution though. If the models are trained in a closed way, they can prioritize values encoded during training even if that's not what you want (example: ask the open Chinese models about Tiananmen). It's not beyond imagining that these models would e.g. try to send your data to authorities or advertisers when their training says so, even if you run them locally.
So the full solution would be models trained in an open verifiable way and running locally.
Another angle is when you're passing untrusted content to the AI service, e.g. anything from using it to crawl websites to spam-detection on new forum user posts.
You can trigger the the service's ToS violation or worse, get tipped off to law enforcement for something you didn't even write.
For those who don't know, 'Apfel' is the German word for Apple.
And for those who did know that and want to know more, the shift from apple - apfel and water -> wasser happened during the High German consonant shift.
https://en.wikipedia.org/wiki/High_German_consonant_shift
[dead]
Just a small thing about the website: your examples shift all the elements below it on mobile when changing, making it jump randomly when trying to read.
Now this is a development I like.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
Yeah I don’t think the models are meaningfully differentiated outside of very specific edge cases. I suspect this was the thinking behind OpenAI and Facebook and all trying to lean hard into presenting their chatbots as friends and romantic partners. If they can’t maintain a technical moat they can try to cultivate an emotional one.
A serious project would do the work to be delivered via the native homebrew repository, not a “selfhosted” one.
Isn't the whole idea of "home brew" to enable hackers and enthusiasts to easily share what they built?
Is this you signing up as a packager or
Does the local LLM have access to personal information from the Apple account associated with the logged-in user? Maybe through a RAG pipeline or similar? Just curious if there are any risks associated with exposing this in a way that could be exploited via CORS or through another rogue app querying it locally.
no. the on device foundationmodels framework that apfel uses does not have access to personal information from the apple account. the model is a bare language model with no built in personal data access.
apple does have an on device rag pipeline called the semantic index that feeds personal data like contacts emails calendar and photos into the model context but this is only available to apples own first party features like siri and system summaries.
it is not exposed through the foundationmodels api.
This is pretty cool. My bet is that we have more LLMs running locally when its possible, either thru "better hardware as default" or some new tech that can run the models on commodity hardware (like apple silicon / equivalent PC setup).
[dead]
Read Austria as Australia and thought this as an April fool
> Starting with macOS 26 (Tahoe), every Apple Silicon Mac includes a language model as part of Apple Intelligence.
So you have to put up with the low contrast buggy UI to use that.
As an experiment I built a prototype chatbot app that uses the built-in LLM. It’s got a small context window, but is surprisingly capable and has tool-calling support. Without too much effort I was able to get it to fetch weather data, fetch and summarise emails, read and write reminders and calendar events.
How much storage does it take up?
4mb download, after install about 15mb, model is already on your mac with mac os x tahoe
[dead]
Just discovered iOS shortcuts has a native action called “use model” that lets you use local, Apple cloud, or ChatGPT— before that I would have agreed with the author about being locked behind Siri (natively)
BoltAI also does this, but a CLI tool is nice.
It’s a nice LLM because it seems fairly decent and it loads instantly and uses the CPU neural engine. The GPU is faster but when I run bigger LLMs on the GPU the normally very cool M series Mac becomes a lap roaster.
It’s a small LLM though. Seems decent but it’s also been safety trained to a somewhat comical degree. It will balk over safety at requests that are in fact quite banal.
Digging into this, found Apple’s release notes for the Foundation Model Service
https://developer.apple.com/documentation/Updates/Foundation...
They released an official python SDK in March 2026:
https://github.com/apple/python-apple-fm-sdk
[flagged]
Is this for Tahoe only? I’m still clutching onto Sequoia
Yes, it says on that page that it uses Apple Intelligence from Tahoe. I'm also hanging onto Sequoia, though I'm ready to make the leap any time here.
Yeah seems to need Tahoe (I'm on Sequoia):
4,096 token context window is pretty limiting. That's roughly 3,000 words — fine for "summarize this paragraph" but not enough for anything that needs real context. Still, zero cost and fully local is hard to beat for quick throwaway tasks. Does it handle streaming or is it request-response only?
Try it and see
Would really love to see a web api standard for on device llms. This could get us closer. Some in-browser language model usage could be very powerful. In the interim maybe a little protocol spec + a discovery protocol used with browser plugins, web apps could detect and interface with on-device llms making it universally available.
https://webmachinelearning.github.io/prompt-api/
Already in Chrome as an origin trial: https://developer.chrome.com/docs/ai/prompt-api
You have to enable Apple Intelligence so that's a hard no from me. I'll stick to LM Studio and gpt-oss/qwen. Very cool project though.
I hacked this together last fall to let you use Apple Foundation Models with llm: https://github.com/btucker/llm-apple . To enable that I built python bindings with Claude Code: https://github.com/btucker/apple-foundation-models-py
Unfortunately, I found the small context window makes the utility pretty limited.
Yeah I think you hit on the head a good way to use it though. I'm not on MacOS but KDE has a little tool called krunner[1] that lets you perform simple tasks from a small pop-up on your desktop. It would be cool if I could do slightly agentic things from there with a local model like ask what the capital of Austria is, or what's the current exchange rate between two currencies.
Then save the heavy lifting for the big boys.
[1] https://userbase.kde.org/Plasma/Krunner
I have been using Apple’s built-in system LLM model for the last 7 or 8 months. I like the feature that if it needs to, it occasionally uses a more powerful secure private cloud model. I also write my own app to wrap it.
Local AIs are the future in times of limited resources. This could be the beginning of something big. I like that Apple opens up like this. Hopefully more to come.
completely agree.
Nice! The example should imo say
apfel -o json "Translate to German: apple" | jq .content
Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of frontier and open-source models on a fairly heavy data analysis workflow I’d been running in the cloud.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
[dead]
Tahoe+ only
This is similar to something I was playing around with last month-- basically just a CLI for accessing the foundational models.
https://github.com/ehamiter/afm
It's really handy for quick things like "what's the capital of country x" but for coding, I feel that it is severely limited. With such a small context it's (currently) not great for complicated things.
[dead]
What's the easiest way to use it with on-device voice model for voice chat?
https://github.com/Arthur-Ficial/apfel-gui uses on-device speech-to-text and text-to-speech
https://handy.computer
How does this model compare against other local models like Qwen run through LMStudio?
> Apple locked it behind Siri. apfel sets it free
This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.
> Runs on Neural Engine
Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
This doesn't feel…
Also unsure…
Thank you for sharing your feelings and uncertainty.
Perhaps resist the urge to post until you have something to contribute.
[dead]
Real experience I've had:
"Text Carol bring me a glass of water please"
"I'm sorry, I don't see a 'Carol Bring' in your contacts"
Looks like a nice wrapper around the APIs. Extremely oversold landing page, very marketing heavy for what it is. You can actually make nice looking landing pages that are about 10% the size of this and more straightforward, rather than some mimicry of a SaaS that's trying desperately to sell you something. Makes it easier for you to review the content for factuality too, and heck you couldn't even take ownership of some of the voice.
Hard to know what to do with this. I'm interested in the project and know others who would be, but I feel like shit after being slopped on by a landing page and I don't wish to slop on my friends by sharing it with them. I suppose the github link is indeed significantly better, I'll share that.
It's absolutely free and open source, no need to bash it like this.
[dead]
Tempted to write a grammarly-like underline engine that flags writing mistakes across all apps and browser. Fully private grammarly alternative without even bundling an LLM!
That's a great idea. I would be very interested in using it of someone builds it.
Thank you for making it open source!
Submitted a PR to prevent its installation on macos versions older than Tahoe(26), since I was able to install it on my older macos 15, but it aborted on execution.
https://github.com/Arthur-Ficial/homebrew-tap/pull/1
[dead]
I'm a Linux user who wanted exactly this but for Linux — so I ended up building it myself. It's called TalkType, it runs Whisper locally for offline speech-to-text. The privacy angle was a big reason I went local from the start — I didn't want my voice being sent to anyone's server. Nice to see the same idea getting traction on Mac.
This is great. A few questions come to mind, I need to go look up. Is the model an OpenAI one or home grown for Apple. And can I still use it if Siri is disabled?
On a similar bent, I recently discovered Handy (cross-platform) which is very well implemented local voice input: https://handy.computer/ ... serious finger saver and ideal for LLM conversations
love the simple website and typography. AI design or you? tasteful and fast animations. nice work and thanks for sharing!
This is cool!
The big question is whether Apple can keep shipping new models constantly.
AFAIK the current model is on par with with Qwen-3-4B, which is from a year ago [0]. There's a big leap going from last year Qwen-3-4B to Qwen-3.5-4B or to Gemma 4.
Apple model is nice since you don't need to download anything else, but I'd rather use the latest model than to use a model from a year ago.
https://machinelearning.apple.com/research/apple-foundation-...
> Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
This actually looks really neat. I'll have to bookmark this for whenever I'm dragged kicking and screaming into the abomination that is "Tahoe."
Interesting. How does this foundational model compares with other LLMs?
1. Hugely non-deterministic: repeat queries give vastly different responses. 2. Often returns incorrect and inconsistent results even for mathematical queries. 3. Often the responses include unwanted highlighting or presentation markup. 4. Defaults to German decimal notation.