I love to see real debugging instead of conspiracy theories!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
djmips
IKR - this is very typical
the_arun
Is it a clickbait? Can’t it do math or run llm?
bri3d
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
ploum
Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
lxgr
LLMs are applied math, so… both?
Buttons840
I clicked hoping this would be about how old graphing calculators are generally better math companions than phone.
The best way to do math on my phone I know of is the HP Prime emulator.
VorpalWay
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
xp84
I was pretty delighted to realize I could now delete the lame Calculator.app from my iPhone and replace it with something of my choice. For now I've settled on NumWorks, which is apparently an emulator of a modern upstart physical graphing calc that has made some inroads into schools. And of course, you can make a Control Center button to launch an app, so that's what I did.
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
xoa
My personal favorite is iHP48 (previously I used m48+ before it died) running an HP 48GX with metakernal installed as I used through college. Still just so intuitive and fast to me.
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
bri3d
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
every other Apple device delivered the same results
Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
lionkor
I could not parse most of this article. So, it's all vibe coded and you're blaming the silicon?
bri3d
> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.
Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.
But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.
decimalenough
My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.
Whether you should do this on device is another story entirely.
netsharc
I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.
PlatoIsADisease
You don't buy Apple products because of the quality, you buy it because its more expensive than the value of it. Its a demonstration of wealth. This is called Veblen good, and a phenomena called out as early as Thomas Hobbes.
What you need to do is carry 2 phones. A phone that does the job, and a phone for style.
I didn't invent the laws of nature, I just follow them.
ohyoutravel
This is a conclusion that comes with some personal baggage you should identify and consider addressing.
dghlsakjg
I severely doubt your thesis around iPhones being Veblen goods.
You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Correspondingly, you are arguing that if they increased prices they could increase sales?
You are claiming that 100s of millions of people have all made the decision that the price of an iPhone is more than it is worth to them as a device, but is made up for by being seen with one in your hand?
Not all goods that signify status are Veblen goods.
jwrallie
Can you prove that is still the case with the iPhone SE by showing a comparable hardware with similar long support on software updates and lower price?
B1FF_PSUVM
> Its a demonstration of wealth. This is called Veblen good
Just the other day I was reminded of the poor little "I am rich" iOS app (a thousand dollar ruby icon that performed diddly squat by design), which Apple deep-sixed from the app store PDQ.
If misery loves company, Veblen goods sure don't.
giancarlostoro
I closed the article when I read the first sentence. LLMs are not the first thing I grab when I want to do math, they're maybe the second tool.
Playboi_Carti
It's not about LLMs doing math.
bri3d
If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.
dummydummy1234
Uhh, that's not the article, the article is running a ml model, on phone and floating point opps for tensor multiplication seems to be off.
_kulang
Maybe this is why my damn keyboard predictive text is so gloriously broken
sen
Oh it's not just me?
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
taneq
It’s gotten so bad that I’m half convinced it’s either (a) deliberately trolling, or (b) ‘optimising’ for speech to text adoption.
tehwebguy
Here’s one that kills me:
- Tightening some bolts, listening to something via airpods
- Spec tells me torque in Nm
- Torque wrench is in ft lbs
- “Hey Siri, what’s X newton meters in foot pounds?”
- “Here’s some fucking website: ”
refulgentis
.
bri3d
Can you read the article a little more closely?
> - MiniMax can't fit on an iPhone.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
They asked Minimax to use MLX instead. It didn't work.
They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
johngossman
Posting some code that reproduces the bug could help not only Apple but you and others.
csmantle
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
fatheranton
[dead]
JimboOmega
(This is a total digression, so apologies)
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
CrispinS
> What's moon plus sun?
Eclipse, obviously.
idk1
As an aside, one of my very nice family members like tarot card reading, and I think you'd get an extremely different answer for - "What's moon plus sun?" - something like I would guess as they're opposites - "Mixed signals or insecurity get resolved by openness and real communication." - It's kind of fascinating, the range of answers to that question. As a couple of other people have mentioned, it could mean loads of things. I thought I'd add one in there.
I wish he would have tried on a different iPhone 16 Pro Max to see if the defect was specific to that individual device.
crossroadsguy
So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
jajuuka
Latest update at the bottom of the page.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
ernsheong
Have you heard of the Calculator app?
Metacelsus
>"What is 2+2?" apparently "Applied.....*_dAK[...]" according to my iPhone
At least the machine didn't say it was seven!
tolciho
Maybe Trurl and Klapaucius were put in charge of Q&A.
RiceNBananas
[flagged]
dav43
My thousand dollar iPhone can't even add a contact from a business card.
ftyghome
I also would like to see if the same error happens in another phone with the exactly same model.
nickorlow
I'd think other neural-engine using apps would also have weird behavior. Would've been interesting to try a few App Store apps and see the weird behavior
mungoman2
Good article. Would have liked to see them create a minimal test case, to conclusively show that the results of math operations are actually incorrect.
docfort
Interesting post, but the last bit of logic pointing to the Neural Engine for MLX doesn’t hold up. MLX supports running on CPU, Apple GPU via Metal, and NVIDIA GPU via CUDA: https://github.com/ml-explore/mlx/tree/main/mlx/backend
z3t4
neural nets or AI are very bad at math, it can only produce what's in the training data. So if you have trained it from 1+1 to 8+8 it can't do 9+9, it's not like a child brain that it can make logical conclusions.
swyx
> Update on Feb. 1st:
> Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective.
nothing to see here.
tgma
The author is assuming Metal is compiled to ANE in MLX. MLX is by-and-large GPU-based and not utilizing ANE, barring some community hacks.
builderhq_io
[dead]
thinkbud
So the LLM is working as intended?
watt
Does it bother anyone else that the author drops "MiniMax" there in the article without bothering to explain or footnote what that is? (I could look it up, but I think article authors should call out these things).
embedding-shape
There are tons of terms that aren't explained that some people (like me) might not understand. I think it's fine that some articles have a particular audience in mind and write specifically for those, in this case, it seems it's for "Apple mobile developers who make LLM inference engines" so not so unexpected there are terms I (and others) don't understand.
JCharante
I think articles are worse when they have to explain everything someone off the street might not know.
Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)
Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
zozbot234
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
syntaxing
Kinda sucks how it seems like there’s no CI that runs on hardware.
MarginalGainz
[dead]
ryeguy_24
What expense app are you building? I really want an app that helps me categorize transactions for budgeting purposes. Any recommendations?
bri3d
I love to see real debugging instead of conspiracy theories!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
djmips
IKR - this is very typical
the_arun
Is it a clickbait? Can’t it do math or run llm?
bri3d
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
ploum
Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
lxgr
LLMs are applied math, so… both?
Buttons840
I clicked hoping this would be about how old graphing calculators are generally better math companions than phone.
The best way to do math on my phone I know of is the HP Prime emulator.
VorpalWay
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
xp84
I was pretty delighted to realize I could now delete the lame Calculator.app from my iPhone and replace it with something of my choice. For now I've settled on NumWorks, which is apparently an emulator of a modern upstart physical graphing calc that has made some inroads into schools. And of course, you can make a Control Center button to launch an app, so that's what I did.
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
xoa
My personal favorite is iHP48 (previously I used m48+ before it died) running an HP 48GX with metakernal installed as I used through college. Still just so intuitive and fast to me.
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
bri3d
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
every other Apple device delivered the same results
Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
lionkor
I could not parse most of this article. So, it's all vibe coded and you're blaming the silicon?
bri3d
> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.
Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.
But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.
decimalenough
My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.
Whether you should do this on device is another story entirely.
netsharc
I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.
PlatoIsADisease
You don't buy Apple products because of the quality, you buy it because its more expensive than the value of it. Its a demonstration of wealth. This is called Veblen good, and a phenomena called out as early as Thomas Hobbes.
What you need to do is carry 2 phones. A phone that does the job, and a phone for style.
I didn't invent the laws of nature, I just follow them.
ohyoutravel
This is a conclusion that comes with some personal baggage you should identify and consider addressing.
dghlsakjg
I severely doubt your thesis around iPhones being Veblen goods.
You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Correspondingly, you are arguing that if they increased prices they could increase sales?
You are claiming that 100s of millions of people have all made the decision that the price of an iPhone is more than it is worth to them as a device, but is made up for by being seen with one in your hand?
Not all goods that signify status are Veblen goods.
jwrallie
Can you prove that is still the case with the iPhone SE by showing a comparable hardware with similar long support on software updates and lower price?
B1FF_PSUVM
> Its a demonstration of wealth. This is called Veblen good
Just the other day I was reminded of the poor little "I am rich" iOS app (a thousand dollar ruby icon that performed diddly squat by design), which Apple deep-sixed from the app store PDQ.
If misery loves company, Veblen goods sure don't.
giancarlostoro
I closed the article when I read the first sentence. LLMs are not the first thing I grab when I want to do math, they're maybe the second tool.
Playboi_Carti
It's not about LLMs doing math.
bri3d
If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.
dummydummy1234
Uhh, that's not the article, the article is running a ml model, on phone and floating point opps for tensor multiplication seems to be off.
_kulang
Maybe this is why my damn keyboard predictive text is so gloriously broken
sen
Oh it's not just me?
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
taneq
It’s gotten so bad that I’m half convinced it’s either (a) deliberately trolling, or (b) ‘optimising’ for speech to text adoption.
tehwebguy
Here’s one that kills me:
- Tightening some bolts, listening to something via airpods
- Spec tells me torque in Nm
- Torque wrench is in ft lbs
- “Hey Siri, what’s X newton meters in foot pounds?”
- “Here’s some fucking website: ”
refulgentis
.
bri3d
Can you read the article a little more closely?
> - MiniMax can't fit on an iPhone.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
They asked Minimax to use MLX instead. It didn't work.
They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
johngossman
Posting some code that reproduces the bug could help not only Apple but you and others.
csmantle
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
fatheranton
[dead]
JimboOmega
(This is a total digression, so apologies)
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
CrispinS
> What's moon plus sun?
Eclipse, obviously.
idk1
As an aside, one of my very nice family members like tarot card reading, and I think you'd get an extremely different answer for - "What's moon plus sun?" - something like I would guess as they're opposites - "Mixed signals or insecurity get resolved by openness and real communication." - It's kind of fascinating, the range of answers to that question. As a couple of other people have mentioned, it could mean loads of things. I thought I'd add one in there.
I wish he would have tried on a different iPhone 16 Pro Max to see if the defect was specific to that individual device.
crossroadsguy
So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
jajuuka
Latest update at the bottom of the page.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
ernsheong
Have you heard of the Calculator app?
Metacelsus
>"What is 2+2?" apparently "Applied.....*_dAK[...]" according to my iPhone
At least the machine didn't say it was seven!
tolciho
Maybe Trurl and Klapaucius were put in charge of Q&A.
RiceNBananas
[flagged]
dav43
My thousand dollar iPhone can't even add a contact from a business card.
ftyghome
I also would like to see if the same error happens in another phone with the exactly same model.
nickorlow
I'd think other neural-engine using apps would also have weird behavior. Would've been interesting to try a few App Store apps and see the weird behavior
mungoman2
Good article. Would have liked to see them create a minimal test case, to conclusively show that the results of math operations are actually incorrect.
docfort
Interesting post, but the last bit of logic pointing to the Neural Engine for MLX doesn’t hold up. MLX supports running on CPU, Apple GPU via Metal, and NVIDIA GPU via CUDA: https://github.com/ml-explore/mlx/tree/main/mlx/backend
z3t4
neural nets or AI are very bad at math, it can only produce what's in the training data. So if you have trained it from 1+1 to 8+8 it can't do 9+9, it's not like a child brain that it can make logical conclusions.
swyx
> Update on Feb. 1st:
> Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective.
nothing to see here.
tgma
The author is assuming Metal is compiled to ANE in MLX. MLX is by-and-large GPU-based and not utilizing ANE, barring some community hacks.
builderhq_io
[dead]
thinkbud
So the LLM is working as intended?
watt
Does it bother anyone else that the author drops "MiniMax" there in the article without bothering to explain or footnote what that is? (I could look it up, but I think article authors should call out these things).
embedding-shape
There are tons of terms that aren't explained that some people (like me) might not understand. I think it's fine that some articles have a particular audience in mind and write specifically for those, in this case, it seems it's for "Apple mobile developers who make LLM inference engines" so not so unexpected there are terms I (and others) don't understand.
JCharante
I think articles are worse when they have to explain everything someone off the street might not know.
Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)
Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
zozbot234
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
syntaxing
Kinda sucks how it seems like there’s no CI that runs on hardware.
MarginalGainz
[dead]
ryeguy_24
What expense app are you building? I really want an app that helps me categorize transactions for budgeting purposes. Any recommendations?
I love to see real debugging instead of conspiracy theories!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
IKR - this is very typical
Is it a clickbait? Can’t it do math or run llm?
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
LLMs are applied math, so… both?
I clicked hoping this would be about how old graphing calculators are generally better math companions than phone.
The best way to do math on my phone I know of is the HP Prime emulator.
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
I was pretty delighted to realize I could now delete the lame Calculator.app from my iPhone and replace it with something of my choice. For now I've settled on NumWorks, which is apparently an emulator of a modern upstart physical graphing calc that has made some inroads into schools. And of course, you can make a Control Center button to launch an app, so that's what I did.
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
My personal favorite is iHP48 (previously I used m48+ before it died) running an HP 48GX with metakernal installed as I used through college. Still just so intuitive and fast to me.
GraphNCalc83 is awesome [0].
[0] https://apps.apple.com/us/app/graphncalc83/id744882019
I use the "RealCalc" app on my phone. It's pretty similar to my old HP48.
Anytime I have to do some serious amount of math, I have to go dig around and find my TI-84, everything is just burned into muscle memory
PCalc -- because it runs on every Apple platform since the Mac Classic:
https://pcalc.com/mac/thirty.html
My other favorite calculator is free42, or its larger display version plus42
https://thomasokken.com/plus42/
For a CAS tool on a pocket mobile device, I haven't found anything better than MathStudio (formerly SpaceTime):
https://mathstud.io
You can run that in your web browser, but they maintain a mobile app version. It's like a self-hosted Wolfram Alpha.
Perfect conclusion: my expensive and rather new phone is broken by design, so I just buy an even newer and more expensive one from the same vendor.
This is a vibe coded slop app.
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
every other Apple device delivered the same results
Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
I could not parse most of this article. So, it's all vibe coded and you're blaming the silicon?
> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.
Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.
But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.
My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.
Whether you should do this on device is another story entirely.
I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.
You don't buy Apple products because of the quality, you buy it because its more expensive than the value of it. Its a demonstration of wealth. This is called Veblen good, and a phenomena called out as early as Thomas Hobbes.
What you need to do is carry 2 phones. A phone that does the job, and a phone for style.
I didn't invent the laws of nature, I just follow them.
This is a conclusion that comes with some personal baggage you should identify and consider addressing.
I severely doubt your thesis around iPhones being Veblen goods.
You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Correspondingly, you are arguing that if they increased prices they could increase sales?
You are claiming that 100s of millions of people have all made the decision that the price of an iPhone is more than it is worth to them as a device, but is made up for by being seen with one in your hand?
Not all goods that signify status are Veblen goods.
Can you prove that is still the case with the iPhone SE by showing a comparable hardware with similar long support on software updates and lower price?
> Its a demonstration of wealth. This is called Veblen good
Just the other day I was reminded of the poor little "I am rich" iOS app (a thousand dollar ruby icon that performed diddly squat by design), which Apple deep-sixed from the app store PDQ.
If misery loves company, Veblen goods sure don't.
I closed the article when I read the first sentence. LLMs are not the first thing I grab when I want to do math, they're maybe the second tool.
It's not about LLMs doing math.
If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.
Uhh, that's not the article, the article is running a ml model, on phone and floating point opps for tensor multiplication seems to be off.
Maybe this is why my damn keyboard predictive text is so gloriously broken
Oh it's not just me?
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
It’s gotten so bad that I’m half convinced it’s either (a) deliberately trolling, or (b) ‘optimising’ for speech to text adoption.
Here’s one that kills me:
- Tightening some bolts, listening to something via airpods
- Spec tells me torque in Nm
- Torque wrench is in ft lbs
- “Hey Siri, what’s X newton meters in foot pounds?”
- “Here’s some fucking website: ”
.
Can you read the article a little more closely?
> - MiniMax can't fit on an iPhone.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
They asked Minimax to use MLX instead. It didn't work.
They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
Posting some code that reproduces the bug could help not only Apple but you and others.
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
[dead]
(This is a total digression, so apologies)
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
> What's moon plus sun?
Eclipse, obviously.
As an aside, one of my very nice family members like tarot card reading, and I think you'd get an extremely different answer for - "What's moon plus sun?" - something like I would guess as they're opposites - "Mixed signals or insecurity get resolved by openness and real communication." - It's kind of fascinating, the range of answers to that question. As a couple of other people have mentioned, it could mean loads of things. I thought I'd add one in there.
I'll just add that if you think this advice applies to you, it's the - https://en.wikipedia.org/wiki/Barnum_effect
> What's moon plus sun?
"Monsoon," says ChatGPT.
I wish he would have tried on a different iPhone 16 Pro Max to see if the defect was specific to that individual device.
So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
Latest update at the bottom of the page.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
Have you heard of the Calculator app?
>"What is 2+2?" apparently "Applied.....*_dAK[...]" according to my iPhone
At least the machine didn't say it was seven!
Maybe Trurl and Klapaucius were put in charge of Q&A.
[flagged]
My thousand dollar iPhone can't even add a contact from a business card.
I also would like to see if the same error happens in another phone with the exactly same model.
I'd think other neural-engine using apps would also have weird behavior. Would've been interesting to try a few App Store apps and see the weird behavior
Good article. Would have liked to see them create a minimal test case, to conclusively show that the results of math operations are actually incorrect.
Interesting post, but the last bit of logic pointing to the Neural Engine for MLX doesn’t hold up. MLX supports running on CPU, Apple GPU via Metal, and NVIDIA GPU via CUDA: https://github.com/ml-explore/mlx/tree/main/mlx/backend
neural nets or AI are very bad at math, it can only produce what's in the training data. So if you have trained it from 1+1 to 8+8 it can't do 9+9, it's not like a child brain that it can make logical conclusions.
> Update on Feb. 1st: > Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective.
nothing to see here.
The author is assuming Metal is compiled to ANE in MLX. MLX is by-and-large GPU-based and not utilizing ANE, barring some community hacks.
[dead]
So the LLM is working as intended?
Does it bother anyone else that the author drops "MiniMax" there in the article without bothering to explain or footnote what that is? (I could look it up, but I think article authors should call out these things).
There are tons of terms that aren't explained that some people (like me) might not understand. I think it's fine that some articles have a particular audience in mind and write specifically for those, in this case, it seems it's for "Apple mobile developers who make LLM inference engines" so not so unexpected there are terms I (and others) don't understand.
I think articles are worse when they have to explain everything someone off the street might not know.
It is a bug in MLX that has been fixed a few days ago: https://github.com/ml-explore/mlx/pull/3083
Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)
Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
Kinda sucks how it seems like there’s no CI that runs on hardware.
[dead]
What expense app are you building? I really want an app that helps me categorize transactions for budgeting purposes. Any recommendations?
I love to see real debugging instead of conspiracy theories!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
IKR - this is very typical
Is it a clickbait? Can’t it do math or run llm?
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
LLMs are applied math, so… both?
I clicked hoping this would be about how old graphing calculators are generally better math companions than phone.
The best way to do math on my phone I know of is the HP Prime emulator.
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
I was pretty delighted to realize I could now delete the lame Calculator.app from my iPhone and replace it with something of my choice. For now I've settled on NumWorks, which is apparently an emulator of a modern upstart physical graphing calc that has made some inroads into schools. And of course, you can make a Control Center button to launch an app, so that's what I did.
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
My personal favorite is iHP48 (previously I used m48+ before it died) running an HP 48GX with metakernal installed as I used through college. Still just so intuitive and fast to me.
GraphNCalc83 is awesome [0].
[0] https://apps.apple.com/us/app/graphncalc83/id744882019
I use the "RealCalc" app on my phone. It's pretty similar to my old HP48.
Anytime I have to do some serious amount of math, I have to go dig around and find my TI-84, everything is just burned into muscle memory
PCalc -- because it runs on every Apple platform since the Mac Classic:
https://pcalc.com/mac/thirty.html
My other favorite calculator is free42, or its larger display version plus42
https://thomasokken.com/plus42/
For a CAS tool on a pocket mobile device, I haven't found anything better than MathStudio (formerly SpaceTime):
https://mathstud.io
You can run that in your web browser, but they maintain a mobile app version. It's like a self-hosted Wolfram Alpha.
Perfect conclusion: my expensive and rather new phone is broken by design, so I just buy an even newer and more expensive one from the same vendor.
This is a vibe coded slop app.
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
every other Apple device delivered the same results
Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
I could not parse most of this article. So, it's all vibe coded and you're blaming the silicon?
> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.
Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.
But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.
My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.
Whether you should do this on device is another story entirely.
I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.
You don't buy Apple products because of the quality, you buy it because its more expensive than the value of it. Its a demonstration of wealth. This is called Veblen good, and a phenomena called out as early as Thomas Hobbes.
What you need to do is carry 2 phones. A phone that does the job, and a phone for style.
I didn't invent the laws of nature, I just follow them.
This is a conclusion that comes with some personal baggage you should identify and consider addressing.
I severely doubt your thesis around iPhones being Veblen goods.
You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Correspondingly, you are arguing that if they increased prices they could increase sales?
You are claiming that 100s of millions of people have all made the decision that the price of an iPhone is more than it is worth to them as a device, but is made up for by being seen with one in your hand?
Not all goods that signify status are Veblen goods.
Can you prove that is still the case with the iPhone SE by showing a comparable hardware with similar long support on software updates and lower price?
> Its a demonstration of wealth. This is called Veblen good
Just the other day I was reminded of the poor little "I am rich" iOS app (a thousand dollar ruby icon that performed diddly squat by design), which Apple deep-sixed from the app store PDQ.
If misery loves company, Veblen goods sure don't.
I closed the article when I read the first sentence. LLMs are not the first thing I grab when I want to do math, they're maybe the second tool.
It's not about LLMs doing math.
If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.
Uhh, that's not the article, the article is running a ml model, on phone and floating point opps for tensor multiplication seems to be off.
Maybe this is why my damn keyboard predictive text is so gloriously broken
Oh it's not just me?
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
It’s gotten so bad that I’m half convinced it’s either (a) deliberately trolling, or (b) ‘optimising’ for speech to text adoption.
Here’s one that kills me:
- Tightening some bolts, listening to something via airpods
- Spec tells me torque in Nm
- Torque wrench is in ft lbs
- “Hey Siri, what’s X newton meters in foot pounds?”
- “Here’s some fucking website: ”
.
Can you read the article a little more closely?
> - MiniMax can't fit on an iPhone.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
They asked Minimax to use MLX instead. It didn't work.
They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
Posting some code that reproduces the bug could help not only Apple but you and others.
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
[dead]
(This is a total digression, so apologies)
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
> What's moon plus sun?
Eclipse, obviously.
As an aside, one of my very nice family members like tarot card reading, and I think you'd get an extremely different answer for - "What's moon plus sun?" - something like I would guess as they're opposites - "Mixed signals or insecurity get resolved by openness and real communication." - It's kind of fascinating, the range of answers to that question. As a couple of other people have mentioned, it could mean loads of things. I thought I'd add one in there.
I'll just add that if you think this advice applies to you, it's the - https://en.wikipedia.org/wiki/Barnum_effect
> What's moon plus sun?
"Monsoon," says ChatGPT.
I wish he would have tried on a different iPhone 16 Pro Max to see if the defect was specific to that individual device.
So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
Latest update at the bottom of the page.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
Have you heard of the Calculator app?
>"What is 2+2?" apparently "Applied.....*_dAK[...]" according to my iPhone
At least the machine didn't say it was seven!
Maybe Trurl and Klapaucius were put in charge of Q&A.
[flagged]
My thousand dollar iPhone can't even add a contact from a business card.
I also would like to see if the same error happens in another phone with the exactly same model.
I'd think other neural-engine using apps would also have weird behavior. Would've been interesting to try a few App Store apps and see the weird behavior
Good article. Would have liked to see them create a minimal test case, to conclusively show that the results of math operations are actually incorrect.
Interesting post, but the last bit of logic pointing to the Neural Engine for MLX doesn’t hold up. MLX supports running on CPU, Apple GPU via Metal, and NVIDIA GPU via CUDA: https://github.com/ml-explore/mlx/tree/main/mlx/backend
neural nets or AI are very bad at math, it can only produce what's in the training data. So if you have trained it from 1+1 to 8+8 it can't do 9+9, it's not like a child brain that it can make logical conclusions.
> Update on Feb. 1st: > Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective.
nothing to see here.
The author is assuming Metal is compiled to ANE in MLX. MLX is by-and-large GPU-based and not utilizing ANE, barring some community hacks.
[dead]
So the LLM is working as intended?
Does it bother anyone else that the author drops "MiniMax" there in the article without bothering to explain or footnote what that is? (I could look it up, but I think article authors should call out these things).
There are tons of terms that aren't explained that some people (like me) might not understand. I think it's fine that some articles have a particular audience in mind and write specifically for those, in this case, it seems it's for "Apple mobile developers who make LLM inference engines" so not so unexpected there are terms I (and others) don't understand.
I think articles are worse when they have to explain everything someone off the street might not know.
It is a bug in MLX that has been fixed a few days ago: https://github.com/ml-explore/mlx/pull/3083
Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)
Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
Kinda sucks how it seems like there’s no CI that runs on hardware.
[dead]
What expense app are you building? I really want an app that helps me categorize transactions for budgeting purposes. Any recommendations?