AI models trying to fix bugs do better in Apple's mobile platform than in Google's
Even Google's own Gemini model performed worse on Android, per a new study from a software company called Instabug.
Koldunova Anna / Shutterstock
- AI models fix app crashes better on iOS than Android, a new study finds.
- Even Google's own Gemini model performed worse on Android.
- Android's fragmented ecosystem and language variability may hinder AI model performance.
When your mobile app crashes, there's often a mad scramble to track down the software bug and fix it fast.
Now there's AI for that. But the technology works a lot better with Apple's iOS platform than Google's Android, according to a study released on Thursday.
A software company called Instabug built a tool called SmartResolve that uses leading AI models to automate the process of spotting app crashes, diagnosing what's wrong, and generating usable software code fixes.
They used models from OpenAI, Anthropic, Google, and Meta against a dataset of real-world app crashes. Each fix was scored on correctness, similarity to human fixes, depth of root-cause analysis, relevance, and overall coherence.
A big takeaway: AI models consistently perform better on iOS than on Android. Instabug found on Apple's platform, crash fixes were more accurate, coherent, and well-structured across nearly every model tested.
Even Google's AI model did worse Android
OpenAI's models, for example, delivered significantly better results on iOS. GPT-4o scored 60% on iOS versus 49% on Android. With OpenAI's o1 model, the difference was even more dramatic: It hit 62% on iOS but dropped to 26% on Android, often failing to respond entirely in Android tests.
Other models followed a similar pattern. Anthropic's Claude Sonnet 3.5 V1 scored 58% on iOS and 56% on Android — a smaller gap, but still an iOS lead.
Even Google's own Gemini 1.5 Pro performed worse on Android (51%) than on iOS (59%). Instabug found that it also faced more hallucination issues when using its larger context window.
Why does Android lag behind?
The discrepancy may come from Android's fragmented ecosystem. Compared to iOS, which offers a more uniform environment, Android's broader range of devices and crash types can make it harder for AI models to generalize fixes.
"The stronger performance on iOS is partially due to the structure of iOS native languages like Swift and Objective-C," Kenny Johnston, Instabug's chief product officer, said. "Their syntax is more predictable and strongly typed, which makes it easier for LLMs to generate accurate fixes."
Johnston said Android's languages — Java and Kotlin — plus crash format variability means higher complexity for fixes.
Apple and Google did not respond to Business Insider's requests for comment.