Testing Local AI Models on Mobile: Gemma 3, PHI-4, and DeepSeek

Table of Contents

I recently did a quick, casual test of several AI models on my Samsung Tab S9 to see how practical running local AI inference directly on mobile devices could be — especially focusing on Gemma 3’s new performance enhancements. I followed Google’s official guide for setting up Gemma 3 with MediaPipe LLM inference.

Test Setup
#

Device: Samsung Tab S9 (8GB RAM, Snapdragon® 8 Gen 2)
Environment: MediaPipe LLM Inference
Models:
- Gemma 3 (1B) CPU & GPU
- PHI-4-mini-instruct (CPU)
- DeepSeek-R1-Distill-Qwen-1.5B (CPU)

Performance Results
#

Tokens Per Second
#

Gemma 3 GPU: 52.4 tokens/sec
Gemma 3 CPU: 42.8 tokens/sec
DeepSeek-R1 CPU: 15.7 tokens/sec
PHI-4-mini CPU: 4.5 tokens/sec

Time to Generate a 150-word Prompt
#

Gemma 3 GPU: 3.3 sec
Gemma 3 CPU: 5.4 sec
DeepSeek-R1 CPU: 34.3 sec
PHI-4-mini CPU: 45.8 sec

Reasoning Test: “All but Three”
#

To briefly evaluate logical reasoning, I asked the models:

Solve the following problem and explain your reasoning step by step: A farmer has 10 sheep. All but three run away. How many sheep does he have left?

The correct answer is 3, but surprisingly, every model answered incorrectly:

Gemma 3 (CPU & GPU): answered 7
PHI-4-mini and DeepSeek-R1: answered 7

Although these are small models, it’s still funny to see them miss the linguistic nuance in this simple question.

Final Thoughts
#

Testing these local AI models on a mobile device highlights just how powerful Gemma 3 is, especially with GPU acceleration. Local inference on mobile is practical and effective, but this casual experiment also shows that smaller models still have clear limitations, particularly in nuanced logical reasoning.

Author

David Chang

Test Setup#

Performance Results#

Tokens Per Second#

Time to Generate a 150-word Prompt#

Reasoning Test: “All but Three”#

Final Thoughts#