Skip to main content

Testing Local AI Models on Mobile: Gemma 3, PHI-4, and DeepSeek

·2 mins·
Hands-On AI Mobile Gemma3 PHI-4 DeepSeek
Table of Contents

I recently did a quick, casual test of several AI models on my Samsung Tab S9 to see how practical running local AI inference directly on mobile devices could be — especially focusing on Gemma 3’s new performance enhancements. I followed Google’s official guide for setting up Gemma 3 with MediaPipe LLM inference.

Test Setup
#

  • Device: Samsung Tab S9 (8GB RAM, Snapdragon® 8 Gen 2)
  • Environment: MediaPipe LLM Inference
  • Models:
    • Gemma 3 (1B) CPU & GPU
    • PHI-4-mini-instruct (CPU)
    • DeepSeek-R1-Distill-Qwen-1.5B (CPU)

Performance Results
#

Tokens Per Second
#

  • Gemma 3 GPU: 52.4 tokens/sec
  • Gemma 3 CPU: 42.8 tokens/sec
  • DeepSeek-R1 CPU: 15.7 tokens/sec
  • PHI-4-mini CPU: 4.5 tokens/sec

Time to Generate a 150-word Prompt
#

  • Gemma 3 GPU: 3.3 sec
  • Gemma 3 CPU: 5.4 sec
  • DeepSeek-R1 CPU: 34.3 sec
  • PHI-4-mini CPU: 45.8 sec

Reasoning Test: “All but Three”
#

To briefly evaluate logical reasoning, I asked the models:

Solve the following problem and explain your reasoning step by step: A farmer has 10 sheep. All but three run away. How many sheep does he have left?

The correct answer is 3, but surprisingly, every model answered incorrectly:

  • Gemma 3 (CPU & GPU): answered 7
  • PHI-4-mini and DeepSeek-R1: answered 7

Although these are small models, it’s still funny to see them miss the linguistic nuance in this simple question.

Final Thoughts
#

Testing these local AI models on a mobile device highlights just how powerful Gemma 3 is, especially with GPU acceleration. Local inference on mobile is practical and effective, but this casual experiment also shows that smaller models still have clear limitations, particularly in nuanced logical reasoning.

David Chang
Author
David Chang