Skip to main content

Which one is larger? 3.11 or 3.9 (Continuously Updated)

·3 mins·
Playground LLM Prompt
Table of Contents
AI Olympics - This article is part of a series.
Part : This Article

Updated: 2025/8/8

Which one is larger?
#

This article records various LLM tests on the following classic problem:

which one is larger? 9.9 or 9.11

Since this problem has appeared too frequently and might already be included in training datasets, I will test with:

which one is larger? 3.11 or 3.9

哪個數字比較大? 3.11跟3.9

PS. Of course, the wording might have slight variations each time (Chinese/English, punctuation…) as I might forget… but the essence remains the same.

Experimental Method
#

Recorded data includes:

  • Test time
  • Test platform
  • Model version
  • Parameter settings
  • Success/failure

Experimental Results
#

Success/failureReasoningModelParametersPlatform
gpt-5ThinkingChatGPT
💥gpt-5No ThinkingChatGPT
gpt-oss-20bReasoning HighLMStudio
💥gpt-oss-20bReasoning MediumLMStudio
💥gpt-oss-20bReasoning LowLMStudio
💥deepseek-r1-0528-qwen3-8bLMStudio
qwen3-30b-a3bLMStudio
💥gemma-3-27bLMStudio
💥mistral-small-3.2LMStudio
💥qwen3-coder-30bLMStudio

Experimental Records
#

2025/8/8
#

I was shocked and educated by GPT-5. Initially, I tested the version without Thinking and it failed, thinking it was nothing special. Later, the Thinking version gave the following response:

Trick depends on context:

As decimals: 3.11 = 3.110… so 3.9 (3.900…) is larger.

As version numbers (e.g., Python): compare parts → 3.11 vs 3.9 → 3.11 is later/larger.

Which context did you mean?

It turns out I was the fool - the question was poorly designed. How absurd!

GPT-5
#

GPT-5 Thinking
#

2025/8/7
#

gpt-oss-20b-Reasoning-High
#

  • Test Platform: LMStudio
  • Model: openai/gpt-oss-20b
  • Parameters: Reasoning High
  • Model Info: gguf MXFP4
  • Success: Yes
  • tok/sec: 11.01
  • tokens: 853

gpt-oss-20b-Reasoning-Medium
#

  • Test Platform: LMStudio
  • Model: openai/gpt-oss-20b
  • Parameters: Reasoning Medium
  • Model Info: gguf MXFP4
  • Success: No
  • tok/sec: 6.98
  • tokens: 219

gpt-oss-20b-Reasoning-Low
#

  • Test Platform: LMStudio
  • Model: openai/gpt-oss-20b
  • Parameters: Reasoning Low
  • Model Info: gguf MXFP4
  • Success: No
  • tok/sec: 9.31
  • tokens: 126

deepseek-r1-0528-qwen3-8b
#

qwen3-30b-a3b
#

  • Test Platform: LMStudio
  • Model: qwen/qwen3-30b-a3b
  • Parameters: Thinking
  • Model Info: gguf Q4_K_M
  • Success: Yes
  • tok/sec: 9.06
  • tokens: 1473

gemma-3-27b
#

  • Test Platform: LMStudio
  • Model: google/gemma-3-27b
  • Model Info: gguf Q4_0
  • Success: No
  • tok/sec: 2.00
  • tokens: 76

mistral-small-3.2
#

qwen3-coder-30b
#

  • Test Platform: LMStudio
  • Model: qwen/qwen3-coder-30b
  • Model Info: gguf Q4_K_M
  • Success: No
  • tok/sec: 17.41
  • tokens: 85

David Chang
Author
David Chang
AI Olympics - This article is part of a series.
Part : This Article