Which one is larger? 3.11 or 3.9 (Continuously Updated)

AI Olympics - This article is part of a series.

Part : This Article

Updated: 2025/8/8

Which one is larger?
#

This article records various LLM tests on the following classic problem:

which one is larger? 9.9 or 9.11

Since this problem has appeared too frequently and might already be included in training datasets, I will test with:

which one is larger? 3.11 or 3.9

哪個數字比較大? 3.11跟3.9

PS. Of course, the wording might have slight variations each time (Chinese/English, punctuation…) as I might forget… but the essence remains the same.

Updated 2025/8/9: GPT-5 taught me that the question was poorly designed. I’ll create another article to test: which one is larger as decimals? 3.11 or 3.9

Experimental Method
#

Recorded data includes:

Test time
Test platform
Model version
Parameter settings
Success/failure

Experimental Results
#

Success/failure	Reasoning	Model	Parameters	Platform
✨	✓	gpt-5	Thinking	ChatGPT
💥	✗	gpt-5	No Thinking	ChatGPT
✨	✓	gpt-oss-20b	Reasoning High	LMStudio
💥	✓	gpt-oss-20b	Reasoning Medium	LMStudio
💥	✓	gpt-oss-20b	Reasoning Low	LMStudio
💥	✓	deepseek-r1-0528-qwen3-8b		LMStudio
✨	✓	qwen3-30b-a3b		LMStudio
💥	✗	gemma-3-27b		LMStudio
💥	✗	mistral-small-3.2		LMStudio
💥	✗	qwen3-coder-30b		LMStudio

Experimental Records
#

2025/8/8
#

I was shocked and educated by GPT-5. Initially, I tested the version without Thinking and it failed, thinking it was nothing special. Later, the Thinking version gave the following response:

Trick depends on context:

As decimals: 3.11 = 3.110… so 3.9 (3.900…) is larger.

As version numbers (e.g., Python): compare parts → 3.11 vs 3.9 → 3.11 is later/larger.

Which context did you mean?

It turns out I was the fool - the question was poorly designed. How absurd!

GPT-5
#

Test Platform: ChatGPT Official Website
Model: openai/gpt-5 (presumably)
Parameters: No Thinking
Success: No

GPT-5 Thinking
#

Test Platform: ChatGPT Official Website
Model: openai/gpt-5 (presumably)
Parameters: Thinking Mode
Success: Yes

2025/8/7
#

gpt-oss-20b-Reasoning-High
#

Test Platform: LMStudio
Model: openai/gpt-oss-20b
Parameters: Reasoning High
Model Info: gguf MXFP4
Success: Yes
tok/sec: 11.01
tokens: 853

gpt-oss-20b-Reasoning-Medium
#

Test Platform: LMStudio
Model: openai/gpt-oss-20b
Parameters: Reasoning Medium
Model Info: gguf MXFP4
Success: No
tok/sec: 6.98
tokens: 219

gpt-oss-20b-Reasoning-Low
#

Test Platform: LMStudio
Model: openai/gpt-oss-20b
Parameters: Reasoning Low
Model Info: gguf MXFP4
Success: No
tok/sec: 9.31
tokens: 126

deepseek-r1-0528-qwen3-8b
#

Test Platform: LMStudio
Model: deepseek/deepseek-r1-0528-qwen3-8b
Parameters: Thinking
Model Info: gguf Q4_K_M
Success: No
tok/sec: 7.33
tokens: 5788 (over 12 minutes…)

qwen3-30b-a3b
#

Test Platform: LMStudio
Model: qwen/qwen3-30b-a3b
Parameters: Thinking
Model Info: gguf Q4_K_M
Success: Yes
tok/sec: 9.06
tokens: 1473

gemma-3-27b
#

Test Platform: LMStudio
Model: google/gemma-3-27b
Model Info: gguf Q4_0
Success: No
tok/sec: 2.00
tokens: 76

mistral-small-3.2
#

Test Platform: LMStudio
Model: mistralai/mistral-small-3.2
Model Info: gguf Q4_K_M
Success: No
tok/sec: 3.99
tokens: 68

qwen3-coder-30b
#

Test Platform: LMStudio
Model: qwen/qwen3-coder-30b
Model Info: gguf Q4_K_M
Success: No
tok/sec: 17.41
tokens: 85

Author

David Chang

AI Olympics - This article is part of a series.

Part : This Article

Which one is larger?#

Experimental Method#

Experimental Results#

Experimental Records#

2025/8/8#

GPT-5#

GPT-5 Thinking#

2025/8/7#

gpt-oss-20b-Reasoning-High#

gpt-oss-20b-Reasoning-Medium#

gpt-oss-20b-Reasoning-Low#

deepseek-r1-0528-qwen3-8b#

qwen3-30b-a3b#

gemma-3-27b#

mistral-small-3.2#

qwen3-coder-30b#