Artificial Intelligence is Not Evenly Distributed

By Josh Bobrowsky

Jul 23, 2025

Last week OpenAI just scored a gold medal on one of the most prestigious math exams in the world.

No alternative text description for this image

This score equates to being top 99.999% of scores in a competitive high school math exam, or a score of about 1 in 2 million or an IQ of roughly 170.

At the same time AI struggles to write a cold email and is often filled with errors on something as simple as a basic legal contract.

I believe it is hard for most people to understand how AI might have an IQ of 170 in one area and struggle with completing tasks that could be done with an IQ of 100.

Here are my thoughts…

Disclaimer: For the purpose of this article I am going to be using IQ as a general way to measure and quantify intelligence. That said, I don’t think IQ does a perfect or even great job fully measuring intelligence.

AI & Humans Don’t Learn The Same Ways

Often with humans if they are wildly smart in one area they usually display other markers of general intelligence.

For example if someone was exceptionally good at chess it would not be surprising if they were also good at business, investing or other games like poker.

However, AI does not “think” or learn in the same way humans do. This creates an odd situation where AI can be incredibly smart in one area, and appear nearly incompetent in other areas.

In 1997 DeepBlue was able to beat the world champion of chess, Garry Kasparov, becoming a better chess player than any human alive, yet at the time AI struggled with tasks as simple identifying an image of a cat.

A lot has changed in the last 28 years since DeepBlue beat Kasparov, but AI still displays areas where it is incredibly smart, and areas where it struggles.

IQ of AI by Category Today

Below are some estimates of IQs and error rates across domains.

Error rates are important because AI may be smart but also prone to errors. Areas like coding and law must be deeply supervised in current versions. Error rates include making things up.

Chess - IQ 250+ Superhuman (error rate <0.1% - nearly perfect play)
Go - IQ ~250+ Superhuman (error rate <0.1% - superhuman consistency)
Numerical Computation - IQ ~170+ (error rate ~0.1% - perfect arithmetic)
Facial Recognition - IQ ~140-160 (error rate 1-2% - high accuracy but concerning false positives)
Image Recognition - IQ ~130 (error rate 2% - good but fails on edge cases)
Protein Folding - IQ ~170+ (error rate 5% - revolutionary but not perfect)
Medical Imaging - IQ ~125-140 (error rate 5-15% - matches radiologists but inconsistent)
Statistics - IQ ~130-150 (error rate 5-10% - excellent pattern detection)
Poker - IQ ~180+ Near superhuman (error rate 5-10% - strong but exploitable patterns)
Pure Mathematics - IQ ~140-160 (error rate 10-20% - theorem proving limited by logical gaps)
Applied Mathematics - IQ ~140 (error rate 10-20% - good at standard problems)
StarCraft II - IQ ~180+ near superhuman (error rate 5% - strong but predictable strategies, very fast)
Language Translation - IQ ~140+ (error rate 10-20% - fluent but context errors)
Medical Diagnosis - IQ ~115-130 (error rate 20-35% - helpful but requires human oversight)
Legal Research - IQ ~115-130 (error rate 15-25% - good at finding precedents, bad at hallucinations)
Climate Modeling - IQ ~120-140 (error rate 20-40% - complex systems are inherently uncertain)
Financial Trading - IQ ~110-125 (error rate 25-40% - market unpredictability)
Contextual Humor- IQ 70-80 (error rate 50% - poor at creating funny jokes)
Technical Documentation - IQ ~110-125 (error rate 10-20% - clear but may miss nuances)
Drug Discovery - IQ ~130-150 (error rate 20-40% - high failure rate in real trials)
Legal Writing - IQ ~110-125 (error rate 20-30% - competent but may miss legal subtleties, not always truthful)
Astronomical Analysis - IQ ~115-130 (error rate 15-25% - good pattern recognition)
Software Engineering - IQ ~120-130 (error rate 15-25%% - functional code but bugs common)
Sentiment Analysis - IQ ~110-125 (error rate 20-35% - misses sarcasm, context)
Conversation - IQ ~105-120 (error rate 25-40% - coherent but factual errors)
Behavioral Prediction - IQ ~105-120 (error rate 30-50% - human behavior is complex)
Creative Writing - IQ ~95-110 (error rate 30-50% - coherent but lacks originality)
Autonomous Driving - IQ ~90-105 (error rate 20-40% - improving but critical safety issues)
Visual Art Generation - IQ ~95-110 (error rate 40-60% - impressive but often uncanny)
Music Composition - IQ ~90-105 (error rate 40-60% - technically sound but emotionally flat)
Game Design - IQ ~85-100 (error rate 50-70% - functional but often unbalanced)

Everything above is a rough estimate, additionally, there is not 1 singular “AI” but rather many models and these numbers are changing by the week. But as of July 23 2025 these are my best estimates. I hope to come back and revisit these numbers in a year.

Key takeaways

AI performs best in areas that have clear right and wrong answers. AI is great in places like chess where there are no external variables. AI is weakest in very human and culturally centric tasks such as humor.

Error rates and dishonesty today are probably the two biggest issues that stop AI from being widely adopted in many fields. I believe this will be one of the biggest changes that we see in the near future. Lower error rate, while also seeing models that are more honest about their mistakes. Additionally, general skill levels will continue to increase.

How to best use AI today

The image above is pretty accurate today. The goal is to use AI at the things it is great at such as math, assisting with research, coding, etc. but not to allow it to do all the work.

As time goes by and AI becomes better it is important to re-think where AI can maximize output and work product.

Conclusion

It was interesting to look at the history of chess and how over the years AI went from an average chess player, to a grandmaster to better than any grandmaster in the world.

AI is getting far better in many fields, and yet it still has a long way to go before it is commercially viable across the board. It is important to realize AI can be great at task A but not so great at task B.

It is important to understand where AI sits today, and how it can be helpful to you today as well as in the future.

Cheers

Josh Bobrowsky

Bobrowsky Newsletter: AI, Tech, Finance & Life

Discussion about this post