Can Machines Really Be Intelligent?
In 2023, researchers administered a standard Wechsler-style verbal IQ test to GPT-4 and reported a score equivalent to roughly 155 -- placing it in the 99.9th percentile of the human population. Headlines declared that AI had surpassed most humans in intelligence. But had it?
The question "Can AI ever have an IQ?" sits at the intersection of computer science, philosophy, and psychology. It forces us to confront what intelligence actually is, whether a number can capture it, and whether a machine that produces correct answers is genuinely thinking or merely performing an extraordinarily sophisticated pattern match.
"The question of whether machines can think is about as relevant as the question of whether submarines can swim."
-- Edsger Dijkstra, computer science pioneer
This article examines how modern AI systems perform on human IQ tests, why those scores are both impressive and deeply misleading, and what the pursuit of artificial general intelligence (AGI) means for our understanding of the mind.
How AI Models Actually Score on IQ Tests
Several research teams have tested large language models (LLMs) on standardized IQ assessments. The results are striking -- and complicated.
AI Performance on IQ-Style Benchmarks
| AI Model | Test Type | Reported Score / Performance | Year |
|---|---|---|---|
| GPT-4 | Verbal IQ (WAIS-style) | ~155 (est. verbal IQ) | 2023 |
| GPT-4 | Raven's Progressive Matrices | Solved ~80% of items | 2023 |
| Claude 3 Opus | Graduate-level reasoning (GPQA) | ~60% accuracy | 2024 |
| GPT-3.5 | Verbal IQ estimate | ~110-120 range | 2023 |
| Google Gemini Ultra | MMLU benchmark (broad knowledge) | 90.0% | 2024 |
| AlphaGo | Strategic reasoning (Go) | Surpassed all human players | 2016 |
These numbers raise an obvious question: if GPT-4 scores 155 on a verbal IQ test, is it a genius? The answer requires understanding what IQ tests actually measure and how AI arrives at its answers.
"Artificial intelligence is growing up fast, as are robots whose facial expressions can elicit empathy and make your mirror neurons quiver."
-- Diane Ackerman, author and naturalist
Why the Scores Are Misleading
AI models achieve high scores on verbal IQ tasks because they have been trained on billions of text examples that include the very types of reasoning, vocabulary, and analogies that IQ tests assess. Consider the difference:
- A human solving an analogy like "tree : forest :: star : ?" draws on embodied experience, visual memory, and abstract categorization built over a lifetime
- An LLM solves the same analogy by identifying statistical patterns in its training data where these word relationships appeared thousands of times
The AI is not reasoning in the way the test designers intended. It is performing sophisticated retrieval and interpolation from a massive dataset. This distinction matters enormously for whether we can meaningfully assign an IQ to a machine.
The Turing Test: A Starting Point, Not a Solution
Alan Turing proposed his famous test in 1950: if a machine can converse with a human evaluator who cannot reliably distinguish it from a human, the machine should be considered intelligent. Modern LLMs like GPT-4 and Claude can pass casual versions of the Turing test with ease, fooling many evaluators in short conversations.
But the Turing test has well-known limitations:
- It tests imitation, not understanding -- a machine can produce human-like responses without comprehending their meaning
- It is vulnerable to tricks -- systems like ELIZA (1966) fooled some users with simple pattern matching
- It conflates fluency with intelligence -- a model can be articulate yet lack common sense
- Cultural and linguistic biases shape evaluator expectations
"Computing machinery is expected to do every kind of thinking. But machines are expected to think, not to feel or to have consciousness."
-- Alan Turing, in his 1950 paper "Computing Machinery and Intelligence"
Beyond Turing: Modern AI Benchmarks
Researchers have developed more rigorous frameworks for evaluating machine intelligence:
| Benchmark | What It Measures | Limitation |
|---|---|---|
| Turing Test | Conversational mimicry | Tests imitation, not understanding |
| ARC (Abstraction and Reasoning Corpus) | Novel pattern generalization | AI still struggles significantly |
| MMLU (Massive Multitask Language Understanding) | Broad factual knowledge | Memorization can inflate scores |
| BIG-Bench | Diverse cognitive tasks | Many tasks are still language-dependent |
| Raven's Progressive Matrices | Nonverbal abstract reasoning | AI solves via image recognition heuristics |
| Winograd Schema Challenge | Common-sense reasoning | LLMs have largely "solved" it through scale |
Francois Chollet, creator of the ARC benchmark, argues that true intelligence is the ability to generalize to novel situations, not to perform well on tasks similar to training data. On ARC tasks requiring genuine abstraction, even the most advanced AI models score far below average humans.
The Chinese Room: Does AI Understand Anything?
Philosopher John Searle's 1980 "Chinese Room" thought experiment remains one of the most powerful arguments against AI having genuine intelligence:
Imagine a person locked in a room who receives Chinese characters through a slot. They have a rulebook that tells them which Chinese characters to send back. To an outside observer, the room appears to understand Chinese perfectly. But the person inside understands nothing -- they are just following rules.
"Minds are not just programs. The brain is a machine, but it is a very special kind of machine."
-- John Searle, philosopher, creator of the Chinese Room argument
Modern LLMs are, in a sense, very sophisticated Chinese Rooms. They manipulate symbols (tokens) according to learned patterns without any internal experience of meaning. This is why:
- GPT-4 can write a moving poem about grief but has never felt grief
- Claude can explain quantum physics but does not understand physics in the way a physicist does
- AI can ace IQ test questions about spatial reasoning without having any spatial experience
The Consciousness Gap
| Dimension | Human Intelligence | Current AI |
|---|---|---|
| Subjective experience | Present -- "what it is like" to think | Absent |
| Intentionality | Thoughts are about something | No genuine "aboutness" |
| Embodied cognition | Grounded in sensory experience | No body, no senses |
| Self-awareness | Can reflect on own mental states | No inner mental life |
| Motivation | Driven by goals, desires, emotions | Responds only to prompts |
| Learning from single examples | Highly capable | Very limited |
Artificial General Intelligence: Would AGI Have an IQ?
Artificial General Intelligence (AGI) refers to a hypothetical AI system that can perform any intellectual task that a human can, with comparable flexibility and learning speed. Unlike today's narrow AI, AGI would:
- Transfer knowledge seamlessly across domains
- Learn new skills from minimal examples
- Reason about novel situations without prior training
- Potentially possess something analogous to common sense
If AGI is achieved, the question of assigning it an IQ becomes more meaningful -- but also more complex. An AGI system might:
- Score extremely high on all subtests of a standard IQ battery
- Process information millions of times faster than any human brain
- Access far more stored knowledge than any individual could accumulate in a lifetime
- Yet still lack consciousness, subjective experience, and emotional understanding
"The development of full artificial intelligence could spell the end of the human race. It would take off on its own, and redesign itself at an ever-increasing rate."
-- Stephen Hawking, theoretical physicist
Timeline Estimates for AGI
| Source | Predicted AGI Timeline | Key Assumption |
|---|---|---|
| Ray Kurzweil | By 2029 | Exponential computing growth |
| DeepMind (Demis Hassabis) | 2030s | Fundamental breakthroughs needed |
| Metaculus (crowd forecast) | ~2040 | Median community estimate |
| Gary Marcus (skeptic) | "Not in our lifetime" | Current approaches insufficient |
| Yann LeCun (Meta AI) | Decades away | Need new paradigms beyond LLMs |
The wide disagreement among experts highlights how uncertain this field remains. What is clear is that current AI, however impressive, is not AGI.
Real-World Examples: When AI Amazes and When It Fails
Impressive AI Achievements
- Chess and Go: DeepBlue defeated Kasparov (1997); AlphaGo defeated Lee Sedol (2016) in moves no human had considered in 3,000 years of the game
- Medical diagnosis: AI systems now match or exceed radiologists in detecting certain cancers from imaging scans
- Scientific discovery: AlphaFold predicted the 3D structure of over 200 million proteins, a problem that had stumped biologists for decades
- Standardized tests: GPT-4 passed the bar exam in the 90th percentile
Embarrassing AI Failures
- Common sense: AI models have confidently stated that a horse has six legs or that you can fit a basketball inside a coffee cup
- Cause and effect: Models struggle with questions like "If I drop a glass on a pillow, will it break?" because they lack physical intuition
- Novel reasoning: When presented with truly new puzzles (not variants of training data), AI performance drops dramatically
- Counting and basic math: LLMs frequently fail at counting letters in words or performing multi-step arithmetic
"The real risk with AI isn't malice but competence. A superintelligent AI will be extremely good at accomplishing its goals, and if those goals aren't aligned with ours, we're in trouble."
-- Stuart Russell, AI researcher, UC Berkeley
These examples illustrate why raw IQ scores for AI are misleading. A system that passes the bar exam but cannot reliably count the letters in "strawberry" does not have intelligence in the way humans understand it.
What Would a Meaningful "Machine IQ" Look Like?
If we wanted to create a fair intelligence metric for AI, it would need to measure capabilities that IQ tests measure in humans -- but adapted for the unique architecture of machines:
Proposed Dimensions for Machine Intelligence Assessment
- Generalization ability: Can the system solve problems it was never trained on?
- Sample efficiency: How much data does it need to learn a new concept?
- Transfer learning: Can it apply knowledge from one domain to another?
- Robustness: Does performance hold up when inputs are slightly altered?
- Calibration: Does the system know what it does and does not know?
- Compositional reasoning: Can it combine known concepts in novel ways?
| Metric | What It Captures | Current AI Performance |
|---|---|---|
| ARC score | Novel abstraction | Well below average human |
| Few-shot learning accuracy | Sample efficiency | Moderate, improving |
| Distribution shift robustness | Generalization | Poor to moderate |
| Calibration error | Self-knowledge | Improving with RLHF |
| Compositional generalization | Creative combination | Limited |
A true "machine IQ" would likely be multidimensional, not reducible to a single number, and would need regular updating as AI capabilities evolve.
The Philosophical Stakes: Why This Question Matters
The question of whether AI can have an IQ is not merely academic. It has profound implications for:
- Legal rights: If an AI is declared "intelligent," does it deserve legal protections?
- Moral responsibility: If an AI system causes harm, is it "responsible" or is its creator?
- Employment: If AI scores higher than humans on cognitive benchmarks, what does this mean for knowledge work?
- Education: Should AI tutors be assessed for their "intelligence" in teaching, or only for student outcomes?
- Human identity: If machines can match us on our most valued cognitive tests, what makes human intelligence unique?
"We are called to be architects of the future, not its victims."
-- R. Buckminster Fuller, inventor and futurist
These questions will become increasingly urgent as AI systems grow more capable. Understanding the distinction between performing well on IQ tests and being intelligent is essential preparation for the world ahead.
Conclusion: Impressive Performance Is Not Intelligence
AI systems can achieve remarkable scores on IQ tests and cognitive benchmarks. GPT-4's reported verbal IQ of 155 is genuinely impressive as a technical achievement. But scoring well on a test designed for human minds does not mean the machine has a mind.
Intelligence -- as humans experience it -- involves consciousness, subjective experience, embodied understanding, emotional depth, and the ability to navigate genuinely novel situations. Current AI has none of these qualities. It has performance without understanding, fluency without meaning, and answers without comprehension.
Whether AGI will change this picture remains one of the great open questions of our time. What we can say with confidence is that today's AI, however dazzling, does not have an IQ in any meaningful sense. It has something else entirely -- and we do not yet have the right word for it.
To explore what human IQ tests actually measure and how your own mind performs, you can take our full IQ test, try a quick IQ assessment, or warm up with a practice test. These assessments are designed for human cognition -- the kind of intelligence that, for now, remains uniquely ours.
Frequently Asked Questions
How does GPT-4 actually score on IQ tests?
GPT-4 has been reported to score in the range of **120-155** on verbal IQ components, depending on the specific test and administration method. It performs exceptionally well on vocabulary, analogies, and verbal reasoning tasks because these overlap heavily with its text-based training data. However, it performs much worse on tasks requiring *genuine novel reasoning*, spatial manipulation, or common-sense physical understanding. Researchers like Francois Chollet have argued that these scores reflect **memorization and interpolation**, not the fluid intelligence that IQ tests are designed to measure in humans.
Can current AI pass the Turing test?
Modern large language models can pass **casual versions** of the Turing test, fooling many evaluators in short conversations. In a 2024 study, GPT-4 fooled evaluators about 50% of the time in two-minute conversations. However, extended or adversarial conversations typically reveal the AI's limitations -- including confabulation (making up facts), inconsistent reasoning, and inability to draw on genuine personal experience. The Turing test itself is widely considered an **insufficient measure of intelligence** by contemporary AI researchers.
What is the difference between narrow AI and AGI?
**Narrow AI** (also called weak AI) excels at specific tasks -- playing chess, translating languages, recognizing images -- but cannot transfer its abilities to new domains. **Artificial General Intelligence (AGI)** would possess human-like cognitive flexibility, learning new tasks as efficiently as humans and transferring knowledge across domains. All current AI systems, including GPT-4 and Claude, are forms of narrow AI, despite their impressive breadth. The gap between today's AI and true AGI is considered by most researchers to be **substantial**, requiring fundamental breakthroughs, not just more computing power.
Does the Chinese Room argument disprove AI intelligence?
John Searle's Chinese Room argument (1980) does not *disprove* AI intelligence, but it powerfully illustrates that **symbol manipulation is not the same as understanding**. The argument shows that a system can produce perfectly correct outputs without any internal comprehension. Critics of Searle, such as Daniel Dennett, counter that understanding might *emerge* from sufficiently complex information processing. This debate remains unresolved and is central to philosophy of mind. For practical purposes, it reminds us that an AI acing an IQ test does not necessarily *understand* the questions.
Will AI eventually become smarter than all humans?
AI already surpasses humans in many narrow domains -- calculation speed, data analysis, pattern recognition in large datasets, and certain game-playing abilities. Whether AI will achieve **superhuman general intelligence** depends on breakthroughs in AGI research. Experts are deeply divided: a 2023 survey of AI researchers found a median estimate of a **10% chance** of human-level AI leading to "extremely bad" outcomes, while timelines for AGI range from 2029 to "never." The question is not only technical but philosophical -- it depends on what we mean by "smarter" and whether consciousness matters for intelligence.
How should we think about AI "intelligence" going forward?
The most productive approach is to treat AI capabilities as a ***different kind of cognitive tool***, not as a competitor to human intelligence. AI excels at processing speed, pattern recognition, and knowledge retrieval. Humans excel at creativity, emotional understanding, moral reasoning, and navigating novel situations. Rather than asking "Is AI smarter than humans?", a better question is "How can human and AI intelligence complement each other?" This framing avoids the misleading IQ comparison and focuses on practical value -- which is ultimately what matters.
Can AI develop emotional intelligence?
Current AI can *simulate* emotional responses -- recognizing sentiment in text, generating empathetic-sounding replies, identifying facial expressions in images. However, it does not *experience* emotions. Emotional intelligence in humans involves subjective feelings, physiological responses (elevated heart rate, tears), and learned social navigation built over years of embodied experience. AI lacks all of these. The field of **affective computing** is working to make AI more emotionally responsive, but there is a fundamental difference between *recognizing patterns associated with emotions* and *having emotions*. This distinction is critical when considering whether AI can ever possess true intelligence.
What ethical issues arise from claiming AI has an IQ?
Assigning IQ scores to AI creates several risks: it can **overstate AI capabilities**, leading to premature trust in AI decision-making; it can **devalue human intelligence** by suggesting machines are "smarter"; and it can create **misleading marketing claims** by AI companies. The American Psychological Association cautions against applying human cognitive metrics to non-human systems without rigorous validation. Responsible AI development requires honest communication about what AI can and cannot do, rather than headline-grabbing IQ comparisons. ## References - Turing, A. M. (1950). Computing Machinery and Intelligence. *Mind*, 59(236), 433-460. - Searle, J. R. (1980). Minds, Brains, and Programs. *Behavioral and Brain Sciences*, 3(3), 417-424. - Chollet, F. (2019). On the Measure of Intelligence. *arXiv preprint*, arXiv:1911.01547. - Gardner, H. (1983). *Frames of Mind: The Theory of Multiple Intelligences*. Basic Books. - Silver, D., et al. (2016). Mastering the game of Go with deep neural networks and tree search. *Nature*, 529(7587), 484-489. - Bubeck, S., et al. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. *arXiv preprint*, arXiv:2303.12712. - Grace, K., et al. (2024). Thousands of AI Authors on the Future of AI. *arXiv preprint*, arXiv:2401.02843. - Webb, T., et al. (2023). Emergent Analogical Reasoning in Large Language Models. *Nature Human Behaviour*, 7, 1526-1541. - American Psychological Association. (2024). Intelligence. https://www.apa.org/topics/intelligence
Curious about your IQ?
You can take a free online IQ test and get instant results.
Take IQ Test