Can Machines Really Be Intelligent?

In 2023, researchers administered a standard Wechsler-style verbal IQ test to GPT-4 and reported a score equivalent to roughly 155 -- placing it in the 99.9th percentile of the human population. Headlines declared that AI had surpassed most humans in intelligence. But had it?

The question "Can AI ever have an IQ?" sits at the intersection of computer science, philosophy, and psychology. It forces us to confront what intelligence actually is, whether a number can capture it, and whether a machine that produces correct answers is genuinely thinking or merely performing an extraordinarily sophisticated pattern match.

"The question of whether machines can think is about as relevant as the question of whether submarines can swim."
-- Edsger Dijkstra, computer science pioneer

This article examines how modern AI systems perform on human IQ tests, why those scores are both impressive and deeply misleading, and what the pursuit of artificial general intelligence (AGI) means for our understanding of the mind.


How AI Models Actually Score on IQ Tests

Several research teams have tested large language models (LLMs) on standardized IQ assessments. The results are striking -- and complicated.

AI Performance on IQ-Style Benchmarks

AI Model Test Type Reported Score / Performance Year
GPT-4 Verbal IQ (WAIS-style) ~155 (est. verbal IQ) 2023
GPT-4 Raven's Progressive Matrices Solved ~80% of items 2023
Claude 3 Opus Graduate-level reasoning (GPQA) ~60% accuracy 2024
GPT-3.5 Verbal IQ estimate ~110-120 range 2023
Google Gemini Ultra MMLU benchmark (broad knowledge) 90.0% 2024
AlphaGo Strategic reasoning (Go) Surpassed all human players 2016

These numbers raise an obvious question: if GPT-4 scores 155 on a verbal IQ test, is it a genius? The answer requires understanding what IQ tests actually measure and how AI arrives at its answers.

"Artificial intelligence is growing up fast, as are robots whose facial expressions can elicit empathy and make your mirror neurons quiver."
-- Diane Ackerman, author and naturalist

Why the Scores Are Misleading

AI models achieve high scores on verbal IQ tasks because they have been trained on billions of text examples that include the very types of reasoning, vocabulary, and analogies that IQ tests assess. Consider the difference:

  • A human solving an analogy like "tree : forest :: star : ?" draws on embodied experience, visual memory, and abstract categorization built over a lifetime
  • An LLM solves the same analogy by identifying statistical patterns in its training data where these word relationships appeared thousands of times

The AI is not reasoning in the way the test designers intended. It is performing sophisticated retrieval and interpolation from a massive dataset. This distinction matters enormously for whether we can meaningfully assign an IQ to a machine.


The Turing Test: A Starting Point, Not a Solution

Alan Turing proposed his famous test in 1950: if a machine can converse with a human evaluator who cannot reliably distinguish it from a human, the machine should be considered intelligent. Modern LLMs like GPT-4 and Claude can pass casual versions of the Turing test with ease, fooling many evaluators in short conversations.

But the Turing test has well-known limitations:

  1. It tests imitation, not understanding -- a machine can produce human-like responses without comprehending their meaning
  2. It is vulnerable to tricks -- systems like ELIZA (1966) fooled some users with simple pattern matching
  3. It conflates fluency with intelligence -- a model can be articulate yet lack common sense
  4. Cultural and linguistic biases shape evaluator expectations

"Computing machinery is expected to do every kind of thinking. But machines are expected to think, not to feel or to have consciousness."
-- Alan Turing, in his 1950 paper "Computing Machinery and Intelligence"

Beyond Turing: Modern AI Benchmarks

Researchers have developed more rigorous frameworks for evaluating machine intelligence:

Benchmark What It Measures Limitation
Turing Test Conversational mimicry Tests imitation, not understanding
ARC (Abstraction and Reasoning Corpus) Novel pattern generalization AI still struggles significantly
MMLU (Massive Multitask Language Understanding) Broad factual knowledge Memorization can inflate scores
BIG-Bench Diverse cognitive tasks Many tasks are still language-dependent
Raven's Progressive Matrices Nonverbal abstract reasoning AI solves via image recognition heuristics
Winograd Schema Challenge Common-sense reasoning LLMs have largely "solved" it through scale

Francois Chollet, creator of the ARC benchmark, argues that true intelligence is the ability to generalize to novel situations, not to perform well on tasks similar to training data. On ARC tasks requiring genuine abstraction, even the most advanced AI models score far below average humans.


The Chinese Room: Does AI Understand Anything?

Philosopher John Searle's 1980 "Chinese Room" thought experiment remains one of the most powerful arguments against AI having genuine intelligence:

Imagine a person locked in a room who receives Chinese characters through a slot. They have a rulebook that tells them which Chinese characters to send back. To an outside observer, the room appears to understand Chinese perfectly. But the person inside understands nothing -- they are just following rules.

"Minds are not just programs. The brain is a machine, but it is a very special kind of machine."
-- John Searle, philosopher, creator of the Chinese Room argument

Modern LLMs are, in a sense, very sophisticated Chinese Rooms. They manipulate symbols (tokens) according to learned patterns without any internal experience of meaning. This is why:

  • GPT-4 can write a moving poem about grief but has never felt grief
  • Claude can explain quantum physics but does not understand physics in the way a physicist does
  • AI can ace IQ test questions about spatial reasoning without having any spatial experience

The Consciousness Gap

Dimension Human Intelligence Current AI
Subjective experience Present -- "what it is like" to think Absent
Intentionality Thoughts are about something No genuine "aboutness"
Embodied cognition Grounded in sensory experience No body, no senses
Self-awareness Can reflect on own mental states No inner mental life
Motivation Driven by goals, desires, emotions Responds only to prompts
Learning from single examples Highly capable Very limited

Artificial General Intelligence: Would AGI Have an IQ?

Artificial General Intelligence (AGI) refers to a hypothetical AI system that can perform any intellectual task that a human can, with comparable flexibility and learning speed. Unlike today's narrow AI, AGI would:

  • Transfer knowledge seamlessly across domains
  • Learn new skills from minimal examples
  • Reason about novel situations without prior training
  • Potentially possess something analogous to common sense

If AGI is achieved, the question of assigning it an IQ becomes more meaningful -- but also more complex. An AGI system might:

  1. Score extremely high on all subtests of a standard IQ battery
  2. Process information millions of times faster than any human brain
  3. Access far more stored knowledge than any individual could accumulate in a lifetime
  4. Yet still lack consciousness, subjective experience, and emotional understanding

"The development of full artificial intelligence could spell the end of the human race. It would take off on its own, and redesign itself at an ever-increasing rate."
-- Stephen Hawking, theoretical physicist

Timeline Estimates for AGI

Source Predicted AGI Timeline Key Assumption
Ray Kurzweil By 2029 Exponential computing growth
DeepMind (Demis Hassabis) 2030s Fundamental breakthroughs needed
Metaculus (crowd forecast) ~2040 Median community estimate
Gary Marcus (skeptic) "Not in our lifetime" Current approaches insufficient
Yann LeCun (Meta AI) Decades away Need new paradigms beyond LLMs

The wide disagreement among experts highlights how uncertain this field remains. What is clear is that current AI, however impressive, is not AGI.


Real-World Examples: When AI Amazes and When It Fails

Impressive AI Achievements

  • Chess and Go: DeepBlue defeated Kasparov (1997); AlphaGo defeated Lee Sedol (2016) in moves no human had considered in 3,000 years of the game
  • Medical diagnosis: AI systems now match or exceed radiologists in detecting certain cancers from imaging scans
  • Scientific discovery: AlphaFold predicted the 3D structure of over 200 million proteins, a problem that had stumped biologists for decades
  • Standardized tests: GPT-4 passed the bar exam in the 90th percentile

Embarrassing AI Failures

  • Common sense: AI models have confidently stated that a horse has six legs or that you can fit a basketball inside a coffee cup
  • Cause and effect: Models struggle with questions like "If I drop a glass on a pillow, will it break?" because they lack physical intuition
  • Novel reasoning: When presented with truly new puzzles (not variants of training data), AI performance drops dramatically
  • Counting and basic math: LLMs frequently fail at counting letters in words or performing multi-step arithmetic

"The real risk with AI isn't malice but competence. A superintelligent AI will be extremely good at accomplishing its goals, and if those goals aren't aligned with ours, we're in trouble."
-- Stuart Russell, AI researcher, UC Berkeley

These examples illustrate why raw IQ scores for AI are misleading. A system that passes the bar exam but cannot reliably count the letters in "strawberry" does not have intelligence in the way humans understand it.


What Would a Meaningful "Machine IQ" Look Like?

If we wanted to create a fair intelligence metric for AI, it would need to measure capabilities that IQ tests measure in humans -- but adapted for the unique architecture of machines:

Proposed Dimensions for Machine Intelligence Assessment

  1. Generalization ability: Can the system solve problems it was never trained on?
  2. Sample efficiency: How much data does it need to learn a new concept?
  3. Transfer learning: Can it apply knowledge from one domain to another?
  4. Robustness: Does performance hold up when inputs are slightly altered?
  5. Calibration: Does the system know what it does and does not know?
  6. Compositional reasoning: Can it combine known concepts in novel ways?
Metric What It Captures Current AI Performance
ARC score Novel abstraction Well below average human
Few-shot learning accuracy Sample efficiency Moderate, improving
Distribution shift robustness Generalization Poor to moderate
Calibration error Self-knowledge Improving with RLHF
Compositional generalization Creative combination Limited

A true "machine IQ" would likely be multidimensional, not reducible to a single number, and would need regular updating as AI capabilities evolve.


The Philosophical Stakes: Why This Question Matters

The question of whether AI can have an IQ is not merely academic. It has profound implications for:

  • Legal rights: If an AI is declared "intelligent," does it deserve legal protections?
  • Moral responsibility: If an AI system causes harm, is it "responsible" or is its creator?
  • Employment: If AI scores higher than humans on cognitive benchmarks, what does this mean for knowledge work?
  • Education: Should AI tutors be assessed for their "intelligence" in teaching, or only for student outcomes?
  • Human identity: If machines can match us on our most valued cognitive tests, what makes human intelligence unique?

"We are called to be architects of the future, not its victims."
-- R. Buckminster Fuller, inventor and futurist

These questions will become increasingly urgent as AI systems grow more capable. Understanding the distinction between performing well on IQ tests and being intelligent is essential preparation for the world ahead.


Conclusion: Impressive Performance Is Not Intelligence

AI systems can achieve remarkable scores on IQ tests and cognitive benchmarks. GPT-4's reported verbal IQ of 155 is genuinely impressive as a technical achievement. But scoring well on a test designed for human minds does not mean the machine has a mind.

Intelligence -- as humans experience it -- involves consciousness, subjective experience, embodied understanding, emotional depth, and the ability to navigate genuinely novel situations. Current AI has none of these qualities. It has performance without understanding, fluency without meaning, and answers without comprehension.

Whether AGI will change this picture remains one of the great open questions of our time. What we can say with confidence is that today's AI, however dazzling, does not have an IQ in any meaningful sense. It has something else entirely -- and we do not yet have the right word for it.

To explore what human IQ tests actually measure and how your own mind performs, you can take our full IQ test, try a quick IQ assessment, or warm up with a practice test. These assessments are designed for human cognition -- the kind of intelligence that, for now, remains uniquely ours.