Why the Same Person Can Get Different IQ Scores on Different Tests

Here is a fact that surprises most people: if you took the WAIS-III (normed in 1995) and the WAIS-IV (normed in 2008) on the same day, you would likely score 2-5 points higher on the older test. Your brain did not change between tests. What changed was the norm -- the reference population your performance was compared against.

This phenomenon sits at the heart of one of the most important but least understood aspects of IQ testing: norming. Every IQ score is fundamentally a comparison. It tells you where you stand relative to a specific group of people tested at a specific point in time. Change that reference group, and your score changes too.

Understanding how norming works is essential for anyone who wants to interpret their IQ score accurately -- whether from a clinical assessment, a school placement test, or an online evaluation like our full IQ test.

"An IQ score is not a fixed property of a person. It is a comparison score, and its meaning depends entirely on the quality and recency of the norms against which it is calibrated."
-- Alan Kaufman, clinical psychologist and IQ test developer


What Is Norming? The Foundation of Every IQ Score

At its simplest, norming is the process of determining what "average" looks like for a given test. Test developers administer the assessment to a large, carefully selected group of people -- the normative sample -- and use the results to create a scoring system.

How Raw Scores Become IQ Scores

The conversion process works in stages:

  1. Raw score collection -- Count correct answers on each subtest
  2. Scaled score conversion -- Adjust raw scores for item difficulty (so harder items count more)
  3. Composite scoring -- Combine scaled scores into domain composites (e.g., Verbal Comprehension Index)
  4. Standardization -- Map composite scores onto a normal distribution with a mean of 100 and a standard deviation of 15
Raw Score Percentile in Norm Sample Standardized IQ Score Classification
Top 2% of norm sample 98th percentile 130+ Very Superior
Top 16% 84th percentile 115 High Average
Middle 50% 25th-75th percentile 90-110 Average
Bottom 16% 16th percentile 85 Low Average
Bottom 2% 2nd percentile Below 70 Extremely Low

The critical point: your IQ score is not a measure of absolute ability. It is a rank within the normative sample. Score 100, and you performed exactly at the average of the people in the norming study. Score 115, and you outperformed 84% of them.

"The IQ score has no meaning except in relation to the normative group. Change the normative group, and you change the meaning of the score."
-- John Carroll, psychometrician and author of Human Cognitive Abilities


Building the Normative Sample: Harder Than You Think

Creating a normative sample is one of the most expensive and labor-intensive parts of test development. The sample must be representative of the population the test is designed for -- typically the entire country, stratified by key demographic variables.

What Goes Into a Major IQ Test Norming Study

Variable How It Is Controlled Why It Matters
Age Sample includes all age groups in proportion to census data IQ varies by age; norms must be age-specific
Sex Equal or proportional representation Some cognitive subtests show sex differences
Race/Ethnicity Proportional to national demographics Ensures no group is over- or under-represented
Education level Stratified by educational attainment Education correlates with test performance
Geographic region Includes urban, suburban, and rural areas Regional differences in educational quality
Socioeconomic status Represented across income levels SES correlates with cognitive test scores

Normative Sample Sizes for Major IQ Tests

Test Norming Year Sample Size Age Range Country
WAIS-IV 2008 2,200 16-90 United States
WAIS-V 2024 2,170 16-90 United States
WISC-V 2014 2,200 6-16 United States
Stanford-Binet 5 2003 4,800 2-85+ United States
Raven's Progressive Matrices Various Varies by edition 5-65+ Multiple countries
Cattell Culture Fair III 1960s/updated ~3,000 15+ United States/International

A sample of 2,200 might seem small for representing an entire country of 330 million people. But when properly stratified, this size produces remarkably stable norms -- the standard error of the mean IQ score in such samples is typically less than 1 point.

"The art of norming is not just collecting a large sample. It is collecting the right sample -- one that mirrors the population with precision."
-- David Wechsler, creator of the Wechsler intelligence scales

However, even well-designed samples have limitations. People with severe cognitive disabilities, those in institutions, and non-English speakers are often underrepresented. This is why test manuals include detailed descriptions of the norming sample, so clinicians can judge whether the norms are appropriate for a given individual.


The Flynn Effect: Why IQ Scores Keep Rising

The most dramatic reason IQ tests need renorming is the Flynn effect -- the well-documented phenomenon of rising average IQ scores across generations. Named after political scientist James Flynn, who first systematically documented it in the 1980s, this effect shows that raw cognitive performance improves by roughly 3 points per decade on average.

Flynn Effect Magnitude by Test Type

Cognitive Domain Average Gain Per Decade Implications
Fluid reasoning (Raven's matrices) 5-6 points Largest gains; abstract reasoning improving fastest
Full-scale IQ (Wechsler tests) 3 points Consistent across editions
Verbal comprehension 2-3 points Moderate gains
Processing speed 1-2 points Smallest gains
Crystallized knowledge 2-3 points Reflects better education access

The Flynn effect means that someone scoring 100 on the WAIS-III (normed 1995) would score roughly 105-108 on the WAIS-R (normed 1978), because the older norms represent a lower-performing population. The person has not gotten smarter -- they are simply being compared to a less capable reference group.

What Causes the Flynn Effect?

Researchers have proposed multiple explanations, and the consensus is that no single factor accounts for it:

  • Improved nutrition -- Better prenatal and early childhood nutrition supports brain development
  • Expanded education -- More years of schooling and higher-quality instruction
  • Environmental complexity -- Modern life demands more abstract thinking (technology, media, bureaucracy)
  • Reduced disease burden -- Fewer childhood infections that affect cognitive development
  • Smaller family sizes -- More parental resources per child
  • Test-taking familiarity -- Greater exposure to standardized testing formats

"The Flynn effect tells us that the average person in 1900 would score about 70 on today's IQ tests. That does not mean our great-grandparents were intellectually disabled -- it means the world they lived in made different cognitive demands."
-- James Flynn, political scientist and discoverer of the Flynn effect

Is the Flynn Effect Slowing Down?

Recent evidence from Scandinavian countries (Norway, Denmark, Finland) suggests the Flynn effect may be reversing in some developed nations. Studies by Bratsberg and Rogeberg (2018) found that Norwegian IQ scores peaked for cohorts born around 1975 and have declined by about 0.3 points per year since then.

Country Flynn Effect Status Estimated Trend
Norway Reversing -0.3 points/year since mid-1970s birth cohorts
Denmark Reversing -0.2 points/year since 1990s birth cohorts
Finland Plateauing Minimal change in recent cohorts
United States Slowing Gains reduced to ~1 point/decade
Developing nations Continuing 3-5 points/decade (similar to historical Western pattern)

This potential reversal has significant implications for renorming. If scores are declining, tests normed during the peak of the Flynn effect may actually underestimate current test-takers' relative standing.


Renorming Cycles: When and Why Tests Get Updated

Renorming is the process of collecting a new normative sample and recalibrating the scoring system. It is essential because norms become outdated as the population changes.

Renorming Timeline for Major IQ Tests

Test Family Edition Year Normed Years Between Editions
Wechsler Adult (WAIS) WAIS 1955 --
WAIS-R 1981 26 years
WAIS-III 1997 16 years
WAIS-IV 2008 11 years
WAIS-V 2024 16 years
Wechsler Child (WISC) WISC 1949 --
WISC-R 1974 25 years
WISC-III 1991 17 years
WISC-IV 2003 12 years
WISC-V 2014 11 years
Stanford-Binet SB-4 1986 --
SB-5 2003 17 years

The general recommendation is that IQ tests should be renormed every 15-20 years, though some experts argue for more frequent updates given the pace of societal change.

What Happens During Renorming

  1. New sample recruitment -- Test publishers recruit a new demographically representative sample
  2. Test administration -- The new (or updated) test is administered under standardized conditions
  3. Statistical analysis -- Raw score distributions are analyzed and new conversion tables are created
  4. Equating -- If test content has changed, statistical methods ensure scores remain comparable across editions
  5. Publication -- New norms are published in the test manual along with detailed sample demographics

The cost of a major renorming study can exceed $5-10 million when including sample recruitment, examiner training, administration, and data analysis. This is why only well-funded test publishers can maintain current norms.


Score Drift: The Silent Distortion

Score drift is the gradual inflation (or deflation) of IQ scores that occurs when test norms become outdated. It is the practical consequence of failing to renorm often enough.

How Score Drift Works

Due to the Flynn effect, average raw scores increase over time. If the norms stay fixed, the average test-taker will score above 100 -- not because they are above average, but because they are being compared to a population from the past.

Years Since Norming Estimated Score Inflation (due to Flynn effect) Actual Meaning of "IQ 110"
0 years (freshly normed) 0 points Truly above average (75th percentile)
5 years ~1.5 points Slightly inflated
10 years ~3 points Appears above average but may be average
15 years ~4.5 points Notably inflated
20 years ~6 points Score of 110 may actually represent ~104 on current norms
30 years ~9 points Severely inflated; clinical decisions may be compromised

Real-World Consequences of Score Drift

Score drift is not just an academic concern. It has serious practical consequences:

  • Special education eligibility -- A child might qualify (or fail to qualify) for services based on an inflated score from outdated norms. The cutoff for intellectual disability diagnosis is typically IQ below 70. On a test normed 20 years ago, a child with a "true" IQ of 72 might score 78, missing the threshold for needed services.
  • Gifted program placement -- Conversely, students may appear to qualify for gifted programs (typically IQ 130+) when their true score on current norms would be closer to 124.
  • Forensic evaluations -- In capital punishment cases in the United States, an IQ score below 70 can exempt a defendant from the death penalty (per Atkins v. Virginia, 2002). Score drift from outdated norms can literally be a matter of life and death.
  • Disability benefits -- IQ thresholds determine eligibility for certain social services and accommodations.

"Score drift is not a minor technical issue. In clinical and forensic settings, a 5-point difference can change a diagnosis, alter a life trajectory, or determine a legal outcome."
-- Kevin McGrew, psychometric researcher and IQ test expert


How Online IQ Tests Handle Norming

Online IQ tests, including our assessments at whats-your-iq.com, face unique norming challenges compared to clinical instruments like the WAIS.

Clinical vs. Online Norming

Factor Clinical IQ Tests (WAIS, WISC) Online IQ Tests
Normative sample Carefully stratified, demographically matched Often larger but self-selected
Administration Standardized, one-on-one with trained examiner Self-administered, uncontrolled environment
Sample size 2,000-5,000 Can be tens of thousands or more
Demographic control Census-matched May skew toward certain demographics
Renorming frequency Every 11-20 years Can be updated continuously
Cost per norming $5-10 million Significantly lower

The advantage of online testing is the ability to collect massive amounts of data continuously, enabling more frequent norm updates. The disadvantage is that the sample is self-selected -- people who choose to take online IQ tests may not represent the general population.

Our tests address this by using statistical corrections and regularly updating our scoring algorithms to align with established IQ distributions. For the most accurate online assessment, try our full IQ test or our timed IQ test for processing speed evaluation.


How to Interpret Your IQ Score Knowing About Norming

Armed with knowledge about norming, here are practical guidelines for interpreting any IQ score:

Questions to Ask About Any IQ Test Result

  1. When was the test normed? If the norms are more than 15 years old, your score may be inflated by 3-5 points due to score drift.
  1. What population was the normative sample? If you differ significantly from the norming population (age, language, culture), the norms may not accurately represent your standing.
  1. What is the test's standard error of measurement? Most IQ tests have an SEM of about 3-5 points. A score of 112 might represent a "true" score anywhere from 107 to 117.
  1. Was the test administered under standardized conditions? Distractions, fatigue, or anxiety can depress scores by 5-10 points regardless of norming quality.
  1. How does this score compare across different tests? If you have taken multiple tests, look for consistency rather than fixating on any single score.

"A single IQ score is best understood as a range, not a point. The confidence interval is where the science lives."
-- Cecil Reynolds, neuropsychologist and psychometrics expert

For practice and familiarization with cognitive testing formats, try our practice IQ test or take the quick IQ assessment for a faster evaluation.


References

  1. Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101(2), 171-191.
  1. Flynn, J. R. (2007). What Is Intelligence? Beyond the Flynn Effect. Cambridge University Press.
  1. Bratsberg, B., & Rogeberg, O. (2018). Flynn effect and its reversal are both environmentally caused. Proceedings of the National Academy of Sciences, 115(26), 6674-6678.
  1. Kaufman, A. S. (2009). IQ Testing 101. Springer Publishing.
  1. Wechsler, D. (2008). Wechsler Adult Intelligence Scale -- Fourth Edition (WAIS-IV): Technical and interpretive manual. Pearson.
  1. Trahan, L. H., Stuebing, K. K., Fletcher, J. M., & Hiscock, M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332-1360.
  1. Reynolds, C. R., & Niland, J. (1980). Time and score distributions in IQ testing: The problem of bias. Journal of School Psychology, 18(4), 341-348.
  1. Carroll, J. B. (1993). Human Cognitive Abilities: A Survey of Factor-Analytic Studies. Cambridge University Press.
  1. McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37(1), 1-10.