Why the Same Person Can Get Different IQ Scores on Different Tests
Here is a fact that surprises most people: if you took the WAIS-III (normed in 1995) and the WAIS-IV (normed in 2008) on the same day, you would likely score 2-5 points higher on the older test. Your brain did not change between tests. What changed was the norm -- the reference population your performance was compared against.
This phenomenon sits at the heart of one of the most important but least understood aspects of IQ testing: norming. Every IQ score is fundamentally a comparison. It tells you where you stand relative to a specific group of people tested at a specific point in time. Change that reference group, and your score changes too.
Understanding how norming works is essential for anyone who wants to interpret their IQ score accurately -- whether from a clinical assessment, a school placement test, or an online evaluation like our full IQ test.
"An IQ score is not a fixed property of a person. It is a comparison score, and its meaning depends entirely on the quality and recency of the norms against which it is calibrated."
-- Alan Kaufman, clinical psychologist and IQ test developer
What Is Norming? The Foundation of Every IQ Score
At its simplest, norming is the process of determining what "average" looks like for a given test. Test developers administer the assessment to a large, carefully selected group of people -- the normative sample -- and use the results to create a scoring system.
How Raw Scores Become IQ Scores
The conversion process works in stages:
- Raw score collection -- Count correct answers on each subtest
- Scaled score conversion -- Adjust raw scores for item difficulty (so harder items count more)
- Composite scoring -- Combine scaled scores into domain composites (e.g., Verbal Comprehension Index)
- Standardization -- Map composite scores onto a normal distribution with a mean of 100 and a standard deviation of 15
| Raw Score | Percentile in Norm Sample | Standardized IQ Score | Classification |
|---|---|---|---|
| Top 2% of norm sample | 98th percentile | 130+ | Very Superior |
| Top 16% | 84th percentile | 115 | High Average |
| Middle 50% | 25th-75th percentile | 90-110 | Average |
| Bottom 16% | 16th percentile | 85 | Low Average |
| Bottom 2% | 2nd percentile | Below 70 | Extremely Low |
The critical point: your IQ score is not a measure of absolute ability. It is a rank within the normative sample. Score 100, and you performed exactly at the average of the people in the norming study. Score 115, and you outperformed 84% of them.
"The IQ score has no meaning except in relation to the normative group. Change the normative group, and you change the meaning of the score."
-- John Carroll, psychometrician and author of Human Cognitive Abilities
Building the Normative Sample: Harder Than You Think
Creating a normative sample is one of the most expensive and labor-intensive parts of test development. The sample must be representative of the population the test is designed for -- typically the entire country, stratified by key demographic variables.
What Goes Into a Major IQ Test Norming Study
| Variable | How It Is Controlled | Why It Matters |
|---|---|---|
| Age | Sample includes all age groups in proportion to census data | IQ varies by age; norms must be age-specific |
| Sex | Equal or proportional representation | Some cognitive subtests show sex differences |
| Race/Ethnicity | Proportional to national demographics | Ensures no group is over- or under-represented |
| Education level | Stratified by educational attainment | Education correlates with test performance |
| Geographic region | Includes urban, suburban, and rural areas | Regional differences in educational quality |
| Socioeconomic status | Represented across income levels | SES correlates with cognitive test scores |
Normative Sample Sizes for Major IQ Tests
| Test | Norming Year | Sample Size | Age Range | Country |
|---|---|---|---|---|
| WAIS-IV | 2008 | 2,200 | 16-90 | United States |
| WAIS-V | 2024 | 2,170 | 16-90 | United States |
| WISC-V | 2014 | 2,200 | 6-16 | United States |
| Stanford-Binet 5 | 2003 | 4,800 | 2-85+ | United States |
| Raven's Progressive Matrices | Various | Varies by edition | 5-65+ | Multiple countries |
| Cattell Culture Fair III | 1960s/updated | ~3,000 | 15+ | United States/International |
A sample of 2,200 might seem small for representing an entire country of 330 million people. But when properly stratified, this size produces remarkably stable norms -- the standard error of the mean IQ score in such samples is typically less than 1 point.
"The art of norming is not just collecting a large sample. It is collecting the right sample -- one that mirrors the population with precision."
-- David Wechsler, creator of the Wechsler intelligence scales
However, even well-designed samples have limitations. People with severe cognitive disabilities, those in institutions, and non-English speakers are often underrepresented. This is why test manuals include detailed descriptions of the norming sample, so clinicians can judge whether the norms are appropriate for a given individual.
The Flynn Effect: Why IQ Scores Keep Rising
The most dramatic reason IQ tests need renorming is the Flynn effect -- the well-documented phenomenon of rising average IQ scores across generations. Named after political scientist James Flynn, who first systematically documented it in the 1980s, this effect shows that raw cognitive performance improves by roughly 3 points per decade on average.
Flynn Effect Magnitude by Test Type
| Cognitive Domain | Average Gain Per Decade | Implications |
|---|---|---|
| Fluid reasoning (Raven's matrices) | 5-6 points | Largest gains; abstract reasoning improving fastest |
| Full-scale IQ (Wechsler tests) | 3 points | Consistent across editions |
| Verbal comprehension | 2-3 points | Moderate gains |
| Processing speed | 1-2 points | Smallest gains |
| Crystallized knowledge | 2-3 points | Reflects better education access |
The Flynn effect means that someone scoring 100 on the WAIS-III (normed 1995) would score roughly 105-108 on the WAIS-R (normed 1978), because the older norms represent a lower-performing population. The person has not gotten smarter -- they are simply being compared to a less capable reference group.
What Causes the Flynn Effect?
Researchers have proposed multiple explanations, and the consensus is that no single factor accounts for it:
- Improved nutrition -- Better prenatal and early childhood nutrition supports brain development
- Expanded education -- More years of schooling and higher-quality instruction
- Environmental complexity -- Modern life demands more abstract thinking (technology, media, bureaucracy)
- Reduced disease burden -- Fewer childhood infections that affect cognitive development
- Smaller family sizes -- More parental resources per child
- Test-taking familiarity -- Greater exposure to standardized testing formats
"The Flynn effect tells us that the average person in 1900 would score about 70 on today's IQ tests. That does not mean our great-grandparents were intellectually disabled -- it means the world they lived in made different cognitive demands."
-- James Flynn, political scientist and discoverer of the Flynn effect
Is the Flynn Effect Slowing Down?
Recent evidence from Scandinavian countries (Norway, Denmark, Finland) suggests the Flynn effect may be reversing in some developed nations. Studies by Bratsberg and Rogeberg (2018) found that Norwegian IQ scores peaked for cohorts born around 1975 and have declined by about 0.3 points per year since then.
| Country | Flynn Effect Status | Estimated Trend |
|---|---|---|
| Norway | Reversing | -0.3 points/year since mid-1970s birth cohorts |
| Denmark | Reversing | -0.2 points/year since 1990s birth cohorts |
| Finland | Plateauing | Minimal change in recent cohorts |
| United States | Slowing | Gains reduced to ~1 point/decade |
| Developing nations | Continuing | 3-5 points/decade (similar to historical Western pattern) |
This potential reversal has significant implications for renorming. If scores are declining, tests normed during the peak of the Flynn effect may actually underestimate current test-takers' relative standing.
Renorming Cycles: When and Why Tests Get Updated
Renorming is the process of collecting a new normative sample and recalibrating the scoring system. It is essential because norms become outdated as the population changes.
Renorming Timeline for Major IQ Tests
| Test Family | Edition | Year Normed | Years Between Editions |
|---|---|---|---|
| Wechsler Adult (WAIS) | WAIS | 1955 | -- |
| WAIS-R | 1981 | 26 years | |
| WAIS-III | 1997 | 16 years | |
| WAIS-IV | 2008 | 11 years | |
| WAIS-V | 2024 | 16 years | |
| Wechsler Child (WISC) | WISC | 1949 | -- |
| WISC-R | 1974 | 25 years | |
| WISC-III | 1991 | 17 years | |
| WISC-IV | 2003 | 12 years | |
| WISC-V | 2014 | 11 years | |
| Stanford-Binet | SB-4 | 1986 | -- |
| SB-5 | 2003 | 17 years |
The general recommendation is that IQ tests should be renormed every 15-20 years, though some experts argue for more frequent updates given the pace of societal change.
What Happens During Renorming
- New sample recruitment -- Test publishers recruit a new demographically representative sample
- Test administration -- The new (or updated) test is administered under standardized conditions
- Statistical analysis -- Raw score distributions are analyzed and new conversion tables are created
- Equating -- If test content has changed, statistical methods ensure scores remain comparable across editions
- Publication -- New norms are published in the test manual along with detailed sample demographics
The cost of a major renorming study can exceed $5-10 million when including sample recruitment, examiner training, administration, and data analysis. This is why only well-funded test publishers can maintain current norms.
Score Drift: The Silent Distortion
Score drift is the gradual inflation (or deflation) of IQ scores that occurs when test norms become outdated. It is the practical consequence of failing to renorm often enough.
How Score Drift Works
Due to the Flynn effect, average raw scores increase over time. If the norms stay fixed, the average test-taker will score above 100 -- not because they are above average, but because they are being compared to a population from the past.
| Years Since Norming | Estimated Score Inflation (due to Flynn effect) | Actual Meaning of "IQ 110" |
|---|---|---|
| 0 years (freshly normed) | 0 points | Truly above average (75th percentile) |
| 5 years | ~1.5 points | Slightly inflated |
| 10 years | ~3 points | Appears above average but may be average |
| 15 years | ~4.5 points | Notably inflated |
| 20 years | ~6 points | Score of 110 may actually represent ~104 on current norms |
| 30 years | ~9 points | Severely inflated; clinical decisions may be compromised |
Real-World Consequences of Score Drift
Score drift is not just an academic concern. It has serious practical consequences:
- Special education eligibility -- A child might qualify (or fail to qualify) for services based on an inflated score from outdated norms. The cutoff for intellectual disability diagnosis is typically IQ below 70. On a test normed 20 years ago, a child with a "true" IQ of 72 might score 78, missing the threshold for needed services.
- Gifted program placement -- Conversely, students may appear to qualify for gifted programs (typically IQ 130+) when their true score on current norms would be closer to 124.
- Forensic evaluations -- In capital punishment cases in the United States, an IQ score below 70 can exempt a defendant from the death penalty (per Atkins v. Virginia, 2002). Score drift from outdated norms can literally be a matter of life and death.
- Disability benefits -- IQ thresholds determine eligibility for certain social services and accommodations.
"Score drift is not a minor technical issue. In clinical and forensic settings, a 5-point difference can change a diagnosis, alter a life trajectory, or determine a legal outcome."
-- Kevin McGrew, psychometric researcher and IQ test expert
How Online IQ Tests Handle Norming
Online IQ tests, including our assessments at whats-your-iq.com, face unique norming challenges compared to clinical instruments like the WAIS.
Clinical vs. Online Norming
| Factor | Clinical IQ Tests (WAIS, WISC) | Online IQ Tests |
|---|---|---|
| Normative sample | Carefully stratified, demographically matched | Often larger but self-selected |
| Administration | Standardized, one-on-one with trained examiner | Self-administered, uncontrolled environment |
| Sample size | 2,000-5,000 | Can be tens of thousands or more |
| Demographic control | Census-matched | May skew toward certain demographics |
| Renorming frequency | Every 11-20 years | Can be updated continuously |
| Cost per norming | $5-10 million | Significantly lower |
The advantage of online testing is the ability to collect massive amounts of data continuously, enabling more frequent norm updates. The disadvantage is that the sample is self-selected -- people who choose to take online IQ tests may not represent the general population.
Our tests address this by using statistical corrections and regularly updating our scoring algorithms to align with established IQ distributions. For the most accurate online assessment, try our full IQ test or our timed IQ test for processing speed evaluation.
How to Interpret Your IQ Score Knowing About Norming
Armed with knowledge about norming, here are practical guidelines for interpreting any IQ score:
Questions to Ask About Any IQ Test Result
- When was the test normed? If the norms are more than 15 years old, your score may be inflated by 3-5 points due to score drift.
- What population was the normative sample? If you differ significantly from the norming population (age, language, culture), the norms may not accurately represent your standing.
- What is the test's standard error of measurement? Most IQ tests have an SEM of about 3-5 points. A score of 112 might represent a "true" score anywhere from 107 to 117.
- Was the test administered under standardized conditions? Distractions, fatigue, or anxiety can depress scores by 5-10 points regardless of norming quality.
- How does this score compare across different tests? If you have taken multiple tests, look for consistency rather than fixating on any single score.
"A single IQ score is best understood as a range, not a point. The confidence interval is where the science lives."
-- Cecil Reynolds, neuropsychologist and psychometrics expert
For practice and familiarization with cognitive testing formats, try our practice IQ test or take the quick IQ assessment for a faster evaluation.
References
- Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101(2), 171-191.
- Flynn, J. R. (2007). What Is Intelligence? Beyond the Flynn Effect. Cambridge University Press.
- Bratsberg, B., & Rogeberg, O. (2018). Flynn effect and its reversal are both environmentally caused. Proceedings of the National Academy of Sciences, 115(26), 6674-6678.
- Kaufman, A. S. (2009). IQ Testing 101. Springer Publishing.
- Wechsler, D. (2008). Wechsler Adult Intelligence Scale -- Fourth Edition (WAIS-IV): Technical and interpretive manual. Pearson.
- Trahan, L. H., Stuebing, K. K., Fletcher, J. M., & Hiscock, M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332-1360.
- Reynolds, C. R., & Niland, J. (1980). Time and score distributions in IQ testing: The problem of bias. Journal of School Psychology, 18(4), 341-348.
- Carroll, J. B. (1993). Human Cognitive Abilities: A Survey of Factor-Analytic Studies. Cambridge University Press.
- McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37(1), 1-10.
Frequently Asked Questions
How often should IQ tests be renormed to maintain accuracy?
Major IQ tests like the WAIS and WISC are typically renormed every **11-20 years**, with the recommended interval being approximately 15 years. The Flynn effect causes scores to drift upward by about **3 points per decade**, so after 15 years, average scores on outdated norms will be inflated by roughly 4-5 points. The American Psychological Association recommends that clinicians use the most recent edition of any test, and some forensic guidelines explicitly require corrections when older norms are used. Online tests can update norms more frequently due to continuous data collection.
Can score drift affect the validity of IQ comparisons over time?
Absolutely. Score drift makes it ***impossible to directly compare*** IQ scores obtained at different times on tests with different norms unless adjustments are applied. For example, an IQ score of 100 on the WAIS-R (1981 norms) is equivalent to roughly **95** on the WAIS-IV (2008 norms). In forensic settings, clinicians routinely apply Flynn effect corrections (typically subtracting 0.3 points per year since norming) to ensure fair comparisons. Without such corrections, score drift can lead to misdiagnosis, inappropriate educational placements, and flawed legal decisions.
What are the risks of using a non-representative sample in IQ norming?
A non-representative sample produces ***systematically biased norms***, meaning certain groups will be scored unfairly. For example, if the norming sample includes disproportionately educated individuals, the average raw score will be inflated, causing less-educated test-takers to receive artificially low IQ scores. Historical cases include early IQ tests normed primarily on White, middle-class populations, which produced biased results for minority groups. Modern test publishers like Pearson and PAR use census-matched stratified sampling to minimize this risk, but no sample is perfectly representative.
How does the Flynn effect influence the need for renorming IQ tests?
The Flynn effect is the ***primary driver*** of renorming schedules. Because average cognitive performance improves by about 3 points per decade, norms become outdated predictably. Without renorming, a test normed in 2000 would produce mean scores of approximately 103 by 2010 and 106 by 2020 -- making "average" people appear above average. This inflates IQ scores systematically across the population, undermining the test's fundamental purpose of providing an accurate relative ranking.
Are norm-referenced IQ tests culturally biased despite norming?
Even well-normed tests can contain ***cultural bias*** in two forms: (1) **content bias**, where specific items assume knowledge or experiences more common in certain cultures, and (2) **structural bias**, where the relationship between test scores and the underlying cognitive abilities differs across groups. Modern tests like the WAIS-V undergo extensive bias analysis, including differential item functioning (DIF) studies that flag items performing differently across demographic groups. Culture-fair tests like Raven's Progressive Matrices attempt to minimize cultural content, but no test is entirely culture-free. Interpreting scores within cultural context remains essential.
Can an individual's IQ score change due to renorming?
Yes. When a test is renormed, the scoring tables change. A raw score that translated to an IQ of 112 on the old norms might correspond to an IQ of **108** on the new norms. The individual's actual cognitive ability has not changed -- what changed is the reference population. This is why psychologists always report which test edition and which norms were used when documenting IQ scores. If you took a test years ago, your score on current norms may be several points lower than the number you were originally given.
How can I prepare for an IQ test to get an accurate norm-referenced score?
The most effective preparation involves **familiarizing yourself with test formats** without memorizing specific answers. Practice with our [practice IQ test](/en/practice-iq-test) to reduce test anxiety and improve your test-taking efficiency. Beyond that: get adequate sleep the night before (sleep deprivation can reduce scores by 5-8 points), eat a proper meal, minimize distractions during testing, and approach the test in a calm, focused state. Remember that practice effects diminish after 1-2 exposures to the format -- repeated test-taking does not indefinitely inflate your score.
Why do some IQ tests offer timed sections, and how does this relate to norming?
Timed sections measure **processing speed** -- how quickly and accurately you can perform cognitive operations under time pressure. This domain is normed separately because processing speed varies significantly with age (peaking in the mid-20s and declining thereafter). The norms for timed sections must account for this age-related variation, which is why age-stratified normative samples are essential. On our [timed IQ test](/en/iq-test), your performance is compared against age-appropriate norms to provide an accurate assessment of your processing speed relative to peers.
Curious about your IQ?
You can take a free online IQ test and get instant results.
Take IQ Test