The Fairness Problem in IQ Testing: Why It Matters

Intelligence testing has been one of psychology's most powerful -- and most controversial -- tools since Alfred Binet developed the first practical IQ test in 1905. The core question remains: do IQ tests measure innate cognitive ability, or do they partly measure cultural knowledge, socioeconomic advantage, and familiarity with testing conventions?

The stakes are enormous. IQ test scores influence gifted program admissions, special education placements, college readiness assessments, military assignments, employment decisions, and even forensic determinations in criminal cases (where an IQ below 70 can determine whether a death sentence is permitted under Atkins v. Virginia, 2002). When these tests contain systematic biases, the consequences ripple across millions of lives.

"The question is not whether intelligence can be measured, but whether it can be measured fairly across all populations. That is the great unresolved challenge of psychometrics."
-- Dr. Robert Sternberg, Cornell University, former president of the American Psychological Association

This article examines the evidence for bias in IQ testing, the specific mechanisms through which bias operates, and the solutions that researchers and test developers are implementing to build more equitable assessments.


Defining Test Bias: What Psychometricians Actually Mean

In everyday language, "bias" suggests unfairness or prejudice. In psychometrics, test bias has a precise technical definition: a test is biased if it systematically over- or under-predicts the criterion performance of a particular group. This is a crucial distinction because group differences in average scores alone do not constitute bias.

Three Types of Psychometric Bias

Type of Bias Technical Definition Example
Construct bias The test measures different psychological constructs in different groups A verbal reasoning test measuring vocabulary knowledge in one group but reading comprehension in another
Method bias Differences arising from the testing method rather than the trait being measured A computer-based test disadvantaging groups with less technology access
Item bias (DIF) Specific items function differently across groups of equal ability A math word problem about baseball statistics disadvantaging children unfamiliar with the sport

"A test that is biased is one that yields systematically different meanings for scores earned by members of different groups. It is not merely one that yields different scores."
-- Dr. Cecil Reynolds, Texas A&M University, co-author of Bias in Psychological Assessment

Understanding this distinction matters because much of the public debate conflates group differences with test bias. They are related but distinct phenomena. A test can show group differences and still be unbiased (if the differences reflect genuine variation in the measured trait), or it can show equal group averages while containing biased items (if biases in different directions cancel out).

The Magnitude of Observed Group Differences

Research has documented persistent group differences in average IQ test scores across several demographic dimensions:

Factor Observed Score Gap Trend Over Time Key Contributing Factors
Racial/ethnic (Black-White gap, US) ~15 points (1 SD) historically; narrowing to ~10 points Closing steadily since 1970s Educational access, socioeconomic disparities, stereotype threat
Socioeconomic (top vs. bottom quintile) ~12-18 points Relatively stable Nutrition, cognitive stimulation, educational quality, stress exposure
National (cross-country comparisons) Up to 30+ points Converging as countries develop Education systems, health infrastructure, urbanization
Gender (overall IQ) ~0-2 points (negligible) Stable Greater male variability in distribution tails
Urban-rural ~5-10 points Narrowing Access to education and resources

The Black-White IQ gap has narrowed by approximately one-third since the 1970s, according to Dickens and Flynn (2006), coinciding with improvements in educational equity, desegregation, and reduced childhood poverty. This narrowing strongly suggests that the gap reflects environmental factors rather than fixed biological differences.

"The narrowing of the Black-White IQ gap over the past 30 years is one of the most important findings in the history of intelligence research. It tells us these differences are not immutable."
-- Dr. James Flynn, University of Otago, discoverer of the Flynn Effect


Cultural Bias: How Test Content Reflects Its Creators

Cultural bias occurs when test items require knowledge, language patterns, or problem-solving approaches specific to a particular culture, thereby disadvantaging test-takers from different cultural backgrounds.

Classic Examples of Cultural Bias

The Chitling Test (1968): Sociologist Adrian Dove created this satirical IQ test using African American cultural knowledge to demonstrate how culture-dependent "intelligence" tests really are. Questions about soul food, jazz musicians, and street slang were deliberately opaque to white test-takers -- just as standard IQ test questions about classical music, sailing terminology, or country club etiquette could disadvantage test-takers from different backgrounds.

The Australian Aboriginal Example: Psychologist Judith Kearins (1981) found that Aboriginal Australian children outperformed white Australian children on spatial memory tasks involving the arrangement of objects in space -- a cognitive ability critical to navigating the Australian bush. Standard Western IQ tests, which emphasize verbal reasoning and abstract logic, completely missed this cognitive strength.

Where Cultural Bias Hides in Modern Tests

Test Component How Cultural Bias Enters Example
Vocabulary items Words more common in certain cultural contexts "Regatta" or "sonata" vs. more universally known concepts
Analogies Relationships dependent on culturally specific knowledge "Cup is to saucer as..." requires familiarity with formal table settings
General knowledge Questions reflecting dominant-culture education "Who wrote Hamlet?" assumes a Western literary canon
Picture completion Images reflecting specific cultural environments Identifying what is missing from a picture of a suburban kitchen
Processing expectations Timed tests favoring cultures that value speed over deliberation Many Indigenous cultures prioritize careful, deliberate thought over rapid responses
Test-taking behavior Assumptions about appropriate behavior during testing Some cultures consider it inappropriate to display knowledge assertively or guess when uncertain

Real-world example: When the Wechsler Intelligence Scale for Children (WISC) was first administered in Puerto Rico, children scored significantly lower than mainland US children. Upon investigation, researchers found that many items contained cultural references unfamiliar to Puerto Rican children. When the test was adapted with culturally appropriate content, the gap narrowed substantially (Herrans & Rodriguez, 1989).

"Every test is a sample of behavior. If the sample is drawn from a narrow cultural pool, it will reflect that pool's knowledge and values, not universal human intelligence."
-- Dr. Patricia Greenfield, UCLA, researcher on culture and cognitive development


Socioeconomic Bias: The Poverty-IQ Connection

Socioeconomic status (SES) is one of the strongest correlates of IQ test scores, and the relationship operates through multiple concrete mechanisms that have nothing to do with innate cognitive potential.

How Poverty Affects IQ Scores

Mechanism Effect on IQ Research Evidence
Chronic stress (cortisol) Impairs prefrontal cortex development, reducing working memory and executive function Children in poverty have cortisol levels 40% higher than affluent peers (Evans & Kim, 2013)
Nutrition deficits Iron, iodine, and omega-3 deficiencies impair brain development Iodine supplementation alone can raise IQ by 8-15 points in deficient populations (Qian et al., 2005)
Lead exposure Each 10 ug/dL increase in blood lead reduces IQ by 2-5 points Flint, Michigan water crisis estimated to have affected 6,000-12,000 children
Reduced cognitive stimulation Fewer books, educational toys, and enrichment activities Hart & Risley (1995): children in poverty hear 30 million fewer words by age 3
Educational quality Underfunded schools, larger class sizes, fewer qualified teachers Students in high-poverty schools score 15-20 points lower on standardized tests (NAEP data)
Housing instability Frequent moves disrupt education and increase stress Each school change associated with 0.5-1 point IQ reduction (Mehana & Reynolds, 2004)
Healthcare access Untreated vision, hearing, or cognitive conditions affect test performance ~25% of children in poverty have untreated vision problems affecting reading and test-taking

The Adoption Studies: Nature vs. Nurture in Action

Some of the most compelling evidence that SES affects IQ comes from adoption studies:

  • The French Adoption Study (Duyme et al., 1999): Children with low-SES biological parents who were adopted into high-SES families showed IQ gains of 12-16 points compared to siblings raised in the biological home. This is one of the clearest demonstrations that IQ is substantially influenced by environment.
  • The Minnesota Transracial Adoption Study (Scarr & Weinberg, 1976): Black children adopted by white middle-class families scored above the national average for both Black and white children on IQ tests, suggesting that environmental enrichment significantly influences scores.
  • The Romanian Orphan Studies (Rutter et al., 2007): Children rescued from severely deprived Romanian orphanages and adopted by British families showed IQ gains of 15-20 points over those who remained institutionalized, with earlier adoption producing larger gains.

"The data are now overwhelming: socioeconomic environment has a causal effect on IQ test scores. The question is not whether environment matters, but how much and through what mechanisms."
-- Dr. Eric Turkheimer, University of Virginia, behavioral geneticist


Stereotype Threat: How Expectations Become Self-Fulfilling

One of the most extensively researched mechanisms of test bias is stereotype threat -- the phenomenon where awareness of a negative stereotype about one's group impairs performance on the stereotyped task.

Key Stereotype Threat Findings

Claude Steele and Joshua Aronson's landmark 1995 study demonstrated that when Black college students were told a test measured intellectual ability, they scored significantly lower than white students. When the same test was described as a "laboratory problem-solving task" with no reference to ability measurement, the gap virtually disappeared.

Study Condition Result
Test described as measuring intellectual ability Significant racial gap in scores
Test described as a problem-solving exercise Gap virtually eliminated
Test with demographic questions asked before testing Larger gap than control
Test with demographic questions asked after testing Smaller gap
Test with explicit statement that the test is "fair" Gap reduced

Stereotype threat affects any group for which a negative intellectual stereotype exists:

  • Women in mathematics (Spencer, Steele, & Quinn, 1999)
  • Low-SES individuals on "intelligence" tests (Croizet & Claire, 1998)
  • Elderly adults on memory tests (Levy, 1996)
  • White men compared to Asian men on math tests (Aronson et al., 1999)

The mechanism operates through anxiety-induced cognitive load: the mental resources spent worrying about confirming a stereotype are the same resources needed for complex problem-solving, effectively reducing available working memory capacity.

"Stereotype threat shows us that test performance is not just about what you know or how smart you are. It is also about what you think others think about you."
-- Dr. Claude Steele, Stanford University, author of Whistling Vivaldi


Measuring and Detecting Bias: The Tools of Modern Psychometrics

Test developers use sophisticated statistical methods to identify and remove biased items. Understanding these tools helps evaluate claims about test fairness.

Differential Item Functioning (DIF) Analysis

DIF analysis is the primary method for detecting item-level bias. It works by comparing how individuals of equal ability but from different demographic groups perform on each test item. If an item is harder for one group after controlling for overall ability, it shows DIF and is flagged for review.

DIF Method How It Works Strengths Limitations
Mantel-Haenszel Compares item performance across ability-matched groups using chi-square statistics Simple, widely used, well-understood Limited to two groups; dichotomous items only
Logistic regression Models the probability of correct response as a function of ability, group, and their interaction Handles continuous matching; detects non-uniform DIF Requires larger samples
Item Response Theory (IRT) Compares item characteristic curves across groups Most precise; handles complex item types Requires very large samples; computationally intensive
SIBTEST Evaluates DIF at the test level (bundles of items) Detects cumulative bias across multiple items More complex to interpret

Real-world application: When the SAT was analyzed for DIF in the 1990s, researchers found that certain math items about sports statistics showed significant DIF against female test-takers -- not because women were worse at math, but because the content context was more familiar to male test-takers. These items were revised or removed, improving test fairness.

Modern IQ tests typically undergo multiple rounds of DIF analysis during development:

  1. Item tryout with diverse pilot samples
  2. DIF screening to flag problematic items
  3. Expert review of flagged items by diverse panels
  4. Revision or removal of items showing significant bias
  5. Re-testing with new samples to verify improvements

Culture-Fair and Culture-Reduced Testing: Solutions and Limitations

In response to cultural bias concerns, psychometricians have developed tests specifically designed to minimize cultural content. These "culture-fair" tests rely primarily on nonverbal, figural reasoning tasks.

Major Culture-Fair IQ Tests

Test Developer Format What It Measures Strengths Limitations
Raven's Progressive Matrices John C. Raven (1938) Pattern completion with abstract figures Nonverbal fluid reasoning Minimal language; widely used cross-culturally Still shows group differences; ceiling effects for gifted
Cattell Culture Fair Intelligence Test Raymond Cattell (1940s) Series, classifications, matrices, conditions Fluid intelligence Explicitly designed for cross-cultural use Less predictive of academic outcomes than verbal tests
Naglieri Nonverbal Ability Test (NNAT) Jack Naglieri Pattern completion, analogies, serial reasoning Nonverbal reasoning Reduces racial/ethnic gaps by ~50% Does not capture verbal or crystallized intelligence
Universal Nonverbal Intelligence Test (UNIT) Bracken & McCallum Entirely nonverbal administration and response Memory, reasoning, symbolic/nonsymbolic Can be administered without any spoken language Time-intensive to administer individually
Leiter International Performance Scale Russell Leiter Nonverbal tasks with minimal instructions Fluid reasoning, attention, memory Designed for hearing-impaired and non-English speakers Norming samples smaller than WISC or Stanford-Binet

The Culture-Fair Paradox

Despite their design intent, culture-fair tests have a fundamental limitation: they still show group differences, though typically smaller ones. This has led researchers to recognize that no test can be fully culture-free because:

  1. Test-taking itself is a cultural activity -- sitting in a quiet room, working individually, competing against a clock, and selecting from predetermined answers are all culturally specific behaviors.
  2. Nonverbal reasoning is also culturally influenced -- exposure to puzzles, pattern recognition games, and spatial activities varies across cultures and socioeconomic groups.
  3. Motivation and familiarity with testing contexts differ across populations.

"The idea of a culture-free test is a contradiction in terms. All behavior -- including test behavior -- occurs in a cultural context. The best we can aspire to is culture-reduced testing."
-- Dr. Anne Anastasi, Fordham University, author of Psychological Testing (7th edition)

Our online assessments at whats-your-iq.com are designed with these principles in mind. The full IQ test uses pattern-based reasoning tasks that minimize cultural and linguistic content, while the practice test helps users become familiar with test formats, reducing the anxiety and unfamiliarity effects that can depress scores.


The Flynn Effect: Evidence That IQ Is Not Fixed

Perhaps the strongest evidence against a purely biological interpretation of IQ scores is the Flynn Effect -- the well-documented phenomenon of rising IQ scores across generations.

Flynn Effect Data Across Countries

Country Period Studied Average IQ Gain Per Decade Total Estimated Gain
United States 1932-2000 ~3 points ~20 points
United Kingdom 1942-2008 ~3.1 points ~20 points
Netherlands 1952-1982 ~7 points ~21 points
Japan 1950-1990 ~3.3 points ~13 points
Kenya 1984-1998 ~11 points ~15 points
Denmark 1958-1998 ~3.6 points ~14 points

The gains have been largest on culture-fair tests like Raven's Progressive Matrices -- precisely the tests designed to measure "pure" intelligence. This paradox suggests that even "culture-fair" measures are sensitive to environmental changes.

What caused the Flynn Effect? Researchers point to:

  • Improved nutrition (especially in early childhood)
  • Better education (more years of schooling, improved pedagogical methods)
  • Reduced family sizes (more parental investment per child)
  • Greater cognitive complexity in daily life (technology, media, urbanization)
  • Improved healthcare (fewer childhood illnesses affecting brain development)

Important note: Recent research suggests the Flynn Effect has plateaued or reversed in some developed nations (Dutton & Lynn, 2013; Bratsberg & Rogeberg, 2018), possibly due to immigration patterns, changing education systems, or measurement artifacts. This is an active area of research.


Practical Solutions for Fairer Testing

Based on decades of research, here are the concrete steps being taken -- and that should be taken -- to improve IQ test fairness.

For Test Developers

  1. Diverse norming samples -- Modern tests like the WISC-V use norming samples that closely match national demographics by race, ethnicity, SES, geographic region, and language background
  2. Rigorous DIF analysis -- Every item analyzed for differential functioning across demographic groups before inclusion
  3. Expert review panels -- Diverse panels review items for cultural content, linguistic complexity, and potential bias
  4. Multiple score reporting -- Report subtest scores (not just composites) so that specific strengths and weaknesses are visible, which helps identify masking effects in 2e and culturally diverse populations
  5. Adaptive testing -- Computer-adaptive tests that adjust difficulty in real time reduce floor/ceiling effects and measurement error

For Test Administrators

  1. Standardized conditions -- Consistent environment, instructions, and timing across all test-takers
  2. Rapport building -- Establishing trust and comfort before testing begins, especially with children and culturally diverse populations
  3. Language considerations -- Testing in the examinee's dominant language or providing bilingual assessments when available
  4. Reducing stereotype threat -- Framing tests as "problem-solving activities" rather than "intelligence measures"; not collecting demographic information before testing
  5. Multiple assessment methods -- Supplementing IQ tests with portfolio assessments, dynamic assessment, and behavioral observations

For Score Interpreters

  1. Contextual interpretation -- Always consider SES, educational history, language background, and testing conditions when interpreting scores
  2. Confidence intervals -- Report scores as ranges (e.g., "IQ: 112-120") rather than single numbers to acknowledge measurement error
  3. Avoid over-interpretation -- A single IQ score should never be the sole basis for high-stakes decisions
  4. Pattern analysis -- Look at subtest variability, not just composite scores, to identify specific strengths and areas of challenge

"Intelligence testing should be a tool for empowerment, not a mechanism of exclusion. When used properly and interpreted fairly, these tests can open doors. When misused, they can slam them shut."
-- Dr. Howard Gardner, Harvard University, creator of the theory of Multiple Intelligences


IQ testing fairness is not just a scientific question -- it is also a legal and ethical one. Several landmark legal cases have shaped how IQ tests can be used.

Case Year Ruling Impact
Larry P. v. Riles 1979 Banned IQ tests for placing Black students in special education in California Established that culturally biased tests cannot be used for educational placement
PASE v. Hannon 1980 Ruled that IQ tests were not culturally biased (contradicting Larry P.) Created conflicting legal precedents; highlighted complexity of the issue
Griggs v. Duke Power Co. 1971 Ruled that employment tests must be job-related and consistent with business necessity Required validation evidence for all employment testing
Atkins v. Virginia 2002 Banned execution of intellectually disabled individuals (IQ < 70) Made IQ score accuracy and fairness a life-or-death legal issue

Ethical Standards

The American Psychological Association's Standards for Educational and Psychological Testing (2014) requires:

  • Tests must be validated for the populations and purposes for which they are used
  • Test users must be qualified and trained in proper administration and interpretation
  • Scores must be interpreted in light of relevant contextual factors
  • Test bias must be evaluated and minimized through empirical methods
  • Test-takers have the right to understand their scores and how they will be used

What Individuals Can Do: Navigating Testing Fairly

If you or your child are facing an IQ assessment, here are evidence-based steps to ensure the fairest possible evaluation.

Before Testing

  • Build familiarity with test formats by practicing with assessments like our practice IQ test or quick IQ test -- research shows that test familiarity can improve scores by 5-10 points (Hausknecht et al., 2007)
  • Ensure adequate sleep (7-9 hours for adults; 9-12 for children) -- sleep deprivation impairs the cognitive functions IQ tests measure most
  • Eat a balanced meal before testing -- glucose is the brain's primary fuel, and low blood sugar impairs working memory
  • Reduce anxiety through breathing exercises or mindfulness -- test anxiety can depress scores by 12+ points

During Testing

  • Ask for clarification if instructions are unclear -- this is always allowed
  • Use the full time allotted -- rushing through items leads to careless errors
  • Skip and return to difficult items if the test format allows
  • Stay hydrated -- even mild dehydration (1-2%) impairs cognitive performance

After Testing

  • Request a full report including subtest scores, not just the composite
  • Ask about confidence intervals -- a score of 115 with a 95% confidence interval of 110-120 is more meaningful than a single number
  • Consider retesting if conditions were not optimal (illness, anxiety, unfamiliar setting)
  • Get a second opinion if scores will be used for high-stakes decisions

For a reliable, accessible starting point, our full IQ test provides a comprehensive cognitive assessment, while the timed IQ test offers a structured evaluation of reasoning speed and accuracy.


Frequently Asked Questions

How do cultural differences specifically affect IQ test results?

Cultural differences affect IQ scores through multiple concrete mechanisms. First, linguistic bias: vocabulary items like "sonata," "regatta," or "peninsula" are more familiar to individuals from Western, educated backgrounds, as documented by Helms (1992) in American Psychologist. Second, response style differences: some cultures value deliberation over speed, penalizing test-takers on timed assessments even when their reasoning is equally sophisticated. Third, content familiarity: story problems set in suburban American contexts (mowing lawns, baseball statistics) disadvantage children from different cultural environments. Fourth, test-taking expectations: in many cultures, guessing when unsure is considered inappropriate, while Western test-taking strategy specifically encourages guessing on multiple-choice items. Research by Greenfield (1997) found that when test content was adapted to use culturally familiar materials, group differences narrowed by 25-50% in many cases.

What methods are used to detect bias in IQ test questions?

The primary method is Differential Item Functioning (DIF) analysis, which statistically compares how individuals of equal measured ability but from different demographic groups perform on each test item. If an item is significantly harder for one group after controlling for ability, it shows DIF and is flagged for review. The Mantel-Haenszel procedure and Item Response Theory (IRT) methods are the most widely used approaches. Modern test development also employs sensitivity review panels composed of experts from diverse backgrounds who evaluate items for cultural content, linguistic complexity, and potential stereotyping. The WISC-V development process, for example, involved over 2,200 pilot participants across demographic groups and multiple rounds of DIF analysis before finalizing items. Items showing significant DIF are either revised to remove the biasing element or eliminated from the final test form entirely.

Can IQ test scores be considered reliable across different socioeconomic groups?

IQ tests show adequate reliability (test-retest correlations of 0.85-0.95) across SES groups, meaning they consistently measure something. However, validity -- whether that something is truly "intelligence" versus accumulated advantage -- is the critical question. Research by Turkheimer et al. (2003) in a landmark twin study found that in low-SES families, shared environment accounted for 60% of IQ variance (and genetics only 10%), while in high-SES families, genetics accounted for 60-80% of variance. This means that IQ tests in low-SES populations are substantially measuring environmental deprivation rather than cognitive potential. Practical implications: IQ scores from low-SES individuals should be interpreted with greater caution and supplemented with dynamic assessment methods that measure learning potential rather than accumulated knowledge.

Are nonverbal IQ tests completely free from cultural bias?

No. While nonverbal tests like Raven's Progressive Matrices and the NNAT reduce linguistic and content bias, they are not culture-free. Research by Rosselli and Ardila (2003) demonstrated that performance on nonverbal reasoning tasks is significantly influenced by formal education, exposure to two-dimensional representation (pictures, diagrams), and familiarity with multiple-choice formats. Te Nijenhuis et al. (2015) found that group differences on nonverbal tests, while smaller than on verbal tests, still persist at approximately 60-70% of the magnitude of verbal test gaps. The remaining bias comes from differential familiarity with abstract figural patterns, culturally variable attitudes toward timed performance, and SES-related differences in exposure to puzzles, games, and spatial reasoning activities. Nonverbal tests are better described as "culture-reduced" rather than "culture-fair" -- an important distinction acknowledged by most modern test developers.

How can individuals prepare to reduce the impact of bias when taking IQ tests?

The most effective preparation strategy is building test familiarity. A meta-analysis by Hausknecht et al. (2007) found that practice effects improve IQ scores by 5-10 points on average, with the largest gains for individuals with the least prior test-taking experience -- precisely those most affected by bias. Specific steps include: (1) Practice with diverse question types using resources like our practice IQ test, (2) Learn test-taking strategies such as process of elimination, strategic guessing, and time management, (3) Reduce test anxiety through relaxation techniques -- deep breathing before testing has been shown to reduce anxiety-related score depression by up to 5 points (Hembree, 1988), (4) Ensure optimal physical conditions (sleep, nutrition, hydration), and (5) Request accommodations if you have a documented disability or are a non-native speaker of the test language.

What role does technology play in improving fairness in IQ testing?

Technology is advancing fair testing through several mechanisms. Computer-adaptive testing (CAT) adjusts item difficulty in real time based on responses, reducing floor and ceiling effects and providing more precise measurement at all ability levels. Automated DIF detection using machine learning algorithms can identify biased items faster and across more demographic intersections than traditional methods. Standardized digital administration eliminates examiner variability -- a significant source of method bias documented by Glutting et al. (1987). Multilingual platforms can present items in a test-taker's preferred language while maintaining psychometric equivalence. Looking ahead, AI-generated test items could theoretically produce culture-neutral items at scale, though this technology is still in early development and raises its own fairness concerns (algorithmic bias in training data). Our online assessments, including the full IQ test and timed IQ test, leverage standardized digital administration to provide consistent testing conditions for all users.


References

  1. Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797-811.
  2. Flynn, J. R. (2007). What Is Intelligence? Beyond the Flynn Effect. Cambridge University Press.
  3. Dickens, W. T., & Flynn, J. R. (2006). Black Americans reduce the racial IQ gap. Psychological Science, 17(10), 913-920.
  4. Turkheimer, E., Haley, A., Waldron, M., D'Onofrio, B., & Gottesman, I. I. (2003). Socioeconomic status modifies heritability of IQ in young children. Psychological Science, 14(6), 623-628.
  5. Reynolds, C. R., & Suzuki, L. A. (2013). Bias in Psychological Assessment: An Empirical Review and Recommendations. In I. B. Weiner (Ed.), Handbook of Psychology.
  6. Anastasi, A., & Urbina, S. (1997). Psychological Testing (7th ed.). Prentice Hall.
  7. Sternberg, R. J. (2004). Culture and intelligence. American Psychologist, 59(5), 325-338.
  8. Greenfield, P. M. (1997). You can't take it with you: Why ability assessments don't cross cultures. American Psychologist, 52(10), 1115-1124.
  9. Helms, J. E. (1992). Why is there no study of cultural equivalence in standardized cognitive ability testing? American Psychologist, 47(9), 1083-1101.
  10. Hausknecht, J. P., Halpert, J. A., Di Paolo, N. T., & Moriarty Gerrard, M. O. (2007). Retesting in selection: A meta-analysis of coaching and practice effects. Journal of Applied Psychology, 92(2), 373-385.
  11. Duyme, M., Dumaret, A. C., & Tomkiewicz, S. (1999). How can we boost IQs of "dull children"? A late adoption study. Proceedings of the National Academy of Sciences, 96(15), 8790-8794.
  12. Evans, G. W., & Kim, P. (2013). Childhood poverty, chronic stress, self-regulation, and coping. Child Development Perspectives, 7(1), 43-48.
  13. Bratsberg, B., & Rogeberg, O. (2018). Flynn effect and its reversal are both environmentally caused. Proceedings of the National Academy of Sciences, 115(26), 6674-6678.
  14. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. AERA.