Introduction

Subquery: query within query. Nested: inner query runs first. Results: feed outer query. Flexibility: complex logic. Performance: potential issues (depends on optimizer). Alternative to joins: sometimes better, sometimes worse.

Types: scalar (single value), IN/EXISTS (membership), correlated (references outer), derived table (FROM subquery).

"Subqueries enable complex queries through nesting. Readability: sometimes better than joins. Performance: must verify (optimizer dependent). Use judiciously: measure actual cost." -- Query design

Scalar Subqueries

Definition

Returns single value: one row, one column. Used: in WHERE, SELECT, FROM clause. Syntax: (SELECT column FROM table WHERE condition).

Example

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Execution

Inner query first: compute average salary. Outer query: filter using value. Result: employees above average.

Multiple Scalars

SELECT
 name,
 salary,
 (SELECT AVG(salary) FROM employees) AS company_avg,
 (SELECT MAX(salary) FROM employees) AS max_salary
FROM employees;

Risk

Multiple rows: error. Always verify: subquery returns single row.

IN Subqueries

Definition

Returns list: multiple rows. Checks: value in list. Syntax: WHERE column IN (SELECT ...).

Example

SELECT name
FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE region='East');

Semantics

Equivalent: WHERE dept_id = 10 OR dept_id = 20 OR dept_id = 30 (if subquery returns 10, 20, 30).

NOT IN

WHERE dept_id NOT IN (SELECT dept_id FROM departments WHERE region='West');

Null Handling

NOT IN with NULLs: all rows rejected (NULL comparisons). Safer: use NOT EXISTS instead.

EXISTS Subqueries

Definition

Tests: existence (true/false). Doesn't return values. Efficient: stops after finding one match.

Example

SELECT dept_name
FROM departments d
WHERE EXISTS (SELECT 1 FROM employees WHERE dept_id = d.dept_id);

Semantics

For each department: check if any employee in that department. True: include department. False: exclude.

NOT EXISTS

WHERE NOT EXISTS (SELECT 1 FROM employees WHERE dept_id = d.dept_id);

Efficiency

Stops early: one match found, returns true. Better: than COUNT (which counts all). Semantic: clear intent.

Correlated Subqueries

Definition

References outer query: subquery uses outer column. Executed: per outer row. Expensive: potentially O(n*m).

Example

SELECT name, salary
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id);

Execution

For each employee: run subquery (compute department average). Compare: salary against average. Expensive: subquery repeats per row.

Performance Issue

N employees, M departments: O(N*M) operations (worst case). Solution: JOIN instead (O(N log N) with index). Avoid if possible.

When Useful

Readable: clearer logic (sometimes). Unavoidable: complex relationships. Acceptable: small tables (performance acceptable).

Derived Tables (FROM Subqueries)

Definition

Subquery in FROM clause: treated as table. Alias required. Enables: multi-step queries, complex joins.

Example

SELECT dept, avg_sal FROM (
 SELECT dept_id AS dept, AVG(salary) AS avg_sal
 FROM employees
 GROUP BY dept_id
) AS dept_stats
WHERE avg_sal > 60000;

Advantage

Readability: breaks complex query. Multiple steps: logical separation. Maintainability: easier understanding.

Materialization

Subquery: executed, result stored (temp). Joined with outer: memory overhead. Optimizer: decides materialization strategy.

Performance Impact

Optimizer Behavior

Convert: subquery to JOIN (if beneficial). Inlining: move subquery into main query. Depends: DBMS, query structure. Unpredictable: test empirically.

Correlated Cost

Per-row execution: expensive. Repetition: subquery runs repeatedly. Avoidable: use JOIN instead (set operation, faster).

Scalar vs. IN

IN: set operation (efficient). NOT IN with NULLs: problematic. EXISTS: efficient (stops early). Choose: based on logic and performance.

EXPLAIN Analysis

EXPLAIN SELECT ... FROM ... WHERE EXISTS ...;
Check: subquery executed once or per row?
Nested loop: indicates per-row (expensive)
Semi-join: indicates optimization applied

Subqueries vs. Joins

Equivalence

Many subqueries: expressible as JOINs. JOINs: often more efficient. Optimizer: may convert automatically. Semantics: same result (usually).

Example Comparison

Subquery:
SELECT name FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE region='East');

Join:
SELECT DISTINCT e.name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE d.region = 'East';

Performance Implications

JOIN: typically faster (set operations optimized). Subquery: may be slower (depends on optimizer). Test: benchmark both.

Readability

Subquery: sometimes clearer (nested logic). JOIN: sometimes clearer (explicit relationships). Choose: based on understandability.

Recommendation

Prefer JOINs: generally better performance. Use subqueries: when necessary or for readability. Measure: verify actual performance.

Query Optimization

Rewriting Strategies

Subquery to JOIN: if equivalent. Derived table: simplify (remove unnecessary columns). Remove: unused subqueries.

Index Usage

Subquery filter: can use index (on filtered column). Correlated: index on joining column (speeds lookup). Analyze: execution plan.

Simplification

Complex nested subquery: break into CTE (more readable)
Multiple subqueries: consolidate where possible
Redundant: eliminate duplicate logic

Common Table Expressions (CTEs)

Definition

Named subquery: reusable. WITH clause: define before main query. Improves readability: named intermediate results.

Example

WITH dept_stats AS (
 SELECT dept_id, AVG(salary) AS avg_sal
 FROM employees
 GROUP BY dept_id
)
SELECT e.name, e.salary
FROM employees e
JOIN dept_stats ds ON e.dept_id = ds.dept_id
WHERE e.salary > ds.avg_sal;

Advantages

Readability: named intermediate steps. Reusability: used multiple times. Clarity: complex queries simplified.

Recursive CTEs

Advanced: hierarchical data (trees). Complex: requires separate study.

Practical Examples

Find High Earners

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Employees in Specific Regions

SELECT name
FROM employees
WHERE dept_id IN (
 SELECT dept_id FROM departments WHERE region IN ('East', 'West')
);

Departments with Employees

SELECT dept_name
FROM departments
WHERE EXISTS (SELECT 1 FROM employees WHERE dept_id = departments.dept_id);

Complex Multi-Step

WITH sales_summary AS (
 SELECT product_id, SUM(amount) AS total FROM sales GROUP BY product_id
)
SELECT p.name, s.total
FROM products p
JOIN sales_summary s ON p.id = s.product_id
WHERE s.total > 100000;

References

  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
  • ISO/IEC 9075-1:2016 Information Technology - Database Languages - SQL.