Introduction
Aggregations: reduce multiple rows to single value. Summarize: COUNT rows, SUM amounts, AVG values. Essential analytics: reporting, dashboard. GROUP BY: aggregations per group. Window functions: advanced (aggregates without reducing rows).
Foundation: fundamental query operation. Every reporting query: aggregation. Understanding: essential skill.
"Aggregations transform raw data into insights. Count, sum, average: simple yet powerful. Mastery: foundation of reporting and analytics." -- SQL fundamentals
Basic Aggregate Functions
Function Overview
COUNT: number of rows/values. SUM: total of values. AVG: average value. MIN: smallest value. MAX: largest value. All: reduce multiple to single.
Syntax
SELECT
COUNT(*) AS row_count,
COUNT(column) AS non_null_count,
SUM(salary) AS total,
AVG(salary) AS average,
MIN(salary) AS minimum,
MAX(salary) AS maximum
FROM employees;
Scalar Result
Result: single row (aggregates whole table). Typically one result (unless GROUP BY). Numeric: COUNT, SUM, AVG. Comparable: MIN, MAX.
COUNT Function
Variations
COUNT(*): all rows (including NULLs). COUNT(column): non-NULL values only. COUNT(DISTINCT column): unique non-NULL values.
Examples
SELECT COUNT(*) FROM employees; -- Total rows
SELECT COUNT(manager_id) FROM employees; -- Non-NULL managers
SELECT COUNT(DISTINCT dept_id) FROM employees; -- Unique departments
Usage
Count all: row count. Count specific: non-NULL values (missing detection). Count distinct: unique values (cardinality).
SUM and AVG Functions
SUM
Adds values: numeric only. NULLs ignored. Example: SELECT SUM(salary) FROM employees (total payroll).
AVG
Average: SUM / COUNT (non-NULL). NULLs excluded. Example: SELECT AVG(salary) FROM employees.
Distinction
AVG ≠ SUM / COUNT(*): COUNT includes NULLs, AVG doesn't (divided by non-NULL count). Careful: different results.
Example
Values: [100, 200, NULL, 300]
SUM: 600
COUNT(*): 4
COUNT(salary): 3
AVG: 600/3 = 200 (NOT 600/4 = 150)
MIN and MAX Functions
Definition
MIN: smallest value. MAX: largest value. Works: numeric, string, date. Ignores: NULLs.
Examples
SELECT
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees;
String Comparison
Lexicographic: 'Alice' < 'Bob' (alphabetic). Date: chronological. Useful: range detection.
NULL Handling in Aggregates
General Rule
Aggregates ignore NULLs (except COUNT(*)). SUM([1, 2, NULL, 4]) = 7 (not NULL). Result: still NULL if all values NULL (or empty set).
Empty Set
No rows: COUNT(*) = 0. SUM/AVG/MIN/MAX = NULL (no data). Important: distinguish zero from no data.
COALESCE Usage
SELECT COALESCE(AVG(salary), 0) AS average_salary
FROM employees
WHERE dept = 'unknown';
Result: 0 if no matching employees (NULL converted)
GROUP BY Clause
Purpose
Partition rows: groups by column. Aggregate per group. Result: one row per group value.
Example
SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM employees
GROUP BY dept_id;
Result
One row per department. Rows grouped: aggregates per group. COUNT, AVG computed: per department.
Multiple Columns
GROUP BY dept_id, salary_level
(groups on both columns: combinations)
Restrictions
SELECT list: grouped columns or aggregates only. Non-grouped column: ambiguous (which row's value?). Standard: enforced (MySQL strict mode).
HAVING Clause
Purpose
Filter groups: WHERE filters rows, HAVING filters groups. Applied: after aggregation.
Example
SELECT dept_id, COUNT(*) AS emp_count
FROM employees
GROUP BY dept_id
HAVING emp_count > 5;
Difference
WHERE: row-level (before GROUP BY). HAVING: group-level (after GROUP BY). Different purposes: use appropriately.
Conditions
HAVING AVG(salary) > 60000 AND COUNT(*) >= 3
Window Functions
Concept
Aggregates without reducing rows. Each row: includes aggregate (e.g., running total). Advanced: SUM(...) OVER (ORDER BY date).
Example
SELECT emp_id, salary, SUM(salary) OVER (ORDER BY emp_id) AS running_total
FROM employees;
Advantages
Ranking, running totals, lead/lag: sophisticated analysis. Powerful: detailed insights.
Complex
Advanced topic: requires separate study. Beyond scope: basic aggregations sufficient for now.
Performance Considerations
Full Table Scan
Aggregations typically: scan all rows. No index optimization (aggregate on entire set). Large tables: slow.
Index Usage
Index on GROUP BY column: scan optimization possible. ORDER BY aggregate: may use index. Optimizer decides.
Materialized Views
Pre-compute aggregations: cache results. Update: periodic (if fresh data needed). Reduces: query cost (trade storage).
Optimization
Avoid: unnecessary aggregations. Selective: filter before (WHERE clause). Index: GROUP BY columns (helpful).
Practical Examples
Sales Summary
SELECT
product_id,
COUNT(*) AS orders,
SUM(amount) AS revenue,
AVG(amount) AS avg_order
FROM sales
GROUP BY product_id
HAVING SUM(amount) > 10000
ORDER BY revenue DESC;
Department Statistics
SELECT
dept_name,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
JOIN departments ON employees.dept_id = departments.dept_id
GROUP BY dept_id, dept_name;
Time-Based Aggregation
SELECT
DATE(order_date) AS order_day,
COUNT(*) AS daily_orders,
SUM(total) AS daily_revenue
FROM orders
GROUP BY DATE(order_date)
ORDER BY order_day DESC;
References
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- ISO/IEC 9075-1:2016 Information Technology - Database Languages - SQL.
- Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.