Introduction

Aggregations: reduce multiple rows to single value. Summarize: COUNT rows, SUM amounts, AVG values. Essential analytics: reporting, dashboard. GROUP BY: aggregations per group. Window functions: advanced (aggregates without reducing rows).

Foundation: fundamental query operation. Every reporting query: aggregation. Understanding: essential skill.

"Aggregations transform raw data into insights. Count, sum, average: simple yet powerful. Mastery: foundation of reporting and analytics." -- SQL fundamentals

Basic Aggregate Functions

Function Overview

COUNT: number of rows/values. SUM: total of values. AVG: average value. MIN: smallest value. MAX: largest value. All: reduce multiple to single.

Syntax

SELECT
 COUNT(*) AS row_count,
 COUNT(column) AS non_null_count,
 SUM(salary) AS total,
 AVG(salary) AS average,
 MIN(salary) AS minimum,
 MAX(salary) AS maximum
FROM employees;

Scalar Result

Result: single row (aggregates whole table). Typically one result (unless GROUP BY). Numeric: COUNT, SUM, AVG. Comparable: MIN, MAX.

COUNT Function

Variations

COUNT(*): all rows (including NULLs). COUNT(column): non-NULL values only. COUNT(DISTINCT column): unique non-NULL values.

Examples

SELECT COUNT(*) FROM employees; -- Total rows
SELECT COUNT(manager_id) FROM employees; -- Non-NULL managers
SELECT COUNT(DISTINCT dept_id) FROM employees; -- Unique departments

Usage

Count all: row count. Count specific: non-NULL values (missing detection). Count distinct: unique values (cardinality).

SUM and AVG Functions

SUM

Adds values: numeric only. NULLs ignored. Example: SELECT SUM(salary) FROM employees (total payroll).

AVG

Average: SUM / COUNT (non-NULL). NULLs excluded. Example: SELECT AVG(salary) FROM employees.

Distinction

AVG ≠ SUM / COUNT(*): COUNT includes NULLs, AVG doesn't (divided by non-NULL count). Careful: different results.

Example

Values: [100, 200, NULL, 300]
SUM: 600
COUNT(*): 4
COUNT(salary): 3
AVG: 600/3 = 200 (NOT 600/4 = 150)

MIN and MAX Functions

Definition

MIN: smallest value. MAX: largest value. Works: numeric, string, date. Ignores: NULLs.

Examples

SELECT
 MIN(salary) AS min_salary,
 MAX(salary) AS max_salary
FROM employees;

String Comparison

Lexicographic: 'Alice' < 'Bob' (alphabetic). Date: chronological. Useful: range detection.

NULL Handling in Aggregates

General Rule

Aggregates ignore NULLs (except COUNT(*)). SUM([1, 2, NULL, 4]) = 7 (not NULL). Result: still NULL if all values NULL (or empty set).

Empty Set

No rows: COUNT(*) = 0. SUM/AVG/MIN/MAX = NULL (no data). Important: distinguish zero from no data.

COALESCE Usage

SELECT COALESCE(AVG(salary), 0) AS average_salary
FROM employees
WHERE dept = 'unknown';

Result: 0 if no matching employees (NULL converted)

GROUP BY Clause

Purpose

Partition rows: groups by column. Aggregate per group. Result: one row per group value.

Example

SELECT dept_id, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM employees
GROUP BY dept_id;

Result

One row per department. Rows grouped: aggregates per group. COUNT, AVG computed: per department.

Multiple Columns

GROUP BY dept_id, salary_level
(groups on both columns: combinations)

Restrictions

SELECT list: grouped columns or aggregates only. Non-grouped column: ambiguous (which row's value?). Standard: enforced (MySQL strict mode).

HAVING Clause

Purpose

Filter groups: WHERE filters rows, HAVING filters groups. Applied: after aggregation.

Example

SELECT dept_id, COUNT(*) AS emp_count
FROM employees
GROUP BY dept_id
HAVING emp_count > 5;

Difference

WHERE: row-level (before GROUP BY). HAVING: group-level (after GROUP BY). Different purposes: use appropriately.

Conditions

HAVING AVG(salary) > 60000 AND COUNT(*) >= 3

Window Functions

Concept

Aggregates without reducing rows. Each row: includes aggregate (e.g., running total). Advanced: SUM(...) OVER (ORDER BY date).

Example

SELECT emp_id, salary, SUM(salary) OVER (ORDER BY emp_id) AS running_total
FROM employees;

Advantages

Ranking, running totals, lead/lag: sophisticated analysis. Powerful: detailed insights.

Complex

Advanced topic: requires separate study. Beyond scope: basic aggregations sufficient for now.

Performance Considerations

Full Table Scan

Aggregations typically: scan all rows. No index optimization (aggregate on entire set). Large tables: slow.

Index Usage

Index on GROUP BY column: scan optimization possible. ORDER BY aggregate: may use index. Optimizer decides.

Materialized Views

Pre-compute aggregations: cache results. Update: periodic (if fresh data needed). Reduces: query cost (trade storage).

Optimization

Avoid: unnecessary aggregations. Selective: filter before (WHERE clause). Index: GROUP BY columns (helpful).

Practical Examples

Sales Summary

SELECT
 product_id,
 COUNT(*) AS orders,
 SUM(amount) AS revenue,
 AVG(amount) AS avg_order
FROM sales
GROUP BY product_id
HAVING SUM(amount) > 10000
ORDER BY revenue DESC;

Department Statistics

SELECT
 dept_name,
 COUNT(*) AS emp_count,
 AVG(salary) AS avg_salary,
 MIN(salary) AS min_salary,
 MAX(salary) AS max_salary
FROM employees
JOIN departments ON employees.dept_id = departments.dept_id
GROUP BY dept_id, dept_name;

Time-Based Aggregation

SELECT
 DATE(order_date) AS order_day,
 COUNT(*) AS daily_orders,
 SUM(total) AS daily_revenue
FROM orders
GROUP BY DATE(order_date)
ORDER BY order_day DESC;

References

  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
  • ISO/IEC 9075-1:2016 Information Technology - Database Languages - SQL.
  • Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.