Introduction
Arrays are fundamental data structures: contiguous block of memory holding fixed-size elements of uniform type. Simplest abstract data type: sequence of elements indexed 0, 1, 2, ..., n-1. Fast random access (constant time) distinguishes arrays from linked structures.
Arrays ubiquitous: every programming language provides them. Implementation choices: fixed-size (memory allocated at creation) vs. dynamic (grows as needed). Most languages hide complexity: "arrays" are dynamic (vector in C++, ArrayList in Java, list in Python).
Conceptual foundation: arrays enable algorithms. Searching, sorting, dynamic programming all assume array-like structures. Understanding array mechanics essential for efficient programming.
"Arrays are the most basic and important data structure. Mastery of arrays and their operations is prerequisite for understanding all higher-level structures." -- Algorithms textbook fundamentals
Memory Layout and Indexing
Contiguous Allocation
Array elements stored sequentially in memory. If array starts at address 1000 and element size is 4 bytes: element 0 at 1000, element 1 at 1004, element 2 at 1008, etc. Address of element i: base_address + i * element_size.
Memory layout (4-byte integers):
Address: 1000 1004 1008 1012 1016
Value: 5 12 -3 8 14
Index: 0 1 2 3 4
Access arr[2]: compute 1000 + 2*4 = 1008, fetch value -3
Zero-Based Indexing
Most languages use 0-based indexing (first element is index 0). Some older languages (FORTRAN) 1-based. 0-based natural from memory layout: offset 0 = first element.
Random Access
Array enables O(1) access to any element. Address calculation constant time (one multiplication, one addition). No traversal needed unlike linked lists.
Multi-Dimensional Arrays
2D array [rows][cols] stored in memory in two ways:
Row-major (C, Python): [0][0], [0][1], ..., [0][cols-1], [1][0], [1][1], ...
Column-major (Fortran, MATLAB): [0][0], [1][0], ..., [rows-1][0], [0][1], [1][1], ...
Element [i][j] address (row-major): base + (i*cols + j)*element_size
Fixed-Size vs. Dynamic Arrays
Fixed-Size Arrays
Size determined at creation. Memory allocated once, can't grow. Example: C arrays. Efficient (no resizing overhead), inflexible (if you need more elements, must allocate new array and copy).
Dynamic Arrays
Grow as elements added. Underlying implementation: initially allocate larger-than-needed capacity. When full, allocate bigger array, copy elements, deallocate old. Trade-off: space overhead (extra capacity) for growth capability.
Initial capacity: 4
[a, b, c, d, _, _, _, _] (8 slots, 4 used)
Add 5th element:
[a, b, c, d, e, _, _, _]
Add 9th element (full):
Allocate 16-slot array
Copy: [a, b, c, d, e, f, g, h, i, _, _, _, _, _, _, _]
Add element: [a, b, c, d, e, f, g, h, i, j, _, _, _, _, _, _]
Growth Strategies
| Strategy | Growth Factor | Space Overhead | Amortized Cost |
|---|---|---|---|
| Linear growth | capacity += k | O(n) total | O(n) append |
| Doubling | capacity *= 2 | ≤ 2x actual | O(1) append |
| Golden ratio | capacity *= 1.618 | ~1.618x | O(1) append |
Doubling Analysis
With doubling: amortized O(1) append. Why? n appends cost: 1 + 2 + 4 + 8 + ... + n ≈ 2n (geometric series). Total cost O(n) for n operations = O(1) per operation.
Basic Operations: Access, Insert, Delete
Access
arr[i]: O(1) time, 0 space
Compute address: base + i*element_size, fetch value
One CPU operation (plus memory load)
Insert
Insert element at position i:
1. Shift elements [i, n-1] to [i+1, n]
2. Place element at i
Time: O(n) (worst case: shift all)
Space: O(1) (in-place, if capacity available)
Delete
Delete element at position i:
1. Remove element at i
2. Shift elements [i+1, n] to [i, n-1]
Time: O(n) (shift remaining)
Space: O(1)
Search
Linear search: O(n) time, O(1) space
Binary search (sorted array): O(log n) time
Time and Space Complexity Analysis
| Operation | Time | Space | Notes |
|---|---|---|---|
| Access arr[i] | O(1) | O(1) | Direct address computation |
| Insert at end | O(1) amortized | O(1) | Dynamic array doubling |
| Insert at middle | O(n) | O(1) | Shift elements |
| Delete from end | O(1) | O(1) | No shift needed |
| Delete from middle | O(n) | O(1) | Shift remaining |
| Linear search | O(n) | O(1) | No assumptions |
| Binary search | O(log n) | O(1) or O(log n) | Requires sorted array |
| Sort | O(n log n) | O(log n) or O(n) | Merge sort, quicksort |
Space Complexity
Array storage: O(n) for n elements. Additional space: operations like sorting in-place use O(log n) recursion stack or O(n) for merge sort temp arrays. Searching uses O(1) (no extra).
Multidimensional Arrays
2D Arrays (Matrices)
Represented as array of arrays. Logically m x n grid. Physically: contiguous block (row-major or column-major).
2x3 matrix (row-major):
[1, 2, 3]
[4, 5, 6]
Memory: [1, 2, 3, 4, 5, 6] (contiguous)
Address [i][j]: base + (i*3 + j)*element_size
3D and Higher
Extension to 3D+. Address [i][j][k] in i x j x k array: base + (i*j*k + j*k + k) * element_size (row-major). Complexity increases: cache locality suffers (if accessing non-contiguous patterns).
Jagged Arrays
Rows of different lengths. Array of arrays where each row is independent. Memory not contiguous across rows. Flexibility (vary row sizes) vs. cache inefficiency (non-contiguous).
Jagged array:
[1, 2, 3]
[4, 5]
[6, 7, 8, 9]
Physically: three separate arrays
Space overhead: pointers to each row
Dynamic Array Resizing Strategies
Resize Triggers
Resize when: capacity reached (insert) or too much wasted space (delete). Insert trigger: full. Delete trigger: typically size < capacity/4 (shrink to avoid memory waste).
Copy-On-Grow
Allocate new larger array. Copy all elements O(n). Old array deallocated. Amortized O(1) per insert with doubling (cost 1 + 2 + 4 + ... + n ≈ 2n total for n insertions).
Comparison: Doubling vs. Linear Growth
Doubling: Capacity grows exponentially (1, 2, 4, 8, ...). Fewer resizes (O(log n) total). Space overhead at most 2x. Amortized O(1) insert.
Linear growth (capacity += k): More frequent resizes (O(n/k) total). Total space used: n + (n-k) + (n-2k) + ... ≈ O(n^2 / k). Expensive! Amortized O(n) insert.
Doubling Disadvantage: Memory Waste
Worst case: just after resize, capacity = 2n but size = n. 50% wasted. Mitigated: golden ratio (1.618) less overhead (~62% waste). Trade-off: more frequent resizes vs. space efficiency.
Array Variants and Extensions
Circular Arrays
Treat array as circular: next element after last is first. Useful for queues. Wrap-around using modulo: next_index = (current_index + 1) % capacity.
Bit Arrays
Compact storage of booleans. Each bit represents true/false. 8 bits per byte vs. 8 bytes per boolean. Space-efficient, used in set representations, bloom filters.
Associative Arrays (Hash Tables)
Arrays indexed by arbitrary keys (not just integers). Implemented via hashing: hash key to index, use that index. Linear probing, chaining handle collisions. Averag O(1) access.
Sparse Arrays
Most elements zero/null. Dense representation wasteful. Sparse: store only non-zero. Coordinate list, compressed row storage (CSR). Trade-off: space vs. access speed.
String Arrays
Arrays of strings (char arrays in C, String[] in Java). Each element is variable-length. Often: array of pointers to strings (strings stored separately).
Common Algorithms on Arrays
Searching
Linear search O(n), binary search O(log n, requires sorted). Two-pointer techniques: find pair summing to target.
Sorting
Quicksort, merge sort O(n log n). Counting sort O(n + k) if values bounded. In-place methods preferred (minimize space).
Array Rotation
Rotate array left/right by k positions. Naive O(n*k) by shifting. Efficient O(n) using reversal: reverse first k, reverse rest, reverse all.
Prefix Sums
Array where prefix[i] = sum of arr[0...i]. Computed O(n) in one pass. Enables range sum queries in O(1) after preprocessing.
Two Pointers and Sliding Window
Two pointers traverse array: find pair, partition. Sliding window: maintain window size k, slide across array. Both O(n) total.
Cache Performance and Locality
Cache Lines
Modern CPUs cache: adjacent memory. Cache line typically 64 bytes. Accessing arr[0] loads arr[0...7] (assuming 8-byte elements). Sequential access: optimal (each cache miss loads 8 elements).
Spatial Locality
Accessing nearby memory likely. Arrays have excellent spatial locality (contiguous). Linked lists poor (pointers scattered).
Row-Major vs. Column-Major
Row-major (C, Python): iterate rows first (arr[i][j++]) efficient. Column-major (Fortran): iterate columns first. Mismatch: many cache misses.
Row-major traversal (efficient):
for i in range(rows):
for j in range(cols):
sum += arr[i][j]
Column-major traversal (inefficient in C):
for j in range(cols):
for i in range(rows):
sum += arr[i][j] # cache misses
Practical Implications
Array algorithms often faster than theory predicts (cache helps). Linked list algorithms often slower. Constants matter in practice: O(n) array access beats O(log n) BST if cache-friendly.
Applications
Dynamic Programming
DP tables: arrays storing subproblem solutions. Fibonacci: arr[i] = arr[i-1] + arr[i-2]. Enables reuse of computations.
Sorting and Searching
Most algorithms assume arrays. QuickSort partitions in-place. Binary search requires array access pattern.
Graphics and Image Processing
Pixel arrays (2D). Matrix operations (3D rotations). Efficient array manipulation critical for performance.
Database Indexing
B-trees, hash tables arrays at leaf level. Fast data retrieval depends on array properties.
Mathematical Computing
NumPy, MATLAB use arrays (numpy.array, matrix). Linear algebra: matrix multiplication, decomposition. High performance via vectorization.
References
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
- Knuth, D. E. "The Art of Computer Programming, Volume 1: Fundamental Algorithms." Addison-Wesley, 3rd edition, 1997.
- Sedgewick, R., and Wayne, K. "Algorithms, Part I." Addison-Wesley, 4th edition, 2011.
- Stroustrup, B. "The C++ Programming Language." Addison-Wesley, 4th edition, 2013.
- Gorelick, M., and Ozsvald, I. "High Performance Python." O'Reilly, 2nd edition, 2020.