Introduction

Arrays are fundamental data structures: contiguous block of memory holding fixed-size elements of uniform type. Simplest abstract data type: sequence of elements indexed 0, 1, 2, ..., n-1. Fast random access (constant time) distinguishes arrays from linked structures.

Arrays ubiquitous: every programming language provides them. Implementation choices: fixed-size (memory allocated at creation) vs. dynamic (grows as needed). Most languages hide complexity: "arrays" are dynamic (vector in C++, ArrayList in Java, list in Python).

Conceptual foundation: arrays enable algorithms. Searching, sorting, dynamic programming all assume array-like structures. Understanding array mechanics essential for efficient programming.

"Arrays are the most basic and important data structure. Mastery of arrays and their operations is prerequisite for understanding all higher-level structures." -- Algorithms textbook fundamentals

Memory Layout and Indexing

Contiguous Allocation

Array elements stored sequentially in memory. If array starts at address 1000 and element size is 4 bytes: element 0 at 1000, element 1 at 1004, element 2 at 1008, etc. Address of element i: base_address + i * element_size.

Memory layout (4-byte integers):
Address: 1000 1004 1008 1012 1016
Value: 5 12 -3 8 14
Index: 0 1 2 3 4

Access arr[2]: compute 1000 + 2*4 = 1008, fetch value -3

Zero-Based Indexing

Most languages use 0-based indexing (first element is index 0). Some older languages (FORTRAN) 1-based. 0-based natural from memory layout: offset 0 = first element.

Random Access

Array enables O(1) access to any element. Address calculation constant time (one multiplication, one addition). No traversal needed unlike linked lists.

Multi-Dimensional Arrays

2D array [rows][cols] stored in memory in two ways:

Row-major (C, Python): [0][0], [0][1], ..., [0][cols-1], [1][0], [1][1], ...
Column-major (Fortran, MATLAB): [0][0], [1][0], ..., [rows-1][0], [0][1], [1][1], ...

Element [i][j] address (row-major): base + (i*cols + j)*element_size

Fixed-Size vs. Dynamic Arrays

Fixed-Size Arrays

Size determined at creation. Memory allocated once, can't grow. Example: C arrays. Efficient (no resizing overhead), inflexible (if you need more elements, must allocate new array and copy).

Dynamic Arrays

Grow as elements added. Underlying implementation: initially allocate larger-than-needed capacity. When full, allocate bigger array, copy elements, deallocate old. Trade-off: space overhead (extra capacity) for growth capability.

Initial capacity: 4
[a, b, c, d, _, _, _, _] (8 slots, 4 used)

Add 5th element:
[a, b, c, d, e, _, _, _]

Add 9th element (full):
Allocate 16-slot array
Copy: [a, b, c, d, e, f, g, h, i, _, _, _, _, _, _, _]
Add element: [a, b, c, d, e, f, g, h, i, j, _, _, _, _, _, _]

Growth Strategies

Strategy Growth Factor Space Overhead Amortized Cost
Linear growth capacity += k O(n) total O(n) append
Doubling capacity *= 2 ≤ 2x actual O(1) append
Golden ratio capacity *= 1.618 ~1.618x O(1) append

Doubling Analysis

With doubling: amortized O(1) append. Why? n appends cost: 1 + 2 + 4 + 8 + ... + n ≈ 2n (geometric series). Total cost O(n) for n operations = O(1) per operation.

Basic Operations: Access, Insert, Delete

Access

arr[i]: O(1) time, 0 space
Compute address: base + i*element_size, fetch value
One CPU operation (plus memory load)

Insert

Insert element at position i:
1. Shift elements [i, n-1] to [i+1, n]
2. Place element at i
Time: O(n) (worst case: shift all)
Space: O(1) (in-place, if capacity available)

Delete

Delete element at position i:
1. Remove element at i
2. Shift elements [i+1, n] to [i, n-1]
Time: O(n) (shift remaining)
Space: O(1)

Search

Linear search: O(n) time, O(1) space
Binary search (sorted array): O(log n) time

Time and Space Complexity Analysis

Operation Time Space Notes
Access arr[i] O(1) O(1) Direct address computation
Insert at end O(1) amortized O(1) Dynamic array doubling
Insert at middle O(n) O(1) Shift elements
Delete from end O(1) O(1) No shift needed
Delete from middle O(n) O(1) Shift remaining
Linear search O(n) O(1) No assumptions
Binary search O(log n) O(1) or O(log n) Requires sorted array
Sort O(n log n) O(log n) or O(n) Merge sort, quicksort

Space Complexity

Array storage: O(n) for n elements. Additional space: operations like sorting in-place use O(log n) recursion stack or O(n) for merge sort temp arrays. Searching uses O(1) (no extra).

Multidimensional Arrays

2D Arrays (Matrices)

Represented as array of arrays. Logically m x n grid. Physically: contiguous block (row-major or column-major).

2x3 matrix (row-major):
[1, 2, 3]
[4, 5, 6]

Memory: [1, 2, 3, 4, 5, 6] (contiguous)
Address [i][j]: base + (i*3 + j)*element_size

3D and Higher

Extension to 3D+. Address [i][j][k] in i x j x k array: base + (i*j*k + j*k + k) * element_size (row-major). Complexity increases: cache locality suffers (if accessing non-contiguous patterns).

Jagged Arrays

Rows of different lengths. Array of arrays where each row is independent. Memory not contiguous across rows. Flexibility (vary row sizes) vs. cache inefficiency (non-contiguous).

Jagged array:
[1, 2, 3]
[4, 5]
[6, 7, 8, 9]

Physically: three separate arrays
Space overhead: pointers to each row

Dynamic Array Resizing Strategies

Resize Triggers

Resize when: capacity reached (insert) or too much wasted space (delete). Insert trigger: full. Delete trigger: typically size < capacity/4 (shrink to avoid memory waste).

Copy-On-Grow

Allocate new larger array. Copy all elements O(n). Old array deallocated. Amortized O(1) per insert with doubling (cost 1 + 2 + 4 + ... + n ≈ 2n total for n insertions).

Comparison: Doubling vs. Linear Growth

Doubling: Capacity grows exponentially (1, 2, 4, 8, ...). Fewer resizes (O(log n) total). Space overhead at most 2x. Amortized O(1) insert.

Linear growth (capacity += k): More frequent resizes (O(n/k) total). Total space used: n + (n-k) + (n-2k) + ... ≈ O(n^2 / k). Expensive! Amortized O(n) insert.

Doubling Disadvantage: Memory Waste

Worst case: just after resize, capacity = 2n but size = n. 50% wasted. Mitigated: golden ratio (1.618) less overhead (~62% waste). Trade-off: more frequent resizes vs. space efficiency.

Array Variants and Extensions

Circular Arrays

Treat array as circular: next element after last is first. Useful for queues. Wrap-around using modulo: next_index = (current_index + 1) % capacity.

Bit Arrays

Compact storage of booleans. Each bit represents true/false. 8 bits per byte vs. 8 bytes per boolean. Space-efficient, used in set representations, bloom filters.

Associative Arrays (Hash Tables)

Arrays indexed by arbitrary keys (not just integers). Implemented via hashing: hash key to index, use that index. Linear probing, chaining handle collisions. Averag O(1) access.

Sparse Arrays

Most elements zero/null. Dense representation wasteful. Sparse: store only non-zero. Coordinate list, compressed row storage (CSR). Trade-off: space vs. access speed.

String Arrays

Arrays of strings (char arrays in C, String[] in Java). Each element is variable-length. Often: array of pointers to strings (strings stored separately).

Common Algorithms on Arrays

Searching

Linear search O(n), binary search O(log n, requires sorted). Two-pointer techniques: find pair summing to target.

Sorting

Quicksort, merge sort O(n log n). Counting sort O(n + k) if values bounded. In-place methods preferred (minimize space).

Array Rotation

Rotate array left/right by k positions. Naive O(n*k) by shifting. Efficient O(n) using reversal: reverse first k, reverse rest, reverse all.

Prefix Sums

Array where prefix[i] = sum of arr[0...i]. Computed O(n) in one pass. Enables range sum queries in O(1) after preprocessing.

Two Pointers and Sliding Window

Two pointers traverse array: find pair, partition. Sliding window: maintain window size k, slide across array. Both O(n) total.

Cache Performance and Locality

Cache Lines

Modern CPUs cache: adjacent memory. Cache line typically 64 bytes. Accessing arr[0] loads arr[0...7] (assuming 8-byte elements). Sequential access: optimal (each cache miss loads 8 elements).

Spatial Locality

Accessing nearby memory likely. Arrays have excellent spatial locality (contiguous). Linked lists poor (pointers scattered).

Row-Major vs. Column-Major

Row-major (C, Python): iterate rows first (arr[i][j++]) efficient. Column-major (Fortran): iterate columns first. Mismatch: many cache misses.

Row-major traversal (efficient):
for i in range(rows):
 for j in range(cols):
 sum += arr[i][j]

Column-major traversal (inefficient in C):
for j in range(cols):
 for i in range(rows):
 sum += arr[i][j] # cache misses

Practical Implications

Array algorithms often faster than theory predicts (cache helps). Linked list algorithms often slower. Constants matter in practice: O(n) array access beats O(log n) BST if cache-friendly.

Applications

Dynamic Programming

DP tables: arrays storing subproblem solutions. Fibonacci: arr[i] = arr[i-1] + arr[i-2]. Enables reuse of computations.

Sorting and Searching

Most algorithms assume arrays. QuickSort partitions in-place. Binary search requires array access pattern.

Graphics and Image Processing

Pixel arrays (2D). Matrix operations (3D rotations). Efficient array manipulation critical for performance.

Database Indexing

B-trees, hash tables arrays at leaf level. Fast data retrieval depends on array properties.

Mathematical Computing

NumPy, MATLAB use arrays (numpy.array, matrix). Linear algebra: matrix multiplication, decomposition. High performance via vectorization.

References

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
  • Knuth, D. E. "The Art of Computer Programming, Volume 1: Fundamental Algorithms." Addison-Wesley, 3rd edition, 1997.
  • Sedgewick, R., and Wayne, K. "Algorithms, Part I." Addison-Wesley, 4th edition, 2011.
  • Stroustrup, B. "The C++ Programming Language." Addison-Wesley, 4th edition, 2013.
  • Gorelick, M., and Ozsvald, I. "High Performance Python." O'Reilly, 2nd edition, 2020.