Introduction
Transmission Control Protocol (TCP): core transport layer protocol in Internet protocol suite. Provides reliable, ordered, and error-checked delivery of data between applications running on hosts across IP networks. Enables connection-oriented communication with flow and congestion control mechanisms. Essential for web browsing, email, file transfers, and many other network services.
"TCP is the backbone of reliable communication on the Internet, ensuring data integrity and delivery despite the unreliable nature of underlying networks." -- W. Richard Stevens
TCP Overview
Protocol Characteristics
Connection-oriented: establishes virtual circuit before data transfer. Reliable: guarantees data delivery, order, and integrity. Full-duplex: simultaneous bidirectional communication. Stream-oriented: data viewed as continuous byte stream, not discrete messages. Flow control: prevents sender from overwhelming receiver. Congestion control: prevents network overload. Multiplexing: uses ports to support multiple applications.
Role in Transport Layer
Provides logical communication between processes on hosts. Works atop IP (Internet Protocol), which provides best-effort datagram service. Adds reliability, sequencing, error detection, and retransmission. Differentiates from UDP (User Datagram Protocol) by reliability and connection orientation.
Applications Using TCP
HTTP/HTTPS: web browsing. FTP: file transfers. SMTP/POP3/IMAP: email services. Telnet/SSH: remote terminal access. Database communication. Any application requiring reliable data delivery.
TCP Header Structure
Header Fields Overview
Fixed 20-byte header, optional fields up to 60 bytes. Contains source/destination ports, sequence and acknowledgment numbers, flags, window size, checksum, urgent pointer, options.
Key Header Fields
Source Port (16 bits): identifies sender application. Destination Port (16 bits): identifies receiver application. Sequence Number (32 bits): position of first data byte in segment. Acknowledgment Number (32 bits): next expected byte from sender. Data Offset (4 bits): header length. Reserved (6 bits): reserved for future use. Flags (6 bits): control bits (URG, ACK, PSH, RST, SYN, FIN). Window Size (16 bits): flow control buffer size. Checksum (16 bits): error-checking of header and data. Urgent Pointer (16 bits): indicates urgent data offset.
Options Field
Variable length. Common options: Maximum Segment Size (MSS), Window Scale, Timestamp, Selective Acknowledgment (SACK). Enhances performance, reliability, and scalability.
| Field | Size (bits) | Description |
|---|---|---|
| Source Port | 16 | Sender application identifier |
| Destination Port | 16 | Receiver application identifier |
| Sequence Number | 32 | First data byte number |
| Acknowledgment Number | 32 | Next expected byte number |
| Flags | 6 | Control bits (SYN, ACK, FIN, etc.) |
Connection Establishment
Three-Way Handshake
Purpose: synchronize sequence numbers, establish connection parameters. Steps: 1. SYN from client with initial sequence number (ISN). 2. SYN-ACK from server with its ISN, acknowledging client's SYN. 3. ACK from client acknowledging server's SYN.
Sequence Number Initialization
Each side selects ISN pseudo-randomly. Prevents old connection confusion. Ensures ordered byte stream.
State Transitions
Client: CLOSED → SYN-SENT → ESTABLISHED. Server: LISTEN → SYN-RECEIVED → ESTABLISHED. Connection ready for data transfer after handshake.
Client Server | SYN (seq=x) | |-------------------------->| | | SYN-ACK (seq=y, ack=x+1) |<--------------------------| | ACK (ack=y+1) | |-------------------------->|Connection establishedConnection Termination
Four-Way Handshake
Initiated by one side sending FIN to signal end of data. Other side acknowledges FIN, sends its own FIN. Initiator acknowledges FIN. Half-close state allows one-way data flow. Full close after both FINs acknowledged.
States Involved
FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT, CLOSED. TIME-WAIT ensures delayed packets discarded.
TIME-WAIT Importance
Duration: 2 * Maximum Segment Lifetime (MSL). Prevents confusion from delayed segments of old connections. Ensures reliable connection release.
Side A Side B | FIN | |-------------------------->| | | ACK |<--------------------------| | | FIN |<--------------------------| | ACK | |-------------------------->|Connection closedFlow Control
Purpose
Prevents sender from overwhelming receiver buffer. Ensures receiver can process incoming data without loss.
Window Size
Advertised by receiver in TCP header. Indicates available buffer space. Sender limits data sent according to window size.
Sliding Window Mechanism
Dynamic window adjusts as data acknowledged and buffer space freed. Sender advances window base on acknowledgments. Enables efficient, continuous transmission.
| Term | Description |
|---|---|
| Window Size | Receiver buffer capacity in bytes |
| Advertised Window | Value sent in ACK to control sender |
| Sliding Window | Dynamic window shifting as ACKs received |
Congestion Control
Objective
Prevent network congestion collapse. Adjust sender rate based on network conditions to maintain fairness and efficiency.
Key Algorithms
Slow Start: exponential increase of congestion window (cwnd) initially. Congestion Avoidance: linear increase after threshold. Fast Retransmit: detect loss via duplicate ACKs. Fast Recovery: avoid slow start after loss detection.
Congestion Window (cwnd)
Sender-side variable limiting bytes in flight. Interacts with receiver advertised window. Effective window = min(cwnd, advertised window).
Algorithm:Initialize cwnd = 1 MSSWhile no loss: cwnd = cwnd * 2 (Slow Start)Once cwnd >= ssthresh: cwnd = cwnd + 1 MSS per RTT (Congestion Avoidance)On loss: ssthresh = cwnd / 2 cwnd = 1 MSS (Slow Start restart)Reliable Data Transfer
Mechanisms
Sequence numbers: track byte ordering. Acknowledgments: confirm receipt, cumulative ACKs. Retransmissions: resend lost or corrupted segments. Checksums: detect errors in header and data. Timers: trigger retransmission on timeout.
Duplicate Acknowledgments
Indicate missing segment. Trigger fast retransmission without waiting for timeout.
Selective Acknowledgment (SACK)
Enhancement allowing receiver to inform sender about all received blocks. Improves efficiency in case of multiple losses.
TCP Segmentation and Reassembly
Segmentation
Breaks large application data into segments fitting Maximum Segment Size (MSS). MSS negotiated during handshake. Ensures compatibility with underlying network MTU.
Reassembly
Receiver reorders segments according to sequence numbers. Buffers out-of-order segments. Delivers data to application as continuous byte stream.
Fragmentation vs Segmentation
Fragmentation: IP layer splits packets if exceeding MTU. Segmentation: TCP divides data before IP layer. TCP aware of segment boundaries, IP not.
TCP Timers and Retransmission
Retransmission Timer
Starts when segment sent. Timeout triggers retransmission. Timer value adaptive based on Round Trip Time (RTT) estimation.
RTT Estimation
Uses exponential weighted moving average (EWMA) to smooth measurements. Prevents premature timeouts or delayed retransmissions.
Other Timers
Delayed ACK timer: reduces ACK traffic by waiting briefly before sending ACK. Persist timer: probes zero-window to avoid deadlock. Keepalive timer: checks if connection is alive.
RTT estimation formulas:SampleRTT = measured RTTEstimatedRTT = (1 - α) * EstimatedRTT + α * SampleRTTDevRTT = (1 - β) * DevRTT + β * |SampleRTT - EstimatedRTT|TimeoutInterval = EstimatedRTT + 4 * DevRTTWhere α = 1/8, β = 1/4Performance Optimization Techniques
Window Scaling
Extends 16-bit window field to support windows larger than 65,535 bytes. Uses scale factor negotiated in options.
Selective Acknowledgment (SACK)
Allows acknowledgment of non-contiguous blocks. Improves retransmission efficiency in lossy networks.
Timestamps
Enhances RTT measurement accuracy. Prevents sequence number wrapping ambiguity. Used in PAWS (Protect Against Wrapped Sequence numbers).
Delayed ACK
Waits short interval before sending ACK. Reduces overhead by combining ACKs with data segments.
TCP Variants and Extensions
TCP Reno
Classic congestion control with fast retransmit and recovery. Aggressive after loss detection.
TCP NewReno
Improves fast recovery to handle multiple packet losses better.
TCP Vegas
Proactive congestion avoidance based on RTT variation. Adjusts sending rate before losses.
TCP Cubic
Default in Linux. Uses cubic function for congestion window growth. Better scalability on high-bandwidth networks.
Explicit Congestion Notification (ECN)
Allows routers to signal congestion without packet loss. TCP reacts by reducing sending rate.
References
- W. Richard Stevens, "TCP/IP Illustrated, Volume 1: The Protocols," Addison-Wesley, 1994, pp. 523-610.
- J. Postel, "Transmission Control Protocol," RFC 793, IETF, 1981, pp. 1-85.
- V. Jacobson, "Congestion Avoidance and Control," SIGCOMM '88, ACM, 1988, pp. 314-329.
- Mathis et al., "TCP Selective Acknowledgment Options," RFC 2018, IETF, 1996, pp. 1-13.
- Allman et al., "TCP Congestion Control," RFC 5681, IETF, 2009, pp. 1-38.