What is HTML Encoding?
HTML encoding (also called HTML escaping) converts characters that have special meaning in HTML into their safe equivalent representations called HTML entities. For example, the less-than sign < has special meaning in HTML because it begins a tag. If you want to display a literal < character on a web page, you must write it as < in your HTML.
Without HTML encoding, any user-supplied content containing HTML characters can accidentally (or maliciously) break your page structure or execute scripts โ a serious security vulnerability known as Cross-Site Scripting (XSS).
The Five Essential HTML Characters to Encode
These five characters must always be encoded when inserting dynamic content into HTML:
| Character | Named Entity | Numeric (Decimal) | Why It Matters |
|---|---|---|---|
| < | < | < | Opens HTML tags โ can inject arbitrary markup |
| > | > | > | Closes HTML tags โ breaks structure |
| & | & | & | Starts entity references โ causes rendering errors |
| " | " | " | Breaks attribute values โ can inject attributes |
| ' | ' | ' | Breaks single-quoted attribute values |
HTML Encoding vs URL Encoding vs Base64
These are three different encoding schemes used in different contexts:
HTML encoding is for inserting text safely into HTML documents. It replaces characters that would be interpreted as HTML markup. Use it when putting user data into HTML.
URL encoding (percent encoding) is for encoding characters in URLs. Spaces become %20, & becomes %26. Use it when building query strings or URL parameters.
Base64 encoding is for representing binary data as ASCII text. It makes data safe to transmit in text-based protocols. Use it for embedding images in HTML/CSS, encoding file attachments in emails, or transmitting binary data in JSON.
A common mistake is applying the wrong encoding for the context โ for example, HTML-encoding a URL parameter instead of URL-encoding it, which breaks the URL.
XSS Prevention: Why HTML Encoding Matters for Security
Cross-Site Scripting (XSS) is consistently one of the top 10 web vulnerabilities (OWASP Top 10). It occurs when attacker-controlled data is inserted into a web page without proper encoding, allowing malicious scripts to execute in a victim's browser.
Example of a vulnerable code pattern:
document.innerHTML = 'Hello, ' + userInput;
// If userInput = '<script>stealCookies()</script>'
// The script executes in every visitor's browser
The safe pattern: Always HTML-encode user-supplied data before inserting it into HTML. Most server-side frameworks (Django, Rails, Laravel, etc.) do this automatically by default โ but custom string concatenation, innerHTML assignments, and template literals bypass this protection.
Common HTML Entities Reference
| Character | Entity | Character | Entity |
|---|---|---|---|
| ยฉ | © | ยฎ | ® |
| โข | ™ | โฌ | € |
| ยฃ | £ | ยฅ | ¥ |
| โ | → | โ | ← |
| โ | — | โ | – |
| (non-break space) | | ยฐ | ° |
Frequently Asked Questions
Do I need to HTML-encode inside JavaScript strings?
No โ HTML encoding is for HTML context only. If you are building a JavaScript string (inside a script tag or .js file), you need JavaScript string escaping (backslash escapes like \n, \", \'). However, if you are inserting a JavaScript variable into HTML via DOM manipulation, always HTML-encode the value when using innerHTML. Better yet, use textContent which never interprets HTML at all.
Will HTML encoding affect how my page looks to users?
No. Browsers decode HTML entities before rendering, so users see the original characters. A page containing < displays as < to the reader. HTML encoding affects the source code only, not the visible output.
Do modern frameworks handle HTML encoding automatically?
Most do, yes. React, Vue, Angular, Django templates, Rails ERB, and Laravel Blade all HTML-encode output by default. The dangerous scenario is opting out of this protection (React's dangerouslySetInnerHTML, Django's |safe filter, Rails' html_safe) without careful validation. When you bypass automatic encoding, you must manually ensure the data is safe.
What is double encoding and why is it a problem?
Double encoding happens when you HTML-encode a string that is already HTML-encoded. The ampersand in < gets encoded to &lt;, which displays as the literal text "<" instead of "<". Always encode raw text exactly once. If you receive HTML from an API or CMS, check whether it is already encoded before encoding it again.