What types of entity references does this decode?
Three kinds. Named entities like `&`, ` `, `©`, `—`. Decimal numeric references such as `€` for the euro sign. Hexadecimal numeric references such as `€` for the same character. All three forms decode to the same UTF-8 character.
Why does the order of unescaping matter?
You must decode `&` LAST. If you decoded it first, the encoded sequence `&lt;` would become `<` and then the next pass would decode that to a literal `<` — the original ampersand is lost. Decoding `<`, `>`, the named entities, and numeric references first, then `&`, prevents this double-decode bug.
Will the decoded output be safe to inject into innerHTML?
No. After decoding you have raw `<`, `>`, `&` characters that the browser would parse as markup. Treat the output as plain text — assign it to `el.textContent`, not `el.innerHTML`. If you do need HTML, run the result through a sanitiser like DOMPurify.
Can it handle double-encoded strings?
Yes — run it twice. A double-encoded string like `&amp;lt;` becomes `&lt;` after the first pass, then `<` after the second. Double-encoding usually indicates a bug upstream where data was escaped twice; fixing the source is preferable to compensating with multiple decodes.
What happens to unknown named entities?
The tool decodes the most common HTML named entities — the core five plus nbsp, copy, reg, trade, mdash, ndash, hellip — and falls back to numeric reference decoding (`&#NNN;` and `&#xHH;`) which covers every Unicode codepoint. Unrecognised named entities are left untouched in the output.
Is this the same as DOMParser-based decoding?
Functionally similar but safer. `new DOMParser().parseFromString(s, "text/html").documentElement.textContent` would also decode entities — but it builds a real DOM tree from your input, which executes some HTML parsing rules and can be slow on large strings. This tool is a pure string operation.
How are emoji and supplementary-plane characters decoded?
Numeric references for codepoints above U+FFFF (such as `😀` for 😀) are decoded using `String.fromCodePoint`, which correctly emits the surrogate pair for JavaScript strings. The output is a single visual character even though it occupies two UTF-16 code units.
Should I run this on data fetched from a JSON API?
Only if the API explicitly HTML-encoded its values — some legacy CMS APIs do, returning `&` instead of `&`. Modern JSON APIs send raw Unicode and JSON-escape only the JSON-required characters. Decoding raw JSON values as HTML can corrupt strings that legitimately contain `&` followed by letters.