What are the five XML predefined entities?
The XML 1.0 specification defines exactly five named entities: `<` (<), `>` (>), `&` (&), `"` ("), and `'` ('). Unlike HTML, XML does not include ` `, `©`, or any other named entities — those would have to be declared in a DTD or replaced with numeric character references.
How does XML escaping differ from HTML escaping?
Two ways. First, XML uses `'` for the apostrophe; HTML4 does not define `'` and tools typically emit `'`. Second, XML has only the five predefined entities — every other named entity from HTML (nbsp, copy, mdash, etc.) is invalid in XML unless declared. Use numeric references like ` ` instead.
Why must I escape `&` first?
If `<` were escaped before `&`, the resulting `<` itself contains `&`, and the next pass would re-escape it into `&lt;`. Escaping `&` first guarantees no entity reference produced by later passes is corrupted. Every correct XML serialiser follows this rule.
When should I use a CDATA section instead of escaping?
CDATA (`<![CDATA[ ... ]]>`) is best for large blocks of literal text — embedded scripts, code snippets, or pre-formatted markup — where escaping every `<` and `&` would be noisy. For short, dynamic values inside an attribute or text node, entity escaping is shorter and works in attributes (CDATA does not).
Do attributes need different escaping than text nodes?
Yes, slightly. Inside double-quoted attribute values you must escape `&`, `<`, and `"`. Inside single-quoted attributes you must escape `&`, `<`, and `'`. Escaping all five entities works in every context, which is why most serialisers do it unconditionally.
Will XML escaping affect Unicode characters?
No. UTF-8 XML documents handle every Unicode character natively. Only the five ASCII characters with XML syntactic meaning need escaping. Emoji, CJK ideographs, and accented Latin characters pass through unchanged and remain valid XML.
Is the apostrophe escape `'` always required?
Only inside single-quoted attribute values (`attr='it's'`). In text nodes and double-quoted attributes the literal `'` is fine. Most serialisers escape it unconditionally for consistency — it costs five extra bytes and removes a category of bug.
Can I copy the escaped output into a SOAP envelope?
Yes — XML escape output is exactly what SOAP envelopes, RSS/Atom feeds, OOXML/DOCX content, and SVG documents require for textual payloads. Drop the escaped string between your start and end tags and the surrounding XML stays well-formed.