Rich Text, Poor Text: The Hidden Pain of Character Encoding

2025-04-05

This article delves into the issue of how font styles (bold, italics, etc.) are stored in rich text editing. The author argues that these styles aren't mere 'decorations' but integral parts of language expression, similar to punctuation. However, early character encoding standards (like ASCII) didn't include this styling information, leading to the use of embedded markup. This 'pollutes' text data, impacting efficiency and consistency in text processing. The author proposes a wider character encoding scheme to directly encode style information into characters, solving this problem.