Mysterious Squares in Windows Filenames: A UTF-16 Surrogate Pair Adventure

2025-02-26

This article describes a curious phenomenon in Windows: many small executables with strange squares in their names appearing in Task Manager. These files are not malicious; the issue stems from the use of UTF-16 surrogate pairs in filenames. UTF-16, to accommodate extended Unicode characters, uses surrogate pairs to represent characters beyond the Basic Multilingual Plane. When string manipulation produces isolated or malformed surrogate pairs, filenames become unrenderable. The article explains surrogate pairs and provides a Python script to generate files with unrenderable filenames, reproducing the phenomenon.

Development Surrogate Pairs