Contribute Media
A thank you to everyone who makes this possible: Read More

Text is More Complicated Than You Think: Comparing and Sorting Unicode


Few people realize just how complicated text can be. Did you know sorting and even case-folding can depend on a user's locale? That different strings of characters can be semantically completely equivalent? That there are over a thousand Latin letters?

Legacy text encodings like ASCII made a lot of simplifying assumptions about how written languages work, and we all put up with them because it was cool to even have computers in the first place. Unicode removes many of those assumptions and provides the tools we need to write software that can just do the right thing regardless of what text users throw at it. Even if you don't translate your UI, getting the details of string comparison, sorting, and searching right can eliminate annoying surprises for you and your users.


Improve this page