Status of Unicode in Python 3

ARCHIVE.ORG

Summary

Introduced in Python 2.0, unicode became the default string type in Python 3.0. It took 8 years to switch to unicode, and since Python 3.0, a lot of bugs has been fixed. The switch to unicode opened many questions. Should Python support both bytes and characters for filenames? What to do with undecodable bytes? etc.

Description

The talk will focus on the recent issues fixed in Python 3.1 and 3.2:

Use the PEP 383 (error handler to store undecodable bytes) everywhere
Encoding of the command line arguments: utf-8 on Mac OS X, locale encoding on UNIX/BSD, unicode on Windows
Environment variables: creation of os.environb
Filenames: huge work to support the PEP 383 everywhere, creation of os.fsencode() and os.fsdecode()
Python source code encoding: use tokenize.detect_encoding() instead of the locale encoding
some library examples: email, ftp, ...
etc.

The talk will present not only the changes in Python, but also in the C API.

PyVideo

Status of Unicode in Python 3

Summary

Description

Details