Broken UTF-8
Any copyright to this file is dedicated to the Public Domain. https://creativecommons.org/publicdomain/zero/1.0/
Five-byte and six-byte sequences were defined in RFC 2297 but are no longer part of the UTF-8 definition.
Non-shortest forms for lowest single-byte (U+0000)
- Two-byte sequence (C0 80)
 
- ��
 
- Three-byte sequence (E0 80 80)
 
- ���
 
- Four-byte sequence (F0 80 80 80)
 
- ����
 
- Five-byte sequence (F8 80 80 80 80)
 
- �����
 
- Six-byte sequence (FC 80 80 80 80 80)
 
- ������
 
Non-shortest forms for highest single-byte (U+007F)
- Two-byte sequence (C1 BF)
 
- ��
 
- Three-byte sequence (E0 81 BF)
 
- ���
 
- Four-byte sequence (F0 80 81 BF)
 
- ����
 
- Five-byte sequence (F8 80 80 81 BF)
 
- �����
 
- Six-byte sequence (FC 80 80 80 81 BF)
 
- ������
 
Non-shortest forms for lowest two-byte (U+0080)
- Three-byte sequence (E0 82 80)
 
- ���
 
- Four-byte sequence (F0 80 82 80)
 
- ����
 
- Five-byte sequence (F8 80 80 82 80)
 
- �����
 
- Six-byte sequence (FC 80 80 80 82 80)
 
- ������
 
Non-shortest forms for highest two-byte (U+07FF)
- Three-byte sequence (E0 9F BF)
 
- ���
 
- Four-byte sequence (F0 80 9F BF)
 
- ����
 
- Five-byte sequence (F8 80 80 9F BF)
 
- �����
 
- Six-byte sequence (FC 80 80 80 9F BF)
 
- ������
 
Non-shortest forms for lowest three-byte (U+0800)
- Four-byte sequence (F0 80 A0 80)
 
- ����
 
- Five-byte sequence (F8 80 80 A0 80)
 
- �����
 
- Six-byte sequence (FC 80 80 80 A0 80)
 
- ������
 
Non-shortest forms for highest three-byte (U+FFFF)
- Four-byte sequence (F0 8F BF BF)
 
- ����
 
- Five-byte sequence (F8 80 8F BF BF)
 
- �����
 
- Six-byte sequence (FC 80 80 8F BF BF)
 
- ������
 
Non-shortest forms for lowest four-byte (U+10000)
- Five-byte sequence (F8 80 90 80 80)
 
- �����
 
- Six-byte sequence (FC 80 80 90 80 80)
 
- ������
 
Non-shortest forms for last Unicode (U+10FFFF)
- Five-byte sequence (F8 84 8F BF BF)
 
- �����
 
- Six-byte sequence (FC 80 84 8F BF BF)
 
- ������
 
Out of range
- One past Unicode (F4 90 80 80)
 
- ����
 
- Longest five-byte sequence (FB BF BF BF BF)
 
- �����
 
- Longest six-byte sequence (FD BF BF BF BF BF)
 
- ������
 
- First surrogate (ED A0 80)
 
- ���
 
- Last surrogate (ED BF BF)
 
- ���
 
- CESU-8 surrogate pair (ED A0 BD ED B2 A9)
 
- ������
 
Out of range and non-shortest
- One past Unicode as five-byte sequence (F8 84 90 80 80)
 
- �����
 
- One past Unicode as six-byte sequence (FC 80 84 90 80 80)
 
- ������
 
- First surrogate as four-byte sequence (F0 8D A0 80)
 
- ����
 
- Last surrogate as four-byte sequence (F0 8D BF BF)
 
- ����
 
- CESU-8 surrogate pair as two four-byte overlongs (F0 8D A0 BD F0 8D B2 A9)
 
- ��������
 
Lone trails
- One (80)
 
- �
 
- Two (80 80)
 
- ��
 
- Three (80 80 80)
 
- ���
 
- Four (80 80 80 80)
 
- ����
 
- Five (80 80 80 80 80)
 
- �����
 
- Six (80 80 80 80 80 80)
 
- ������
 
- Seven (80 80 80 80 80 80 80)
 
- �������
 
- After valid two-byte (C2 B6 80)
 
- ¶�
 
- After valid three-byte (E2 98 83 80)
 
- ☃�
 
- After valid four-byte (F0 9F 92 A9 80)
 
- 💩�
 
- After five-byte (FB BF BF BF BF 80)
 
- ������
 
- After six-byte (FD BF BF BF BF BF 80)
 
- �������
 
Truncated sequences
- Two-byte lead (C2)
 
- �
 
- Three-byte lead (E2)
 
- �
 
- Three-byte lead and one trail (E2 98)
 
- �
 
- Four-byte lead (F0)
 
- �
 
- Four-byte lead and one trail (F0 9F)
 
- �
 
- Four-byte lead and two trails (F0 9F 92)
 
- �
 
Leftovers
- FE (FE)
 
- �
 
- FE and trail (FE 80)
 
- ��
 
- FF (FF)
 
- �
 
- FF and trail (FF 80)
 
- ��