BusyBox Bug and Patch Tracking
BusyBox
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000687 [uClibc] Internationalization / Localization minor always 02-06-06 00:59 10-11-08 23:50
Reporter rfelker View Status public  
Assigned To carmelo73
Priority normal Resolution open  
Status assigned   Product Version
Summary 0000687: utf-8 mbrtowc accepts invalid bytes
Description According to section 3.9 of the Unicode Standard, UTF-8 is a mapping between byte sequences and "Unicode scalar values", which are integers in one of the ranges 0-0xd7ff or 0xe000-0x10ffff. The standard is clear that UTF-8 sequences are one to four bytes in length. uClibc accepts the illegal bytes 0xf5-0xfd giving 5- and 6-byte sequences for code points up to 0x7fffffff.

Although there was a conflict in the past, my understanding is that ISO-10646 now agrees that UCS codes go only up through 0x10ffff and that UTF-8 is a 1-4 byte encoding, not 1-6 byte.
Additional Information
Attached Files

- Relationships

There are no notes attached to this issue.

- Issue History
Date Modified Username Field Change
02-06-06 00:59 rfelker New Issue
02-06-06 00:59 rfelker Status new => assigned
02-06-06 00:59 rfelker Assigned To  => uClibc
02-18-06 11:49 vapier Category Standards Compliance => Internationalization / Localization
10-11-08 23:50 carmelo73 Assigned To uClibc => carmelo73


Copyright © 2000 - 2006 Mantis Group
Powered by Mantis Bugtracker