BusyBox Bug and Patch Tracking
BusyBox
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000713 [uClibc] Internationalization / Localization minor always 02-09-06 23:16 10-11-08 23:50
Reporter rfelker View Status public  
Assigned To carmelo73
Priority normal Resolution open  
Status assigned   Product Version
Summary 0000713: UTF-8 encoder/decoder reject UFFFE,UFFFF
Description These code points are noncharacters, but they are not invalid UTF-8 sequences. UTF-8 encodes 'Unicode scalar values' which can be any integer in the range 0-0xd7ff or 0xe000-0x10ffff. See:

http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf [^]
Section 3.9, page 73-74 (PDF page 20-21)

Read the last bullet point under D29:

To ensure that the mapping for a Unicode encoding form is one-to-one, all Unicode scalar values, including those corresponding to noncharacter code points and unassigned code points, must be mapped to unique code unit sequences. Note that this requirement does not extend to high-surrogate and low-surrogate code points, which are excluded by definition from the set of Unicode scalar values.
Additional Information
Attached Files

- Relationships

- Notes
(0001086)
rfelker
02-12-06 18:14

ISO-10646 seems to allow and maybe recommend this behavior, conflicting with Unicode. I would say either is acceptable and uClibc's behavior may be preferable, so perhaps this bug should be closed.
 

- Issue History
Date Modified Username Field Change
02-09-06 23:16 rfelker New Issue
02-09-06 23:16 rfelker Status new => assigned
02-09-06 23:16 rfelker Assigned To  => uClibc
02-12-06 18:14 rfelker Note Added: 0001086
10-11-08 23:50 carmelo73 Assigned To uClibc => carmelo73


Copyright © 2000 - 2006 Mantis Group
Powered by Mantis Bugtracker