0000713: UTF-8 encoder/decoder reject UFFFE,UFFFF - BusyBox Bug and Patch Tracking

BusyBox Bug and Patch Tracking

Viewing Issue Simple Details [ Jump to Notes ]

[ View Advanced ] [ Issue History ] [ Print ]

Category

Severity

Reproducibility

Date Submitted

Last Update

0000713

[uClibc] Internationalization / Localization

minor

always

02-09-06 23:16

10-11-08 23:50

Reporter

rfelker

View Status

public

Assigned To

carmelo73

Priority

normal

Resolution

open

Status

assigned

Product Version

Summary

0000713: UTF-8 encoder/decoder reject UFFFE,UFFFF

Description

These code points are noncharacters, but they are not invalid UTF-8 sequences. UTF-8 encodes 'Unicode scalar values' which can be any integer in the range 0-0xd7ff or 0xe000-0x10ffff. See:

http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf [^]
Section 3.9, page 73-74 (PDF page 20-21)

Read the last bullet point under D29:

To ensure that the mapping for a Unicode encoding form is one-to-one, all Unicode scalar values, including those corresponding to noncharacter code points and unassigned code points, must be mapped to unique code unit sequences. Note that this requirement does not extend to high-surrogate and low-surrogate code points, which are excluded by definition from the set of Unicode scalar values.

Additional Information

Attached Files

Relationships

Notes
(0001086) rfelker 02-12-06 18:14	ISO-10646 seems to allow and maybe recommend this behavior, conflicting with Unicode. I would say either is acceptable and uClibc's behavior may be preferable, so perhaps this bug should be closed.

Issue History
Date Modified	Username	Field	Change
02-09-06 23:16	rfelker	New Issue
02-09-06 23:16	rfelker	Status	new => assigned
02-09-06 23:16	rfelker	Assigned To	=> uClibc
02-12-06 18:14	rfelker	Note Added: 0001086
10-11-08 23:50	carmelo73	Assigned To	uClibc => carmelo73