0000687: utf-8 mbrtowc accepts invalid bytes - BusyBox Bug and Patch Tracking

BusyBox Bug and Patch Tracking

Viewing Issue Simple Details [ Jump to Notes ]

[ View Advanced ] [ Issue History ] [ Print ]

Category

Severity

Reproducibility

Date Submitted

Last Update

0000687

[uClibc] Internationalization / Localization

minor

always

02-06-06 00:59

10-11-08 23:50

Reporter

rfelker

View Status

public

Assigned To

carmelo73

Priority

normal

Resolution

open

Status

assigned

Product Version

Summary

0000687: utf-8 mbrtowc accepts invalid bytes

Description

According to section 3.9 of the Unicode Standard, UTF-8 is a mapping between byte sequences and "Unicode scalar values", which are integers in one of the ranges 0-0xd7ff or 0xe000-0x10ffff. The standard is clear that UTF-8 sequences are one to four bytes in length. uClibc accepts the illegal bytes 0xf5-0xfd giving 5- and 6-byte sequences for code points up to 0x7fffffff.

Although there was a conflict in the past, my understanding is that ISO-10646 now agrees that UCS codes go only up through 0x10ffff and that UTF-8 is a 1-4 byte encoding, not 1-6 byte.

Additional Information

Attached Files

Relationships

There are no notes attached to this issue.

Issue History
Date Modified	Username	Field	Change
02-06-06 00:59	rfelker	New Issue
02-06-06 00:59	rfelker	Status	new => assigned
02-06-06 00:59	rfelker	Assigned To	=> uClibc
02-18-06 11:49	vapier	Category	Standards Compliance => Internationalization / Localization
10-11-08 23:50	carmelo73	Assigned To	uClibc => carmelo73