| Anonymous | Login | Signup for a new account | 11-10-2008 10:57 PST |
| Main | My View | View Issues | Change Log | Docs |
| Viewing Issue Simple Details [ Jump to Notes ] | [ View Advanced ] [ Issue History ] [ Print ] | |||||||||||
| ID | Category | Severity | Reproducibility | Date Submitted | Last Update | |||||||
| 0000687 | [uClibc] Internationalization / Localization | minor | always | 02-06-06 00:59 | 10-11-08 23:50 | |||||||
| Reporter | rfelker | View Status | public | |||||||||
| Assigned To | carmelo73 | |||||||||||
| Priority | normal | Resolution | open | |||||||||
| Status | assigned | Product Version | ||||||||||
| Summary | 0000687: utf-8 mbrtowc accepts invalid bytes | |||||||||||
| Description |
According to section 3.9 of the Unicode Standard, UTF-8 is a mapping between byte sequences and "Unicode scalar values", which are integers in one of the ranges 0-0xd7ff or 0xe000-0x10ffff. The standard is clear that UTF-8 sequences are one to four bytes in length. uClibc accepts the illegal bytes 0xf5-0xfd giving 5- and 6-byte sequences for code points up to 0x7fffffff. Although there was a conflict in the past, my understanding is that ISO-10646 now agrees that UCS codes go only up through 0x10ffff and that UTF-8 is a 1-4 byte encoding, not 1-6 byte. |
|||||||||||
| Additional Information | ||||||||||||
| Attached Files | ||||||||||||
|
|
||||||||||||
| There are no notes attached to this issue. |
| Copyright © 2000 - 2006 Mantis Group |