BusyBox Bug and Patch Tracking
BusyBox
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000719 [uClibc] Internationalization / Localization major always 02-12-06 23:09 02-12-06 23:09
Reporter rfelker View Status public  
Assigned To uClibc
Priority normal Resolution open  
Status assigned   Product Version
Summary 0000719: Many non-european letters are classified non-alphabetic
Description uClibc inherits this bug from glibc, which incorrectly derives the alphabetic property. In addition to the L*, Nl, Sl, etc. Unicode character classes for letters, Unicode also includes an "Other_Alphabetic" class in http://www.unicode.org/Public/UNIDATA/PropList.txt [^] of combining marks (Mn and Mc) in South Asian scripts which are certainly letters. Arguably all combining marks should be included in class alpha (otherwise decomposed alphabetic strings with accents/diacritics will be nonalphabetic), but the ones in Other_Alphabetic MUST be included.

This bug results in most words in most South Asian scripts being classified nonalphabetic; thus I consider it major.
Additional Information DerivedCoreProperties.txt from Unicode contains the full list of characters considered alphabetic by Unicode. IMO it's insufficient, but it's a minimal list of what must be included.

This bug cannot be easily fixed without processing the Unicode data directly rather than mirroring glibc, unless glibc also fixes their bug.
Attached Files

- Relationships

There are no notes attached to this issue.

- Issue History
Date Modified Username Field Change
02-12-06 23:09 rfelker New Issue
02-12-06 23:09 rfelker Status new => assigned
02-12-06 23:09 rfelker Assigned To  => uClibc


Copyright © 2000 - 2006 Mantis Group
Powered by Mantis Bugtracker