This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Broken utf-8 decomposed display

Tags: None
(comma "," separated)
wouter
Registered Member
Posts
4
Karma
0

Broken utf-8 decomposed display

Thu Oct 21, 2010 8:30 pm
The konsole shipped with Suse-10 (a few years old, I don't know the version) displays utf-8 characters OK.
The new one (2.5, kde 4.5.1, Ubuntu 10.10), does not display accents, if these are decomposed (so a separate charater, say o, e or a, and (mostly two) following bytes to indicate an accent. The characters that include an accent, (the composed form, as used in the unicode latin-1 set) are displayed OK.

Another problem is that if a character is separated from it's combining accent by a combining grapheme joiner, the accent is put on the next, instead of on the previous character.

Mind that it is not caused by the editor or environment setting, using the same environment and same editor (vim), but a different terminal (gnome-terminal or putty), the display is OK.
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS

Re: Broken utf-8 decomposed display

Fri Oct 22, 2010 3:10 am
Can you please post some decomposed letters so I can test this?


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
wouter
Registered Member
Posts
4
Karma
0

Re: Broken utf-8 decomposed display

Fri Oct 29, 2010 8:52 am
Hi

Examples below, including the hex-code in utf-8.

Mind that there is a problem also with "CD 8F", the combining grapheme joiner, as used in Bilbliografieen below. It should combine an 'e', with the combining diaresis that is after the "CD 8F", but in konsole the diaresis is not displayed above the 'e', but above the 'n' following.

(konsole is not the only terminal that makes this mistake).




B♭ HEX:42 E2 99 AD 20
Kamchatskai︠a︡ oblastʹ HEX:4B 61 6D 63 68 61 74 73 6B 61 69 EF B8 A0 61 EF B8 A1 20 6F 62 6C 61 73 74 CA B9 20
Systèmes HEX:53 79 73 74 65 CC 80 6D 65 73 20
Délire, HEX:44 65 CC 81 6C 69 72 65
Enquêtes HEX:45 6E 71 75 65 CC 82 74 65 73 20
Niños HEX:4E 69 6E CC 83 6F 73
Sagarmāthā HEX:53 61 67 61 72 6D 61 CC 84 74 68 61 CC 84 20
Chăm HEX:43 68 61 CC 86 6D 20
İznik HEX:49 CC 87 7A 6E 69 6B 20
Südafrika HEX:53 75 CC 88 64 61 66 72 69 6B 61 20
Ahiṃsā, HEX:41 68 69 6D CC A3 73 61 CC 84
Prons̤ticos, HEX:50 72 6F 6E 73 CC A4 74 69 63 6F 73
Chișinău HEX:43 68 69 73 CC A6 69 6E 61 CC 86 75 20
français HEX:66 72 61 6E 63 CC A7 61 69 73
przestępczość HEX:70 72 7A 65 73 74 65 CC A8 70 63 7A 6F 73 CC 81 63 CC 81
Điện Biên Phủ HEX: C4 90 69 65 CC A3 CC 82 6E 20 42 69 65 CC 82 6E 20 50 68 75 CC 89
Compaį́as consolidadas, HEX: 43 6F 6D 70 61 69 CC A8 CC 81 61 73 20 63 6F 6E 73 6F 6C 69 64 61 64 61 73
Phra Nnakhō̜n Sī ʻAyutthayā, HEX:50 68 72 61 20 4E 6E 61 6B 68 6F CC 9C CC 84 6E 20 53 69 CC 84 20 CA BB 41 79 75 74 74 68 61 79 61 CC 84
Monografiee͏̈n HEX:40 4D 6F 6E 6F 67 72 61 66 69 65 65 CD 8F CC 88 6E
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS

Re: Broken utf-8 decomposed display

Fri Oct 29, 2010 8:11 pm
Is the following correct? http://imagebin.ca/view/xLtXA2W.html'


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
wouter
Registered Member
Posts
4
Karma
0

Re: Broken utf-8 decomposed display

Fri Oct 29, 2010 8:24 pm
Most of it is correct, in fact it looks like all combining accents are OK now, but some others are wrong (I don't know these languages either, so I can just look at the pictures)


Some accents are missing in the first line (Kamsat..), these are the three-byte sequences: EF B8 A0 and EF B8 A1


The accent CC A4 is missing in Pronsticos (some kind of diaresis below the 's')


Thanx so far...
The word Monografieen, should have a diaresis above the last 'e'
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS

Re: Broken utf-8 decomposed display

Tue Nov 02, 2010 5:13 am
I suggest filing a bug report. Please include characters like you did here, and a zoomed screenshot of how it does display and how it should.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
wouter
Registered Member
Posts
4
Karma
0

Re: Broken utf-8 decomposed display

Tue Nov 16, 2010 8:39 pm
A bug-report has been filed on bugs.kde.org:

Bug 255862 - Broken utf8-decompsed display


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], q.ignora, watchstar