This forum has been archived. All content is frozen. Please use KDE Discuss instead.

[SOLVED] Transfer to media device messes up unicode utf-8 tags

Tags: None
(comma "," separated)
nobody
Registered Member
Posts
9
Karma
0
I have an unicode (utf-8) tag problem. I'm not yet sure where is the problem.

My procedure:
Rip cd to mp3 with kaudiocreator or k3b and both use lame for encoding and tag writing. Tag entries are from cddb or freedb.
Use Amarok(1.4.8) to send the files to my Creative Zen through MTP (libmtp 0.2.4) protocol.

Problem:
Transfer queue screws up the titles that contain non-ascii characters. They are shown with ISO-8859-1 in the transfer queue and in Zen.

System:
kubuntu feisty (utf-8 locales)

My research so far:
- I don't know what version of id3v2 lame uses. Only 2.4 supports utf-8 (http://en.wikipedia.org/wiki/Id3_tag#ID3v2), but I'm not 100% sure if lame uses 2.4. I'm fairly sure though. I use --id3v2-only in the lame options.
- id3v2 (command line utility) shows the titles correctly as far as my konsole uses utf-8 for encoding. So far so good.
- Zen should be able to show utf-8, but not 100% sure here either.
- mp3unicode (another clu) claims to be able to convert to unicode tags, but so far I haven't been able to use it successfully; it just deletes all tags.
- easytag (gui) has all kind of unicode options for tags, but I only managed to mess up the titles with that
- perl script (http://amarok.kde.org/wiki/FAQ#Amarok_i ... roperly.21) didn't help, it just screwed up the tag, or failed

Goal:
I have lots of cd's I want to transfer to the Zen. I don't want any extra hassle, like needing to use some perl magic or other utils. Just simply rip and use amarok to transfer.

Possible solutions:
- Force lame to use ISO-8859-1, but I'm not quite sure where the utf-8 titles originate. From cddb I quess, so I would have to modify lame to encode the command line options to ISO-8859-1 before writing the tags.
- Just bite the bullet and find some way to convert a bunch of mp3 files to ISO-8859-1 encoding (Which is enough for my cd collection titles).


Thanks for listening!

Last edited by nobody on Sun Jan 20, 2008 6:38 am, edited 1 time in total.
nobody
Registered Member
Posts
9
Karma
0
Problem was with lame. It marks the tag encoding as ISO-8859-1, so I patched it to mark them as UTF-8 and now everything works.

Patch attached for libmp3lame/id3tag.c, if anyone wishes to do the same.

NOTE! The proper way would include a command line option to let the user specify the encoding, or maybe some obscure logic to detect the encoding of a given string, but for my purposes this was enough.

From id3.org, what v2.4 specs about the encoding byte:

    $00  ISO-8859-1 [ISO-8859-1]. Terminated with $00.
    $01  UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
          strings in the same frame SHALL have the same byteorder.
          Terminated with $00 00.
    $02  UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
          Terminated with $00 00.
    $03  UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.
iwakun
Registered Member
Posts
3
Karma
0
Could you explain how to apply this patch?  I've never done one before.
nobody
Registered Member
Posts
9
Karma
0
Well, the easiest way is just to edit the id3tag.c file and find the following lines:

        /* clear 1 encoding descriptor byte to indicate ISO-8859-1 format */
        *frame++ = 0;

and change *frame++ =0; to

        *frame++ = 3;

I should have created the patch from the root folder of libmp3lame for easier patching. Maybe I'll do that someday and give the full instructions how to apply it.

ps. the patch also changes the comment, but I omitted that here


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], kde-naveen, Sogou [Bot]