This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Converting tags to unicode

Tags: None
(comma "," separated)
User avatar
dangle_wtf
Moderator
Posts
1252
Karma
0

Converting tags to unicode

Fri Jun 30, 2006 11:32 pm
Since Amarok switched to using unicode exclusively for track tags, there has been a constant stream of queries about various codepages, and how to convert the tags.

I noticed this little app on kde-apps this morning, which should help quite a few people.

MP3Unicode
MP3Unicode is a command line utility to convert ID3 tags in mp3 files between different encodings. For example,

mp3unicode --source-encoding cp1251 --id3v1-encoding none --id3v2-encoding unicode file.mp3

will read id3v2 tag (or id3v1 tag if there is no id3v2) from the file, convert the text fields in the tag from cp1251 to Unicode and will write id3v2 tag back, stripping away id3v1 tag.


Please note I'm not the dev of this app, nor is it an Amarok project, so if you have any questions or problems, the best thing would be to either post on the kde-apps page, or contact the author.

And always remember, when trying anything new, always work on a COPY of your music first - NEVER your original.

Hope this helps a few of you!


"There are two theories to arguing with women. Neither one works."
.
If men could get pregnant, we'd learn the true meaning of "screaming nancyboy wuss"
fortezza
Registered Member
Posts
3
Karma
0

Re: Converting tags to unicode

Sat Jul 29, 2006 7:39 pm
Nice idea, but I wonder how the end user is supposed to know what encoding they are coming from? It would be better if the tool code handle that part and read the command as "convert from whatever to unicode". Otherwise, I do not see how it would be useful.
User avatar
dangle_wtf
Moderator
Posts
1252
Karma
0

Re: Converting tags to unicode

Sun Jul 30, 2006 12:52 am
This would be an issue best addressed on the kde-apps page for the utility, as mentioned in the original message. Amarok has nothing to do with this project. It's merely listed here as a resource.

If anyone has other utilities they've used that are useful, perhaps they could also post them.


"There are two theories to arguing with women. Neither one works."
.
If men could get pregnant, we'd learn the true meaning of "screaming nancyboy wuss"
User avatar
eean
KDE Developer
Posts
1016
Karma
0
OS

Re: Converting tags to unicode

Wed Aug 02, 2006 8:59 pm
fortezza wrote:Nice idea, but I wonder how the end user is supposed to know what encoding they are coming from? It would be better if the tool code handle that part and read the command as "convert from whatever to unicode". Otherwise, I do not see how it would be useful.
The most fun part of encodings is that its impossible to know which one they are reliably (why web browsers often still have that huge list of possible encodings).


Amarok Developer
Gleb Litvjak
Registered Member
Posts
61
Karma
0

Re: Converting tags to unicode

Thu Sep 28, 2006 7:39 am
Well, there IS a way to auto-guess the encoding. I'm using a patched taglib from the rus-xmms project (yes, the project originally was supposed to allow XMMS to recode cp1251 tags, hence the name, but now it became something more). See http://rusxmms.sourceforge.net/ for details.
nobody
Registered Member
Posts
9
Karma
0

Re: Converting tags to unicode

Sun Jan 20, 2008 6:39 am
If you want to know how to patch lame to produce utf-8 tags, see http://amarok.kde.org/forum/index.php/t ... 972.0.html
areskz
Registered Member
Posts
5
Karma
0

Re: Converting tags to unicode

Sun Jan 27, 2008 12:14 am
Also you can try this:

Code: Select all
#!/bin/bash
find $1 -name "*.mp3" -print0 | xargs -0 mid3iconv -e CP1251


(It requires mutagen, as far as I remember).
news1234
Registered Member
Posts
6
Karma
0

Re: Converting tags to unicode

Tue Nov 11, 2008 9:11 pm
I'm having the problem of many Ukrainian/Russian songs
ALmost none UTF-8 encoded

For the songs I know at least, that it's one of two languages

I thought about following semi-automatic appoach.

Try several codigns
- reject all automatically, which result in decoding errors
- take the other codings and run each against a Ukrainian/Russian  spell checker (Normally know the language upfront)
- if there's multiple solutions (several without spelling errors  or none without spelling error),
prompt for the one to be chosen (ordered by least amount of spelling mistakes)
- when a song has been accepted add all words of the song and the band name to the spell checker

Probably it would be faster to just display a list of potential codigns for each file to be translated and just select which
translation should be taken.

Assuming, that there's no tool doing exactly what I need, I'll probably try to write something small / not user firendly in python.
- character recoding is part of python (function unicode() and the string method encode() )
- the library ID3 can be used to read modify id3 tags
- the library enchant could be used to communicate with a spell checker

bye

N
User avatar
markey
KDE Developer
Posts
2286
Karma
3
OS

Re: Converting tags to unicode

Wed Nov 12, 2008 7:04 am
Amarok 2 now has an encoding detector built in (borrowed from Firefox) which is pretty accurate.


--
Mark Kretschmann - Amarok Developer
flying_stranger
Registered Member
Posts
26
Karma
0

Re: Converting tags to unicode

Sat Nov 22, 2008 3:05 am
+1 to news1234.

Amarok2 does not detect East-European encodings in tags
(KOI-8r KOI-8u CP1251...)
http://www.picatom.com/r/capture1-6.html


flying_stranger, proud to be a member of KDE forums since 2008-Oct.
donga
Registered Member
Posts
1
Karma
0

Re: Converting tags to unicode

Fri Jan 30, 2009 7:59 pm
Mozilla's charset-detector does not detected correctly in Thai language too. (Many Thai songs were TIS-620 encoded)

Since Amarok team had removed my beloved "manual Charset selection" feature from Amarok 1.4 (or 1.3 ? .. not sure) and I think that Mozilla's charset-detector was not ready for using.
So, i was trying to bring it back but by guess from locale. now, i have patches for Amarok 1.4.10 and 2.0.x already.
These are my patches:
For Amarok 1.4.10: http://linux.thai.net/websvn/wsvn/softw ... tring.diff
For Amarok 2.0.x: http://linux.thai.net/websvn/wsvn/softw ... ocale.diff

They are pretty work for me (for a locale which i was using).

Sure, I don't expect about Amarok team would be accept my patches, but please kindly consider to use another method for detecting charset instead of Mozilla's charset-detector.

Regards,
donga.
User avatar
markey
KDE Developer
Posts
2286
Karma
3
OS

Re: Converting tags to unicode

Fri Jan 30, 2009 10:45 pm
donga wrote:Mozilla's charset-detector does not detected correctly in Thai language too. (Many Thai songs were TIS-620 encoded)

Since Amarok team had removed my beloved "manual Charset selection" feature from Amarok 1.4 (or 1.3 ? .. not sure) and I think that Mozilla's charset-detector was not ready for using.
So, i was trying to bring it back but by guess from locale. now, i have patches for Amarok 1.4.10 and 2.0.x already.
These are my patches:
For Amarok 1.4.10: http://linux.thai.net/websvn/wsvn/softw ... tring.diff
For Amarok 2.0.x: http://linux.thai.net/websvn/wsvn/softw ... ocale.diff

They are pretty work for me (for a locale which i was using).

Sure, I don't expect about Amarok team would be accept my patches, but please kindly consider to use another method for detecting charset instead of Mozilla's charset-detector.


If you could please send your patch for 2.0.x to amarok-devel@kde.org, we will be happy to review it. 1.4.x is no longer maintained, so we would not patch it.


--
Mark Kretschmann - Amarok Developer
User avatar
eean
KDE Developer
Posts
1016
Karma
0
OS

Re: Converting tags to unicode

Sat Jan 31, 2009 1:31 am
donga wrote:Mozilla's charset-detector does not detected correctly in Thai language too. (Many Thai songs were TIS-620 encoded)

Since Amarok team had removed my beloved "manual Charset selection" feature from Amarok 1.4 (or 1.3 ? .. not sure) and I think that Mozilla's charset-detector was not ready for using.
So, i was trying to bring it back but by guess from locale. now, i have patches for Amarok 1.4.10 and 2.0.x already.
These are my patches:
For Amarok 1.4.10: http://linux.thai.net/websvn/wsvn/softw ... tring.diff
For Amarok 2.0.x: http://linux.thai.net/websvn/wsvn/softw ... ocale.diff

They are pretty work for me (for a locale which i was using).

Sure, I don't expect about Amarok team would be accept my patches,

What's wrong with your patch?
but please kindly consider to use another method for detecting charset instead of Mozilla's charset-detector.


Are there other methods of charset detection?


Amarok Developer


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]