This forum has been archived. All content is frozen. Please use KDE Discuss instead.

How to save disk space by eliminating duplication of audio data?

Tags: None
(comma "," separated)
wangyuanzju
Registered Member
Posts
2
Karma
0
I have a really large collection (10K tracks) and it occupies lots of disk space, which challenges the limit of my poor notebook time and again. I want it to occupy less disk space but I also don't want to lose any music.

After some research, I find that there are a lot of duplications of audio data in my collection. That is, the same track can occurs multiple times in multiple albums. For example. the same song "The Times They Are A-Changin'" of Bob Dylan are in album "The Times They Are A-Changin'" and "The Best of Bob Dylan". These two files have different tags but same data for the audio. So in theory we can only retain one copy of the audio (Note that we have to retain two copies of the meta-data to indicate that the song shows in two albums).

I then write a simple Java program to find out how many such duplicates. The result is awesome, 30% of the disk space can be saved if I eliminate the duplicates.

In order to achieve this goal, there are two steps to do. First, we should have a program to find out the duplicates. We can not just compare the MD5 digest of the audio for different codecs or different version of the same codec with lead to different result, let alone the tracks may have different bitrates. However, I think I can write a program to do this.

My trouble is the second step, that is, to play these music in Amarok. Since we have to separate the metadata and audio of a song into two files, I think I should introduce a new music format (Is there an existing format that can do this?). How can I let Amarok to support this new format?
wangyuanzju
Registered Member
Posts
2
Karma
0
No answers, Hu?

Maybe my questions is too lengthy, what I really want to know is whether is there any music format with separated metadata and audio data?


Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], Google [Bot]