This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Slow building collection using MySQL

Tags: None
(comma "," separated)
larand54
Registered Member
Posts
4
Karma
0
I've seen many question like this but never seen any solution or explanation. Anyone who can spread som light over this mess?

I have more than 60000 files so that's one point but should it take 3-5 hours to finish? And even now it seems to stop at 95%. It has an cpu load of 99.5%(mysqld) in this case. Now problem with MySql - I can use it as normal even with this load.

I suspect it can have something with utf8/latin1 and local characters (Sweden äåö).

From the beginning I used Latin1 but have now tried with utf8 but no success. I have the files stored on an USB-drive formatted in ntfs. I uses ntfs-3g for the moment but it was the same problem with the standard drivers.

Hope for some explanation/solution....
qurk
Karma
0
Hmm it's maybe sortof hit and miss.  You are starting from scratch?  I know I had some pain in converting my database from sqlite to mysql, the Japanese letters gave me a lot of grief.  For 60,000 songs?  I've got about half those and waited a long time at least once, so 3 hours maybe, 5 hours would be stretching, imo.  One thing I've learned on my own filesystem is that non-ascii letters (mostly anything but normal letters and some symbols, and that includes "normal" letters with accents and such) are sortof a show stopper.  Hmm I guess I just am here to offer you encouragement, and to suggest not to get angry if all your waiting is for nothing.  There's gotta be a way :)
User avatar
dangle_wtf
Moderator
Posts
1252
Karma
0
Part of it could be the external device, as well as ntfs. When the collection is building, what sort of load is the rest of your system under? It's possible that the actual reading of data is the bottleneck, rather than mysql. My collection of 15k tracks on a nfs network share only take about 10-15 mins to build with mysql.
To be honest, all my tracks are ascii filenames simply to make life easier all round, so it is still possible that other characters might be causing the slowdown. To compare, you could do some experiments with a bunch of tracks that don't have extended (or other encodings) characters in the filenames, compared with the same number of tracks that do.


"There are two theories to arguing with women. Neither one works."
.
If men could get pregnant, we'd learn the true meaning of "screaming nancyboy wuss"
larand54
Registered Member
Posts
4
Karma
0
I finally discovered at least one of the reasons why the Collection handling is so slow.

The collection got a lot of indexed fields and that should be ok if they not used the "lower"-function as in:
Code:
# Query_time: 1  Lock_time: 0  Rows_sent: 0  Rows_examined: 151212
SELECT DISTINCT album.name FROM tags, album, artist WHERE tags.album = album.id AND tags.artist = artist.id AND lower(artist.name) = lower('Jeff Buckley')  AND tags.deviceid IN (25,27,28,29,30,31,-1) ORDER BY lower( album.name );


This usage disables the index and MySQL is forced to search sequentially through the database. This causes a lot of work if you have over 70000 tracks.

My suggestion is to add a duplicate field for those indexed fields. One field that contain the original field value ond one that is normalized like "lower".
This normalized field should only be used in the search functions and not for display.

My collection contains about 73000 tracks, and tested and it took 0.76sec for one search and 0.04sec when I removed the "lower"-function.
I also tested running MySQL query browser on another computer over the network. This gave my a more accurate reading.
0.0011sec against 0.8252sec !! More than 700 times faster!!

I have another example here:
This is recorded in the MySQL logfile  while Amarok playing som music.
Code:
[color=blue]# Query_time: 3  Lock_time: 0  Rows_sent: 0  Rows_examined: 151212
SELECT DISTINCT album.name FROM tags, album, artist WHERE tags.album = album.id AND tags.artist = artist.id AND lower(artist.name) = lower('Nekobeddo')  AND tags.deviceid IN (25,27,28,29,30,31,-1) ORDER BY lower( album.name );
[/color]

I hope that the Amarok developers can look at this. Do they read this forum?

I'm interested in what all users think of this.
Note the time was 3 sec and it has examined 151212 records and no result delivered! A waste of CPU time i would call it ;-)
[size=100][/size]
User avatar
Dr.Diesel
Registered Member
Posts
44
Karma
0
My collection of 110k songs takes about 6-8 hours on a 3Ghz P4 with dedicated RAID 5 hardware, mysqld sucking up 100% CPU.  Each search query from within Amarok takes about 5 seconds each.  Not too bad but I would like to find ways of improving this.  These durations are the same on NFS mounted clients over a gigE network.

I actually ended up removing the "Enter" key function in collectionbroswer.cpp cause it always locked up Amarok attempting to add the entire search query results into the playlist!  I believe there should be an option for removing this functionality.
ankle
Registered Member
Posts
6
Karma
0
I was just looking at this exact same query (used for seeing if any of the Related Artists are in your collection) and came to the same conclusion. In fact, it doesn't make any difference if the lower() is there at all - SELECT queries in MySQL are case-insensitive.

Removing the lower() from the query results in a 100x speed increase: 0.7s to 0.007s (with MySQL's cache disabled).

Filed a bug here: http://bugs.kde.org/show_bug.cgi?id=152749

Last edited by ankle on Thu Nov 22, 2007 11:31 pm, edited 1 time in total.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]