This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Collection Scanner cpu load issue

Tags: None
(comma "," separated)
joethefox
Registered Member
Posts
122
Karma
0
OS

Collection Scanner cpu load issue

Sun Dec 19, 2010 11:16 am
Every minute for about seven/eight seconds the cpu goes near 100% (intel core duo P8400) during update collection process. My collection has 11.200 tracks in 115 GB. No folder update, my tracks are always the same during this issue.
Before fill a bug report, I want to ask you if it is normal or maybe something doesn't working well in my configuration.

Amarok
Version 2.4-GIT
Using KDE 4.5.85 (4.6 Beta2)
build date Dec 13 2010


joethefox, proud to be a member of KDE forums since 2008-Oct.
User avatar
google01103
Manager
Posts
6668
Karma
25

Re: Collection Scanner cpu load issue

Sun Dec 19, 2010 12:08 pm
I can confirm the spikes though my timings are different (4 or so seconds for the spikes, think more than 1 minute between but haven't timed them).

Checked and didn't see a bug post on this issue.

using git compiled this morning


OpenSuse Leap 42.1 x64, Plasma 5.x

User avatar
Sentynel
KDE Developer
Posts
285
Karma
1
OS

Re: Collection Scanner cpu load issue

Sun Dec 19, 2010 12:21 pm
You can turn off watching the collection for changes if this is causing problems.

Is this a new issue? If so, can you work out which commit introduced it?


User avatar
Dieter Schroeder
Registered Member
Posts
714
Karma
7
OS
Collection scanner is running in a loop since the rewrite. The old scanner only ran on new entries. So it exists since the 2.4 git/beta whatsoever. Even playback stops for some seconds, btw.


If men could get pregnant, abortion would be a sacrament.
rengels
Registered Member
Posts
55
Karma
0
OS
Hi all,
the new collection scanner is definitely not running in a loop.
It is running ever 30 seconds like before.

The scanner needs to determine the change times of all directories to see if some tracks were added or removed. With my small collection of 6000 tracks this is quite fast but never the less very visible in the processor load.
I can imagine what over 10000 tracks would mean.
You still shouldn't get any audio skipping as this is done in a different process. Linux should handle the concurrent file access.

However since you are the second person complaining about an increased processor load I switched back to an older Amarok version.
The old version really is much faster and the load is not really visible there.
That means we have a regression and I will need to figure out what causes the delays, which could very well be the many debug messages that we currently have.
User avatar
Dieter Schroeder
Registered Member
Posts
714
Karma
7
OS
Thought we have already figured out this issue. The old scanner definitely ran only on filesystem changes. I've 65000 tracks in my collection, which means what...? Exact. The scanner runs always. As a sideeffect the new scanner destroys the collection by filling different artists, who have an album of the same name, under various artists. (It's GIT so not a big deal. I gave up the hope to keep my collection longer than 3 month back in the old days of 1.4)
So we have a scanner with sorting issues, which is slower, but needs more resources than the older working one.
Can someone explain a dumb guy like me the advantages of the new scanner, plz?

m0nk


If men could get pregnant, abortion would be a sacrament.
User avatar
google01103
Manager
Posts
6668
Karma
25
I always assumed functionality like this would be based on inotify


OpenSuse Leap 42.1 x64, Plasma 5.x

User avatar
Dieter Schroeder
Registered Member
Posts
714
Karma
7
OS
Yep, me too. Periodically scanning is next to no scanning like iTunes.


If men could get pregnant, abortion would be a sacrament.
rengels
Registered Member
Posts
55
Karma
0
OS

Re: Collection Scanner cpu load issue

Mon Dec 20, 2010 10:39 pm
Ok. I just checked the new collectionscanner timings.
When I do an update scan without any changes I get the following timings for my nearly 10000 tracks:

commit 24ms
complete scan 45ms

The times are split up like following:
10 ms starting the collection scanner
10 ms waiting for the scanning results
10 ms parsing the xml results
10 ms checking that every directory was found

Can you start amarok with --debug and give me the last outputs after an incremental scan?


Advantages of the new scanner:
Beside having much cleaner code it has the following benefits:
1. Album covers are only given to one album not to every album that has an image nearby.
2. Scanner will read a lot more meta information and use AlbumArtist, rating, score, playcount, ... if available.
3. The Scanner should do a better job identifying compilations. However that is more an art instead of science but:
4. the scanner will use the compilation tag when identifying compilations and if you once set "various artists" it will never forget again. Same for a non-compilation
5. the scanner and Amarok itself will now keep the collection browser in sync. You can see this during scanning when your collection is updated. The days where you moved a track to another album and it was not updated until you restarted Amarok are gone.

Just to counter some myths:
-It is only slightly slower than the old one. Tests have shown it running 13:30 minutes instead of 13:00 minutes on larger collections
-It does not use more resources. Actually it probably uses less RAM, and the source code is quiet a lot shorter and much easier to read
-The new scanner fullfills all the auto tests the old one did and quite a lot more

If you have some albums that the new scanner throws together (that the old one didn't) or if you have other complaints; just write it here.
But I would like to have numbers:
How many files, how many seconds, what are the filenames of the tracks it throws together.
User avatar
google01103
Manager
Posts
6668
Karma
25

Re: Collection Scanner cpu load issue

Mon Dec 20, 2010 10:57 pm
24127 tracks, 2156 albums

I had issues after a full rescan with lots (30->50 ?) albums fully taged being placed in various, an issue not seen in earlier versions and not in this one till the full rescan


amarok: [ScanManager] ScannerJob: run: 2042 current path "../MyMedia/xxxxxxxx/yyyyyy/zzzzzz - vvvvvvvvvvv"
amarok: BEGIN: virtual void SqlScanResultProcessor::commit()
amarok: END__: virtual void SqlScanResultProcessor::commit() [Took: 2.9s]
amarok: [ScanManager] ScannerJob finished
amarok: END__: virtual void ScannerJob::run() [Took: 3.9s]
amarok: BEGIN: virtual ScannerJob::~ScannerJob()
amarok: END__: virtual ScannerJob::~ScannerJob() [Took: 0s]
amarok: BEGIN: void StatusBar::hideProgress()
amarok: END__: void StatusBar::hideProgress() [Took: 0.001s]
amarok: BEGIN: void SqlRegistry::emptyCache()
amarok: [SqlRegistry] Cache unchanged
amarok: END__: void SqlRegistry::emptyCache() [Took: 0.002s]
amarok: BEGIN: void SqlRegistry::emptyCache()
amarok: [SqlRegistry] Cache unchanged
amarok: END__: void SqlRegistry::emptyCache() [Took: 0.001s]


OpenSuse Leap 42.1 x64, Plasma 5.x

valoriez
KDE CWG
Posts
625
Karma
3
OS
One entangled album I've noticed recently in VA is two both called "The Singles" -- by The Clash, and The Pretenders. Filenames of a few:

/home/valorie/Music/The Pretenders/The Singles/The Pretenders - 1 - Stop Your Sobbing.ogg
/home/valorie/Music/The Clash/The Singles/The Clash - 01 - White Riot.ogg
/home/valorie/Music/The Clash/The Singles/The Clash - 02 - Remote Control.ogg
/home/valorie/Music/The Pretenders/The Singles/The Pretenders - 2 - Kid.ogg

I've noticed that although I checked my tags with Picard recently, the Album Artist tag wasn't set, so I'll set it and I'm sure that will solve the problem. Sucks that it will have to be track by track, but oh, well. Such is the cost of progress.

However, a difficulty I'm not seeing is that for some albums NOT in VA, because I've set an Album Artist as a choir or orchestra or something (these are mostly Xmas or classical) -- the tracks don't show up under the Album Artist in the collection, but rather under the track artist. This can't be right, I think? I especially hate the ones showing up under [anonymous] or [traditional]. I would expect to see the album listed under the Album Artist.
User avatar
Dieter Schroeder
Registered Member
Posts
714
Karma
7
OS
Eve from Alan Parsons Project and Ufomammut
III from Led Zeppelin, Stinking Lizaveta and Heliotropes
Medusa from Annie Lennox and Stake-Off The Witch
Infinity from Journey and Jesu
Sehnsucht from Rammstein and Lacrimosa
II from Krux, Led Zeppelin, Sahg and Dreaming.
All the best of albums
...
Just recognized that although the years are different, they are filled under one year.
Structure is /media/Mucke/<initial>/<artist>/<album>/<artist>-<track>.<suffix>
Never set the Album tag before and it should not be needed. I've nearly 6000 CDs here, so...
Just erased the collection and did a full rescan (overnight btw.) but without success.
One of the few reason why I prefered amarok was its ability to handle various artists.

m0nk


If men could get pregnant, abortion would be a sacrament.
joethefox
Registered Member
Posts
122
Karma
0
OS

Re: Collection Scanner cpu load issue

Tue Dec 21, 2010 10:20 am
@rengels, last eight lines:

Code: Select all
$ amarok --debug
amarok:   BEGIN: virtual void SqlScanResultProcessor::commit()
amarok:   END__: virtual void SqlScanResultProcessor::commit() [Took: 2.5s]
amarok:   [ScanManager] ScannerJob finished
amarok: END__: virtual void ScannerJob::run() [DELAY Took (quite long) 7.1s]
amarok: BEGIN: virtual ScannerJob::~ScannerJob()
amarok: END__: virtual ScannerJob::~ScannerJob() [Took: 0s]
amarok: BEGIN: void StatusBar::hideProgress()
amarok: END__: void StatusBar::hideProgress() [Took: 0s]


My folder structure is similar to Dieter, /media/xhdd/Music/<initial>/<artist>/<album>/<track no.>-<track>.<suffix>, with Music folder containing 2295 sub-folders. I'm available for further tests.


EDIT: to remove doubts about disk performance:
Code: Select all
# mount
/dev/sda3 on /media/xhdd

# hdparm -t /dev/sda3

/dev/sda3:
 Timing buffered disk reads:  260 MB in  3.00 seconds =  86.60 MB/sec

# hdparm -T /dev/sda3

/dev/sda3:
 Timing cached reads:   3586 MB in  2.00 seconds = 1794.80 MB/sec


If can be useful as information, 60GB of 115GB total are lossless tracks extracted from CDs.


joethefox, proud to be a member of KDE forums since 2008-Oct.
rengels
Registered Member
Posts
55
Karma
0
OS

Re: Collection Scanner cpu load issue

Wed Dec 22, 2010 12:33 am
That are quite some times.
Not surprising on such a large collection, exspecially if the directory entries are not cached.

How long did the old Amarok version take for incremental scanning?

How long does a "find . -type d >/dev/null" take?




Regarding the albums.
It's very easy to see in your directory stucture what is going wrong:
/media/Mucke/<initial>/<artist>/<album>/<artist>-<track>.<suffix>

That is the sturcture that itunes and Amarok 1.4 creates. I don't like it but that's not the topic here.

Now think about a compilation. How would it look like...?

Right. Exactly the same. From the directory structure alone you can't tell an album from a compilation.
What we are doing now (and what also Amarok 2.3 should do) is that we have to consider an album a compilation if we find different artists for an album name.

If the old Amarok did not do this, then it was an error. This behaviour is clearly described in the auto tests and also logical.

So, how can we determine that it is probably not a compilation?
That could be an album called "Best Of" or "Live".
At least that's the only thing that we could think about. All other ideas that we had would fail in one or the other case.

Please propose something. The benefit of the new collection scanner code is that it's quite easy to change.

In the meantime I would propose to set "don't show under various artists". The scanner will respect the "compilation 0" tag.
rengels
Registered Member
Posts
55
Karma
0
OS

Re: Collection Scanner cpu load issue

Wed Dec 22, 2010 12:42 am
Upps. I just realized that I might really have made a mistake by not fully respecting the album artist tag.

That might help.

Do all your albums have the album artist set?
An album should only be a compilation if the album artist is empty or "various artists" or something similar.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], q.ignora, watchstar