This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Multicore utilization is very low

Tags: None
(comma "," separated)
joebauer
Registered Member
Posts
12
Karma
0

Multicore utilization is very low

Fri Oct 16, 2015 7:09 pm
Hi there,

I've recently done some renderings with kdenlive and the rendering takes awfully long because it doesn't seem to distribute work around cores properly. I've specified 12 threads in the rendering dialog, and melt is called with those 12 threads as an argument:
Code: Select all
/home/joe/kdenlive/20150905/./bin/melt /tmp/kde-joe/kdenliveN31673.tmp.mlt -profile atsc_1080p_25 -consumer avformat:/home/joe/kdenlive/myvid.mp4 progress=1 properties=x264-medium vb=8000k ab=160k threads=12 real_time=-1


and there's PLENTY of threads actually created

Code: Select all
$ ps -eLf | grep kdenlive | grep melt | wc -l
124


However, only one to two of these threads seem to actually perform any work:

Code: Select all
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND       
 3210 joe       20   0 9322708 1,832g  32104 S 144,6  5,9 128:04.13 melt                                                                                                     


What could the reason for this be? How can I fix it?

Thanks for your help! :-)
Joe
TheDiveO
Registered Member
Posts
595
Karma
3
OS
joebauer
Registered Member
Posts
12
Karma
0
TheDiveO wrote:https://forum.kde.org/viewtopic.php?f=265&t=122140


Oh no way, so there are really thread-unsafe modules? That sucks :-(

Even with those, it still should be possible to fork twelve processes and have them render different chunks of the video, should it not? Then they could even be thread-unsafe, but still multicores could be used.

That's a real shame. Took almost 3 hours to render the video on my beast of computer where the main "effect" was to rotate something 180°. That effect, I'm guessing, is not threadsafe.

*sigh*

Edit: Is there a way to find out which modules are non-threadsafe? I'd really like to avoid those in the future if there're alternatives.
User avatar
ttguy
Moderator
Posts
1152
Karma
6
OS
joebauer wrote:
Edit: Is there a way to find out which modules are non-threadsafe? I'd really like to avoid those in the future if there're alternatives.


https://github.com/mltframework/mlt/blo ... d_safe.txt
joebauer
Registered Member
Posts
12
Karma
0
ttguy wrote:
joebauer wrote:
Edit: Is there a way to find out which modules are non-threadsafe? I'd really like to avoid those in the future if there're alternatives.


https://github.com/mltframework/mlt/blo ... d_safe.txt


Hmmm, if this list is correct though, the reason for the low utilization has to be something else. I built my kdenlive at 2015-09-05, i.e. frei0r v0.3+. Therefore, only these modules should be affected:

Code: Select all
cat ./share/mlt/frei0r/not_thread_safe.txt  |grep -v '0.2' | grep -v '0.3'
# plugin name = lowest version that is thread safe or empty if not yet thread safe
alpha0ps
baltan
cartoon
cluster
delay0r
delaygrab
equaliz0r
facebl0r
facedetect
glow
hqdn3d
keyspillm0pup
lightgraffiti
mask0mate
nervous
plasma
rgbparade
scale0tilt
select0r
sharpness
squareblur
tehRoxx0r
vectorscope
vertigo


The ones I used heavily in the project though are

Code: Select all
$ cat myproj.kdenlive |grep kdenlive_id | sort| uniq -c | sort -n
      1    <property name="kdenlive_id">affinerotate</property>
      1     <property name="kdenlive_id">fade_from_black</property>
      1     <property name="kdenlive_id">fade_to_black</property>
      1    <property name="kdenlive_id">normalise</property>
      1    <property name="kdenlive_id">stereocopy</property>
      2    <property name="kdenlive_id">composite</property>
      2     <property name="kdenlive_id">fadein</property>
      2     <property name="kdenlive_id">fadeout</property>
      2     <property name="kdenlive_id">normalise</property>
      2     <property name="kdenlive_id">stereocopy</property>
     22     <property name="kdenlive_id">pan_zoom</property>
     45     <property name="kdenlive_id">crop</property>


i.e. "crop" and "pan_zoom". Neither of which are on that list... So any other reason for the bad performance?
User avatar
CorrosiveTruths
Registered Member
Posts
87
Karma
0
OS

Re: Multicore utilization is very low

Thu Jan 28, 2016 10:15 am
Just 'cause this is on the front page today, the solution was to set real_time=-n where n is the number of parallel mlt threads you wish to allow to speed things up.

It was in the thread linked to by Dive-o.
TheDiveO
Registered Member
Posts
595
Karma
3
OS
Afaik, On KF5-based Kdenlives, this parameter is exposed in the "Rendering" dialog as the number of "Encoder threads".
User avatar
CorrosiveTruths
Registered Member
Posts
87
Karma
0
OS
TheDiveO wrote:Afaik, On KF5-based Kdenlives, this parameter is exposed in the "Rendering" dialog as the number of "Encoder threads".

Nah, that's the threads=n (normal cpu threads) bit, you can control real_time=-n (mlt threads) by changing the processing threads in the configure -> mlt environment bit.

I should really report it as a bug, since setting processing threads (mlt threads) to higher than 1 causes issues in kdenilve (to be fair it does say that >1 is experimental), but you still want to render with more mlt threads to take advantage of multi-core more.

Setting threads (normal cpu threads) from the render screen is fairly useless, both in terms of the default being 1, and not being able to set it higher than number of cores (not sure why). It invariably leads to people setting it to as high as they can and then scratching their heads when mlt bottlenecks at an affine and they only get 100% cpu usage.

In fact, what would be better in my humble opinion (since rendering from the render dialog does one job at a time anyway) would be to completely omit threads and let the encoder handle that on its own and then be able to set real_time=-n to whatever from there. That's pretty much what I end up doing with generated scripts anyhow. Obviously with the usual >1 mlt threads can cause glitches depending on filters warning.

In the meantime, generate script, change real_time=-1 to real_time=-3 and you'll get your multicore use at the expense of not being able to use certain mlt effects.
TheDiveO
Registered Member
Posts
595
Karma
3
OS
What you are refering to is "Configure" > "Environment", then "Processing threads". I'm refering to "Encoding threads". Unless you are describing something else, a third parameter.

I happen to know as on my core i7 I can't use processing threads other than 1. However, I'm just perfectly rendering using multiple processing threads in MLT. And I know because I watch my eight virtual CPUs only getting to 30%, but not sticking at 1/8th as it is the case for your parameter explanation. When I set processing threads to 8 then Kdenlive chokes, yet rendering crawls on a single core. I may still be wrong, then please point out exactly what encoding threads is used for. Thank you.
User avatar
CorrosiveTruths
Registered Member
Posts
87
Karma
0
OS
TheDiveO wrote:What you are refering to is "Configure" > "Environment", then "Processing threads". I'm refering to "Encoding threads". Unless you are describing something else, a third parameter.

I happen to know as on my core i7 I can't use processing threads other than 1. However, I'm just perfectly rendering using multiple processing threads in MLT. And I know because I watch my eight virtual CPUs only getting to 30%, but not sticking at 1/8th as it is the case for your parameter explanation. When I set processing threads to 8 then Kdenlive chokes, yet rendering crawls on a single core. I may still be wrong, then please point out exactly what encoding threads is used for. Thank you.

Encoding threads in the rendering dialog sets the threads=n value in mlt, "Configure" > "Environment", then "Processing threads" sets the real_time=-n value in mlt.

From the mlt docs:

Does MLT take advantage of multiple cores? Or, how do I enable parallel processing?
Some of the FFmpeg decoders and encoders (namely, MPEG-2, MPEG-4, H.264, and VP8) are multi-threaded. Set the threads property to the desired number of threads on the producer or consumer. I think the gains are most noticeable on H.264 and VP8 encoding. Next, by default, MLT uses a separate thread for audio/video preparation (including reading, decoding, and all processing) and the output whether that be for display or encoding. Those two capabilities already go a long way. Finally, versions greater than 0.6.2 (currently, that means git master) can run multiple threads for the video preparation! It works using the real_time consumer property:
0 = no parallelism
> 0 = number of processing threads with frame-dropping
< 0 = number of processing threads without frame-dropping


Ergo, if you generate a script instead of render from the render dialog and edit the script to use, say, real_time=-3 and remove threads=8 you'll get better cpu usage.
TheDiveO
Registered Member
Posts
595
Karma
3
OS
So, threads>1 would still provide better usage when combined with real_time=-3, as the former helps decoding/encoding H.264 which I heavily rely on. And real_time speeds up MLT somehow. So it's not either/or, but both. Time for another feature request...?
TheDiveO
Registered Member
Posts
595
Karma
3
OS
I stand corrected for the real_time parameter. Some more data to validate on my iCore 7 3.7GHz with a recent project, these are rough peak-stable numbers over certain parts of the project:

threads=8 real_time=-1 --> ~30% (25%) CPU <-- my Kdenlive configuration when directly rendering inside Kdenlive.
threads=1 real_time=-3 --> ~40% (35%) CPU <-- script; as suggested ... better, not stellar, so decoding/encoding actually seems to be relevant, not only MLT.
threads=5 real_time=-3 --> ~70% (50%) CPU <-- significantly better
threads=8 real_time=-3 --> similar threads=5

These figures vary over the whole project, as some parts are computationally more intensive in the effect pipeline, while other parts are not. Thus, I've given here the consistent peak for several seconds, as well as the level during a significant part of the project.

So, as I suspected and asked for, it's not "either/or", it is obviously "and" -- at least with the projects and footage I'm working.

Looks as it is really time for filing a bug/feature report on this.
User avatar
CorrosiveTruths
Registered Member
Posts
87
Karma
0
OS
TheDiveO wrote:I stand corrected for the real_time parameter. Some more data to validate on my iCore 7 3.7GHz with a recent project, these are rough peak-stable numbers over certain parts of the project:

threads=8 real_time=-1 --> ~30% (25%) CPU <-- my Kdenlive configuration when directly rendering inside Kdenlive.
threads=1 real_time=-3 --> ~40% (35%) CPU <-- script; as suggested ... better, not stellar, so decoding/encoding actually seems to be relevant, not only MLT.
threads=5 real_time=-3 --> ~70% (50%) CPU <-- significantly better
threads=8 real_time=-3 --> similar threads=5

These figures vary over the whole project, as some parts are computationally more intensive in the effect pipeline, while other parts are not. Thus, I've given here the consistent peak for several seconds, as well as the level during a significant part of the project.

So, as I suspected and asked for, it's not "either/or", it is obviously "and" -- at least with the projects and footage I'm working.

Looks as it is really time for filing a bug/feature report on this.


I actually suggested omitting threads=n entirely.

cores*1.5 is default for threads in x264 encoding.
TheDiveO
Registered Member
Posts
595
Karma
3
OS
It's quicker for me to just adjust the real_time parameter.

Filed a feature request here: https://bugs.kde.org/show_bug.cgi?id=358695
TheDiveO
Registered Member
Posts
595
Karma
3
OS
Another chapter in the ongoing multicore utilisation saga.

For a recent project I worked on I let Kdenlive create a rendering script, then manually set real_time=-3 and threads=8. To my surprise this quickly brought up CPU usage on my Core i7 to 100% but also with memory consumption steadily growing and growing. Someone was leaking badly, I had to kill the script before the system got intro swap trashing. Because the project rendered fine from within Kdenlive, one or both of the parameters must be involved.

So I played around with these parameters: removing threads= and keeping real_time=-3 got me around 40% CPU usage, without any visible memory leak. This therefore seems to point towards a problem in ffmpeg, but not in MLT, as the leak as well as CPU load disappeared when removing threads=8.

For reference: rendering in Kdenlive (thus with real_time=-1) gives me typically around 25% CPU usage, whereas real_time=-3 causes around 40% CPU load. So yes, enabling multithreading within MLT itself improves the situation. Albeit not stellar... :z


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], lockheed, Sogou [Bot]