This forum has been archived. All content is frozen. Please use KDE Discuss instead.

unfa says "hi!" - thinking about video rendering performance

Tags: None
(comma "," separated)
User avatar
unfa
Registered Member
Posts
34
Karma
0
Hi!

TL;DR; How fast can we make Kdenlive render video and how do we do that?

I'm unfa, I'm a Linux electronic music producer and vocalist. I've just released a new album produced entirely with Linux (Linux Mint 18.2, KDE 5, LMMS, Ardour - more details in the album description on Bandcamp if you're interested): https://unfa.bandcamp.com/album/suppressed

I'm also running a YouTube channel where I teach open-source and Linux-based electronic music production.
As of November 2017 I have around 1,500 subscribers on YouTube: https://www.youtube.com/unfa000
Also a few Patreon supporters. http://patreon.com/unfa

So far I've been using Blender VSE for all my video editing needs, but I'm constantly looking for other open-source video NLAs to replace it, because I have many problems with it. (I made a video about it: https://www.youtube.com/watch?v=Y6PD6Zsh1bo)

The development of Blender VSE is completely stalled for many years now. Still - I wasn't able to find anything that'd do the job for me apart from fro a decade. I'm trying to edit one video with Kdenlive to give it a solid go and maybe switch to it for good. I would really hope that I can do that switch.

What I do is capture a 3840x1080 60 FPS MKV video that I then split to two images: my screen (one half) and my webcam (other half). Then I composite one on top of another with a custom mask, add an animated overlay in the corner. I also did some animation with zooming in and out to different parts of my screen to make it easier fro the viewers to see what I'm talking about. Sometimes I drew custom masks to highlight some elements on the screen too. Blender has a very neat tool for that.

I also sometimes cut/fade to my webcam in fullscreen for parts where I talk and not do anything on the screen.

I also use Proxy editing in Blender all the time, because even with Ryzen 7 1700 processor (8 cores, 16 threads) and an Nvidia GTX 1060 GPU I can't have anything approaching fluent editing unless I use proxy, and even with thatit is pretty poor. Also Blender generates proxies in a queue with a single thread, so for hours-long HD footage it can take ages. And then it often just removes the generated proxies for no apparent reason so I have to redo that. So it's a pain to work with on big projects.

To render videos with Blender I used a PHP script called Pulverize. It basically runs multiple Blender instances, sets the frame ranges so that each instance rendered an equally sized chunk of the whole project, and then concatenated the resulting videos with ffmpeg into a single one.

The result is huge RAM overhead, because something that could have been achieved within Blender with shared memory buffers is done with completely separated processes - on my Ryzen 7 1700 machine the disk seems to be the bottleneck there, but I can't tell for sure. 12 threads for small projects (10 minutes of simple video) work well, for bigger ones (1 hour of composited video) 6 threads seem like an optimal spot for HDD <-> CPU load balancing.

MLT is "advertised" to feature multi-threaded CPU and GPU rendering (source: https://www.mltframework.org/features/), but I have no idea how well that works in practice. I heard that the GPU rendering is highly experimental (and it crashed Kdenlive for me very much so far).

i had some experience with Kdenlive in the past but there was obviously something wrong with MLT on my machine back then because the rendering was so slow it was practically stalled even fro simple projects. Even though the like editing preview worked fine. That made me abandom Kdenlive fro a log time, becasue I couldn't get it to work for me.

Anyway -back to the rendering performance.

I wonder if a similar "timeline splitting" approach from Pulverize could be used to improve rendering speed in Kdenlive? Would that make sense?

With Blender Pulverize the problem is that each process reads the same input files, but the have to read them in different places, so the HDD heads have to jump around like crazy to just read the input, which I think is a big bottleneck.

Natron on the other hand renders frames in order, using multiple threads, but the system cache can save a lot of extra disk reads I guess and possible memory sharing between processes probably speeds things up even more. It's not crazy, but it's fast and manages to occupy 100% of my 16 CPU threads all the time when rendering which I like.

I guess having multiple threads each rendering every n-th frame will make the disk reading the input go much smoother, but then all the frames would need to be concatenated into a video stream later on. Otherwise one would have to render it out to PNG sequnce and encode that later to video, which would add a huge storage, time and processing overhead on it's own. Probably some read-ahead and buffering for the input files could help with disk performance even further.

I'm thinking about this, because I usually make 1 hour+ long videos in 1920x1080, 60 FPS with at least 2 FullHD 60FPS videos on the inputs and some compositing going on so it can easily take whole night to render something like this in a single thread, and I would like to harness my CPU power and improve that, without having to delay everything by 24-hours having my machine on, almost idling at all times, which is what it looked like when I didn't use Pulverize for Blender VSE.

I'm by no means an expert, I'm just trying to figure stuff and enable the software to fully utilize my hardware for video production, so I can make more, better videos faster and share my knowledge with people, and empower the open-source music making community.

I'd love to talk about video rendering performance and see what can be done to improve that in MLT and Kdenlive.

I have captured a video with my "first impressions" of trying to do my thing in Kdenlive. Maybe it'll be helpful to highlight some problems occuring for newcomers.

Anyway - that's my strange "welcome"!

Here's a video captured with SSR and not edited at all:
https://youtu.be/4cwfuNF2myw
User avatar
bartoloni
Moderator
Posts
1510
Karma
4
OS
Hi Unfa! .. performances are a good thing, but for now i hope that the 2 minds under Kdenlive ( Jean-Baptiste Mardelle and Vincent Pinon) that are 2 real super heroes can make more stable this great software before spending time on optimizations and performances.

looking at the 473 opened bugs on the bug-tracker.. i think that next year (2018) will be spent fixing transitions/effects, MLT-XML-parsing(maibe), and copy/paste (and use of library) of project sections.

i hope also that more developers can join Kdenlive... and maybe more money (Kickstarter? Patreon?)
vpinon
KDE Developer
Posts
708
Karma
6
OS
Hello,
Tahnks bartoloni for qualifying me as hero, I would rather give the title to alcinos who is really coding for 1 year, while I'm just packaging and fixing obvious bugs ;)
Back to topic: Kdenlive team is focused on UI stability and effectiveness (and a big code rewrite for one year should bring a significant step forward soon), while backend operations are handled by MLT.
MLT itself can multithread its operations (I believe by splitting frames into slices, not sure), however most effects and transitions come from 3rd party libraries (frei0r, avfilter, Movit...), and then the threading is sometimes not well handled (either in the lib, or in the interface) and can cause some render bugs (frames skipped, image corruption) or significant slowdowns!
Note that depending on output codec, encoding can be about 3 to 10 times slower than processing (eg simple effects vs h264/265 or vp8/vp9), meaning that it's more efficient to allocate more threads to encoding than to processing.
GPU processing and encoding seems also an interesting track, however it sills proves slower (most of the time) as the whole chain is not 100% on GPU but requires CPU<->GPU copies (eg for effects that are not coded for GPU), and even GPU encoders are often slower and less efficient (quality vs size) than CPU ones... Commercial products claim to have good solutions for years, but (almost) nobody seems to be working in this direction in the MLT/FFmpeg worlds...
User avatar
unfa
Registered Member
Posts
34
Karma
0
Thank you guys for the input - and thank you for working on Kdenlive!

I guess I have to agree that reliability probably should have priority above performance (which usually is not bad).
I can't wait to see the rewrite complete and see the amazing stuff you're cooking there :)


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]