This forum has been archived. All content is frozen. Please use KDE Discuss instead.

GPU accelerated painting - is this possible?

Tags: None
(comma "," separated)
Double Dee
Registered Member
Posts
20
Karma
0
The biggest issue for people about Krita is painting speed. I saw some opinions about Krita in various places and it seems like people often complain about the speed.
I understand, that dev team is small and optimization is insanely hard, but wouldn't it be possible to find a new way for speeding things up? This one is more important than vector overhaul in my opinion.

Latest version of Paintstorm Studio has GPU accelerated brushes. It makes painting a lot faster. GPU seems like a better tool than CPU.
Would it be possible in Krita to implement GPU acceleration for painting? How hard it is? What are possible barriers to achieve that?
User avatar
halla
KDE Developer
Posts
5092
Karma
20
OS
First off: people will _always_ complain that krita is too slow, no matter what we do. We already spent a lot of effort optimizing things, and honestly, if I compare our smudge brush with photoshop's smudge brush, on the same hardware, same image size, same brush size, both applications are equally responsive.

The problem with moving painting to the gpu is that you then also need to have your image data on the gpu. It is too slow to move image data from main memory to gpu memory for painting and then back. It would make everything faster, but gpu memory is limited compared to main memory and that limits the maximum image size and the number of layers.

That said, some people have said they wanted to investigate and do stuff, but nobody has ever shown me any code. Since Krita is open source, that's the only way things can move forward: people have to take action.
Double Dee
Registered Member
Posts
20
Karma
0
I see. I appreciate the effort taken to optimize it and instant preview mode. Still sometimes it can get slow, for example a bristle brush with low spacing and texture applied. But also my CPU is a bit old, I know I should upgrade it. I'll see how fast it is after upgrading. I can't compare to Photoshop, because I don't own it.
User avatar
halla
KDE Developer
Posts
5092
Karma
20
OS
" Still sometimes it can get slow, for example a bristle brush with low spacing and texture applied."

Yes... That brush engine needs to do calculations for _every_ bristle. In general, we don't want to prevent people from doing outrageous stuff with Krita, all the options are always there, and that means that it will always be possible to create brush presets that just don't work fast enough on a given machine. Krita is about possibilities, in one sense, and one of those possiblities is shooting oneself in the foot.
Double Dee
Registered Member
Posts
20
Karma
0
Oh, by saying bristles I meant default pixel brush with bristle tip, not the engine. Sorry, I almost forgot about "Bristle" engine.
User avatar
radian
Registered Member
Posts
89
Karma
1
OS
As someone who tried Krita and Paintstorm I can say Krita have better "raw performance". By "raw performance" I mean how fast brushes without these things to speed up brush rendering (honestly I can't say for PS because I don't see the way to disable it).

If you need more speed there is always "Instant preview". It can a bit mess with small brushes but overall it's pretty powerful and stable mechanism, it's just need a little polish.

Btw paintstorm's gpu acceleration sometimes works weirdly or don't works at all. And in PS if you made slow brush it will be slow at any size.
User avatar
Quiralta
Registered Member
Posts
301
Karma
5
OS
Double Dee wrote:...GPU seems like a better tool than CPU...


This is not necessarily the case, a GPU can only outperform the CPU IF the former is a high end model, I would dare to say that about less than ten per cent of users has such hardware, most pre-build computers don't have this capabilities, and no integrated graphics has such power (most common type of gpu in modern pc).

This would only benefit a few people, for example using cuda on a low level gpu (and this I personally put it on a test) does "bottlenecks" the system, even on highly optimized apps like Blender, on imagemagick, well, it made me run away from cuda :D

At the end, I think that people would benefit the most from learning how to tweak their workflow rather than searching for a future optimization on the program, maybe we can design some "guidelines", (we are not talking about driver bugs though, that once fixed on their side, things should be normal again) referring on what is risky and slow like "heavy" brushes and some filters/effects. Many people would be tempted to use the most heavy combination of features right out of the box, resulting in a overflowed slow Krita, and thus a bad experience.


Self educated by a very bad teacher!
My Stuff
matiasngoldberg
Registered Member
Posts
1
Karma
0
boudewijn wrote:The problem with moving painting to the gpu is that you then also need to have your image data on the gpu. It is too slow to move image data from main memory to gpu memory for painting and then back. It would make everything faster, but gpu memory is limited compared to main memory and that limits the maximum image size and the number of layers.

I just registered to tell you you're wrong! ;D Nah, just kidding.

I'm a graphics dev and I tried Krita for fun and I was very disappointed by the performance of the brush tool at large sizes. I don't know how Krita compares to Photoshop and I don't care because I am certain brush painting can be much faster than this.

You are correct that moving data between GPU and CPU would complicate code and would certainly be very slow. I too think that would be unfeasible and crazy.
However I think you're looking things the wrong way. Rather than trying to accelerate the processing of the brush tool, you need to accelerate the perceived latency of the brush tool.

A simple solution would be the following algorithm: When the user switches layers to layer 'i':
  • On a background thread, merge layers [0; i) into a single layer and upload them to a GPU texture (let's call this the "background layer").
  • On a background thread, merge layers [i+1; N) into a single layer and upload them to a GPU texture (the "foreground layer").
  • On a background thread, upload layer i to another GPU texture (the active layer)
  • Merging is probably what's going to take the most time. Until the layers aren't uploaded to the GPU, the current code will be used to avoid blocking the UI and Krita every time the user switches layers
  • Switching between layers can be further sped up by keeping caches in RAM of blocks of layers, e.g. keep a cache of merged layers [0; 4) [4; 8 ) [8; 12); so if user switches to layer 2; you need to merge layers [0; 2) and then merge layer 3 with the caches of [4; 8 ) & [8; 12) which means merging 3 layers to generate the foreground layer instead of merging 8! [/list]

When the user starts the brush tool action (i.e. left clicks, uses his pen, etc):
  • Run a compute shader on the active layer to process each brush stroke, and display this to the user.
  • At the same time, the CPU will perform the current code to process a high quality final version of the brush stroke. Once it's done it will upload the results to the GPU. If the brush was used again before the upload finished, these results will be invalidated (I doubt this because uploading a single 4096x4096x32bpp texture is very fast!). Alternatively if the compute shader is already high quality, the data could be transferred back to CPU.

Additionally, users with monster GPUs (i.e. 8GBs of GPU RAM) could use a different codepath where each layer is uploaded separately instead of being merged and then uploaded.

The goals are:
  • Hide latency, rather than accelerate final processing. This means performing redundant work instead constantly copying data back and forth.
  • GPU acceleration is used for preview acceleration. The final result is done on CPU.
  • All transfers are CPU -> GPU.
  • No GPU -> CPU transfers (unless testing shows there's tangible gains from copying active layer i from GPU to CPU, obviously).
  • Avoid high memory consumption (a problem with 1GB consumer cards) by only having 3 layers (bonus: optionally more).
[/list]

Unfortunately I am already involved with enough open source projects to also get involved in Krita. But I wanted to raise the voice because I don't think this algorithm (or something similar) is crazy and has a reasonable chance of actually improve perceived latency by a very large margin. Sometimes all you need is to approach the problem from a different perspective. In a way, it's similar to Instant Preview.

Cheers
My 2 cents.
dkazakov
Registered Member
Posts
64
Karma
1
matiasngoldberg wrote:I just registered to tell you you're wrong! ;D Nah, just kidding.


What you tell is theoretically possible with quite a few limitations (see below). Although there are at least two major problems I have no idea how to overcome:

  • We need to write two copies of the whole pipeline code: one for CPU and the other for GPU. Both versions should provide pixel-to-pixel exact result. Otherwise we will still be bottle-necked by the CPU implementation
  • We will have to maintain these two exact copies of the same code

Rather than trying to accelerate the processing of the brush tool, you need to accelerate the perceived latency of the brush tool.


We already have Instant Preview feature which does exactly what you suggest ("perceived latency") without even bothering about GPU. The problem is that "painting with brush" is not the only thing that painters do. Therefore you should either port *all* Krita functionality to that "perceived latency" engine, or the user will have to wait hours on barriers before every non-optimized action. Or the "perceived latency" engine should provide pixel-to-pixel-exact result, which leads to the problems I listed above.

[*] On a background thread, merge layers [0; i) into a single layer and upload them to a GPU texture (let's call this the "background layer").


Yes, the idea is fun and I proposed it as a research project for GSoC for several times already. There is one small thing that shatters all our dreams: all the blending modes except "Normal" are not associative. Even for "Normal" it is not always true (alpha-channel saturation case, which should be handled separately).

That is, for most of the blending modes:
(layer1 + layer2) + layer3 != layer1 + (layer2 + layer3)


At the same time, the CPU will perform the current code to process a high quality final version of the brush stroke.


That is exactly what "Instant preview" does now. The problem is not in "perceived latency", which is high, but in the barriers which are needed to sync with high quality version.

Unfortunately I am already involved with enough open source projects to also get involved in Krita. But I wanted to raise the voice because I don't think this algorithm (or something similar) is crazy and has a reasonable chance of actually improve perceived latency by a very large margin. Sometimes all you need is to approach the problem from a different perspective. In a way, it's similar to Instant Preview.


I hope now you understand that we have already spent quite a lot of time on "approaching the problem from a different perspective" :) The problem just needs human-hour on further research of extremely weird effects (associativity) and coding.


Bookmarks



Who is online

Registered users: bancha, Bing [Bot], Evergrowing, Google [Bot], lockheed, mesutakcan