This forum has been archived. All content is frozen. Please use KDE Discuss instead.

KWin performance very slow compared to Windows Aero

Tags: None
(comma "," separated)
User avatar
soumyadeep
Registered Member
Posts
12
Karma
0
OS
@OP: I have a fairly old cpu(AMD Athlon x2 4400+) and gpu(8600gt). When using ondemand cpu governor which scales down cpu freq(like ur nvidia card freq is scaled down) when idle I find noticeable performance lags with kwin. So I changed the governor's threshold so that it would raise the cpu freq with lower cpu usage:by default its 95%, I changed it to 40%. So now my cpu frequency scales quickly to higher values.
This has done wonders for me when comes to kwin performance. Try it, see if it helps, and also search if nvidia drivers also support this features with their gpu. Oh and my gpu doesn't support scaling down freqs.
RealNC
Registered Member
Posts
20
Karma
0
toad wrote:From the nvidia site it appears you using the wrong driver - the 270 version is the correct one for your video card.

I'm sure you're mistaken. 270 is a one year old driver version that supports the same hardware as the newer version.

Also I'd suggest using the nouveau driver rather than the nvidia one - it plays much nicer with xrandr.

Nouveau is very slow with Wine and VMWare. Also, I'm not using xrandr, I use nvidia-settings.
luebking
Karma
0
a) how's performance on the XRender backend (kcmshell4 kwincompositing, 3rd tab)
b) how's performance w/o v'syncing
c) How's performace w/o compositing at all
d) tried to flush the pixmap cache? [1] (should imapct w/o further action within less than a second)
e) tried other pixmap placement strategies? [2] (requires a restart of "kwin --replace &" BE AWARE THAT THIS CAN CAUSE THE DESKTOP TO BECOME UNRESPONSIVE FOR SOME VALUES. Ensure you can change to VT1, "export DISPLAY=:0", change the value and restart kwin from there)

Windows simply introduced Direct2D, compositing on X11 requires a 3 step indirection (and -so far- an additional step for the window decoration for other reasons ...) what is rather cumbersome and makes memory throughput a bottleneck.
A modern approach to the problem is wayland where applications render directly into the buffer used by the compositor for onscreen display.

[1] "nvidia-settings -a PixmapCache=0; nvidia-settings -a PixmapCache=1"
[2] "nvidia-settings -a InitialPixmapPlacement=n" | n = [0,4]
RealNC
Registered Member
Posts
20
Karma
0
luebking wrote:a) how's performance on the XRender backend (kcmshell4 kwincompositing, 3rd tab)

Just as slow. Perhaps a bit slower, even.

b) how's performance w/o v'syncing

Moving windows around is fluid, but everything else is still slow.

c) How's performace w/o compositing at all

Blazingly fast.

d) tried to flush the pixmap cache? [1] (should imapct w/o further action within less than a second)

No changes in performance.

e) tried other pixmap placement strategies? [2]

Also no changes. Except that some seem to be even slower.
User avatar
toad
Global Moderator
Posts
1258
Karma
7
OS
http://www.nvidia.com/object/linux-disp ... river.html

You'll find your card listed for the 270 driver. Just because you've got an nvidia card doesn't mean the latest driver version is the correct one for you.


Debian testing
RealNC
Registered Member
Posts
20
Karma
0
toad wrote:http://www.nvidia.com/object/linux-display-ia32-270.41.06-driver.html

You'll find your card listed for the 270 driver. Just because you've got an nvidia card doesn't mean the latest driver version is the correct one for you.

My card is listed in a lot of versions, including the latest one:

http://www.nvidia.com/object/linux-disp ... river.html

I don't think I should be using outdated drivers.
luebking
Karma
0
c) How's performace w/o compositing at all

Blazingly fast.


If eg. moving windows w/o compositing is faster than w/ compositing for you, something on your system is terribly screwed, since the backbuffer isn't in place for the nvidia driver by default and even with the exposure roundtrip and client repaints would be FAR more expensive.

Is this a multiscreen setup? What resolution are we talking about anyway (and the GPU is a GTX560, PCIe16 slot

You are btw. aware that you can configure memory clock and GPU clock independently? Because of the indirections, the memory clock will likely *have* to be at least twice as high as on windows - or MS is stupid ;-)
RealNC
Registered Member
Posts
20
Karma
0
luebking wrote:
c) How's performace w/o compositing at all
Blazingly fast.
If eg. moving windows w/o compositing is faster than w/ compositing for you, something on your system is terribly screwed, since the backbuffer isn't in place for the nvidia driver by default and even with the exposure roundtrip and client repaints would be FAR more expensive.
I've no idea what you just said :P How would I find out what's wrong with my setup?

But I should mention that moving windows around with compositing enabled and VSync disabled is just as fast as without compositing. But *only* moving windows around. Of course when disabling compositing, you don't have any effects so it's impossible to compare anything else than just moving windows around.

Is this a multiscreen setup? What resolution are we talking about anyway (and the GPU is a GTX560, PCIe16 slot
No, it's a single monitor, connected through DVI-D. 1920x1080.

You are btw. aware that you can configure memory clock and GPU clock independently? Because of the indirections, the memory clock will likely *have* to be at least twice as high as on windows - or MS is stupid ;-)
Sadly, manual clocking does not seem to be possible on Fermi hardware. You get three pre-configured clock profiles to chose from, but can't modify them.
luebking
Karma
0
RealNC wrote:I've no idea what you just said :P How would I find out what's wrong with my setup?


Sorry, happens ;-)
X11 is "flat" ie. w/o a compositor, whenever a part of a window becomes visible (what happens by one px column and row while moving another window above) the window gets a message "please repaint yourself for this area", what can - depending on the window content be extremely costly, is usually always more expensive than just painting a pixmap and causes a roundtrip (server sends a message to the client (the window), client handles that event and sends another message to the server which then does the actual on screen update)
While this was nice in the late 80ies and early 90ies (when your GPU accessed up 256kB VRAM) it became really nasty later on, so the X11 drivers introduced the "BackingStore", which, when enabled, made use of the growing memory on the graphicscards to save parts of the entire screen or the windows in some sort of layered cache.
Since compositors do the exact same (among other stuff) the system is nowerdays usually not in use and is not by default for the nvidia driver since years (i don't even know whether it's still supported at all)

But I should mention that moving windows around with compositing enabled and VSync disabled is just as fast as without compositing. But *only* moving windows around.


You probably loose frames in the sync because the system cannot keep up with the refresh rate, what makes things choppy.
What in particular remains slow? "Fullscreen" effects like coverswitch or present windows? They cause an entire screen repaint for each update, what gets you into the memory trap.

No, it's a single monitor, connected through DVI-D. 1920x1080.

It's a real 560GTX, not some capped **** with a similar name or so?
75-100MHz on a 256 bit bus *should* be sufficient. 50MHz won't work.
(for 128bit you need to double that number, for 64bit you go and slam whoever sold you that card ;-)

Sadly, manual clocking does not seem to be possible on Fermi hardware. You get three pre-configured clock profiles to chose from, but can't modify them.


Humm what?
Have you enabled CoolBits in xorg.conf?
Section "Device"
Option "Coolbits" "1"

next do
nvidia-settings -a GPUOverclockingState=1 # enable "over"clocking
nvidia-settings -a GPU3DClockFreqs=50,100 # 50MHz on the GPU, 100MHz on the RAM
nvidia-settings -a GPU2DClockFreqs=50,100 # Same but for 2D engine, it's likely linked anyway

You should also have a fancy UI in "nvidia-settings" but this way you don't have to call it and can change clocking with a plasmoid or so.

Have you btw. played around in nvidia-settings otherwise, such as esp. enabled some of the global overrides for VSYNC, Antialiasing or Aniso filtering?

Flipping should be enabled, you can vsync both xv adaptors, but texture sharpenig does NOT come for free - use image sharpening in the monitor settings if you want such.
On demand VBlank Interrups are a matter to test.

After changing anything but the clock rates, restart "kwin --replace&"
bvitnik
Registered Member
Posts
3
Karma
0
mgraesslin wrote:concerning improving the performance I recommend to follow the advice on Desktop Effects Performance

But in general most has already been said: we are not in the control of the drivers and are not in a position to have the vendors get a certification to be "KWin compliant" which the vendor can then stick on their GPU.

We are working hard on improving the performance and lots of work has gone in making KWin always stay in lowest level on NVIDIA GPUs.


I have nVidia GeForce GTS 450 GPU with similar adaptive GPU clocking. There are three "Performance Levels" as nVidia calls them:

- Performance Level 0 = GPU @ 50MHz
- Performance Level 1 = GPU @ 405MHz
- Performance Level 2 = GPU @ 830MHz

I recently did some experimenting with Windows 7 compozitor, Kwin and Compiz mostly regarding moving windows around and minimize/restore effect ("Minimize Animation" in Kwin, "Animations" plugin - "Zoom" effect in Compiz). I limited my experimenting to these use cases because Window 7 compozitor has only a limited number of effects. Im using Kubuntu 12.04, KDE 4.8.2 and nVidia 295.40 binary blob. Here are my observations:

1) Windows 7 compozitor
- GPU is always @ 50MHz. Moving or minimizing/restoring a window does not affect GPU clock. Only if I use Aero Flip 3D effect (Win key+Tab), GPU gets to 830MHz.
- GPU @ 50MHz. Moving a window, regardless of it's size, and minimizing and restoring are smooth, presumably at 60 fps, and there is no tearing.
- GPU utilization is less than 20% and rarely goes up to 40% @ 50MHz.

2) Kwin
- GPU @ 50MHz, "Translucency" effect is on, Oxigen decoraton shadow and animations are on, VSync is on. Moving a window, regardless of it's size, IS NOT smooth and there is occasional tearing. GPU stays @ 50MHz all the time. It seems that moving a window doesn't get GPU in any higher performance level. "Show FPS" shows 60 when everything is still and 30 when I'm moving a window. In effect, GPU is locked to 50MHz and Kwin locked to 30 fps. Beside that, there is a small "hold up" when I start moving a window. It stops when Oxygen decoration shadow finishes the animation. I can get GPU to work at higher performance profile only if I do something like minimize/restore or resize of a window.
- GPU @ 50MHz, "Translucency" effect is on, Oxigen decoration shadow and animations are OFF, VSync is on. Same as before. There is no "hold up" caused by shadow animation.
- GPU @ 50MHz, "Translucency" effect is OFF, Oxigen decoration shadow and animations are OFF, VSync is on. Same as before. Turning "Translucency" OFF has no noticable effect.
- GPU @ 50MHz, "Translucency" effect is OFF, Oxigen decoration shadow and animations are OFF, VSync is OFF. As soon as I start moving the window, GPU goes to 830MHz and motion is smooth but there is tearing. Same thing happens if I disable compositing completely.
- GPU @ 50MHz, "Translucency" effect is OFF, Oxigen decoration shadow and animations are OFF, VSync is on. At start, window minimize animation IS NOT smooth for a fraction of second. As soon as GPU goes to 830MHz, minimize and restore animations are smooth.
- GPU @ 405MHz = same as GPU @ 50MHz. Still not enough power?
- GPU @ 830MHz. Moving a window and minimize and restore animations are smooth, 60fps and no noticable tearing.

3) Compiz
- GPU @ 50MHz, window is opaque, decoration shadow is on, VSync is on. Moving a window, regardless of it's size, IS smooth, presumably at 60fps, and there is no tearing. GPU stays @ 50MHz all the time. Suprised?
- GPU @ 50MHz, window has about 70% opacity ("Opacity, Brightness and Saturation" plugin, simulating "Translucency" Kwin effect), decoration shadow is on, VSync is on. Moving a small window IS smooth, presumably at 60fps, and there is no tearing. Moving a large window, immidiately gets GPU to 830MHz and, as expected, motion IS smooth, presumably at 60fps, and there is no tearing.
- GPU @ 50MHz, window is opaque, decoration shadow is on, VSync is on. Animation of minimizing and restoring a window IS NOT smooth. GPU stays at 50MHz all the time.

I can conclude that window translucency rises GPU utilization in Compiz. Disabling translucency in Compiz has a positive effect on GPU utilization. On the other hand, disabling "Translucency" effect in Kwin has NO positive effect on GPU utilization. Maybe something regarding that in Kwin is suboptimal? Even with "bad" drivers, X server being cr*p etc., Compiz can run some simple "things" smoothly with GPU @ 50MHz. Why Kwin can't? At least moving an opaque window should be smooth with GPU @ 50MHz.

Last edited by bvitnik on Tue May 01, 2012 4:45 pm, edited 1 time in total.
luebking
Karma
0
Sounds like your sync is locked to 30FPS, check
kreadconfig --file kwinrc --group Compositing --key MaxFPS
kwriteconfig --file kwinrc --group Compositing --key MaxFPS 100
will raise it. ideally try kwin from git master and apply https://git.reviewboard.kde.org/r/103058/
mgraesslin
KDE Developer
Posts
572
Karma
7
OS
you also nicely discovered that Oxygen window decoration is slow and slows down the KWin animations. We are aware of that problem and I plan to work on it for the next release cycle, so that it does not have impact on KWin any more.
bvitnik
Registered Member
Posts
3
Karma
0
OK. I did some more testing.

'kreadconfig --file kwinrc --group Compositing --key MaxFPS' returned nothing. There was no 'MaxFPS' key in '~/.kde/share/config/kwinrc' so I did 'kwriteconfig --file kwinrc --group Compositing --key MaxFPS 100' and this is what happened:

- GPU @ 50MHz, VSync is on. When moving window slowly, fps was going from 60 to 45. If I start moving window faster, GPU immediately goes to 830MHz, fps goes to 60 and motion becomes smooth. No tearing. I could not reproduce the case where GPU stays @ 50MHz all the time and fps locks at 30.

After that, I removed 'MaxFPS' from kwinrc, restarted the system (just in case) and now Kwin behaves erraticaly. I can't reproduce that 50MHz/30fps lockup as often as before. VSync is still on. Earlier, I could basicaly reproduce it every time. Even if the lockup happens, there is no tearing any more. Most of the time now, GPU goes to 830MHz and, ofcourse, everything is smooth. Earlier I was experimenting with Compiz and Kwin, switching one with the other on the fly, testing different effects etc. Maybe something in Compiz caused Kwin to do 50MHz/30fps lockups all the time. On the other hand, even Windows could influence Kwin. I have some weird bug. When I boot Kubuntu, then reboot to Windows, then reboot again, I get some weird graphical corruption in POST. If I press Ctrl+Alt+Del in POST, at next POST corrupton is gone. Inconsistency between Windows and Linux drivers? Who knows.

In the end, 50MHz is not enough for Kwin to smoothly move windows even without translucency and shadows and no effects, which is a shame.

Sadly, I never had enough time to familiarise myself with git and other version control systems, especialy when it comes to KDE projects, so building Kwin from master branch, applying patches etc. is still an area of mistery for me. If I get some free time in near future, I'll try building Kwin from git and do some testing.

As far as Oxygen decoration being slow goes, my testing shows that 'raster' QT graphics system backend has considerable (negative) impact on performance of Oxygen decoration. It can't be easily seen on modern computers, but on older computers and probably netbooks and other less powerfull computers it can be seen. First of all, when compositing is turned on, negative impact on performance can't be seen so easily on any machine. Maybe it can be seen only when resizing windows. That's expected since there is no redrawing of decoration so often as there is when compositing is turned off. When compositing is turned off, "slownes" can be easily seen when moving windows too. For example, take two konsole windows. One big and one small. If you move small window inside the bigger one, motion is smooth. As soon as you touch window decoration of bigger window, motion starts to lag. Basicaly, whenever you cover a part of decoration and then reveal it, redraw has to be done and because of some weird reason, it's very slow. There is also a noticable black trail on window decoration left behind a moving window. This all happens if QT graphics system backend is 'raster' (which is default as of Qt 4.8'). Raster usualy has positive effect on performance when it comes to drawing window content, even on low powered machines, but as fas as Oxygen deco is concerned, its totaly opposite. If you run Kwin with 'native' QT graphics system backend (eg. kwin --replace --graphicssystem native) it runs way smoother.

P.S. Resizing of windows is way smoother if you disable window deco (Alt+F3->Advanced->No Boarder, Alt+Right Mouse Click to resize) with or without compositing ;).
luebking
Karma
0
> Earlier I was experimenting with Compiz and Kwin, switching one with the other on the fly
Esp. when doing so ensure that the compiz decorator (emerald?) isn't running anymore

There's possibly also misdetection of the screen vertical frequency (for nvidia) which should be covered in git master by another commit.

if you've kdelibs-dev and kde-workspace-dev packages installed, all you need to do is
git clone git://anongit.kde.org/kde-workspace.git
cd kde-workspace
git checkout master
[download the patch]
patch -p1 < patch.diff
(notice that it may be p2, p3 or whatever, the patch should just apply without asking for what file to patch, than it's the wrong offset)
mkdir build; cd build
cmake -DCMAKE_INSTALL_PREFIX=$PREFIX ..
XOR
cmake -DCMAKE_INSTALL_PREFIX=$PREFIX -DCMAKE_BUILD_TYPE=Release ..
(builds & strips and stuff)
ccmake ..
[adjust build configuration, mainly you can disable everything BUILD_* but BUILD_kwin]
make && sudo make install
[wait, enter password for sudo]
kwin --replace &
-> profit
bvitnik
Registered Member
Posts
3
Karma
0
luebking wrote:...
if you've kdelibs-dev and kde-workspace-dev packages installed, all you need to do is
git clone git://anongit.kde.org/kde-workspace.git
cd kde-workspace
git checkout master
[download the patch]
patch -p1 < patch.diff
(notice that it may be p2, p3 or whatever, the patch should just apply without asking for what file to patch, than it's the wrong offset)
mkdir build; cd build
cmake -DCMAKE_INSTALL_PREFIX=$PREFIX ..
XOR
cmake -DCMAKE_INSTALL_PREFIX=$PREFIX -DCMAKE_BUILD_TYPE=Release ..
(builds & strips and stuff)
ccmake ..
[adjust build configuration, mainly you can disable everything BUILD_* but BUILD_kwin]
make && sudo make install
[wait, enter password for sudo]
kwin --replace &
-> profit


Will try. Thanks 8) .


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], q.ignora, watchstar