This forum has been archived. All content is frozen. Please use KDE Discuss instead.

cpu spinning in mutex lock / unlock

Tags: None
(comma "," separated)
wolf550e
Registered Member
Posts
3
Karma
0

cpu spinning in mutex lock / unlock

Thu Oct 21, 2010 12:54 pm
KTorrent 4.0.4 with libktorrent 1.0.4 on Gentoo Linux ~amd64 in system tray with uTP enabled uses ~15% cpu on an Intel Core 2 2Ghz machine. With less torrents it's closer to 10% but not lower than that.

perf (a sampling profiler that is part of the linux kernel source) shows this for the process:
Code: Select all
12.97%  ktorrent  libpthread-2.12.1.so        [.] pthread_mutex_lock
7.74%  ktorrent  libpthread-2.12.1.so        [.] __pthread_mutex_unlock_usercnt
3.63%  ktorrent  libglib-2.0.so.0.2400.2     [.] IA__g_main_context_prepare
3.26%  ktorrent  libglib-2.0.so.0.2400.2     [.] IA__g_main_context_check
2.63%  ktorrent  [kernel.kallsyms]           [k] fput
2.54%  ktorrent  libglib-2.0.so.0.2400.2     [.] g_main_context_iterate.clone.5
2.51%  ktorrent  [kernel.kallsyms]           [k] do_sys_poll
1.85%  ktorrent  [kernel.kallsyms]           [k] fget_light
1.75%  ktorrent  libQtCore.so.4.6.3          [.] QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)

etc.

I wanted to see why it spends so much time locking and unlocking mutexes.
I've instrumented qtcore and libktorrent using gcc's -finstrument-functions and a short session showed these locations making the calls to QMutex::lock(), sorted by accumulated time spent in that function.

Two million clock ticks (first column) is a millisecond, last column is offset of call instruction in function (useless to you):
Code: Select all
965393352       utp::PollPipe::readyToWakeUp(utp::Connection*) const    /usr/lib64/libktorrent.so.2.0.3         490
197053472       locking_function        /usr/lib64/qt4/libQtNetwork.so.4.6.3    530
190385580       utp::Connection::isWriteable() const    /usr/lib64/libktorrent.so.2.0.3         450
115364711       QApplication::font()    /usr/lib64/qt4/libQtGui.so.4.6.3        156
95687606        locking_function        /usr/lib64/qt4/libQtNetwork.so.4.6.3    340
89771017        net::BufferedSocket::sendOutputBuffer(unsigned int, unsigned long long)         /usr/lib64/libktorrent.so.2.0.3         164
80664031        QFontPrivate::engineForScript(int) const        /usr/lib64/qt4/libQtGui.so.4.6.3        212
69754524        bt::PacketWriter::hasBytesToWrite() const       /usr/lib64/libktorrent.so.2.0.3         434
68140296        utp::UTPServer::wakeUpPollPipes()       /usr/lib64/libktorrent.so.2.0.3         1682
52987102        utp::UTPServer::preparePolling(net::Poll*, net::Poll::Mode, utp::Connection*)   /usr/lib64/libktorrent.so.2.0.3         2049
45446520        utp::PollPipe::prepare(net::Poll*, unsigned short, QSharedPointer<utp::PollPipe>)       /usr/lib64/libktorrent.so.2.0.3         825
45186169        bt::PacketWriter::onReadyToWrite(unsigned char*, unsigned int)  /usr/lib64/libktorrent.so.2.0.3         1683
25280136        net::BufferedSocket::getUploadRate() const      /usr/lib64/libktorrent.so.2.0.3         38
23712150        net::BufferedSocket::getDownloadRate() const    /usr/lib64/libktorrent.so.2.0.3         38
20086038        QMetaObject::addGuard(QObject**)        /usr/lib64/qt4/libQtCore.so.4.6.3       986
16842938        QMetaObject::removeGuard(QObject**)     /usr/lib64/qt4/libQtCore.so.4.6.3       1050
13428226        bt::PacketReader::dequeuePacket()       /usr/lib64/libktorrent.so.2.0.3         810
12862073        bt::PacketWriter::getUploadedDataBytes() const  /usr/lib64/libktorrent.so.2.0.3         366
12650190        net::BufferedSocket::updateSpeeds(unsigned long long)   /usr/lib64/libktorrent.so.2.0.3         54
12498934        QVariantAnimationPrivate::getInterpolator(int)  /usr/lib64/qt4/libQtCore.so.4.6.3       390
12073782        QObject::~QObject()     /usr/lib64/qt4/libQtCore.so.4.6.3       1303
11882339        QObject::~QObject()     /usr/lib64/qt4/libQtCore.so.4.6.3       1752
11573429        QObject::~QObject()     /usr/lib64/qt4/libQtCore.so.4.6.3       1760
8260917         bt::PacketWriter::getNumDataPacketsToWrite() const      /usr/lib64/libktorrent.so.2.0.3         742
7065597         QFactoryLoader::instance(QString const&) const  /usr/lib64/qt4/libQtCore.so.4.6.3       1865
6941256         q_resolveOpenSslSymbols()       /usr/lib64/qt4/libQtNetwork.so.4.6.3    419
6518638         QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*)  /usr/lib64/qt4/libQtCore.so.4.6.3       1977
6482952         QSslSocketPrivate::ensureInitialized()  /usr/lib64/qt4/libQtNetwork.so.4.6.3    593
5496491         QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*)  /usr/lib64/qt4/libQtCore.so.4.6.3       2689
5212109         net::SocketMonitor::remove(net::BufferedSocket*)        /usr/lib64/libktorrent.so.2.0.3         683
4880540         QNetworkProxy::QNetworkProxy()  /usr/lib64/qt4/libQtNetwork.so.4.6.3    418
4822279         utp::Connection::close()        /usr/lib64/libktorrent.so.2.0.3         338
3163335         utp::UTPServer::connectTo(net::Address const&)  /usr/lib64/libktorrent.so.2.0.3         1070
3139402         utp::UTPServer::attach(utp::UTPSocket*, utp::Connection*)       /usr/lib64/libktorrent.so.2.0.3         595
3019856         utp::PollPipe::reset()  /usr/lib64/libktorrent.so.2.0.3         811
2751336         KLocalizedStringPrivate::toString(KLocale const*, QString const&) const         /usr/lib64/libkdecore.so.5.5.0  1512
2669961         QMetaObjectPrivate::disconnect(QObject const*, int, QObject const*, int, QMetaObjectPrivate::DisconnectType)    /usr/lib64/qt4/libQtCore.so.4.6.3       210
2527791         QObject::moveToThread(QThread*)         /usr/lib64/qt4/libQtCore.so.4.6.3       525
2496935         QGlobalNetworkProxy::proxyForQuery(QNetworkProxyQuery const&)   /usr/lib64/qt4/libQtNetwork.so.4.6.3    892
2443658         QObject::moveToThread(QThread*)         /usr/lib64/qt4/libQtCore.so.4.6.3       540
2324532         QObjectPrivate::clearGuards(QObject*)   /usr/lib64/qt4/libQtCore.so.4.6.3       1065
2264619         utp::UTPServer::detach(utp::UTPSocket*, utp::Connection*)       /usr/lib64/libktorrent.so.2.0.3         603
2139473         net::SocketMonitor::lock()      /usr/lib64/libktorrent.so.2.0.3         46
1990450         utp::Connection::handleTimeout()        /usr/lib64/libktorrent.so.2.0.3         451
1847713         QMetaObjectPrivate::disconnect(QObject const*, int, QObject const*, int, QMetaObjectPrivate::DisconnectType)    /usr/lib64/qt4/libQtCore.so.4.6.3       195
1621116         invalidateEngine(QRegExpPrivate*)       /usr/lib64/qt4/libQtCore.so.4.6.3       674
1600809         prepareEngine_helper(QRegExpPrivate*)   /usr/lib64/qt4/libQtCore.so.4.6.3       586
1503012         dht::RPCServerThread::run()     /usr/lib64/libktorrent.so.2.0.3         1204
1354829         bt::CacheFile::map(bt::MMappeable*, unsigned long long, unsigned int, bt::CacheFile::Mode)      /usr/lib64/libktorrent.so.2.0.3         932
1311345         QMetaObject::changeGuard(QObject**, QObject*)   /usr/lib64/qt4/libQtCore.so.4.6.3       1539
1299704         bt::PacketWriter::queuePacket(bt::Packet*)      /usr/lib64/libktorrent.so.2.0.3         1074
1281609         net::WakeUpPipe::wakeUp()       /usr/lib64/libktorrent.so.2.0.3         490
1119265         dht::RPCServer::handlePackets()         /usr/lib64/libktorrent.so.2.0.3         2883
1064609         QDBusConnectionPrivate::doDispatch()    /usr/lib64/qt4/libQtDBus.so.4.6.3       34
1002701         QObject::sender() const         /usr/lib64/qt4/libQtCore.so.4.6.3       422
1002309         QReadWriteLock::lockForWrite()  /usr/lib64/qt4/libQtCore.so.4.6.3       386

I have made a single simple patch to the code that used to be at the top of the list (STL list::size() is O(n) while list::empty() is O(1), also if you convert "(n1+n2)>0" to "n1||n2" you can short circuit).
Code: Select all

--- src/peer/packetwriter.h.old
+++ src/peer/packetwriter.h
@@ -143,9 +143,6 @@
       /// Send an extended protocol message
       void sendExtProtMsg(Uint8 id,const QByteArray & data);
 
-      /// Get the number of packets which need to be written
-      Uint32 getNumPacketsToWrite() const;
-      
       /// Get the number of data packets to write
       Uint32 getNumDataPacketsToWrite() const;
       
--- src/peer/packetwriter.cpp.old
+++ src/peer/packetwriter.cpp
@@ -283,7 +283,8 @@
    
    bool PacketWriter::hasBytesToWrite() const
    {
-      return getNumPacketsToWrite() > 0;
+      QMutexLocker locker(&mutex);
+      return !data_packets.empty() || !control_packets.empty();
    }
    
    Uint32 PacketWriter::getUploadedDataBytes() const
@@ -302,12 +303,6 @@
       return ret;
    }
    
-   Uint32 PacketWriter::getNumPacketsToWrite() const
-   {
-      QMutexLocker locker(&mutex);
-      return data_packets.size() + control_packets.size();
-   }
-   
    Uint32 PacketWriter::getNumDataPacketsToWrite() const
    {
       QMutexLocker locker(&mutex);



This contribution is small, as a cursory look has not revealed to me other such low hanging fruit. I've probably missed a few as I have not actually studied the code.

Could the developers please ponder this cpu usage issue? I'm sure it can be reduced in half without removing features. Please use a profiler or instrument the code yourselves. I am willing to run test, test patches, etc.
George
Moderator
Posts
5421
Karma
1
Quite surprising that they don't cache the amount of elements in std::list, it's trivial to do.

You should try trunk, the µTP polling code has been improved somewhat, eliminating a timer and much locking which happened every time the timer fired.
wolf550e
Registered Member
Posts
3
Karma
0
Quite surprising that they don't cache the amount of elements in std::list, it's trivial to do.

From the horse's mouth:
size_type size() const
Returns the size of the list. Note: you should not assume that this function is constant time. It is permitted to be O(N), where N is the number of elements in the list. If you wish to test whether a list is empty, you should write L.empty() rather than L.size() == 0.

The rationale is that if you require a kept count, you can very easily wrap the list yourself but you can't easily remove the space and time overhead from the provided implementation.

The GNU libstdc++ doxygen says:
Returns the number of elements in the list.

Definition at line 833 of file stl_list.h.

References std::begin(), __gnu_cxx::distance(), and std::end().

Without reading the code you can see it's an O(n) design.

Anyway, I've tried to checkout trunk from svn and anonsvn.kde.org is too slow and timed out a number of times. Is there a better mirror? Possibly using a better SCM? Or tips on how/when to get better performance from the kde svn repo?

EDIT: Got a checkout. I'll measure the improvement and report here.

EDIT2: It is significantly better. About twice lower cpu usage, in fact. Good job!
George
Moderator
Posts
5421
Karma
1
I'm planning another optimization round for 4.1. Hoping to shave of some more CPU usage.


Bookmarks



Who is online

Registered users: Bing [Bot], daret, Google [Bot], sandyvee, Sogou [Bot]