This forum has been archived. All content is frozen. Please use KDE Discuss instead.

oprofile results (1 week)

Tags: None
(comma "," separated)
clayne
Registered Member
Posts
16
Karma
0

oprofile results (1 week)

Sun May 20, 2007 11:00 pm
Notes:
ktorrent-svn, 2048 kbps down, 1600 kbps up (mostly up used), KDE 3.5.6 (svn, haven't updated in a couple of months but should be irrelevant). qt-copy from 3.5.6 branch.

Code: Select all
CFLAGS="-Os -fomit-frame-pointer -funit-at-a-time -march=prescott -mtune=prescott -mfpmath=sse -msse -msse2 -msse3 -mmmx -m3dnow -pipe -g"
CXXFLAGS="${CFLAGS}"

# gcc --version
gcc-stable (GCC) 4.2.0 20070213 (prerelease)

# opcontrol --status
Daemon running: pid 4778
Event 0: GLOBAL_POWER_EVENTS:50000:1:1:1
Separate options: none
vmlinux file: /usr/src/kernels/linux-2.6.21.1/vmlinux
Image filter: /usr/local/kde/bin/ktorrent
Call-graph depth: 0

[root@ns1 ~]# opreport /usr/local/kde/bin/ktorrent
CPU: P4 / Xeon, speed 2143.87 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 50000
GLOBAL_POWER_E...|
  samples|      %|
------------------
7167614622 100.000 ktorrent
        GLOBAL_POWER_E...|
          samples|      %|
        ------------------
        3552984049 49.5700 libqt-mt.so.3.3.8
        1311145289 18.2926 vmlinux
        654859107  9.1364 libc-2.3.4.so
        637062144  8.8881 libktorrent-2.2dev.so
        160683202  2.2418 libkdeui.so.4.2.0
        142278423  1.9850 libX11.so.6.2.0
        130296845  1.8179 libpthread-2.3.4.so
        122332284  1.7067 libkdecore.so.4.2.0
        112040028  1.5631 ktorrent
        110317090  1.5391 libstdc++.so.6.0.9
         80247851  1.1196 libXft.so.2.1.2
         40424380  0.5640 libXrender.so.1.3.0
         25626146  0.3575 libkdefx.so.4.2.0
         14947149  0.2085 libm-2.3.4.so
         11095425  0.1548 libkio.so.4.2.0
         10385406  0.1449 plastik.so
          9441509  0.1317 libgcc_s.so.1
          7803294  0.1089 anon (tgid:25258 range:0xb7fbb000-0xb7fbc000)
          7586997  0.1059 ip_tables
          7239298  0.1010 sch_htb
          5925866  0.0827 anon (tgid:26164 range:0xb7faf000-0xb7fb0000)
          2677200  0.0374 sch_sfq
          2226714  0.0311 8139too
          2151729  0.0300 libkparts.so.2.1.0
          1737086  0.0242 ktinfowidgetplugin.so
          1408965  0.0197 anon (tgid:12636 range:0xb7ff8000-0xb7ff9000)
           605601  0.0084 xt_tcpudp
           557251  0.0078 xt_length
           549006  0.0077 iptable_mangle
           349514  0.0049 iptable_filter
           275649  0.0038 ktlogviewerplugin.so
           219475  0.0031 xt_CLASSIFY
            46940 6.5e-04 ld-2.3.4.so
            42418 5.9e-04 libz.so.1.2.3
            30005 4.2e-04 libpng12.so.0.16.0
             8830 1.2e-04 oprofile
             2717 3.8e-05 libfontconfig.so.1.1.0
             1947 2.7e-05 libDCOP.so.4.2.0
             1120 1.6e-05 libXcursor.so.1.0.2
              335 4.7e-06 libfreetype.so.6.3.8
              159 2.2e-06 libexpat.so.1.5.0
               48 6.7e-07 ktscanfolderplugin.so
               40 5.6e-07 libnss_db.so.2.0.0
               12 1.7e-07 libkhtml.so.4.2.0
                8 1.1e-07 libdl-2.3.4.so
                8 1.1e-07 ktpartfileimportplugin.so
                7 9.8e-08 libXrandr.so.2.0.0
                6 8.4e-08 libICE.so.6.3.0
                6 8.4e-08 ktsearchplugin.so
                5 7.0e-08 ktrssfeedplugin.so
                5 7.0e-08 libkjs.so.1.2.0
                4 5.6e-08 ISO8859-1.so
                4 5.6e-08 ktschedulerplugin.so
                4 5.6e-08 ktwebinterfaceplugin.so
                3 4.2e-08 ktipfilterplugin.so
                3 4.2e-08 libkdeprint.so.4.2.0
                3 4.2e-08 libkdesu.so.4.2.0
                2 2.8e-08 af_packet
                2 2.8e-08 libXt.so.6.0.0
                2 2.8e-08 ktupnpplugin.so
                1 1.4e-08 libnss_files-2.3.4.so
                1 1.4e-08 libSM.so.6.0.0
                1 1.4e-08 libXext.so.6.4.0
                1 1.4e-08 libXinerama.so.1.0.0
                1 1.4e-08 libkutils.so.1.2.0
                1 1.4e-08 libkwalletclient.so.1.0.1
                1 1.4e-08 libpcreposix.so.0.0.0

# opreport -g -e .plt -t 1 -l /usr/local/kde/lib/libktorrent-2.2dev.so
CPU: P4 / Xeon, speed 2143.87 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 50000
warning: some functions compiled without debug information may have incorrect source line attributions
samples  %        linenr info                 image name               symbol name
55913361  9.5351  qvaluelist.h:264            libktorrent-2.2dev.so    QValueListPrivate<QPair<unsigned int, unsigned long long> >::QValueListPrivate(QValueListPrivate<QPair<unsigned int, unsigned long long> > const&)
40461718  6.9001  qvaluelist.h:562            libktorrent-2.2dev.so    QValueList<QPair<unsigned int, unsigned long long> >::detach()
29510375  5.0325  uploadthread.cpp:42         libktorrent-2.2dev.so    net::UploadThread::update()
28945727  4.9362  qvaluelist.h:299            libktorrent-2.2dev.so    QValueListPrivate<QPair<unsigned int, unsigned long long> >::remove(QValueListIterator<QPair<unsigned int, unsigned long long> >)
19609315  3.3440  qvaluelist.h:474            libktorrent-2.2dev.so    QValueList<QPair<unsigned int, unsigned long long> >::begin()
16589627  2.8291  rc4encryptor.cpp:71         libktorrent-2.2dev.so    mse::RC4Encryptor::RC4Encryptor(bt::SHA1Hash const&, bt::SHA1Hash const&)
16020669  2.7321  downloadthread.cpp:84       libktorrent-2.2dev.so    net::DownloadThread::fillPollVector()
15936464  2.7177  sha1hashgen.cpp:134         libktorrent-2.2dev.so    bt::SHA1HashGen::processChunk(unsigned char const*)
15385198  2.6237  (no location information)   libktorrent-2.2dev.so    net::UploadThread::~UploadThread()
14581486  2.4866  (no location information)   libktorrent-2.2dev.so    net::DownloadThread::~DownloadThread()
14444592  2.4633  bufferedsocket.h:126        libktorrent-2.2dev.so    net::BufferedSocket::bytesReadyToWrite() const
14391051  2.4541  streamsocket.cpp:281        libktorrent-2.2dev.so    mse::StreamSocket::hasBytesToWrite() const
11563843  1.9720  qcstring.h:57               libktorrent-2.2dev.so    qstrlen(char const*)
10874879  1.8545  bitset.h:125                libktorrent-2.2dev.so    bt::BitSet::get(unsigned int) const
9503439   1.6207  (no location information)   libktorrent-2.2dev.so    bt::BEncoderBufferOutput::~BEncoderBufferOutput()
8260433   1.4087  downloadthread.cpp:43       libktorrent-2.2dev.so    net::DownloadThread::update()
8025917   1.3687  multidatachecker.cpp:48     libktorrent-2.2dev.so    bt::MultiDataChecker::check(QString const&, bt::Torrent const&, QString const&)
7415277   1.2645  speed.cpp:44                libktorrent-2.2dev.so    net::Speed::update(unsigned long long)
6792405   1.1583  qmap.h:187                  libktorrent-2.2dev.so    QMapIterator<unsigned int, unsigned long long>::dec()
6207957   1.0587  exitoperation.cpp:0         libktorrent-2.2dev.so    __i686.get_pc_thunk.bx

# opreport -x -g -e .plt -t 1 -l /usr/local/kde/bin/ktorrent
CPU: P4 / Xeon, speed 2143.87 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 50000
warning: some functions compiled without debug information may have incorrect source line attributions
samples  %        linenr info                 symbol name
9752057  10.7326  qcstring.h:65               qstrcmp(char const*, char const*)
7568026   8.3289  ktorrentviewitem.cpp:311    KTorrentViewItem::paintCell(QPainter*, QColorGroup const&, int, int, int)
6955143   7.6544  button.cpp:103              Ideal::Button::drawButton(QPainter*)
5890406   6.4826  addpeerwidget.moc:86        ManualPeerSource::qt_invoke(int, QUObject*)
4106225   4.5191  ktorrentviewitem.cpp:170    KTorrentViewItem::update()
3804301   4.1868  ktorrentcore.cpp:688        KTorrentCore::update()
3782774   4.1631  (no location information)   QString::~QString()
3617813   3.9816  addpeerwidget.h:80          AddPeerWidget::metaObject() const
3232588   3.5576  qmutex.h:93                 QMutexLocker::QMutexLocker(QMutex*)
2723254   2.9971  qstring.h:838               QString::QString()
2050211   2.2563  (no location information)   QValueListPrivate<QString>::~QValueListPrivate()
1911106   2.1033  scandialog.cpp:76           ScanDialog::execute(kt::TorrentInterface*, bool)
1695066   1.8655  trayhoverpopup.cpp:28       TrayHoverPopup::TrayHoverPopup(QPixmap const&, QWidget*, char const*)
1569971   1.7278  qvaluelist.h:474            QValueList<QString>::begin()
1550063   1.7059  qstring.h:777               QCharRef::operator QChar() const
1499566   1.6503  qvaluelist.h:562            QValueList<QString>::detach()
1370383   1.5082  qrect.h:167                 QRect::QRect(int, int, int, int)
1250557   1.3763  ktorrentview.cpp:454        KTorrentView::update()
1241230   1.3660  qvaluelist.h:477            QValueList<QString>::end()
1211822   1.3337  qglist.h:150                QGList::count() const
1183325   1.3023  qgarray.h:74                QGArray::detach()
1174560   1.2927  qvaluelist.h:63             QValueListNode<QString>::QValueListNode()
1160984   1.2777  qstring.h:637               QString::at(unsigned int) const
1114294   1.2263  qvaluelist.h:526            QValueList<QString>::append(QString const&)
1083807   1.1928  trayicon.cpp:39             TrayIcon::TrayIcon(KTorrentCore*, QWidget*, char const*)
921501    1.0142  qvaluelist.h:485            QValueList<QString>::operator<<(QString const&)


ktorrent easily takes a minimum 15% on my box the entire time it's running - and I don't think it should be. It should be mostly IO bound with occasional cpu time spent on hash verification (which really shouldn't take much cpu in the grand scheme of things). I've tried sliding the CPU bar from slow to fast to middle, etc. Doesn't seem to make a difference. Also have tried GUI updates of 5 seconds and that also seems to make little difference.

As you can see from the top graph - the majority of time is spent within QT itself.
clayne
Registered Member
Posts
16
Karma
0

Mon May 21, 2007 12:03 am
Simple top output:

Code: Select all
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26164 clayne    16   0  167m 103m  13m R 16.2 20.8 595:35.42 ktorrent


Footprint seems quite large for what the app should typically be caring about. Even if one were to take the 50 seeding tasks I have going, apply a generalized 64k chunk size, w/ an average of 1GB per torrent, utilize a 32 byte table entry struct for each chunk w/ 16 bytes of malloc overhead:

1024*1024*1024/65536 * 50 * (32+16)
39321600

Would be 40MB for a generalized table of structs tracking chunk checksum and chunk location. Now I don't know if ktorrent is even doing this presently (or why it would have to in the first place - once a torrent has been leeched it shouldn't change - but I'm presuming that rechecking is happening per the option in preferences). That leaves another 60MB of footprint for other things (in my particular case) if I were seeding 50GB of torrents w/ 64k chunksize (which I'm not - but it's a nice worse case).

Just restarted entirely:

Code: Select all
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25510 clayne    16   0  137m  85m  19m S  7.7 17.2   0:08.19 ktorrent


I know not to care much about virt - particularly because as far as I can tell kt is using mmap (presumably to map files for seeding/leeching a given chunk area):

$ nm -A /usr/local/kde/bin/ktorrent /usr/local/kde/lib/libktorrent-2.2dev.so | egrep 'm(un)*map'
/usr/local/kde/lib/libktorrent-2.2dev.so: U mmap64@@GLIBC_2.1
/usr/local/kde/lib/libktorrent-2.2dev.so: U munmap@@GLIBC_2.0

BTW:

I notice that there are two renditions of mmap class wrappers in the util and torrent dirs, one for standard purposes and one for cachefiles. Also, the cachefile variant seems to be using mmap for some purposes and read/write syscalls for others. I'm sure some optimization can be done there.

Also ::growFile should be using fruncate() and not a raw loop of 1k writes(). fruncate() already allocates and extends (with zero padding) or shrinks a file to a provided size in _one_ call.
clayne
Registered Member
Posts
16
Karma
0

Mon May 21, 2007 12:36 am
Going through some of the libktorrent code now. Many critical places to reduce allocation footprint - particularly in the handling of chunks. I'll test some changes and provide patches when suitable.
lucke
Registered Member
Posts
205
Karma
0

Mon May 21, 2007 11:29 am
I'd suggest using exmap for memory usage evaluation.
George
Moderator
Posts
5421
Karma
1

Mon May 21, 2007 6:43 pm
I was planning on ditching the chunk based approach and only use pieces (16 KB each), but that is something for the KDE 4 port.

The problem with chunks is that you keep an entire chunk in memory, while most of it can already be placed on disk. For example during a chunk download, if you downloaded the first 10 pieces, you can calculate the hash up to that point and save those 10 to disk and remove them from memory.

But these are rather big changes, and we are not gonna do these anymore. The plan is to release the 2.2 version in the middle of june and then fully focus on the KDE 4 port.

Btw, ftruncate fails on FAT. And growFile should not be used much. (disk preallocation takes care of that)

Also assuming 64 KB as the average chunk size, is in my opinion a rather big underestimation.
George
Moderator
Posts
5421
Karma
1

Mon May 21, 2007 6:53 pm
The CPU usage for libktorrent seems to be pretty normal, considering the fact that you are mostly uploading.

The Qt one is a bit weird, I mean ManualPeerSource::qt_invoke, that is just part of a dialog to add peers, assuming you didn't constantly had this open, I would not expect that many samples.
clayne
Registered Member
Posts
16
Karma
0

Mon May 21, 2007 8:15 pm
George wrote:The CPU usage for libktorrent seems to be pretty normal, considering the fact that you are mostly uploading.

The Qt one is a bit weird, I mean ManualPeerSource::qt_invoke, that is just part of a dialog to add peers, assuming you didn't constantly had this open, I would not expect that many samples.


Well I'd contest that normal is defined by how things presently are - but I am of the opinion that there is definitely room for optimization.

Also the op results are a bit hazy - after doing some additional profiling with cachegrind - which btw shows a big fat SHA1Generate block (further pointing back to caching of this data as a win). I didn't run the cachegrind for a significant period of time (just about 10 minutes) but I'll be restarting it later.

About the only thing that's different with my setup vs normal is that mine is using NX as the X server rather than a straight X server. But the dialogs haven't been open - just the main screen and minimized 90% of the time as well. That's what I also found odd.
George
Moderator
Posts
5421
Karma
1

Tue May 22, 2007 6:08 pm
clayne wrote:Well I'd contest that normal is defined by how things presently are - but I am of the opinion that there is definitely room for optimization.


There is certainly room for optimization. But the results where as I would expect in a mostly upload situation.

Also the op results are a bit hazy - after doing some additional profiling with cachegrind - which btw shows a big fat SHA1Generate block (further pointing back to caching of this data as a win). I didn't run the cachegrind for a significant period of time (just about 10 minutes) but I'll be restarting it later.


We know, SHA1 hash generation can take up quite a bit of CPU time, certainly when the chunks are big. That is why we added the incremental calculation mode for bigger chunks.

Big chunks resulted in a drop in download speed and a CPU usage spike when a chunk was ready.

About the only thing that's different with my setup vs normal is that mine is using NX as the X server rather than a straight X server. But the dialogs haven't been open - just the main screen and minimized 90% of the time as well. That's what I also found odd.


Something I need to look in, but first we are gonna release a 2.2 beta (probably tomorrow)
clayne
Registered Member
Posts
16
Karma
0

BTW George...

Thu May 24, 2007 12:43 pm
You're gonna hate me - but I've worked out some of the ktorrent load.

I had been using my Intel PIIX IDE/ATA drivers as a module - which specifically limited using DMA on my drives - i.e. *none*.

After ironing that out, it went from 5M/sec to 55M/sec HD throughput.

While ktorrent still hovers around in general, it's not nearly as invasive - and I'm sure you could see why having my HD access method in the trashcan would surely hurt things. ;)
George
Moderator
Posts
5421
Karma
1

Thu May 24, 2007 5:24 pm
OK, that's solved then.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot]