![]() Registered Member ![]()
|
Due to some memory issues when running many torrents at the same time i made the following adjustment.
In libktorrent/util/mmapfile.cpp Line 91: (just after the switch-statement) flag |= O_DIRECT; to remove the load on the VM subsystem that you get when seeding or downloading lots of torrents. Might be better to have this as a checkbox in the config instead of hardcoding it, but i'm not familiar with the source yet so i leave that up to the GUI programmers. One thing related to this, it could be a good idea to have torrents that are downloading with the direct-io flag but seeding torrents should be opened without it since they usually demand a bit more, and you usually dont have that many torrents downloading at once to cause such a big load on the vm. Just for your information, i'm on a 100Mbit connection and that's probably why i notice performance problems when running ktorrent. And yea, i'm running the latest and greatest SVN ofcourse ![]() |
![]() Registered Member ![]()
|
|
![]() Moderator ![]()
|
mmapfile is not used for data files, cachefile.cpp is what you are looking for.
Looking at the man page of open, it seems that O_DIRECT would make the I/O synchronous, I don't think we want this for downloads, every time we copy something into a mmapped region it would imediatly be written to disk. The point of using mmap is to allow linux to decide when it is best to write something to disk. |
![]() Registered Member ![]()
|
oh, well not too familiar with the source yet so... and have not tried this workaround in any big way yet, gonna have a closer look into the cachefile.cpp
But back to the O_DIRECT. This is a really bad way of managing this since the current VM implementation prefers keeping the buffercache in memory instead of keeping applications, so using it in this was would cause applications to be moved to swap during high load of torrents. Then main problem here is that we get tons of data loaded into the buffercache when seeding, and that data is hardly used more than once before getting flushed out of the memory, and this lower the total system hit-rate on the buffercache and there by lowering the total system-performance. I'm not a 100% sure about the lowlevel interactions for the disk-writes here but that should be simple enough to work around to lower the performance of disk-writes. An example for this would be to have a application-cache that does something like this: (pseudocode) while() { time = time() datablock = getblock(FIFO-buffer) write(datablock,destfile) time2 = time() sleep(time2-time) } the getblock() function would be just a plain FIFO where writes can be buffered before they get written to disk, and if there are no free blocks available no new data will be fetched from other clients either. This would cause writing to disk to slow down when the disk becomes more and more loaded since every write would then take more and more time. Just have a look at the source for azureus or rtorrent to get some ideas about how they do this without causing conflicts with the rest of the system. But one thing that you would probably agree to is to use O_DIRECT atleast for seeding-only torrents to get rid of the swapping issue. PS: a bit tired, but i think i understood what i wrote myself ![]() |
![]() Registered Member ![]()
|
posting a reply to my own post...
Where talking to a friend and found out about another way to keep the pagecache clean without using O_DIRECT. NAME madvise - give advice about use of memory SYNOPSIS #include <sys/mman.h> int madvise(void *start, size_t length, int advice); DESCRIPTION The madvise() system call advises the kernel about how to handle paging input/output in the address range beginning at address start and with size length bytes. It allows an application to tell the kernel how it expects to use some mapped or shared memory areas, so that the kernel can choose appropriate read-ahead and caching techniques. This call does not influence the semantics of the application (except in the case of MADV_DONT- NEED), but may influence its performance. The kernel is free to ignore the advice. The advice is indicated in the advice parameter which can be MADV_NORMAL No special treatment. This is the default. MADV_RANDOM Expect page references in random order. (Hence, read ahead may be less useful than normally.) MADV_SEQUENTIAL Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.) MADV_WILLNEED Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.) MADV_DONTNEED Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.) Subsequent accesses of pages in this range will succeed, but will result either in re-loading of the memory contents from the underlying mapped file (see mmap()) or zero-fill-on-demand pages for mappings without an underlying file. MADV_REMOVE (Since Linux 2.6.16) Free up a given range of pages and its associated backing store. Currently, only shmfs/tmpfs supports this; other filesystems return -ENOSYS. MADV_DONTFORK (Since Linux 2.6.16) Do not make the pages in this range available to the child after a fork(2). This is useful to prevent copy-on-write semantics from changing the physical location of a pagei(s) if the parent writes to it after a fork(2). (Such page relocations cause problems for hardware that DMAs into the page(s).) MADV_DOFORK (Since Linux 2.6.16) Undo the effect of MADV_DONTFORK, restoring the default behaviour, whereby a mapping is inherited across fork(2). RETURN VALUE On success madvise() returns zero. On error, it returns -1 and errno is set appropriately. So we got some stuff to play with here... MADV_DONTNEED, MADV_SEQUENTIAL, MADV_RANDOM random and squential probably needs some more investigating, but if done correctly random would be the perfect way to go. Any by correct i refer to copying the whole datablock that's beeing sent in one single operation and that should cause the kernel to read that whole block in one operation but still skipping the readahead. Gonna have a closer look at this this tomorrow.. Just one question, is my assumption correct that ktorrent allocates the complete file that it's going to download or is it done in increments? |
![]() Moderator ![]()
|
Interesting, seeing that during download of a chunk, once we have copied a piece of a chunk into the mmapped buffer, it wil not be needed anymore until we do the hash check, so we can tell the kernel MADV_DONTNEED for that piece of the chunk.
Before the hash check we advice sequential, seeing that the hash check accesses the data sequentially. Upload is a different matter, we can probably assume that once a specific piece is uploaded, it will probably not be needed anymore provided that only one peer is requesting pieces from the chunk. There is a good chance that most of the time, pieces are asked sequentially. So going sequentially could be a good idea. I'm gonna take a deeper look at this in the weekend. |
![]() Registered Member ![]()
|
Got some time to test this today, but gonna have a closer look at it tomorrow. =)
Setting a global MADV_DONTNEED seems to have improved the memory-management, but i have to reassess my assumption regarding that the data gets thrown out of the memory as soon as it has been used. What it seems to do is to mark these pages as invalid and they get thrown out before doing anthing else, like moving applications to the swap. I agree that a sequential setting would be good when doing hashing, but i did not experience any big performance-issues here when just having DONTNEED. Maybe the hash was a bit slower but my impression was that the hashing did not interfere as much with the rest of the system. (ie did a manual import of a torrent and did a hash-check). About the seeding i think that this would be better to put as a DONTNEED. My current ktorrent session has around 20 torrents seeding, and every torrent has around 60-70 files (some have even more) and if you would have a readahead here it would just end up with the vm trying to readahead for every file, but then ending up with the vm throwing away loads of that data. Just for some stats with everything set to DONTNEED. With about 20 torrents seeding (all above 20KB/s) and 2 torrents downloading i had a total speed of about 9MB/s down and 1.4MB/s up. Got a 100Mbit downlink and uplink that's shaped to around 10Mbit so overall i do think that this will not affect the performance of ktorrent while lowering the pressure on the VM. Hope i find some more stuff to work with tomorrow, and maybe be able to do some coding too ![]() PS: have a look at the new splice()/tee() sys-calls, looks quite interesting since the data would not have to be copied to userspace to send to a pipe (was implemented in 2.6.17) |
![]() Moderator ![]()
|
Sounds a reasonable to implement something like this.
Are you sure, that the pages were not in memory, if they are in memory, there is no penalty, but if they are on disk, they will have to be loaded. If you set DONTNEED and you have plenty of free memory, this will not have any impact at all.
We would only put a small part of the file in sequential mode.
Your situation might not be applicable to somebody else, anyway this is something for the weekend.
Don't think we will start with a new feature of a very recent kernel, when a lot of people are using older kernels. |
![]() Registered Member ![]()
|
Well some pages where probably there, but a lot less than before. The point around this is not really to cause ktorrent to have less buffercache, but to make ktorrent behave a bit better towards other applications when running a system where you have filled the ram with more than 60% of applications since the default kernel-tuning, see mapped or swappiness in /proc/sys/vm, causes applications to be moved to swap then to make room for more buffercache that have a very small chance of getting reused.
Ofcourse, why read more than needed, just remember to run a DONTNEED on that part after you have accessed it. And remember that you must specify size as a multiple of PAGESIZE.. Forgot this the first time and it caused a bit of confusion why it did not work as expected.
Ofcouse, testing is always needed to see how the whole application behaves, but it's still a good indication.
Hehe, well it was only for your information... And always nice with new functionality to reduces the load on the system.. And here comes a few more things related to the memory-advice functions: posix_fadvise - predeclare an access pattern for file data posix_madvise - memory advisory information and alignment control so just do a fadvise DONTNEED on new files that gets opened and then only change the advice of specific pages, if needed, later on.. I think this would allow for a simpler implementation. |
![]() Registered Member ![]()
|
And this is exactly where you should be changing things on your sytem, i.e. /proc/sys/vm/swappiness. O_DIRECT is a hack. Other methods are just advisements. If you feel that too much cache memory is being used at the expense of paging then change your swappiness factor - that's what it's there for. |
![]() Registered Member ![]()
|
I have problems with ktorrent when it works long time. PC is very slow responding over ssh and VNC. I have 300 GB share and 512MB memory. There is no sense data should be cached.
Firstly I found this page http://www.stellingwerff.com/?page_id=13 Then I decided to post O_DIRECT idea to this forum but you already discussed it. I looks like linux still caches files. May be I have to set swap to zero or somethings else. Do you have idea what may stop fedora from swapping? |
![]() Moderator ![]()
|
|
![]() Registered Member ![]()
|
I do not know but ktorrent is only program I run on that PC. I is located on different flow and I access it throw VNC and ssh.
If I turn ktorrent to start seeding in then it is hard to access server throw vnc or ssh. I guess linux kernel caches IO requests to disk. But it is senseless to cache 300GB share in 512MB memory when data accessed randomly. I started searching to find how to stop linux form caching these files and found only one article saying that source code modification is required. here is my top top - 12:58:51 up 9 days, 20:20, 3 users, load average: 1.10, 1.06, 1.01 Tasks: 132 total, 1 running, 131 sleeping, 0 stopped, 0 zombie Cpu(s): 30.9%us, 13.4%sy, 0.3%ni, 38.1%id, 13.6%wa, 0.6%hi, 3.0%si, 0.0%st Mem: 515076k total, 508708k used, 6368k free, 2112k buffers Swap: 1052216k total, 343536k used, 708680k free, 252044k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5295 max7 20 0 393m 207m 10m S 43.5 41.3 1756:18 ktorrent 10920 root 20 0 2268 932 696 R 1.9 0.2 0:00.02 top 1 root 20 0 2112 292 268 S 0.0 0.1 0:11.28 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.08 kthreadd 3 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0 4 root 15 -5 0 0 0 S 0.0 0.0 0:04.22 ksoftirqd/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 6 root 15 -5 0 0 0 S 0.0 0.0 0:14.29 events/0 7 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 khelper 59 root 15 -5 0 0 0 S 0.0 0.0 1:58.78 kblockd/0 62 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 63 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify 131 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0 133 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ksuspend_usbd 138 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 khubd 141 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 176 root 15 -5 0 0 0 S 0.0 0.0 30:19.81 kswapd0 216 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 364 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 390 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0 391 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux 396 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0 397 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1 407 root 15 -5 0 0 0 S 0.0 0.0 0:15.97 kjournald 435 root 15 -5 0 0 0 S 0.0 0.0 0:00.01 kauditd 467 root 16 -4 2976 196 196 S 0.0 0.0 0:00.37 udevd 975 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kgameportd 1160 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kmpathd/0 1186 root 15 -5 0 0 0 S 0.0 0.0 0:23.02 kjournald 1187 root 15 -5 0 0 0 S 0.0 0.0 1:18.48 kjournald 1353 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ib_addr 1361 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ib_mcast 1364 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 iw_cm_wq 1367 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ib_cm/0 1370 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 rdma_cm 1376 root 20 0 1924 348 332 S 0.0 0.1 0:12.61 iscsid 1377 root 10 -10 1972 1972 1704 S 0.0 0.4 1:22.91 iscsid 1677 rpc 20 0 2256 468 432 S 0.0 0.1 0:00.98 rpcbind 1697 rpcuser 20 0 1952 564 564 S 0.0 0.1 0:00.05 rpc.statd 1725 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 rpciod/0 1734 root 20 0 5340 264 240 S 0.0 0.1 0:00.33 rpc.idmapd 1790 root 20 0 12824 496 480 S 0.0 0.1 0:21.18 pcscd 1800 root 20 0 13584 772 672 S 0.0 0.1 0:02.01 rsyslogd 1804 root 20 0 1760 284 240 S 0.0 0.1 0:00.24 rklogd 1818 root 16 -4 12248 568 480 S 0.0 0.1 0:01.08 auditd The only solution I see now it disabling swap. |
![]() Moderator ![]()
|
|
![]() Registered Member ![]()
|
Registered users: bartoloni, Bing [Bot], Evergrowing, Google [Bot], q.ignora, watchstar