From cloude at instructables.com Wed Apr 1 23:11:47 2009 From: cloude at instructables.com (Cloude Porteus) Date: Wed, 1 Apr 2009 16:11:47 -0700 Subject: Debugging / nuked objects spike Message-ID: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> We've been running varnish in production for about a week and I've been noticing that things aren't quite right, but it's been hard to figure out what. Most of the time Varnish is running with a 98% hit ratio and all is fine. We have been running for a few days with about 250k documents in the cache. I just noticed that the number of documents in the cache dropped from ~140k -> ~30k and the LRU Nuked Objects increased by 100k. I assume we're hitting our storage limit, which is currently set to 10gb. We had it set at 50gb before, but we were still having similar problems. I noticed last night there was a couple of hours where it looked like the hit ratio was close to zero, but then it went back to normal. Any ideas what would cause Varnish to nuke ~100k objects all at once? I've gone over all the performance tuning info and we've tried to implement most of the suggestions. I'm just not sure which direction to start tuning further. Thanks for any suggestions. Here's our current default.vcl: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Configuration file for varnish NFILES=131072 MEMLOCK=82000 VARNISH_VCL_CONF=/etc/varnish/instructables.vcl VARNISH_LISTEN_ADDRESS= VARNISH_LISTEN_PORT=80 VARNISH_ADMIN_LISTEN_ADDRESS=127.0.0.1 VARNISH_ADMIN_LISTEN_PORT=82 VARNISH_MIN_THREADS=400 VARNISH_MAX_THREADS=1000 VARNISH_THREAD_TIMEOUT=60 VARNISH_STORAGE_FILE=/var/lib/varnish/varnish_storage.bin VARNISH_STORAGE_SIZE=10G VARNISH_STORAGE="file,${VARNISH_STORAGE_FILE},${VARNISH_STORAGE_SIZE}" VARNISH_TTL=1800 DAEMON_OPTS="-a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \ -f ${VARNISH_VCL_CONF} \ -T ${VARNISH_ADMIN_LISTEN_ADDRESS}:${VARNISH_ADMIN_LISTEN_PORT} \ -t ${VARNISH_TTL} \ -w ${VARNISH_MIN_THREADS},${VARNISH_MAX_THREADS},${VARNISH_THREAD_TIMEOUT} \ -u varnish -g varnish \ -s ${VARNISH_STORAGE} \ -p obj_workspace=4096 \ -p sess_workspace=262144 \ -p lru_interval=3600 -p listen_depth=8192 \ -p log_hashstring=off \ -p sess_timeout=10 \ -p shm_workspace=32768 \ -p ping_interval=1 \ -p thread_pools=4 \ -p thread_pool_min=100 \ -p srcaddr_ttl=0 \ -p esi_syntax=1 " thanks, cloude -- VP of Product Development Instructables.com http://www.instructables.com/member/lebowski From jna at twitter.com Wed Apr 1 23:23:13 2009 From: jna at twitter.com (John Adams) Date: Wed, 1 Apr 2009 16:23:13 -0700 Subject: Debugging / nuked objects spike In-Reply-To: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> References: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> Message-ID: <0EB3DE15-CDDA-4AF6-8FB6-233716A68683@twitter.com> Are you sure it's the same child process running? If the child dies randomly (SEGV, etc -- check your syslogs) it might be restarting on you. If you're using lots of regexps you may have to increase sess_workspace. Look for those errors in the logs. -j On Apr 1, 2009, at 4:11 PM, Cloude Porteus wrote: > We've been running varnish in production for about a week and I've > been noticing that things aren't quite right, but it's been hard to > figure out what. Most of the time Varnish is running with a 98% hit > ratio and all is fine. We have been running for a few days with about > 250k documents in the cache. > > I just noticed that the number of documents in the cache dropped from > ~140k -> ~30k and the LRU Nuked Objects increased by 100k. I assume > we're hitting our storage limit, which is currently set to 10gb. We > had it set at 50gb before, but we were still having similar problems. > I noticed last night there was a couple of hours where it looked like > the hit ratio was close to zero, but then it went back to normal. > > Any ideas what would cause Varnish to nuke ~100k objects all at once? > I've gone over all the performance tuning info and we've tried to > implement most of the suggestions. I'm just not sure which direction > to start tuning further. > > Thanks for any suggestions. Here's our current default.vcl: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > # Configuration file for varnish > NFILES=131072 > MEMLOCK=82000 > VARNISH_VCL_CONF=/etc/varnish/instructables.vcl > VARNISH_LISTEN_ADDRESS= > VARNISH_LISTEN_PORT=80 > VARNISH_ADMIN_LISTEN_ADDRESS=127.0.0.1 > VARNISH_ADMIN_LISTEN_PORT=82 > VARNISH_MIN_THREADS=400 > VARNISH_MAX_THREADS=1000 > VARNISH_THREAD_TIMEOUT=60 > VARNISH_STORAGE_FILE=/var/lib/varnish/varnish_storage.bin > VARNISH_STORAGE_SIZE=10G > VARNISH_STORAGE="file,${VARNISH_STORAGE_FILE},${VARNISH_STORAGE_SIZE}" > VARNISH_TTL=1800 > > DAEMON_OPTS="-a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \ > -f ${VARNISH_VCL_CONF} \ > -T ${VARNISH_ADMIN_LISTEN_ADDRESS}:$ > {VARNISH_ADMIN_LISTEN_PORT} \ > -t ${VARNISH_TTL} \ > -w > ${VARNISH_MIN_THREADS},${VARNISH_MAX_THREADS},$ > {VARNISH_THREAD_TIMEOUT} > \ > -u varnish -g varnish \ > -s ${VARNISH_STORAGE} \ > -p obj_workspace=4096 \ > -p sess_workspace=262144 \ > -p lru_interval=3600 > -p listen_depth=8192 \ > -p log_hashstring=off \ > -p sess_timeout=10 \ > -p shm_workspace=32768 \ > -p ping_interval=1 \ > -p thread_pools=4 \ > -p thread_pool_min=100 \ > -p srcaddr_ttl=0 \ > -p esi_syntax=1 " > > thanks, > cloude > -- > VP of Product Development > Instructables.com > > http://www.instructables.com/member/lebowski > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev --- John Adams Twitter Operations jna at twitter.com http://twitter.com/netik From des at des.no Thu Apr 2 00:02:29 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Thu, 02 Apr 2009 02:02:29 +0200 Subject: Debugging / nuked objects spike In-Reply-To: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> (Cloude Porteus's message of "Wed, 1 Apr 2009 16:11:47 -0700") References: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> Message-ID: <86myazapnu.fsf@ds4.des.no> Cloude Porteus writes: > Any ideas what would cause Varnish to nuke ~100k objects all at once? Just a guess: these objects are your hot set, so they're all loaded within the first seconds of operation, and they all have the same expiry time, so they all expire within a few seconds of each other. DES -- Dag-Erling Sm?rgrav - des at des.no From sky at crucially.net Thu Apr 2 00:51:31 2009 From: sky at crucially.net (Artur Bergman) Date: Wed, 1 Apr 2009 17:51:31 -0700 Subject: Debugging / nuked objects spike In-Reply-To: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> References: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> Message-ID: <9F5D9B7B-A6D7-4E99-9A70-58CA866FB1C0@crucially.net> On Apr 1, 2009, at 4:11 PM, Cloude Porteus wrote: > > I just noticed that the number of documents in the cache dropped from > ~140k -> ~30k and the LRU Nuked Objects increased by 100k. I assume > we're hitting our storage limit, which is currently set to 10gb. We > had it set at 50gb before, but we were still having similar problems. > I noticed last night there was a couple of hours where it looked like > the hit ratio was close to zero, but then it went back to normal. > > Any ideas what would cause Varnish to nuke ~100k objects all at once? > I've gone over all the performance tuning info and we've tried to > implement most of the suggestions. I'm just not sure which direction > to start tuning further. There are problems with the fragmentation of the store. Try using malloc and see if the problem goes away. (We see this regularly) Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky at crucially.net Thu Apr 2 00:51:46 2009 From: sky at crucially.net (Artur Bergman) Date: Wed, 1 Apr 2009 17:51:46 -0700 Subject: Debugging / nuked objects spike In-Reply-To: <86myazapnu.fsf@ds4.des.no> References: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> <86myazapnu.fsf@ds4.des.no> Message-ID: <7157980A-735A-4674-923C-8DCB6F27F369@crucially.net> They would expire, not nuke then :) Cheers Artur On Apr 1, 2009, at 5:02 PM, Dag-Erling Sm?rgrav wrote: > Cloude Porteus writes: >> Any ideas what would cause Varnish to nuke ~100k objects all at once? > > Just a guess: these objects are your hot set, so they're all loaded > within the first seconds of operation, and they all have the same > expiry > time, so they all expire within a few seconds of each other. > > DES > -- > Dag-Erling Sm?rgrav - des at des.no > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev From on at cs.ait.ac.th Thu Apr 2 04:51:48 2009 From: on at cs.ait.ac.th (Olivier Nicole) Date: Thu, 2 Apr 2009 11:51:48 +0700 (ICT) Subject: ./configure error for sys/mount.h Message-ID: <200904020451.n324pmxt058307@banyan.cs.ait.ac.th> Hi, ./configure reports the following error: checking sys/mount.h usability... no checking sys/mount.h presence... yes configure: WARNING: sys/mount.h: present but cannot be compiled configure: WARNING: sys/mount.h: check for missing prerequisite headers? configure: WARNING: sys/mount.h: see the Autoconf documentation configure: WARNING: sys/mount.h: section "Present But Cannot Be Compiled" configure: WARNING: sys/mount.h: proceeding with the preprocessor's result configure: WARNING: sys/mount.h: in the future, the compiler will take precedence configure: WARNING: ## --------------------------------------------- ## configure: WARNING: ## Report this to varnish-dev at projects.linpro.no ## configure: WARNING: ## --------------------------------------------- ## As requested, I report. This happens on 2.0.3 and 2.0.4, I think I tracked it down to sys/param.h missing in conftest.c. I am not sure it has an impact on the compilation/execution of Varnish. Best regards, Olivier From cloude at instructables.com Thu Apr 2 18:06:00 2009 From: cloude at instructables.com (Cloude Porteus) Date: Thu, 2 Apr 2009 11:06:00 -0700 Subject: Debugging / nuked objects spike In-Reply-To: <9F5D9B7B-A6D7-4E99-9A70-58CA866FB1C0@crucially.net> References: <4a05e1020904011611h6800c49xed759f3d4146a39c@mail.gmail.com> <9F5D9B7B-A6D7-4E99-9A70-58CA866FB1C0@crucially.net> Message-ID: <4a05e1020904021106m18ee75ffjc8e27d10b1649d59@mail.gmail.com> Artur, So far so good switching to malloc. The system load is also down to .03 from an average of .45 when I was using the file storage option. Thanks for the help! -cloude On Wed, Apr 1, 2009 at 5:51 PM, Artur Bergman wrote: > > On Apr 1, 2009, at 4:11 PM, Cloude Porteus wrote: > > > I just noticed that the number of documents in the cache dropped from > ~140k -> ~30k and the LRU Nuked Objects increased by 100k. I assume > we're hitting our storage limit, which is currently set to 10gb. We > had it set at 50gb before, but we were still having similar problems. > I noticed last night there was a couple of hours where it looked like > the hit ratio was close to zero, but then it went back to normal. > > Any ideas what would cause Varnish to nuke ~100k objects all at once? > I've gone over all the performance tuning info and we've tried to > implement most of the suggestions. I'm just not sure which direction > to start tuning further. > > > There are problems with the fragmentation of the store. Try using malloc > and see if the problem goes away. (We see this regularly) > > Artur > > -- VP of Product Development Instructables.com http://www.instructables.com/member/lebowski -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.halff at gmail.com Fri Apr 3 12:49:09 2009 From: rob.halff at gmail.com (Rob Halff) Date: Fri, 3 Apr 2009 14:49:09 +0200 Subject: Virtualhost logging for varnishncsa Message-ID: Hi, I've changed the varnishncsa sourcecode to support virtualhost logging. I know in the TODO of varnishncsa there is a future wish for "Log in any format one wants", but I can imagine that would need a total rewrite and takes some time. So in the meantime I have a request to add the virtualhost logging as a commandline option. I've added a -v flag enabling virtualhost style logging In this case the logformat looks like: $ varnishncsa -v www.test.nl 111.222.333.44 - - [03/Apr/2009:11:41:57 +0200] "GET http://www.test.nl/favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB5)" Notice the 'www.test.nl' being logged. This is the equivalent of this kind apache logging: http://httpd.apache.org/docs/2.0/vhosts/mass.html#simple : LogFormat "%V %h %l %u %t \"%r\" %s %b" vcommon http://httpd.apache.org/docs/2.0/vhosts/mass.html#simple.rewrite : LogFormat "%{Host}i %h %l %u %t \"%r\" %s %b" vcommon So this adds the Host part to the normal kind of logging ncsa is doing. Taken that this is a very common way to log virtual hosts I would say the -v option is not just some hack to suit my own needs, I think it is usefull for others also. Without this I am not able to use awstats the way we used it when we where not using varnish. Attached you will find the diff against the current trunk. Greetings, Rob Halff -------------- next part -------------- A non-text attachment was scrubbed... Name: varnishncsa_virtual_host_patch.diff Type: application/octet-stream Size: 1120 bytes Desc: not available URL: From des at des.no Fri Apr 3 21:42:25 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Fri, 03 Apr 2009 23:42:25 +0200 Subject: ./configure error for sys/mount.h In-Reply-To: <200904020451.n324pmxt058307@banyan.cs.ait.ac.th> (Olivier Nicole's message of "Thu, 2 Apr 2009 11:51:48 +0700 (ICT)") References: <200904020451.n324pmxt058307@banyan.cs.ait.ac.th> Message-ID: <861vs9ifcu.fsf@ds4.des.no> Olivier Nicole writes: > configure: WARNING: sys/mount.h: present but cannot be compiled Fixed, thanks. DES -- Dag-Erling Sm?rgrav - des at des.no From dome at tel.co.th Tue Apr 7 08:36:42 2009 From: dome at tel.co.th (Dome Charoenyost) Date: Tue, 7 Apr 2009 15:36:42 +0700 Subject: How to check maximum Request /s Message-ID: <8ccbff060904070136v445f2affuc01f70a96ebe14f@mail.gmail.com> Dear All, I try varnishstat to check maxmimum request / s but not found. where to get it ? Best regards. Dome C. From rob.halff at gmail.com Tue Apr 7 08:50:34 2009 From: rob.halff at gmail.com (Rob Halff) Date: Tue, 7 Apr 2009 10:50:34 +0200 Subject: Virtualhost logging for varnishncsa Message-ID: Here is a new diff, the http:// part also needs to be omitted to do the correct kind of logging. Can I draw the conclusing, from the overwhelming response, that nobody is really interested in this patch ? :-) -------------- next part -------------- A non-text attachment was scrubbed... Name: varnishncsa_virtual_host_patch.diff Type: application/octet-stream Size: 1617 bytes Desc: not available URL: From tical.net at gmail.com Tue Apr 7 14:14:16 2009 From: tical.net at gmail.com (Ray Barnes) Date: Tue, 7 Apr 2009 10:14:16 -0400 Subject: Varnish 2.0.3 stops accepting connections - fixed? Performance revisited Message-ID: Hi all.? I have already seen Eden Li's patch, apparently committed to 2.0.4, which fixes the problem of varnish not re-checking to see if file descriptors are available again to service connections - at least that's my extremely lay understanding, being mostly a non-programmer. Further to Eden Li's post to this list which says "We're getting around it now by setting the max open file limit and listen_depth appropriately so that varnish never gets to this point, but it'd be nice if this was fixed in case we ever accidentally get here again." - I'm wondering if someone can critique my config. I've observed several instances where Varnish would do exactly what Eden describes - stop listening to requests on port 80. A 'telnet ip 80' would simply freeze indefinitely and not connect. The child process was running, etc. So I'm going to assume for the moment that the bugfix will be the fix for this issue, as I have not been able to duplicate it under lab testing, but only under live load conditions. Here is the way we call varnishd: #!/bin/bash ulimit -n 131072 ulimit -l 82000 /usr/sbin/varnishd -a x.x.x.x:80 -b x.x.x.x:80 -T x.x.x.x:6083 \ -t 60 -w150,2000,60 -u varnish -g varnish -p obj_workspace=4096 -p sess_workspace=262144 -p listen_depth=8192 \ -p shm_workspace=29000 -p thread_pools=24 -p thread_pool_min=8 -p ping_interval=1 -p srcaddr_ttl=0 -s malloc,60M This configuration was a hack between John Adams config from a post from February with the subject "Is anyone using ESI with a lot of traffic?", and the Fedora startup script for varnish in /etc/init.d - platform is Linux 2.6 (Fedora 10 and RHEL). cat /proc/sys/fs/file-max says 65535 - I set this value on the fly without rebooting yet. This behavior I described, where port 80 will stop taking connections, is also present when I call varnishd using simply 'varnishd -a x.x.x.x:80 -b x.x.x.x:80 -T x.x.x.x:6083' with no ulimit commands, no additional arguments, and as far as I can remember, default FDs in /proc/sys/fs/file-max. Before I "chase my tail" any further, can anyone recommend any improvements to the above config? Also, is there any particular reason, given the above config, why 'ab' (the apache benchmarking utility) would fail intermittently to connect 1000 concurrent sessions to varnish? I found that when I used John Adams' default of 400 initial minimum threads, the daemon would do unpredictable things and not let me run 'ab' consistently without refusing the connections - any idea why? Thanks in advance, both for any replies, and to everyone who has contributed to the project. -Ray From vijayaraghavan.subramaniam at wipro.com Thu Apr 9 15:33:15 2009 From: vijayaraghavan.subramaniam at wipro.com (vijayaraghavan.subramaniam at wipro.com) Date: Thu, 9 Apr 2009 21:03:15 +0530 Subject: Logging and Statistics Message-ID: Hi, I'm using varnish server. I want to write varnishncsa log information into database, what is best practices & Is there any varnish API available to write into database? Thanks, --Raghavan. Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cloude at instructables.com Thu Apr 9 19:02:25 2009 From: cloude at instructables.com (Cloude Porteus) Date: Thu, 9 Apr 2009 12:02:25 -0700 Subject: High Server Load Averages? Message-ID: <4a05e1020904091202u1ad23e4fx1abaa4b5b59317e@mail.gmail.com> Has anyone experienced very high server load averages? We're running varnish on a dual core with 8gb of ram. It runs okay for a day or two and then I start seeing load averages in 6-10 range for an hour or so, drops down to 2-3, then goes back up. This starts to happen once we have more items in the cache than our physical memory. Maybe increasing our lru_interval will help? It's currently set to 3600. Right now we're running with a 50gb file storage option. There are 270k objects in the cache, 70gb virtual memory, 6.2gb of res memory used, 11gb of data on disk in the file storage. We have a 98% hit ratio. We followed Artur's advice about setting a tmpfs and creating an ext2 partition for our file storage. I also tried running with malloc as our storage type, but I had to set it at a little less than half of our physical ram in order for it to work well after the cache got full. I don't understand why the virtual memory is double when I am running in malloc mode. I was running it with 5gb and the virtual memory was about 10-12gb and once it got full it started using the swap memory. Thanks for any help/insight. best, cloude -- VP of Product Development Instructables.com http://www.instructables.com/member/lebowski -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky at crucially.net Thu Apr 9 19:18:22 2009 From: sky at crucially.net (Artur Bergman) Date: Thu, 9 Apr 2009 12:18:22 -0700 Subject: High Server Load Averages? In-Reply-To: <4a05e1020904091202u1ad23e4fx1abaa4b5b59317e@mail.gmail.com> References: <4a05e1020904091202u1ad23e4fx1abaa4b5b59317e@mail.gmail.com> Message-ID: <871677B9-7E39-4246-BE5E-422E4E593151@crucially.net> For the file storage or for the shmlog? When do you start nuking/expiring from disk? I suspect the load goes up when you run out of storage space? Cheers Artur On Apr 9, 2009, at 12:02 PM, Cloude Porteus wrote: > Has anyone experienced very high server load averages? We're running > varnish on a dual core with 8gb of ram. It runs okay for a day or > two and then I start seeing load averages in 6-10 range for an hour > or so, drops down to 2-3, then goes back up. > > This starts to happen once we have more items in the cache than our > physical memory. Maybe increasing our lru_interval will help? It's > currently set to 3600. > > Right now we're running with a 50gb file storage option. There are > 270k objects in the cache, 70gb virtual memory, 6.2gb of res memory > used, 11gb of data on disk in the file storage. We have a 98% hit > ratio. > > We followed Artur's advice about setting a tmpfs and creating an > ext2 partition for our file storage. > > I also tried running with malloc as our storage type, but I had to > set it at a little less than half of our physical ram in order for > it to work well after the cache got full. I don't understand why the > virtual memory is double when I am running in malloc mode. I was > running it with 5gb and the virtual memory was about 10-12gb and > once it got full it started using the swap memory. > > Thanks for any help/insight. > > best, > cloude > -- > VP of Product Development > Instructables.com > > http://www.instructables.com/member/lebowski > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From cloude at instructables.com Thu Apr 9 19:27:55 2009 From: cloude at instructables.com (Cloude Porteus) Date: Thu, 9 Apr 2009 12:27:55 -0700 Subject: High Server Load Averages? In-Reply-To: <871677B9-7E39-4246-BE5E-422E4E593151@crucially.net> References: <4a05e1020904091202u1ad23e4fx1abaa4b5b59317e@mail.gmail.com> <871677B9-7E39-4246-BE5E-422E4E593151@crucially.net> Message-ID: <4a05e1020904091227x519affe6r1a9540b5a6dde1b9@mail.gmail.com> Varnishstat doesn't list any nuked objects and file storage and shmlog look like they have plenty of space: df -h ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Filesystem Size Used Avail Use% Mounted on tmpfs 150M 81M 70M 54% /usr/local/var/varnish /dev/sdc1 74G 11G 61G 16% /var/lib/varnish top ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ top - 12:26:33 up 164 days, 22:21, 1 user, load average: 2.60, 3.26, 3.75 Tasks: 67 total, 1 running, 66 sleeping, 0 stopped, 0 zombie Cpu(s): 0.7%us, 0.3%sy, 0.0%ni, 97.0%id, 0.7%wa, 0.3%hi, 1.0%si, 0.0%st Mem: 8183492k total, 7763100k used, 420392k free, 13424k buffers Swap: 3148720k total, 56636k used, 3092084k free, 7317692k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7441 varnish 15 0 70.0g 6.4g 6.1g S 2 82.5 56:33.31 varnishd Varnishstat: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hitrate ratio: 8 8 8 Hitrate avg: 0.9782 0.9782 0.9782 36494404 219.98 160.57 Client connections accepted 36494486 220.98 160.57 Client requests received 35028477 212.98 154.12 Cache hits 474091 4.00 2.09 Cache hits for pass 988013 6.00 4.35 Cache misses 1465955 10.00 6.45 Backend connections success 9 0.00 0.00 Backend connections failures 994 . . N struct sess_mem 11 . . N struct sess 274047 . . N struct object 252063 . . N struct objecthead 609018 . . N struct smf 28720 . . N small free smf 2 . . N large free smf 2 . . N struct vbe_conn 901 . . N struct bereq 2000 . . N worker threads 2000 0.00 0.01 N worker threads created 143 0.00 0.00 N overflowed work requests 1 . . N backends 672670 . . N expired objects 3514467 . . N LRU moved objects 49 0.00 0.00 HTTP header overflows 32124238 206.98 141.34 Objects sent with write 36494396 224.98 160.57 Total Sessions 36494484 224.98 160.57 Total Requests 783 0.00 0.00 Total pipe 518770 4.00 2.28 Total pass 1464570 10.00 6.44 Total fetch 14559014884 93563.69 64058.18 Total header bytes 168823109304 489874.04 742804.45 Total body bytes 36494387 224.98 160.57 Session Closed 203 0.00 0.00 Session herd 1736767745 10880.80 7641.60 SHM records 148079555 908.90 651.53 SHM writes 15088 0.00 0.07 SHM flushes due to overflow 10494 0.00 0.05 SHM MTX contention 687 0.00 0.00 SHM cycles through buffer 2988576 21.00 13.15 allocator requests 580296 . . outstanding allocations 8916353024 . . bytes allocated 44770738176 . . bytes free 656 0.00 0.00 SMS allocator requests 303864 . . SMS bytes allocated 303864 . . SMS bytes freed 1465172 10.00 6.45 Backend requests made On Thu, Apr 9, 2009 at 12:18 PM, Artur Bergman wrote: > For the file storage or for the shmlog? > > When do you start nuking/expiring from disk? I suspect the load goes up > when you run out of storage space? > > Cheers > Artur > > > On Apr 9, 2009, at 12:02 PM, Cloude Porteus wrote: > > Has anyone experienced very high server load averages? We're running > varnish on a dual core with 8gb of ram. It runs okay for a day or two and > then I start seeing load averages in 6-10 range for an hour or so, drops > down to 2-3, then goes back up. > > This starts to happen once we have more items in the cache than our > physical memory. Maybe increasing our lru_interval will help? It's currently > set to 3600. > > Right now we're running with a 50gb file storage option. There are 270k > objects in the cache, 70gb virtual memory, 6.2gb of res memory used, 11gb of > data on disk in the file storage. We have a 98% hit ratio. > > We followed Artur's advice about setting a tmpfs and creating an ext2 > partition for our file storage. > > I also tried running with malloc as our storage type, but I had to set it > at a little less than half of our physical ram in order for it to work well > after the cache got full. I don't understand why the virtual memory is > double when I am running in malloc mode. I was running it with 5gb and the > virtual memory was about 10-12gb and once it got full it started using the > swap memory. > > Thanks for any help/insight. > > best, > cloude > -- > VP of Product Development > Instructables.com > > http://www.instructables.com/member/lebowski > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev > > > -- VP of Product Development Instructables.com http://www.instructables.com/member/lebowski -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky at crucially.net Thu Apr 9 20:43:13 2009 From: sky at crucially.net (Artur Bergman) Date: Thu, 9 Apr 2009 13:43:13 -0700 Subject: High Server Load Averages? In-Reply-To: <4a05e1020904091227x519affe6r1a9540b5a6dde1b9@mail.gmail.com> References: <4a05e1020904091202u1ad23e4fx1abaa4b5b59317e@mail.gmail.com> <871677B9-7E39-4246-BE5E-422E4E593151@crucially.net> <4a05e1020904091227x519affe6r1a9540b5a6dde1b9@mail.gmail.com> Message-ID: <477CEE15-7F81-4E12-8110-55D1534D3C4B@crucially.net> What is your iopressure? iostat -k -x 5 or something like that artur On Apr 9, 2009, at 12:27 PM, Cloude Porteus wrote: > Varnishstat doesn't list any nuked objects and file storage and > shmlog look like they have plenty of space: > > df -h > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Filesystem Size Used Avail Use% Mounted on > tmpfs 150M 81M 70M 54% /usr/local/var/varnish > /dev/sdc1 74G 11G 61G 16% /var/lib/varnish > > top > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > top - 12:26:33 up 164 days, 22:21, 1 user, load average: 2.60, > 3.26, 3.75 > Tasks: 67 total, 1 running, 66 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.7%us, 0.3%sy, 0.0%ni, 97.0%id, 0.7%wa, 0.3%hi, > 1.0%si, 0.0%st > Mem: 8183492k total, 7763100k used, 420392k free, 13424k > buffers > Swap: 3148720k total, 56636k used, 3092084k free, 7317692k > cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 7441 varnish 15 0 70.0g 6.4g 6.1g S 2 82.5 56:33.31 varnishd > > > Varnishstat: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Hitrate ratio: 8 8 8 > Hitrate avg: 0.9782 0.9782 0.9782 > > 36494404 219.98 160.57 Client connections accepted > 36494486 220.98 160.57 Client requests received > 35028477 212.98 154.12 Cache hits > 474091 4.00 2.09 Cache hits for pass > 988013 6.00 4.35 Cache misses > 1465955 10.00 6.45 Backend connections success > 9 0.00 0.00 Backend connections failures > 994 . . N struct sess_mem > 11 . . N struct sess > 274047 . . N struct object > 252063 . . N struct objecthead > 609018 . . N struct smf > 28720 . . N small free smf > 2 . . N large free smf > 2 . . N struct vbe_conn > 901 . . N struct bereq > 2000 . . N worker threads > 2000 0.00 0.01 N worker threads created > 143 0.00 0.00 N overflowed work requests > 1 . . N backends > 672670 . . N expired objects > 3514467 . . N LRU moved objects > 49 0.00 0.00 HTTP header overflows > 32124238 206.98 141.34 Objects sent with write > 36494396 224.98 160.57 Total Sessions > 36494484 224.98 160.57 Total Requests > 783 0.00 0.00 Total pipe > 518770 4.00 2.28 Total pass > 1464570 10.00 6.44 Total fetch > 14559014884 93563.69 64058.18 Total header bytes > 168823109304 489874.04 742804.45 Total body bytes > 36494387 224.98 160.57 Session Closed > 203 0.00 0.00 Session herd > 1736767745 10880.80 7641.60 SHM records > 148079555 908.90 651.53 SHM writes > 15088 0.00 0.07 SHM flushes due to overflow > 10494 0.00 0.05 SHM MTX contention > 687 0.00 0.00 SHM cycles through buffer > 2988576 21.00 13.15 allocator requests > 580296 . . outstanding allocations > 8916353024 . . bytes allocated > 44770738176 . . bytes free > 656 0.00 0.00 SMS allocator requests > 303864 . . SMS bytes allocated > 303864 . . SMS bytes freed > 1465172 10.00 6.45 Backend requests made > > > > On Thu, Apr 9, 2009 at 12:18 PM, Artur Bergman > wrote: > For the file storage or for the shmlog? > > When do you start nuking/expiring from disk? I suspect the load goes > up when you run out of storage space? > > Cheers > Artur > > > On Apr 9, 2009, at 12:02 PM, Cloude Porteus wrote: > >> Has anyone experienced very high server load averages? We're >> running varnish on a dual core with 8gb of ram. It runs okay for a >> day or two and then I start seeing load averages in 6-10 range for >> an hour or so, drops down to 2-3, then goes back up. >> >> This starts to happen once we have more items in the cache than our >> physical memory. Maybe increasing our lru_interval will help? It's >> currently set to 3600. >> >> Right now we're running with a 50gb file storage option. There are >> 270k objects in the cache, 70gb virtual memory, 6.2gb of res memory >> used, 11gb of data on disk in the file storage. We have a 98% hit >> ratio. >> >> We followed Artur's advice about setting a tmpfs and creating an >> ext2 partition for our file storage. >> >> I also tried running with malloc as our storage type, but I had to >> set it at a little less than half of our physical ram in order for >> it to work well after the cache got full. I don't understand why >> the virtual memory is double when I am running in malloc mode. I >> was running it with 5gb and the virtual memory was about 10-12gb >> and once it got full it started using the swap memory. >> >> Thanks for any help/insight. >> >> best, >> cloude >> -- >> VP of Product Development >> Instructables.com >> >> http://www.instructables.com/member/lebowski >> _______________________________________________ >> varnish-dev mailing list >> varnish-dev at projects.linpro.no >> http://projects.linpro.no/mailman/listinfo/varnish-dev > > > > > -- > VP of Product Development > Instructables.com > > http://www.instructables.com/member/lebowski -------------- next part -------------- An HTML attachment was scrubbed... URL: From cloude at instructables.com Thu Apr 9 21:46:08 2009 From: cloude at instructables.com (Cloude Porteus) Date: Thu, 9 Apr 2009 14:46:08 -0700 Subject: High Server Load Averages? In-Reply-To: <477CEE15-7F81-4E12-8110-55D1534D3C4B@crucially.net> References: <4a05e1020904091202u1ad23e4fx1abaa4b5b59317e@mail.gmail.com> <871677B9-7E39-4246-BE5E-422E4E593151@crucially.net> <4a05e1020904091227x519affe6r1a9540b5a6dde1b9@mail.gmail.com> <477CEE15-7F81-4E12-8110-55D1534D3C4B@crucially.net> Message-ID: <4a05e1020904091446h7f8be7e5l1b33652e1c323767@mail.gmail.com> The current load is just above 2. I'll check this again when I see a load spike. [cloude at squid03 ~]$ iostat -k -x 5 Linux 2.6.18-53.1.19.el5.centos.plus (squid03.instructables.com) 04/09/2009 avg-cpu: %user %nice %system %iowait %steal %idle 1.19 0.00 0.95 2.14 0.00 95.73 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.07 9.64 0.15 1.65 10.08 45.68 61.80 0.13 70.32 3.96 0.72 sdb 0.07 9.63 0.15 1.66 10.14 45.68 61.75 0.02 10.03 3.76 0.68 sdc 0.03 16.47 1.21 14.69 13.99 128.81 17.96 0.08 4.81 4.31 6.85 sdd 0.03 16.45 1.17 13.24 13.29 119.96 18.49 0.24 16.52 4.06 5.86 md1 0.00 0.00 0.43 11.13 20.19 44.52 11.19 0.00 0.00 0.00 0.00 md2 0.00 0.00 2.41 29.40 26.58 117.61 9.07 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 3.15 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.90 0.00 2.40 46.70 0.00 50.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.40 6.00 238.40 74.40 974.40 8.58 132.88 515.03 4.09 100.02 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.90 0.00 1.80 67.67 0.00 29.63 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 1.60 13.40 141.80 188.80 1053.60 16.01 138.62 934.04 6.44 100.02 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 1.80 61.40 0.00 36.30 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.40 0.00 2.40 12.00 0.00 9.00 9.00 0.36 sdb 0.00 0.00 0.00 0.40 0.00 2.40 12.00 0.00 9.50 9.50 0.38 sdc 0.00 1.60 6.40 257.00 132.00 2195.20 17.67 107.40 450.21 3.68 96.82 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.20 0.00 0.80 8.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.60 0.00 1.60 47.80 0.00 50.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.20 0.00 1.60 16.00 0.00 11.00 11.00 0.22 sdb 0.00 0.00 0.00 0.20 0.00 1.60 16.00 0.00 13.00 13.00 0.26 sdc 0.00 0.80 0.20 301.80 8.80 1270.40 8.47 119.40 373.98 3.31 100.04 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.60 0.00 1.70 47.80 0.00 49.90 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 1.20 2.40 245.31 43.11 1538.52 12.77 101.41 419.12 4.03 99.80 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.60 0.00 1.50 3.20 0.00 94.69 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.20 0.00 0.40 0.00 2.40 0.00 12.00 0.01 14.00 7.00 0.28 sdc 0.00 0.00 6.60 11.00 174.40 192.80 41.73 1.26 421.34 3.73 6.56 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.70 0.00 1.60 29.50 0.00 68.20 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 5.60 208.60 110.40 857.60 9.04 70.18 301.18 2.90 62.06 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 1.50 48.05 0.00 49.95 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.20 0.00 0.80 0.00 5.60 14.00 0.01 8.75 8.75 0.70 sdb 0.00 0.20 0.00 0.80 0.00 5.60 14.00 0.01 9.50 9.50 0.76 sdc 0.00 1.00 6.80 232.40 91.20 1180.80 10.64 110.32 475.49 4.18 100.02 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md1 0.00 0.00 0.00 0.60 0.00 2.40 8.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 On Thu, Apr 9, 2009 at 1:43 PM, Artur Bergman wrote: > What is your iopressure? > iostat -k -x 5 > > or something like that > > artur > > On Apr 9, 2009, at 12:27 PM, Cloude Porteus wrote: > > Varnishstat doesn't list any nuked objects and file storage and shmlog look > like they have plenty of space: > > df -h > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Filesystem Size Used Avail Use% Mounted on > tmpfs 150M 81M 70M 54% /usr/local/var/varnish > /dev/sdc1 74G 11G 61G 16% /var/lib/varnish > > top > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > top - 12:26:33 up 164 days, 22:21, 1 user, load average: 2.60, 3.26, 3.75 > Tasks: 67 total, 1 running, 66 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.7%us, 0.3%sy, 0.0%ni, 97.0%id, 0.7%wa, 0.3%hi, 1.0%si, > 0.0%st > Mem: 8183492k total, 7763100k used, 420392k free, 13424k buffers > Swap: 3148720k total, 56636k used, 3092084k free, 7317692k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 7441 varnish 15 0 70.0g 6.4g 6.1g S 2 82.5 56:33.31 varnishd > > > Varnishstat: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Hitrate ratio: 8 8 8 > Hitrate avg: 0.9782 0.9782 0.9782 > > 36494404 219.98 160.57 Client connections accepted > 36494486 220.98 160.57 Client requests received > 35028477 212.98 154.12 Cache hits > 474091 4.00 2.09 Cache hits for pass > 988013 6.00 4.35 Cache misses > 1465955 10.00 6.45 Backend connections success > 9 0.00 0.00 Backend connections failures > 994 . . N struct sess_mem > 11 . . N struct sess > 274047 . . N struct object > 252063 . . N struct objecthead > 609018 . . N struct smf > 28720 . . N small free smf > 2 . . N large free smf > 2 . . N struct vbe_conn > 901 . . N struct bereq > 2000 . . N worker threads > 2000 0.00 0.01 N worker threads created > 143 0.00 0.00 N overflowed work requests > 1 . . N backends > 672670 . . N expired objects > 3514467 . . N LRU moved objects > 49 0.00 0.00 HTTP header overflows > 32124238 206.98 141.34 Objects sent with write > 36494396 224.98 160.57 Total Sessions > 36494484 224.98 160.57 Total Requests > 783 0.00 0.00 Total pipe > 518770 4.00 2.28 Total pass > 1464570 10.00 6.44 Total fetch > 14559014884 93563.69 64058.18 Total header bytes > 168823109304 489874.04 742804.45 Total body bytes > 36494387 224.98 160.57 Session Closed > 203 0.00 0.00 Session herd > 1736767745 10880.80 7641.60 SHM records > 148079555 908.90 651.53 SHM writes > 15088 0.00 0.07 SHM flushes due to overflow > 10494 0.00 0.05 SHM MTX contention > 687 0.00 0.00 SHM cycles through buffer > 2988576 21.00 13.15 allocator requests > 580296 . . outstanding allocations > 8916353024 . . bytes allocated > 44770738176 . . bytes free > 656 0.00 0.00 SMS allocator requests > 303864 . . SMS bytes allocated > 303864 . . SMS bytes freed > 1465172 10.00 6.45 Backend requests made > > > > On Thu, Apr 9, 2009 at 12:18 PM, Artur Bergman wrote: > >> For the file storage or for the shmlog? >> >> When do you start nuking/expiring from disk? I suspect the load goes up >> when you run out of storage space? >> >> Cheers >> Artur >> >> >> On Apr 9, 2009, at 12:02 PM, Cloude Porteus wrote: >> >> Has anyone experienced very high server load averages? We're running >> varnish on a dual core with 8gb of ram. It runs okay for a day or two and >> then I start seeing load averages in 6-10 range for an hour or so, drops >> down to 2-3, then goes back up. >> >> This starts to happen once we have more items in the cache than our >> physical memory. Maybe increasing our lru_interval will help? It's currently >> set to 3600. >> >> Right now we're running with a 50gb file storage option. There are 270k >> objects in the cache, 70gb virtual memory, 6.2gb of res memory used, 11gb of >> data on disk in the file storage. We have a 98% hit ratio. >> >> We followed Artur's advice about setting a tmpfs and creating an ext2 >> partition for our file storage. >> >> I also tried running with malloc as our storage type, but I had to set it >> at a little less than half of our physical ram in order for it to work well >> after the cache got full. I don't understand why the virtual memory is >> double when I am running in malloc mode. I was running it with 5gb and the >> virtual memory was about 10-12gb and once it got full it started using the >> swap memory. >> >> Thanks for any help/insight. >> >> best, >> cloude >> -- >> VP of Product Development >> Instructables.com >> >> http://www.instructables.com/member/lebowski >> _______________________________________________ >> varnish-dev mailing list >> varnish-dev at projects.linpro.no >> http://projects.linpro.no/mailman/listinfo/varnish-dev >> >> >> > > > -- > VP of Product Development > Instructables.com > > http://www.instructables.com/member/lebowski > > > -- VP of Product Development Instructables.com http://www.instructables.com/member/lebowski -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky at crucially.net Thu Apr 9 21:49:14 2009 From: sky at crucially.net (Artur Bergman) Date: Thu, 9 Apr 2009 14:49:14 -0700 Subject: High Server Load Averages? In-Reply-To: <4a05e1020904091446h7f8be7e5l1b33652e1c323767@mail.gmail.com> References: <4a05e1020904091202u1ad23e4fx1abaa4b5b59317e@mail.gmail.com> <871677B9-7E39-4246-BE5E-422E4E593151@crucially.net> <4a05e1020904091227x519affe6r1a9540b5a6dde1b9@mail.gmail.com> <477CEE15-7F81-4E12-8110-55D1534D3C4B@crucially.net> <4a05e1020904091446h7f8be7e5l1b33652e1c323767@mail.gmail.com> Message-ID: <9F79E807-0D08-4489-A66F-7EC834565C75@crucially.net> Your SDC is overloaded is your filesystem mounted noatime? Artur On Apr 9, 2009, at 2:46 PM, Cloude Porteus wrote: > The current load is just above 2. I'll check this again when I see a > load spike. > > [cloude at squid03 ~]$ iostat -k -x 5 > Linux 2.6.18-53.1.19.el5.centos.plus > (squid03.instructables.com) 04/09/2009 > > avg-cpu: %user %nice %system %iowait %steal %idle > 1.19 0.00 0.95 2.14 0.00 95.73 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.07 9.64 0.15 1.65 10.08 45.68 > 61.80 0.13 70.32 3.96 0.72 > sdb 0.07 9.63 0.15 1.66 10.14 45.68 > 61.75 0.02 10.03 3.76 0.68 > sdc 0.03 16.47 1.21 14.69 13.99 128.81 > 17.96 0.08 4.81 4.31 6.85 > sdd 0.03 16.45 1.17 13.24 13.29 119.96 > 18.49 0.24 16.52 4.06 5.86 > md1 0.00 0.00 0.43 11.13 20.19 44.52 > 11.19 0.00 0.00 0.00 0.00 > md2 0.00 0.00 2.41 29.40 26.58 117.61 > 9.07 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 3.15 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.90 0.00 2.40 46.70 0.00 50.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdb 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdc 0.00 0.40 6.00 238.40 74.40 974.40 > 8.58 132.88 515.03 4.09 100.02 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.90 0.00 1.80 67.67 0.00 29.63 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdb 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdc 0.00 1.60 13.40 141.80 188.80 1053.60 > 16.01 138.62 934.04 6.44 100.02 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.50 0.00 1.80 61.40 0.00 36.30 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.40 0.00 2.40 > 12.00 0.00 9.00 9.00 0.36 > sdb 0.00 0.00 0.00 0.40 0.00 2.40 > 12.00 0.00 9.50 9.50 0.38 > sdc 0.00 1.60 6.40 257.00 132.00 2195.20 > 17.67 107.40 450.21 3.68 96.82 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.20 0.00 0.80 > 8.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.60 0.00 1.60 47.80 0.00 50.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.20 0.00 1.60 > 16.00 0.00 11.00 11.00 0.22 > sdb 0.00 0.00 0.00 0.20 0.00 1.60 > 16.00 0.00 13.00 13.00 0.26 > sdc 0.00 0.80 0.20 301.80 8.80 1270.40 > 8.47 119.40 373.98 3.31 100.04 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.60 0.00 1.70 47.80 0.00 49.90 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdb 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdc 0.00 1.20 2.40 245.31 43.11 1538.52 > 12.77 101.41 419.12 4.03 99.80 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.60 0.00 1.50 3.20 0.00 94.69 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdb 0.20 0.00 0.40 0.00 2.40 0.00 > 12.00 0.01 14.00 7.00 0.28 > sdc 0.00 0.00 6.60 11.00 174.40 192.80 > 41.73 1.26 421.34 3.73 6.56 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.70 0.00 1.60 29.50 0.00 68.20 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdb 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > sdc 0.00 0.00 5.60 208.60 110.40 857.60 > 9.04 70.18 301.18 2.90 62.06 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.50 0.00 1.50 48.05 0.00 49.95 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- > sz avgqu-sz await svctm %util > sda 0.00 0.20 0.00 0.80 0.00 5.60 > 14.00 0.01 8.75 8.75 0.70 > sdb 0.00 0.20 0.00 0.80 0.00 5.60 > 14.00 0.01 9.50 9.50 0.76 > sdc 0.00 1.00 6.80 232.40 91.20 1180.80 > 10.64 110.32 475.49 4.18 100.02 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md1 0.00 0.00 0.00 0.60 0.00 2.40 > 8.00 0.00 0.00 0.00 0.00 > md2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > md0 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > > > > On Thu, Apr 9, 2009 at 1:43 PM, Artur Bergman > wrote: > What is your iopressure? > > iostat -k -x 5 > > or something like that > > artur > > On Apr 9, 2009, at 12:27 PM, Cloude Porteus wrote: > >> Varnishstat doesn't list any nuked objects and file storage and >> shmlog look like they have plenty of space: >> >> df -h >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Filesystem Size Used Avail Use% Mounted on >> tmpfs 150M 81M 70M 54% /usr/local/var/varnish >> /dev/sdc1 74G 11G 61G 16% /var/lib/varnish >> >> top >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> top - 12:26:33 up 164 days, 22:21, 1 user, load average: 2.60, >> 3.26, 3.75 >> Tasks: 67 total, 1 running, 66 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.7%us, 0.3%sy, 0.0%ni, 97.0%id, 0.7%wa, 0.3%hi, >> 1.0%si, 0.0%st >> Mem: 8183492k total, 7763100k used, 420392k free, 13424k >> buffers >> Swap: 3148720k total, 56636k used, 3092084k free, 7317692k >> cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 7441 varnish 15 0 70.0g 6.4g 6.1g S 2 82.5 56:33.31 varnishd >> >> >> Varnishstat: >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Hitrate ratio: 8 8 8 >> Hitrate avg: 0.9782 0.9782 0.9782 >> >> 36494404 219.98 160.57 Client connections accepted >> 36494486 220.98 160.57 Client requests received >> 35028477 212.98 154.12 Cache hits >> 474091 4.00 2.09 Cache hits for pass >> 988013 6.00 4.35 Cache misses >> 1465955 10.00 6.45 Backend connections success >> 9 0.00 0.00 Backend connections failures >> 994 . . N struct sess_mem >> 11 . . N struct sess >> 274047 . . N struct object >> 252063 . . N struct objecthead >> 609018 . . N struct smf >> 28720 . . N small free smf >> 2 . . N large free smf >> 2 . . N struct vbe_conn >> 901 . . N struct bereq >> 2000 . . N worker threads >> 2000 0.00 0.01 N worker threads created >> 143 0.00 0.00 N overflowed work requests >> 1 . . N backends >> 672670 . . N expired objects >> 3514467 . . N LRU moved objects >> 49 0.00 0.00 HTTP header overflows >> 32124238 206.98 141.34 Objects sent with write >> 36494396 224.98 160.57 Total Sessions >> 36494484 224.98 160.57 Total Requests >> 783 0.00 0.00 Total pipe >> 518770 4.00 2.28 Total pass >> 1464570 10.00 6.44 Total fetch >> 14559014884 93563.69 64058.18 Total header bytes >> 168823109304 489874.04 742804.45 Total body bytes >> 36494387 224.98 160.57 Session Closed >> 203 0.00 0.00 Session herd >> 1736767745 10880.80 7641.60 SHM records >> 148079555 908.90 651.53 SHM writes >> 15088 0.00 0.07 SHM flushes due to overflow >> 10494 0.00 0.05 SHM MTX contention >> 687 0.00 0.00 SHM cycles through buffer >> 2988576 21.00 13.15 allocator requests >> 580296 . . outstanding allocations >> 8916353024 . . bytes allocated >> 44770738176 . . bytes free >> 656 0.00 0.00 SMS allocator requests >> 303864 . . SMS bytes allocated >> 303864 . . SMS bytes freed >> 1465172 10.00 6.45 Backend requests made >> >> >> >> On Thu, Apr 9, 2009 at 12:18 PM, Artur Bergman >> wrote: >> For the file storage or for the shmlog? >> >> When do you start nuking/expiring from disk? I suspect the load >> goes up when you run out of storage space? >> >> Cheers >> Artur >> >> >> On Apr 9, 2009, at 12:02 PM, Cloude Porteus wrote: >> >>> Has anyone experienced very high server load averages? We're >>> running varnish on a dual core with 8gb of ram. It runs okay for a >>> day or two and then I start seeing load averages in 6-10 range for >>> an hour or so, drops down to 2-3, then goes back up. >>> >>> This starts to happen once we have more items in the cache than >>> our physical memory. Maybe increasing our lru_interval will help? >>> It's currently set to 3600. >>> >>> Right now we're running with a 50gb file storage option. There are >>> 270k objects in the cache, 70gb virtual memory, 6.2gb of res >>> memory used, 11gb of data on disk in the file storage. We have a >>> 98% hit ratio. >>> >>> We followed Artur's advice about setting a tmpfs and creating an >>> ext2 partition for our file storage. >>> >>> I also tried running with malloc as our storage type, but I had to >>> set it at a little less than half of our physical ram in order for >>> it to work well after the cache got full. I don't understand why >>> the virtual memory is double when I am running in malloc mode. I >>> was running it with 5gb and the virtual memory was about 10-12gb >>> and once it got full it started using the swap memory. >>> >>> Thanks for any help/insight. >>> >>> best, >>> cloude >>> -- >>> VP of Product Development >>> Instructables.com >>> >>> http://www.instructables.com/member/lebowski >>> _______________________________________________ >>> varnish-dev mailing list >>> varnish-dev at projects.linpro.no >>> http://projects.linpro.no/mailman/listinfo/varnish-dev >> >> >> >> >> -- >> VP of Product Development >> Instructables.com >> >> http://www.instructables.com/member/lebowski > > > > > -- > VP of Product Development > Instructables.com > > http://www.instructables.com/member/lebowski -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.millar at physics.gla.ac.uk Fri Apr 10 21:47:18 2009 From: p.millar at physics.gla.ac.uk (Paul Millar) Date: Fri, 10 Apr 2009 23:47:18 +0200 Subject: Logging and Statistics In-Reply-To: References: Message-ID: <200904102347.19049.p.millar@physics.gla.ac.uk> Hi Raghavan, On Thursday 09 April 2009 17:33:15 vijayaraghavan.subramaniam at wipro.com wrote: > I'm using varnish server. I want to write varnishncsa log information into > database, what is best practices & Is there any varnish API available to > write into database? At the risk of promoting my own project, have you looked at MonAMI? http://monami.sourceforge.net There's a monitoring plugin for taking log information from varnish and a reporting plugin for logging data into MySQL. The tutorial culminates with logging monitoring data into a database (albeit from a different monitoring plugin) and the userguide gives a detailed information on how to configure MonAMI. HTH, Paul. From tical.net at gmail.com Fri Apr 10 22:12:48 2009 From: tical.net at gmail.com (Ray Barnes) Date: Fri, 10 Apr 2009 18:12:48 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking Message-ID: Hi all. Note that everything herein is based only on a very lay knowledge of varnish, without being familiar with the internals of the code. In my quest to eek more performance out of Varnish, I've been testing under 2.0.4. I have not seen much improvement over 2.0.3 in the way it acts after receiving a bunch of hits all at one time. I am invoking varnish like this: ulimit -n 131072 ulimit -l 82000 /usr/local/sbin/varnishd -a 98.124.141.3:80 -b 67.212.179.98:80 -T 98.124.141.3:6083 \ -t 60 -w1440,3000,60 -u apache -g apache -p obj_workspace=16000 -p sess_workspace=262144 -p listen_depth=4096 \ -p shm_workspace=64000 -p thread_pools=8 -p thread_pool_min=180 -p ping_interval=1 -p srcaddr_ttl=0 -s malloc,80M As best I can tell, the problem I'm seeing is that it will not create the number of worker threads that I'm telling it to, as evidenced by the 'status' output within the CLI immediately after launch: 270 N worker threads 285 N worker threads created So if I launch 'ab' with 700 connections against varnish, it will not work right from the beginning, like so: [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 98.124.141.3 (be patient) apr_socket_recv: Connection refused (111) [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 98.124.141.3 (be patient) apr_poll: The timeout specified has expired (70007) Total of 147 requests completed [root at mia ~]# telnet 98.124.141.3 80 Trying 98.124.141.3... Connected to 98.124.141.3 (98.124.141.3). Escape character is '^]'. GET / HTTP/1.0 ^] telnet> quit Connection closed. The above telnet command simply hung, presumably because there are still 700 sessions in CLOSE_WAIT state within the kernel, although that should not matter if varnish opened the number of worker threads it was supposed to. Based on what I've seen, it would seem that varnish has some problem when you launch it with "too many" initial worker threads (although I'm having a hard time understanding why 1400ish is too many). It seems to go crazy if you specify too many threads initially. Again, that number should not be a problem for the machine in theory, as it's a multicore Xeon. Platform is Linux 2.6 RHEL. Any idea what's happening here? -Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From jna at twitter.com Fri Apr 10 22:35:14 2009 From: jna at twitter.com (John Adams) Date: Fri, 10 Apr 2009 15:35:14 -0700 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: Message-ID: It takes time to spawn threads. If you start the server with hundreds of threads, they won't be ready for ~30-90 seconds. Maybe that's causing this issue? -j On Apr 10, 2009, at 3:12 PM, Ray Barnes wrote: > Hi all. Note that everything herein is based only on a very lay > knowledge of varnish, without being familiar with the internals of > the code. > > In my quest to eek more performance out of Varnish, I've been > testing under 2.0.4. I have not seen much improvement over 2.0.3 in > the way it acts after receiving a bunch of hits all at one time. I > am invoking varnish like this: > > ulimit -n 131072 > ulimit -l 82000 > /usr/local/sbin/varnishd -a 98.124.141.3:80 -b 67.212.179.98:80 -T > 98.124.141.3:6083 \ > -t 60 -w1440,3000,60 -u apache -g apache -p > obj_workspace=16000 -p sess_workspace=262144 -p listen_depth=4096 \ > -p shm_workspace=64000 -p thread_pools=8 -p > thread_pool_min=180 -p ping_interval=1 -p srcaddr_ttl=0 -s malloc,80M > As best I can tell, the problem I'm seeing is that it will not > create the number of worker threads that I'm telling it to, as > evidenced by the 'status' output within the CLI immediately after > launch: > > 270 N worker threads > 285 N worker threads created > So if I launch 'ab' with 700 connections against varnish, it will > not work right from the beginning, like so: > > [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> > apache-2.0 > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > Benchmarking 98.124.141.3 (be patient) > apr_socket_recv: Connection refused (111) > [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> > apache-2.0 > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > Benchmarking 98.124.141.3 (be patient) > apr_poll: The timeout specified has expired (70007) > Total of 147 requests completed > [root at mia ~]# telnet 98.124.141.3 80 > Trying 98.124.141.3... > Connected to 98.124.141.3 (98.124.141.3). > Escape character is '^]'. > GET / HTTP/1.0 > ^] > telnet> quit > Connection closed. > The above telnet command simply hung, presumably because there are > still 700 sessions in CLOSE_WAIT state within the kernel, although > that should not matter if varnish opened the number of worker > threads it was supposed to. Based on what I've seen, it would seem > that varnish has some problem when you launch it with "too many" > initial worker threads (although I'm having a hard time > understanding why 1400ish is too many). It seems to go crazy if you > specify too many threads initially. Again, that number should not > be a problem for the machine in theory, as it's a multicore Xeon. > Platform is Linux 2.6 RHEL. Any idea what's happening here? > > -Ray > > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev --- John Adams Twitter Operations jna at twitter.com http://twitter.com/netik -------------- next part -------------- An HTML attachment was scrubbed... URL: From tical.net at gmail.com Fri Apr 10 22:58:01 2009 From: tical.net at gmail.com (Ray Barnes) Date: Fri, 10 Apr 2009 18:58:01 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: Message-ID: John, Thanks for the reply; as you can see my config is largely based on the one you posted to this list in February (thanks!). I went back as you suggested and waited 90 seconds, while starting it the same way. Before running any tests, I went into the CLI and viewed stats on the threads: 364 N worker threads 364 N worker threads created 782 N worker threads not created When this happens (started threads do not match the number specified), varnish does really unpredictable things, i.e. it won't take 300 connections from 'ab' and times out with the following message: Benchmarking 98.124.141.3 (be patient) apr_poll: The timeout specified has expired (70007) Total of 52 requests completed I think the crux of my problem is figuring out why it won't start more threads. Being not-so-familiar with the internals of varnish, I can't tell whether that's an OS problem or a varnish problem. Hope that helps. -Ray On Fri, Apr 10, 2009 at 6:35 PM, John Adams wrote: > It takes time to spawn threads. If you start the server with hundreds of > threads, they won't be ready for ~30-90 seconds. > Maybe that's causing this issue? > > -j > > On Apr 10, 2009, at 3:12 PM, Ray Barnes wrote: > > Hi all. Note that everything herein is based only on a very lay > knowledge of varnish, without being familiar with the internals of the code. > > In my quest to eek more performance out of Varnish, I've been testing under > 2.0.4. I have not seen much improvement over 2.0.3 in the way it acts after > receiving a bunch of hits all at one time. I am invoking varnish like this: > > ulimit -n 131072 > ulimit -l 82000 > /usr/local/sbin/varnishd -a 98.124.141.3:80 -b > 67.212.179.98:80 -T 98.124.141.3:6083 \ > -t 60 -w1440,3000,60 -u apache -g apache -p obj_workspace=16000 -p > sess_workspace=262144 -p listen_depth=4096 \ > -p shm_workspace=64000 -p thread_pools=8 -p thread_pool_min=180 -p > ping_interval=1 -p srcaddr_ttl=0 -s malloc,80M > As best I can tell, the problem I'm seeing is that it will not create the > number of worker threads that I'm telling it to, as evidenced by the > 'status' output within the CLI immediately after launch: > > 270 N worker threads > 285 N worker threads created > So if I launch 'ab' with 700 connections against varnish, it will not work > right from the beginning, like so: > > [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > Benchmarking 98.124.141.3 (be patient) > apr_socket_recv: Connection refused (111) > [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > Benchmarking 98.124.141.3 (be patient) > apr_poll: The timeout specified has expired (70007) > Total of 147 requests completed > [root at mia ~]# telnet 98.124.141.3 80 > Trying 98.124.141.3... > Connected to 98.124.141.3 (98.124.141.3). > Escape character is '^]'. > GET / HTTP/1.0 > ^] > telnet> quit > Connection closed. > The above telnet command simply hung, presumably because there are still > 700 sessions in CLOSE_WAIT state within the kernel, although that should not > matter if varnish opened the number of worker threads it was supposed to. > Based on what I've seen, it would seem that varnish has some problem when > you launch it with "too many" initial worker threads (although I'm having a > hard time understanding why 1400ish is too many). It seems to go crazy if > you specify too many threads initially. Again, that number should not be a > problem for the machine in theory, as it's a multicore Xeon. Platform is > Linux 2.6 RHEL. Any idea what's happening here? > > -Ray > > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev > > > --- > John Adams > Twitter Operations > jna at twitter.com > http://twitter.com/netik > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jna at twitter.com Fri Apr 10 23:30:27 2009 From: jna at twitter.com (John Adams) Date: Fri, 10 Apr 2009 16:30:27 -0700 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: Message-ID: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Something's very wrong here - we've never experienced this before. Are you stating the server as root or as another user? Any ulimit or restrictions on # of file descriptors? -j On Apr 10, 2009, at 3:58 PM, Ray Barnes wrote: > John, > > Thanks for the reply; as you can see my config is largely based on > the one you posted to this list in February (thanks!). > > I went back as you suggested and waited 90 seconds, while starting > it the same way. Before running any tests, I went into the CLI and > viewed stats on the threads: > > 364 N worker threads > 364 N worker threads created > 782 N worker threads not created > > When this happens (started threads do not match the number > specified), varnish does really unpredictable things, i.e. it won't > take 300 connections from 'ab' and times out with the following > message: > > Benchmarking 98.124.141.3 (be patient) > apr_poll: The timeout specified has expired (70007) > Total of 52 requests completed > > I think the crux of my problem is figuring out why it won't start > more threads. Being not-so-familiar with the internals of varnish, > I can't tell whether that's an OS problem or a varnish problem. > Hope that helps. > > -Ray > > > > On Fri, Apr 10, 2009 at 6:35 PM, John Adams wrote: > It takes time to spawn threads. If you start the server with > hundreds of threads, they won't be ready for ~30-90 seconds. > > Maybe that's causing this issue? > > -j > > On Apr 10, 2009, at 3:12 PM, Ray Barnes wrote: > >> Hi all. Note that everything herein is based only on a very lay >> knowledge of varnish, without being familiar with the internals of >> the code. >> >> In my quest to eek more performance out of Varnish, I've been >> testing under 2.0.4. I have not seen much improvement over 2.0.3 >> in the way it acts after receiving a bunch of hits all at one >> time. I am invoking varnish like this: >> >> ulimit -n 131072 >> ulimit -l 82000 >> /usr/local/sbin/varnishd -a 98.124.141.3:80 -b 67.212.179.98:80 -T >> 98.124.141.3:6083 \ >> -t 60 -w1440,3000,60 -u apache -g apache -p >> obj_workspace=16000 -p sess_workspace=262144 -p listen_depth=4096 \ >> -p shm_workspace=64000 -p thread_pools=8 -p >> thread_pool_min=180 -p ping_interval=1 -p srcaddr_ttl=0 -s malloc,80M >> As best I can tell, the problem I'm seeing is that it will not >> create the number of worker threads that I'm telling it to, as >> evidenced by the 'status' output within the CLI immediately after >> launch: >> >> 270 N worker threads >> 285 N worker threads created >> So if I launch 'ab' with 700 connections against varnish, it will >> not work right from the beginning, like so: >> >> [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ >> This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> >> apache-2.0 >> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ >> Copyright 2006 The Apache Software Foundation, http://www.apache.org/ >> Benchmarking 98.124.141.3 (be patient) >> apr_socket_recv: Connection refused (111) >> [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/ >> This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> >> apache-2.0 >> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ >> Copyright 2006 The Apache Software Foundation, http://www.apache.org/ >> Benchmarking 98.124.141.3 (be patient) >> apr_poll: The timeout specified has expired (70007) >> Total of 147 requests completed >> [root at mia ~]# telnet 98.124.141.3 80 >> Trying 98.124.141.3... >> Connected to 98.124.141.3 (98.124.141.3). >> Escape character is '^]'. >> GET / HTTP/1.0 >> ^] >> telnet> quit >> Connection closed. >> The above telnet command simply hung, presumably because there are >> still 700 sessions in CLOSE_WAIT state within the kernel, although >> that should not matter if varnish opened the number of worker >> threads it was supposed to. Based on what I've seen, it would seem >> that varnish has some problem when you launch it with "too many" >> initial worker threads (although I'm having a hard time >> understanding why 1400ish is too many). It seems to go crazy if >> you specify too many threads initially. Again, that number should >> not be a problem for the machine in theory, as it's a multicore >> Xeon. Platform is Linux 2.6 RHEL. Any idea what's happening here? >> >> -Ray >> >> _______________________________________________ >> varnish-dev mailing list >> varnish-dev at projects.linpro.no >> http://projects.linpro.no/mailman/listinfo/varnish-dev > > --- > John Adams > Twitter Operations > jna at twitter.com > http://twitter.com/netik > > > > > --- John Adams Twitter Operations jna at twitter.com http://twitter.com/netik -------------- next part -------------- An HTML attachment was scrubbed... URL: From tical.net at gmail.com Sat Apr 11 19:46:21 2009 From: tical.net at gmail.com (Ray Barnes) Date: Sat, 11 Apr 2009 15:46:21 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: Thanks for the reply, answers inline: On Fri, Apr 10, 2009 at 7:30 PM, John Adams wrote: > Something's very wrong here - we've never experienced this before. > Are you stating the server as root or as another user? > As root. > Any ulimit or restrictions on # of file descriptors? > I'm manually setting ulimit in the script that starts varnishd, like so: ulimit -n 131072 ulimit -l 82000 Other than that, and having manually set /proc/sys/fs/file-max to 65535, all other settings are default according to RHEL 5 and Linux 2.6.18 with Xen patches and backports maintained by xen.org (the aforementioned results were all obtained by running varnish under domain 0). I tried this on another box which is RHEL 4.6 with 2.6.9-55.0.2.ELsmp (so no Xen in this case), and /proc/sys/fs/file-max set to 49984 (using the same script to launch varnish as aforementioned), the result was relatively the same: 290 N worker threads 290 N worker threads created 8705 N worker threads not created 409188 N worker threads limited HTH. -Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky at crucially.net Sat Apr 11 21:48:04 2009 From: sky at crucially.net (Artur Bergman) Date: Sat, 11 Apr 2009 14:48:04 -0700 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: I've never seen it do worker threads not created. Are there any limits on number of threads? Can you get rid of - w1440,3000,60 and rely on the -p settings instead? Artur On Apr 11, 2009, at 12:46 PM, Ray Barnes wrote: > Thanks for the reply, answers inline: > > On Fri, Apr 10, 2009 at 7:30 PM, John Adams wrote: > Something's very wrong here - we've never experienced this before. > > Are you stating the server as root or as another user? > > As root. > > Any ulimit or restrictions on # of file descriptors? > > I'm manually setting ulimit in the script that starts varnishd, like > so: > > ulimit -n 131072 > ulimit -l 82000 > Other than that, and having manually set /proc/sys/fs/file-max to > 65535, all other settings are default according to RHEL 5 and Linux > 2.6.18 with Xen patches and backports maintained by xen.org (the > aforementioned results were all obtained by running varnish under > domain 0). I tried this on another box which is RHEL 4.6 with > 2.6.9-55.0.2.ELsmp (so no Xen in this case), and /proc/sys/fs/file- > max set to 49984 (using the same script to launch varnish as > aforementioned), the result was relatively the same: > > 290 N worker threads > 290 N worker threads created > 8705 N worker threads not created > 409188 N worker threads limited > HTH. > > -Ray > > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From tical.net at gmail.com Mon Apr 13 05:57:59 2009 From: tical.net at gmail.com (Ray Barnes) Date: Mon, 13 Apr 2009 01:57:59 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: On Sat, Apr 11, 2009 at 5:48 PM, Artur Bergman wrote: > I've never seen it do worker threads not created. > Are there any limits on number of threads? > Apparently there are; thanks for pointing me in the right direction. I found a C program that attempts to spawn threads and lets you know at what point it hits an error - http://people.redhat.com/alikins/tuning_utils/thread-limit.c - it reports that I can't open more than 383 threads. The question is why. Here's what I've done thus far: 1) Recompiled glibc per http://people.redhat.com/alikins/system_tuning.html#threads - the definition of PTHREAD_THREADS_MAX is tied to the value in /usr/include/linux/limits.h so I adjusted that value, installed the source RPM, rebuilt all glibc RPMs and installed using 'rpm -Uvh --force' to overcome pre/post installation errors within the RPM (hopefully that did what it was supposed to). 2) Set /proc/sys/kernel/threads-max to 65535 (was 3000ish before), no change 3) Set /etc/security/limits.conf to "* soft nofile 1024" and "* hard nofile 10240" and added "session required /lib/security/pam_limits.so" to /etc/pam.d/login with no change, per the advice at http://www.mail-archive.com/java-linux at java.blackdown.org/msg15247.html where the poster indicates he did not have to recompile glibc to do this I've tried the same C program above on a few other Linux boxes and they all seem to be somewhere between 200 and 383 allowed threads. The first obvious solution would be to dump Linux and use FBSD - a direction i'll look into in the future. But for now we're stuck on Linux. Any ideas? -Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.umann at terra.com.br Mon Apr 13 17:53:11 2009 From: rafael.umann at terra.com.br (Rafael Umann) Date: Mon, 13 Apr 2009 14:53:11 -0300 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: Take a look at you FDs: (linux) # cat /proc/sys/fs/file-nr 11730 0 5049800 Varnish works with a limit of 65k file descriptors. Anything above that will be a problem. http://varnish.projects.linpro.no/changeset/3631 If you are getting 65k FD`S we hit the same problem. Another tip: if you are running on a 32bits system, thats your problem! []s, On Apr 13, 2009, at 2:57 AM, Ray Barnes wrote: > On Sat, Apr 11, 2009 at 5:48 PM, Artur Bergman > wrote: > I've never seen it do worker threads not created. > > Are there any limits on number of threads? > > Apparently there are; thanks for pointing me in the right > direction. I found a C program that attempts to spawn threads and > lets you know at what point it hits an error - http://people.redhat.com/alikins/tuning_utils/thread-limit.c > - it reports that I can't open more than 383 threads. The question > is why. Here's what I've done thus far: > > 1) Recompiled glibc per http://people.redhat.com/alikins/system_tuning.html#threads > - the definition of PTHREAD_THREADS_MAX is tied to the value in / > usr/include/linux/limits.h so I adjusted that value, installed the > source RPM, rebuilt all glibc RPMs and installed using 'rpm -Uvh -- > force' to overcome pre/post installation errors within the RPM > (hopefully that did what it was supposed to). > > 2) Set /proc/sys/kernel/threads-max to 65535 (was 3000ish before), > no change > > 3) Set /etc/security/limits.conf to "* soft nofile 1024" and "* hard > nofile 10240" and added "session required /lib/security/ > pam_limits.so" to /etc/pam.d/login with no change, per the advice at http://www.mail-archive.com/java-linux at java.blackdown.org/msg15247.html > where the poster indicates he did not have to recompile glibc to do > this > > I've tried the same C program above on a few other Linux boxes and > they all seem to be somewhere between 200 and 383 allowed threads. > The first obvious solution would be to dump Linux and use FBSD - a > direction i'll look into in the future. But for now we're stuck on > Linux. Any ideas? > > -Ray > > > > > > E-mail verificado pelo Terra Anti-Spam. > Para classificar esta mensagem como spam ou n?o spam, clique aqui. > Verifique periodicamente a pasta Spam para garantir que apenas > mensagens > indesejadas sejam classificadas como Spam. > > Esta mensagem foi verificada pelo E-mail Protegido Terra. > Atualizado em 12/04/2009 > > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From tical.net at gmail.com Mon Apr 13 18:52:45 2009 From: tical.net at gmail.com (Ray Barnes) Date: Mon, 13 Apr 2009 14:52:45 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: On Mon, Apr 13, 2009 at 1:53 PM, Rafael Umann wrote: > Take a look at you FDs: (linux) > # cat /proc/sys/fs/file-nr > 11730 0 5049800 > > Varnish works with a limit of 65k file descriptors. Anything above that > will be a problem. > > http://varnish.projects.linpro.no/changeset/3631 > > If you are getting 65k FD`S we hit the same problem. > Thanks for the reply. I'm barely reaching 1300: [root at vpsbox-mia ~]# cat /proc/sys/fs/file-nr 1344 0 106235 [root at vpsbox-mia ~]# > > Another tip: if you are running on a 32bits system, thats your problem! > How does 32 bit architecture restrict me from creating > 380 threads per process? -Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.umann at terra.com.br Mon Apr 13 19:16:15 2009 From: rafael.umann at terra.com.br (Rafael Umann) Date: Mon, 13 Apr 2009 16:16:15 -0300 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: 32 bits restrict you to use more than ~2.5gb of ram. Try decreasing your cache size to see if you can open more threads (allocate memory for threads instead of using it all for cache) and also set the stack size smaller: # vi /etc/security/limits.conf * soft stack 512 * hard stack 512 (use 512kb of mem per thread) or * soft stack 1024 * hard stack 1024 (use 1mb of mem per thread) []s, Rafael Umann On Apr 13, 2009, at 3:52 PM, Ray Barnes wrote: > On Mon, Apr 13, 2009 at 1:53 PM, Rafael Umann > wrote: > Take a look at you FDs: > (linux) > # cat /proc/sys/fs/file-nr > 11730 0 5049800 > > Varnish works with a limit of 65k file descriptors. Anything above > that will be a problem. > > http://varnish.projects.linpro.no/changeset/3631 > > If you are getting 65k FD`S we hit the same problem. > > Thanks for the reply. I'm barely reaching 1300: > > [root at vpsbox-mia ~]# cat /proc/sys/fs/file-nr > 1344 0 106235 > [root at vpsbox-mia ~]# > > > > Another tip: if you are running on a 32bits system, thats your > problem! > > How does 32 bit architecture restrict me from creating > 380 threads > per process? > > -Ray > > > E-mail verificado pelo Terra Anti-Spam. > Para classificar esta mensagem como spam ou n?o spam, clique aqui. > Verifique periodicamente a pasta Spam para garantir que apenas > mensagens > indesejadas sejam classificadas como Spam. > > Esta mensagem foi verificada pelo E-mail Protegido Terra. > Atualizado em 12/04/2009 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tical.net at gmail.com Mon Apr 13 19:32:41 2009 From: tical.net at gmail.com (Ray Barnes) Date: Mon, 13 Apr 2009 15:32:41 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: On Mon, Apr 13, 2009 at 3:16 PM, Rafael Umann wrote: > 32 bits restrict you to use more than ~2.5gb of ram. > I'm sure you know about PAE kernels, so I'm assuming there is some other artificial limit at 2.5GB, like SHM space maybe? > > Try decreasing your cache size to see if you can open more threads > (allocate memory for threads instead of using it all for cache) and also set > the stack size smaller: > > # vi /etc/security/limits.conf > > * soft stack 512 > * hard stack 512 > > (use 512kb of mem per thread) > or > > * soft stack 1024 > * hard stack 1024 > > (use 1mb of mem per thread) > I tried 512kb, then logging off and back on, then starting varnish with 800 threads being tried. Same result: 356 N worker threads 356 N worker threads created 181 N worker threads not created Note that 356 + 181 is not 800. It actually did not do this initially, it said 201 worker threads and 840 created (it always does strange things like this when I try creating more threads than the box can handle). And the program that spawns threads, still tells me 383 is the max it can make. -Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.umann at terra.com.br Tue Apr 14 18:19:29 2009 From: rafael.umann at terra.com.br (Rafael Umann) Date: Tue, 14 Apr 2009 15:19:29 -0300 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: <8DF33129-E6ED-4C0F-8CBB-13809421F811@terra.com.br> What about the cache size? have you decresead it? Try running varnish with: # varnishd -f /etc/varnish/default.vcl \ -a 0.0.0.0:80 \ -s file,/var/lib/varnish/varnish_storage.bin,50M \ -T 0.0.0.0:6082 \ -u varnish \ -g varnish \ -w 500,500,120 \ -p lru_interval=900 \ -p thread_pools=1 \ -P /var/run/varnish/varnish.pid \ -F" []s, On Apr 13, 2009, at 4:32 PM, Ray Barnes wrote: > On Mon, Apr 13, 2009 at 3:16 PM, Rafael Umann > wrote: > 32 bits restrict you to use more than ~2.5gb of ram. > > > I'm sure you know about PAE kernels, so I'm assuming there is some > other artificial limit at 2.5GB, like SHM space maybe? > > > Try decreasing your cache size to see if you can open more threads > (allocate memory for threads instead of using it all for cache) and > also set the stack size smaller: > > # vi /etc/security/limits.conf > > * soft stack 512 > * hard stack 512 > > (use 512kb of mem per thread) > or > > * soft stack 1024 > * hard stack 1024 > > (use 1mb of mem per thread) > > I tried 512kb, then logging off and back on, then starting varnish > with 800 threads being tried. Same result: > > 356 N worker threads > 356 N worker threads created > 181 N worker threads not created > Note that 356 + 181 is not 800. It actually did not do this > initially, it said 201 worker threads and 840 created (it always > does strange things like this when I try creating more threads than > the box can handle). And the program that spawns threads, still > tells me 383 is the max it can make. > > -Ray > > > > E-mail verificado pelo Terra Anti-Spam. > Para classificar esta mensagem como spam ou n?o spam, clique aqui. > Verifique periodicamente a pasta Spam para garantir que apenas > mensagens > indesejadas sejam classificadas como Spam. > > Esta mensagem foi verificada pelo E-mail Protegido Terra. > Atualizado em 12/04/2009 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tical.net at gmail.com Tue Apr 14 18:31:31 2009 From: tical.net at gmail.com (Ray Barnes) Date: Tue, 14 Apr 2009 14:31:31 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: <8DF33129-E6ED-4C0F-8CBB-13809421F811@terra.com.br> References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> <8DF33129-E6ED-4C0F-8CBB-13809421F811@terra.com.br> Message-ID: Thanks for the reply. With those settings, same result: 192 N worker threads 284 N worker threads created 21 N worker threads not created Again, the issue is apparently that the _operating system_ does not let me create more than 300ish threads. -Ray On Tue, Apr 14, 2009 at 2:19 PM, Rafael Umann wrote: > > What about the cache size? have you decresead it? > > Try running varnish with: > > # varnishd -f /etc/varnish/default.vcl \ > -a 0.0.0.0:80 \ > -s file,/var/lib/varnish/varnish_storage.bin,50M \ > -T 0.0.0.0:6082 \ > -u varnish \ > -g varnish \ > -w 500,500,120 \ > -p lru_interval=900 \ > -p thread_pools=1 \ > -P /var/run/varnish/varnish.pid \ > -F" > > []s, > > > > On Apr 13, 2009, at 4:32 PM, Ray Barnes wrote: > > On Mon, Apr 13, 2009 at 3:16 PM, Rafael Umann > wrote: > >> 32 bits restrict you to use more than ~2.5gb of ram. >> > > > I'm sure you know about PAE kernels, so I'm assuming there is some other > artificial limit at 2.5GB, like SHM space maybe? > > >> >> Try decreasing your cache size to see if you can open more threads >> (allocate memory for threads instead of using it all for cache) and also set >> the stack size smaller: >> >> # vi /etc/security/limits.conf >> >> * soft stack 512 >> * hard stack 512 >> >> (use 512kb of mem per thread) >> or >> >> * soft stack 1024 >> * hard stack 1024 >> >> (use 1mb of mem per thread) >> > > I tried 512kb, then logging off and back on, then starting varnish with 800 > threads being tried. Same result: > > 356 N worker threads > 356 N worker threads created > 181 N worker threads not created > Note that 356 + 181 is not 800. It actually did not do this initially, it > said 201 worker threads and 840 created (it always does strange things like > this when I try creating more threads than the box can handle). And the > program that spawns threads, still tells me 383 is the max it can make. > > -Ray > > > > ------------------------------ > E-mail verificado pelo Terra Anti-Spam. > Para classificar esta mensagem como spam ou n?o spam, clique aqui > . > Verifique periodicamente a pasta Spam para garantir que apenas mensagens > indesejadas sejam classificadas como Spam. > ------------------------------ > Esta mensagem foi verificada pelo E-mail Protegido Terra. > Atualizado em 12/04/2009 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tical.net at gmail.com Tue Apr 14 19:10:48 2009 From: tical.net at gmail.com (Ray Barnes) Date: Tue, 14 Apr 2009 15:10:48 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: <20090414185748.GQ87733@iwin.com> References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> <8DF33129-E6ED-4C0F-8CBB-13809421F811@terra.com.br> <20090414185748.GQ87733@iwin.com> Message-ID: Bret, Thanks for the reply - that appears to put us (perhaps) a little closer. I'm assuming for the moment when you said "init script" that you meant the script I use to call up varnish and not the script that boots the box as per /etc/inittab. When I specify "ulimit -s 1024" it did not change the net result of varnish's inability to create threads: 192 N worker threads 424 N worker threads created 62 N worker threads not created However the C program I posted previously in this discussion was able to create 3055 threads. Hope that helps. -Ray On Tue, Apr 14, 2009 at 2:57 PM, Bret A. Barker wrote: > Try a "ulimit -s 1024" in your init script. Definitely sounds like a > problem with thread stack size defaulting to 8192. > > -bret > > On Tue, Apr 14, 2009 at 02:31:31PM -0400, Ray Barnes wrote: > > Thanks for the reply. With those settings, same result: > > > > 192 N worker threads > > 284 N worker threads created > > 21 N worker threads not created > > Again, the issue is apparently that the _operating system_ does not let > me > > create more than 300ish threads. > > > > -Ray > > > > > > > > On Tue, Apr 14, 2009 at 2:19 PM, Rafael Umann >wrote: > > > > > > > > What about the cache size? have you decresead it? > > > > > > Try running varnish with: > > > > > > # varnishd -f /etc/varnish/default.vcl \ > > > -a 0.0.0.0:80 \ > > > -s file,/var/lib/varnish/varnish_storage.bin,50M \ > > > -T 0.0.0.0:6082 \ > > > -u varnish \ > > > -g varnish \ > > > -w 500,500,120 \ > > > -p lru_interval=900 \ > > > -p thread_pools=1 \ > > > -P /var/run/varnish/varnish.pid \ > > > -F" > > > > > > []s, > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bret at iwin.com Tue Apr 14 18:57:49 2009 From: bret at iwin.com (Bret A. Barker) Date: Tue, 14 Apr 2009 14:57:49 -0400 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> <8DF33129-E6ED-4C0F-8CBB-13809421F811@terra.com.br> Message-ID: <20090414185748.GQ87733@iwin.com> Try a "ulimit -s 1024" in your init script. Definitely sounds like a problem with thread stack size defaulting to 8192. -bret On Tue, Apr 14, 2009 at 02:31:31PM -0400, Ray Barnes wrote: > Thanks for the reply. With those settings, same result: > > 192 N worker threads > 284 N worker threads created > 21 N worker threads not created > Again, the issue is apparently that the _operating system_ does not let me > create more than 300ish threads. > > -Ray > > > > On Tue, Apr 14, 2009 at 2:19 PM, Rafael Umann wrote: > > > > > What about the cache size? have you decresead it? > > > > Try running varnish with: > > > > # varnishd -f /etc/varnish/default.vcl \ > > -a 0.0.0.0:80 \ > > -s file,/var/lib/varnish/varnish_storage.bin,50M \ > > -T 0.0.0.0:6082 \ > > -u varnish \ > > -g varnish \ > > -w 500,500,120 \ > > -p lru_interval=900 \ > > -p thread_pools=1 \ > > -P /var/run/varnish/varnish.pid \ > > -F" > > > > []s, > > > > From des at des.no Wed Apr 15 01:28:19 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Wed, 15 Apr 2009 03:28:19 +0200 Subject: Bug? Barage of hits leads to failure creating worker threads / stats tracking In-Reply-To: (Ray Barnes's message of "Mon, 13 Apr 2009 15:32:41 -0400") References: <26961DB5-45E9-4D34-B958-D08A5B070E9B@twitter.com> Message-ID: <86fxgad7t8.fsf@ds4.des.no> Ray Barnes writes: > Rafael Umann writes: > > 32 bits restrict you to use more than ~2.5gb of ram. > I'm sure you know about PAE kernels, so I'm assuming there is some other > artificial limit at 2.5GB, like SHM space maybe? PAE is irrelevant. The address space of each process is still limited to anywhere between 2 to 3 GB depending on the OS. DES -- Dag-Erling Sm?rgrav - des at des.no From emil.isberg at gmail.com Mon Apr 20 12:04:21 2009 From: emil.isberg at gmail.com (Emil Isberg) Date: Mon, 20 Apr 2009 14:04:21 +0200 Subject: Patch adding limited Apache LogFormat support to varnishncsa Message-ID: Hi, As I noted "Specification of custom formats (like apache's % notation ?" in the future planning and needed something similar for my use where I needed a format adaptable to my specific apache configuration I added some limited support for the parts I needed most. Basically I have just added a string that is parsed for each logged row. I didn't find a way to submit it to trac. Please check the attached diff. Best regards Emil -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: varnishncsa-logformat.diff Type: application/octet-stream Size: 7385 bytes Desc: not available URL: From yang at knownsec.com Sat Apr 25 05:00:41 2009 From: yang at knownsec.com (jilong yang) Date: Sat, 25 Apr 2009 13:00:41 +0800 Subject: how can I debug varnishd ? In-Reply-To: References: Message-ID: When I gdb to debug varnishd,after I set break or hbreak ,the varnishd SIGQUIT and then exit. It let I can't debug it . why ? ubuntu at ubuntu:~$ gdb /opt/varnish/sbin/varnishd 7469 GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... Attaching to program: /opt/varnish/sbin/varnishd, process 7469 Error while mapping shared library sections: ./vcl.1P9zoqAU.so: No such file or directory. Reading symbols from /opt/varnish/lib/libvarnish.so.1...done. Loaded symbols for /opt/varnish/lib/libvarnish.so.1 Reading symbols from /lib/tls/i686/cmov/librt.so.1...done. Loaded symbols for /lib/tls/i686/cmov/librt.so.1 Reading symbols from /opt/varnish/lib/libvarnishcompat.so.1...done. Loaded symbols for /opt/varnish/lib/libvarnishcompat.so.1 Reading symbols from /opt/varnish/lib/libvcl.so.1...done. Loaded symbols for /opt/varnish/lib/libvcl.so.1 Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libdl.so.2 Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0xb7d3c6b0 (LWP 7469)] [New Thread 0x2f8fab90 (LWP 7476)] [New Thread 0x300fbb90 (LWP 7475)] [New Thread 0x308fcb90 (LWP 7474)] [New Thread 0xb13ebb90 (LWP 7473)] [New Thread 0xb1becb90 (LWP 7472)] [New Thread 0xb23edb90 (LWP 7471)] [New Thread 0xb2beeb90 (LWP 7470)] Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0 Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done. Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1 Reading symbols from /lib/tls/i686/cmov/libm.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libm.so.6 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_compat.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_compat.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_nis.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_nis.so.2 Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2 Symbol file not found for ./vcl.1P9zoqAU.so 0xb7f24410 in __kernel_vsyscall () (gdb) break printf <----------here I set break Breakpoint 1 at 0xb7d84334 (gdb) c <--------- I continue Continuing. Program received signal SIGQUIT, Quit. <-----------then the sig quit [Switching to Thread 0xb2beeb90 (LWP 7470)] 0xb7f24410 in __kernel_vsyscall () (gdb) c Continuing. Program terminated with signal SIGQUIT, Quit. The program no longer exists. (gdb) c The program is not being run. (gdb) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yang at knownsec.com Sat Apr 25 07:38:36 2009 From: yang at knownsec.com (jilong yang) Date: Sat, 25 Apr 2009 15:38:36 +0800 Subject: how can I debug varnishd ? In-Reply-To: <43685D7B-2493-4501-A974-12E77038AB91@mosso.com> References: <43685D7B-2493-4501-A974-12E77038AB91@mosso.com> Message-ID: thanks very much !! I wan't to check the query string , and I can use vcl to work for it where GET method. I hope to check the POST string too ,how can I do ? I find the post data follow the http header in sp->http0->ws , and the post data in htc->rxbuf->b too . can I do the check POST DATA work on HTC_Rx ? 2009/4/25 Adrian Otto > Jilong Yang, > Set ping_interval to 0 to disable it, or set it to a really high number. > When you stop the child process, the parent process no longer gets ping > responses from it (checks by default every 3 seconds), so it tries to help > by killing off the child and making a new one. It does not know you have it > in a debugger. > > Regards, > > Adrian > > On Apr 24, 2009, at 10:00 PM, jilong yang wrote: > > > > > > When I gdb to debug varnishd,after I set break or hbreak ,the varnishd > SIGQUIT and then exit. > It let I can't debug it . why ? > > > ubuntu at ubuntu:~$ gdb /opt/varnish/sbin/varnishd 7469 > GNU gdb 6.8-debian > Copyright (C) 2008 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "i486-linux-gnu"... > Attaching to program: /opt/varnish/sbin/varnishd, process 7469 > Error while mapping shared library sections: > ./vcl.1P9zoqAU.so: No such file or directory. > Reading symbols from /opt/varnish/lib/libvarnish.so.1...done. > Loaded symbols for /opt/varnish/lib/libvarnish.so.1 > Reading symbols from /lib/tls/i686/cmov/librt.so.1...done. > Loaded symbols for /lib/tls/i686/cmov/librt.so.1 > Reading symbols from /opt/varnish/lib/libvarnishcompat.so.1...done. > Loaded symbols for /opt/varnish/lib/libvarnishcompat.so.1 > Reading symbols from /opt/varnish/lib/libvcl.so.1...done. > Loaded symbols for /opt/varnish/lib/libvcl.so.1 > Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done. > Loaded symbols for /lib/tls/i686/cmov/libdl.so.2 > Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done. > [Thread debugging using libthread_db enabled] > [New Thread 0xb7d3c6b0 (LWP 7469)] > [New Thread 0x2f8fab90 (LWP 7476)] > [New Thread 0x300fbb90 (LWP 7475)] > [New Thread 0x308fcb90 (LWP 7474)] > [New Thread 0xb13ebb90 (LWP 7473)] > [New Thread 0xb1becb90 (LWP 7472)] > [New Thread 0xb23edb90 (LWP 7471)] > [New Thread 0xb2beeb90 (LWP 7470)] > Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0 > Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done. > Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1 > Reading symbols from /lib/tls/i686/cmov/libm.so.6...done. > Loaded symbols for /lib/tls/i686/cmov/libm.so.6 > Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. > Loaded symbols for /lib/tls/i686/cmov/libc.so.6 > Reading symbols from /lib/ld-linux.so.2...done. > Loaded symbols for /lib/ld-linux.so.2 > Reading symbols from /lib/tls/i686/cmov/libnss_compat.so.2...done. > Loaded symbols for /lib/tls/i686/cmov/libnss_compat.so.2 > Reading symbols from /lib/tls/i686/cmov/libnss_nis.so.2...done. > Loaded symbols for /lib/tls/i686/cmov/libnss_nis.so.2 > Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done. > Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2 > Symbol file not found for ./vcl.1P9zoqAU.so > 0xb7f24410 in __kernel_vsyscall () > (gdb) break printf > <----------here I set break > Breakpoint 1 at 0xb7d84334 > (gdb) c > <--------- I continue > Continuing. > > Program received signal SIGQUIT, Quit. <-----------then > the sig quit > [Switching to Thread 0xb2beeb90 (LWP 7470)] > 0xb7f24410 in __kernel_vsyscall () > (gdb) c > Continuing. > > Program terminated with signal SIGQUIT, Quit. > The program no longer exists. > (gdb) c > The program is not being run. > (gdb) > > > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From varnish-dev at projects.linpro.no Mon Apr 27 22:37:29 2009 From: varnish-dev at projects.linpro.no (VIAGRA Inc.) Date: Tue, 28 Apr 2009 06:37:29 +0800 Subject: Pharmacy Online Sale 85% OFF! Message-ID: <20090428143729.2690.qmail@20081214-1330> An HTML attachment was scrubbed... URL: -------------- next part -------------- New from WebMD: Dear varnish-dev at projects.linpro.no!. Sign-up today! You are subscribed as varnish-dev at projects.linpro.no. View and manage your WebMD newsletter preferences. Subscribe to more newsletters. Change/update your email address. WebMD Privacy Policy WebMD Office of Privacy 1175 Peachtree Street, Suite 2400, Atlanta, GA 30361 ? 2009 WebMD, LLC. All rights reserved.