From phk at phk.freebsd.dk Mon Aug 3 20:06:03 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 03 Aug 2009 20:06:03 +0000 Subject: Welcome back from vacation! Message-ID: <74492.1249329963@critter.freebsd.dk> I'm packing up in the beach-house and will be headed back to civilization tomorrow, after which normality will hopefully materialize over a matter of days. Some things you may or may not want to know: First: my inbox needs a vacation, and I will try to do something more sensible than rm -rf to it, but I make no promises. Once it gets above 200-300 emails, I get pretty ruthless with the 'd' button. If you are waiting for my reply, and have not received it by next week, send the question again, dont just send "are you going to reply to my email". Second: My main varnish-priority right now is to get -spersistence done, so we can release the first release with it. It should be workable in -trunk, please help test. Third: A ticket sweep will be done as soon as we find a timeslot, expect feedbacks, questions and if you draw the lucky number, a fix or two. Fourth: Nice People are trying to arrange a "Varnish User Group" meeting, in London in September. Details will be announced when we have them. The intent is simply to get to gather some users and the developers and talk about varnish. If we feel it is a success, we will try to cajole somebody into arranging the next meeting. Fifth: I am pondering a plan for raising some (more) money for varnish development. Right now, I am limted to spending 30% of my time on varnish because of weird tax/work-treaties between the nordic countries. If your company makes a fortune using varnish, be prepard to be delicately mugged for a small contribution. More also on that later. I guess that is it... Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From des at des.no Mon Aug 3 20:39:10 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Mon, 03 Aug 2009 22:39:10 +0200 Subject: Welcome back from vacation! In-Reply-To: <74492.1249329963@critter.freebsd.dk> (Poul-Henning Kamp's message of "Mon, 03 Aug 2009 20:06:03 +0000") References: <74492.1249329963@critter.freebsd.dk> Message-ID: <861vns639t.fsf@ds4.des.no> Poul-Henning Kamp writes: > Fourth: Nice People are trying to arrange a "Varnish User Group" > meeting, in London in September. Hmm, UK, September... Why not Cambridge? DES -- Dag-Erling Sm?rgrav - des at des.no From phk at phk.freebsd.dk Mon Aug 3 20:41:46 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 03 Aug 2009 20:41:46 +0000 Subject: Welcome back from vacation! In-Reply-To: Your message of "Mon, 03 Aug 2009 22:39:10 +0200." <861vns639t.fsf@ds4.des.no> Message-ID: <74722.1249332106@critter.freebsd.dk> In message <861vns639t.fsf at ds4.des.no>, =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= writes: >Poul-Henning Kamp writes: >> Fourth: Nice People are trying to arrange a "Varnish User Group" >> meeting, in London in September. > >Hmm, UK, September... Why not Cambridge? Because London is where the bloke who stuck his hand up thought he could do it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From des at des.no Mon Aug 3 20:44:03 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Mon, 03 Aug 2009 22:44:03 +0200 Subject: Welcome back from vacation! In-Reply-To: <74722.1249332106@critter.freebsd.dk> (Poul-Henning Kamp's message of "Mon, 03 Aug 2009 20:41:46 +0000") References: <74722.1249332106@critter.freebsd.dk> Message-ID: <86ws5k4oh8.fsf@ds4.des.no> "Poul-Henning Kamp" writes: > "Dag-Erling Sm?rgrav" writes: > > Hmm, UK, September... Why not Cambridge? > Because London is where the bloke who stuck his hand up thought he > could do it. How about putting him in touch with rwatson@ and see if something can be arranged? DES -- Dag-Erling Sm?rgrav - des at des.no From rtshilston at gmail.com Tue Aug 4 07:39:58 2009 From: rtshilston at gmail.com (Rob S) Date: Tue, 04 Aug 2009 08:39:58 +0100 Subject: Welcome back from vacation! In-Reply-To: <86ws5k4oh8.fsf@ds4.des.no> References: <74722.1249332106@critter.freebsd.dk> <86ws5k4oh8.fsf@ds4.des.no> Message-ID: <4A77E5CE.9080004@gmail.com> Dag-Erling Sm?rgrav wrote: > "Poul-Henning Kamp" writes: > >> "Dag-Erling Sm?rgrav" writes: >> >>> Hmm, UK, September... Why not Cambridge? >>> >> Because London is where the bloke who stuck his hand up thought he >> could do it. >> > > How about putting him in touch with rwatson@ and see if something can be > arranged? > > DES > Morning all. I'm the brave person who raised my hand. Could anyone who thinks they'd like to assist (no matter how small) with the organisation send me an email over the next day or two to introduce yourself? I think the general flow of the organisation should be as follows: 1) Find people who might like to help 2) Write a quick questionnaire to go to the mailing lists asking people what they want for the conference 3) Review the answers, and get organising. To introduce myself: We selected varnish to provide load balancing and back-end-failure-tolerance on a hosting platform we run for a newspaper's website. Whilst Varnish operates well, there are a number of little problems we run in to which I'm sure everyone else has encountered. A user-group meeting could help exchange experiences and answers between people. My ideas for the first user-group meeting - I think we could discuss any of the following: * General networking and chatting with other users over a drink or two. * The basics: An introduction to varnish * VCL tips and tricks: What cunning things have people done? * What people have achieved using inline C? * Consider spinning off a quick session to discuss re-structuring the wiki to make things easier to find * A "Get problems off your chest" opportunity. Are there things that people have small issues with but which they've not raised tickets for, because they think they're minor? * An opportunity to thank phk in person for all the hard work he's put in.a But, obviously these are my ideas. What do other people want to do or discuss? Finally, I'm away from 29th Aug through 28th September. So, realistically, it'll have to be mid-late October. Rob From tfheen at redpill-linpro.com Tue Aug 4 11:45:28 2009 From: tfheen at redpill-linpro.com (Tollef Fog Heen) Date: Tue, 04 Aug 2009 13:45:28 +0200 Subject: Welcome back from vacation! In-Reply-To: <86ws5k4oh8.fsf@ds4.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?= =?utf-8?Q?rav=22's?= message of "Mon, 03 Aug 2009 22:44:03 +0200") References: <74722.1249332106@critter.freebsd.dk> <86ws5k4oh8.fsf@ds4.des.no> Message-ID: <87bpmviyzr.fsf@qurzaw.linpro.no> ]] Dag-Erling Sm?rgrav | "Poul-Henning Kamp" writes: | > "Dag-Erling Sm?rgrav" writes: | > > Hmm, UK, September... Why not Cambridge? | > Because London is where the bloke who stuck his hand up thought he | > could do it. | | How about putting him in touch with rwatson@ and see if something can be | arranged? We already have a place to host it in London, and getting to London is a bit easier than getting to Cambridge. Maybe next time. :-) -- Tollef Fog Heen Redpill Linpro -- Changing the game! t: +47 21 54 41 73 From des at des.no Tue Aug 4 17:23:55 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue, 04 Aug 2009 19:23:55 +0200 Subject: Welcome back from vacation! In-Reply-To: <87bpmviyzr.fsf@qurzaw.linpro.no> (Tollef Fog Heen's message of "Tue, 04 Aug 2009 13:45:28 +0200") References: <74722.1249332106@critter.freebsd.dk> <86ws5k4oh8.fsf@ds4.des.no> <87bpmviyzr.fsf@qurzaw.linpro.no> Message-ID: <86d47b1oic.fsf@ds4.des.no> Tollef Fog Heen writes: > We already have a place to host it in London, and getting to London is a > bit easier than getting to Cambridge. Is it? Cambridge is 40 minutes by direct train from Stansted. > Maybe next time. :-) You've completely missed the point - there is a BSD conference in Cambridge this september where phk, myself, and several other Varnish users will be present. DES -- Dag-Erling Sm?rgrav - des at des.no From phk at phk.freebsd.dk Tue Aug 4 17:43:29 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 04 Aug 2009 17:43:29 +0000 Subject: Welcome back from vacation! In-Reply-To: Your message of "Tue, 04 Aug 2009 19:23:55 +0200." <86d47b1oic.fsf@ds4.des.no> Message-ID: <2828.1249407809@critter.freebsd.dk> DES, that point is well made, but I still think that it is a better idea to do it separate but time-wise next to the EuroBSDcon. In particular this first time, where things are likely to be quite ad-hoc and random. Right now, it looks like it will be sep 21-22, so anybody at EuroBSDcon should have minimal trouble getting there. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From tfheen at redpill-linpro.com Fri Aug 7 10:08:38 2009 From: tfheen at redpill-linpro.com (Tollef Fog Heen) Date: Fri, 07 Aug 2009 12:08:38 +0200 Subject: Varnish User Group Meeting 2009-09 Message-ID: <87bpms7x7d.fsf@qurzaw.linpro.no> Hi all, On September 21st and 22nd, the first Varnish User Group meeting will be held, in Canonical Ltd's offices in Millbank Tower, London, UK. Please see http://varnish.projects.linpro.no/wiki/200909UserGroupMeeting for more information and for signing up. Please note that we're short on space so only sign up if you'll actually be coming, but please also be quick to sign up. -- Tollef Fog Heen Redpill Linpro -- Changing the game! t: +47 21 54 41 73 From l at lrowe.co.uk Wed Aug 12 14:41:01 2009 From: l at lrowe.co.uk (Laurence Rowe) Date: Wed, 12 Aug 2009 15:41:01 +0100 Subject: How many simultanious users In-Reply-To: References: <20090715101402.GB9071@kjeks.getinternet.no> Message-ID: Remember to up the ulimit -n, with http keep-alive you can quickly run into the linux default of 1024 open files/sockets. Laurence 2009/7/16 Lazy : >>>> assuming that each user loads 40 files in 1 minute we get >>>> 12000*60/40=18 000 users per minute >>>> >>>> Is it possible to get half of that 18k users/per minute in real word >>>> ignoring the amounts of traffic it will generate ? >>> >>> I'd say so, but it depends on how big the data set is. If you can store it >>> in memory, varnish is ridiculously fast. I also wouldn't recommend relying >> i think it will fit in to ram, i will be a single site >> >>> on a single Varnish for more than a few thousand requests per second. If >>> something goes wrong (suddenly getting lots of misses for instance), it >>> will quickly spread. >>> >>> For comparison, I'm looking at a box with roughly 0.4 load serving >>> 2000req/s as we speak, and that's on 2xDual Core Opteron 2212. Going by >>> those numbers, it should theoretically be able to handle almost ten times >>> as many requests if Varnish scaled as a straight line. >>> >>> That'd give you roughly 18000 req/s at peak (give or take a little...) Now >>> you're talking about 8 cores, that should be 36k req/s. That's _not_ >>> unrealistic, from what we've seen in synthetic tests. If each client >>> requires 40 items, that means roughly 900 clients _per second_. Or 54k in a >>> minute. This math is all rough estimates, but the foundation is production >>> sites and real traffic patterns. >>> >>> The problem is that getting your Varnish to deal with 36k req/s is rather >>> difficult, and you quickly run into network issues and similar. And at 36k >>> req/s you can hardly take any amount of backend traffic or delays before it >>> all falls over. > > today i did some ad-hoc tests with 45 byte body, > > when I enable keep-alive I'm getting 39k req/s with 100 concurrent > gets and over 40k with 300 concurrent connections (max cpu load was > under 2 cores for varnish) > > without keep alive i'm stuck with 12k req/s, that might be end of ab's > performance in making new connections or > kernel, i tried the performance tips from the wiki, but id didn't make > a significant difference in this test, > Later i will try to use a benchmark running on another machine. > > 12k req/s is more then enough for me already so I'm happy with that > > -- > Michal Grzedzicki > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > From jeremy at hinegardner.org Wed Aug 12 15:59:53 2009 From: jeremy at hinegardner.org (Jeremy Hinegardner) Date: Wed, 12 Aug 2009 09:59:53 -0600 Subject: abnormally high load? Message-ID: <20090812155953.GU17093@hinegardner.org> Hi all, I'm trying to figure out if this is a normal situation or not. We have a varnish instance in front of 12 tokyo tyrant instances with some inline C in the VCL to determine which backend to talk to. Sporadically during the day the load on the dedicated varnish machine will spike up into the 150-200 range and the response times from varnish drop by an order of magnitude. This is an 8CPU Centos 5.3 machine with 8GB of RAM and the varnish commandline is: varnishd -a :6081 \ -T localhost:6082 \ -f /etc/varnish/default.vcl \ -w 8,512 \ -u varnish -g varnish \ -h classic,1000003 \ -s file,/var/lib/varnish/varnish_storage.bin,4G" I could use any help on attempting to figure out why we get these spikes and ways to mitigate them. Anecdotally, these spikes appear to happen when the cache hit ratio is low. generally we have between 30 and 250 clients hitting the varnish instance. The high load definitely happens when there are large numbers of clients, but it is not consistent. At other times when there are large numbers of clients, the load will be quite low. I've read through many of the past threads on tuning varnish for high load and attempted many of the configurations, yet none have really. The client activity we are dealing with is, on the high end, around 400 client programs hitting varnish with an average rate of say 30 requests / sec each. We have large numbers of backend objects, and the need for caching is a few hours tops. If there are other questions that I should be asking, or other information that anyone would like, just let me know. An I'm okay with being told that I'm using varnish in a way it wasn't meant to be used. enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner jeremy at hinegardner.org From rtshilston at gmail.com Wed Aug 12 16:41:52 2009 From: rtshilston at gmail.com (Rob S) Date: Wed, 12 Aug 2009 17:41:52 +0100 Subject: abnormally high load? In-Reply-To: <20090812155953.GU17093@hinegardner.org> References: <20090812155953.GU17093@hinegardner.org> Message-ID: <4A82F0D0.5030503@gmail.com> Jeremy Hinegardner wrote: > Hi all, > > I'm trying to figure out if this is a normal situation or not. We have a > varnish instance in front of 12 tokyo tyrant instances with some inline C > in the VCL to determine which backend to talk to. > > If you restart varnish during one of these spikes, does it instantly disappear? I've seen this happen (though only spiking to about 12), and this is when Varnish has munched through far more memory than we've allocated it. This problem is one I've been looking into with Ken Brownfield, and touches on http://projects.linpro.no/pipermail/varnish-misc/2009-April/002743.html and http://projects.linpro.no/pipermail/varnish-misc/2009-June/002840.html Do any of these tie up with your experience? Rob > From kristian at redpill-linpro.com Wed Aug 12 17:34:57 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Wed, 12 Aug 2009 19:34:57 +0200 Subject: abnormally high load? In-Reply-To: <20090812155953.GU17093@hinegardner.org> References: <20090812155953.GU17093@hinegardner.org> Message-ID: <20090812173457.GB4896@kjeks.linpro.no> On Wed, Aug 12, 2009 at 09:59:53AM -0600, Jeremy Hinegardner wrote: > I'm trying to figure out if this is a normal situation or not. We have a > varnish instance in front of 12 tokyo tyrant instances with some inline C > in the VCL to determine which backend to talk to. How does your VCL look? > Sporadically during the day the load on the dedicated varnish machine will spike > up into the 150-200 range and the response times from varnish drop by an order > of magnitude. This is an 8CPU Centos 5.3 machine with 8GB of RAM > and the varnish commandline is: > > varnishd -a :6081 \ > -T localhost:6082 \ > -f /etc/varnish/default.vcl \ > -w 8,512 \ > -u varnish -g varnish \ > -h classic,1000003 \ > -s file,/var/lib/varnish/varnish_storage.bin,4G" I can instantly tell you that you have too few threads. Generally, you want to set your thread_pool_min to something reasonably high compared to your normal client load. You can also set thread_pools to 8, as you have 8 CPU cores (though I doubt that's the issue). Starting with something like -w 200,2000. This should give you a reasonable amount of threads. Idle threads are virtually cost-free, while creating new threads is expensive (given the premise that you'll be doing it when you are most bussy). Secondly, you should watch your varnishstat and check your syslog. You are looking for overflowed work requests and dropped worked requests in varnishstat, and you are looking for assert-errors in your syslog. You should also monitor the backend_connection_failures counter, which should always be low. > I could use any help on attempting to figure out why we get these spikes and > ways to mitigate them. Anecdotally, these spikes appear to happen when the > cache hit ratio is low. What sort of cache hit ratio are we talking about? And what sort of request rate (in requests/second)? > I've read through many of the past threads on tuning varnish for high load and > attempted many of the configurations, yet none have really. > > The client activity we are dealing with is, on the high end, around 400 client > programs hitting varnish with an average rate of say 30 requests / sec each. This number sounds a bit high. 400 clients with 30 req/s will lead to 12000 req/s. Is this really what varnishstat is telling you too? In that case, your cache hit rate will be crucial. It could also be useful to see the entire output of varnishstat after Varnish have been running for a while, to get a general idea of how varnish is behaving in your setup. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From jeremy at hinegardner.org Wed Aug 12 18:08:46 2009 From: jeremy at hinegardner.org (Jeremy Hinegardner) Date: Wed, 12 Aug 2009 12:08:46 -0600 Subject: abnormally high load? In-Reply-To: <4A82F0D0.5030503@gmail.com> References: <20090812155953.GU17093@hinegardner.org> <4A82F0D0.5030503@gmail.com> Message-ID: <20090812180846.GV17093@hinegardner.org> On Wed, Aug 12, 2009 at 05:41:52PM +0100, Rob S wrote: > Jeremy Hinegardner wrote: >> Hi all, >> >> I'm trying to figure out if this is a normal situation or not. We have a >> varnish instance in front of 12 tokyo tyrant instances with some inline C >> in the VCL to determine which backend to talk to. >> >> > > If you restart varnish during one of these spikes, does it instantly > disappear? I've seen this happen (though only spiking to about 12), and > this is when Varnish has munched through far more memory than we've > allocated it. This problem is one I've been looking into with Ken > Brownfield, and touches on > http://projects.linpro.no/pipermail/varnish-misc/2009-April/002743.html and > http://projects.linpro.no/pipermail/varnish-misc/2009-June/002840.html > > Do any of these tie up with your experience? Possibly, the correlation I can see with those instances is this section of our VCL sub vcl_recv { ... } else if ( req.request == "PUT" || req.request == "PURGE" ) { purge( "req.url == " req.url ); if ( req.request == "PUT" ) { pass; } else { error 200 "PURGE Success"; } } ... } We do a consistent stream of PUT operations, its probably 10-15% of all our operations. So our ban list would get farily large I'm guessing? I'm not seeing evidence of a memory leak, and the pmap of the process does show 4G in the varnish_storage.bin mapping. I've attached the output of 'varnishstat -1' if that helps. This is after I've diverted some traffic around varnish because of the load. If this purge() is the culprit, then I should make this change? sub vcl_recv { ... } else if ( req.request == "PUT" || req.request == "PURGE" ) { lookup; } ... } sub vcl_hit { if ( req.request == "PUT" || req.request == "PURGE" ) { set obj.ttl = 0s; if ( req.request == "PURGE" ) { error 200 "PURGE Success"; } pass; } } sub vcl_miss { if ( req.request == "PUT" ) { pass; } if ( req.request == "PURGE") { error 404 "Not in cache."; } } enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner jeremy at hinegardner.org -------------- next part -------------- uptime 69385 . Child uptime client_conn 29347846 422.97 Client connections accepted client_req 29342148 422.89 Client requests received cache_hit 15094822 217.55 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 17210495 248.04 Cache misses backend_conn 23663670 341.05 Backend connections success backend_unhealthy 0 0.00 Backend connections not attempted backend_busy 0 0.00 Backend connections too many backend_fail 0 0.00 Backend connections failures backend_reuse 23663248 341.04 Backend connections reuses backend_recycle 23663599 341.05 Backend connections recycles backend_unused 0 0.00 Backend connections unused n_srcaddr 4 . N struct srcaddr n_srcaddr_act 1 . N active struct srcaddr n_sess_mem 537 . N struct sess_mem n_sess 3752 . N struct sess n_object 314256 . N struct object n_objecthead 314769 . N struct objecthead n_smf 647386 . N struct smf n_smf_frag 22063 . N small free smf n_smf_large 4 . N large free smf n_vbe_conn 377 . N struct vbe_conn n_bereq 175 . N struct bereq n_wrk 45 . N worker threads n_wrk_create 2659 0.04 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 397529 5.73 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 121342 1.75 N overflowed work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 12 . N backends n_expired 270565 . N expired objects n_lru_nuked 4241197 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 15005184 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 29343483 422.91 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 29347845 422.97 Total Sessions s_req 29346604 422.95 Total Requests s_pipe 0 0.00 Total pipe s_pass 6482264 93.42 Total pass s_fetch 10818028 155.91 Total fetch s_hdrbytes 5839972531 84167.65 Total header bytes s_bodybytes 26762607898 385711.72 Total body bytes sess_closed 29347707 422.97 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 0 0.00 Session Linger sess_herd 1 0.00 Session herd shm_records 1586983473 22872.14 SHM records shm_writes 170510686 2457.46 SHM writes shm_flushes 27352 0.39 SHM flushes due to overflow shm_cont 1049297 15.12 SHM MTX contention shm_cycles 446 0.01 SHM cycles through buffer sm_nreq 55032175 793.14 allocator requests sm_nobj 625319 . outstanding allocations sm_balloc 3887177728 . bytes allocated sm_bfree 407789568 . bytes free sma_nreq 0 0.00 SMA allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq 3432449 49.47 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 18446744073709543408 . SMS outstanding bytes sms_balloc 1505666175 . SMS bytes allocated sms_bfree 1505674383 . SMS bytes freed backend_req 23663973 341.05 Backend requests made n_vcl 3 0.00 N vcl total n_vcl_avail 3 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 490689 . N total active purges n_purge_add 5083838 73.27 N new purges added n_purge_retire 4593149 66.20 N old purges deleted n_purge_obj_test 15296959 220.46 N objects tested n_purge_re_test 249073265875 3589727.84 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock 0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) From kb+varnish at slide.com Wed Aug 12 19:25:11 2009 From: kb+varnish at slide.com (Ken Brownfield) Date: Wed, 12 Aug 2009 12:25:11 -0700 Subject: abnormally high load? In-Reply-To: <20090812180846.GV17093@hinegardner.org> References: <20090812155953.GU17093@hinegardner.org> <4A82F0D0.5030503@gmail.com> <20090812180846.GV17093@hinegardner.org> Message-ID: My first guess is that you're seeing varnish spawn a lot of threads because your back-end isn't keeping up with the miss rate. My second guess is that these misses are large files that are taking a long time for clients to download, therefore piling up active client connections (and thus worker threads). I'm guessing your load is going high because you're swapping? In top, are your CPUs saturated, or fairly idle? If you're seeing CPU saturation, this is possibly an internal Varnish issue. Your VCL seems sane, but we haven't seen the inline C. It's a fact of life that you may periodically need more back-end or worker threads than you would normally see. If you're on Linux (at least), each of those threads will use 8MB of RAM (the default stack size) which adds up quickly. You can reduce the thread stack size to dramatically decrease how much memory Varnish uses as it scales threads. We run a patch here that adds a startup parameter to change the stack size of backend and worker pthreads, but you could emulate this by reducing stack size before running Varnish: ulimit -s 256 or limit stacksize 256 We run pretty heavy traffic (including inline C) with 256KB stack with no problem. This adjustment should decrease memory usage as thread counts increase, and if you're swapping it might help alleviate the spikes. But if the spikes are due to a slow backend, that's probably where I'd look first. Hope it helps, -- Ken On Aug 12, 2009, at 11:08 AM, Jeremy Hinegardner wrote: > On Wed, Aug 12, 2009 at 05:41:52PM +0100, Rob S wrote: >> Jeremy Hinegardner wrote: >>> Hi all, >>> >>> I'm trying to figure out if this is a normal situation or not. We >>> have a >>> varnish instance in front of 12 tokyo tyrant instances with some >>> inline C >>> in the VCL to determine which backend to talk to. >>> >>> >> >> If you restart varnish during one of these spikes, does it instantly >> disappear? I've seen this happen (though only spiking to about >> 12), and >> this is when Varnish has munched through far more memory than we've >> allocated it. This problem is one I've been looking into with Ken >> Brownfield, and touches on >> http://projects.linpro.no/pipermail/varnish-misc/2009-April/002743.html >> and >> http://projects.linpro.no/pipermail/varnish-misc/2009-June/ >> 002840.html >> >> Do any of these tie up with your experience? > > Possibly, the correlation I can see with those instances is this > section of our > VCL > > sub vcl_recv { > ... > } else if ( req.request == "PUT" || req.request == "PURGE" ) { > purge( "req.url == " req.url ); > if ( req.request == "PUT" ) { > pass; > } else { > error 200 "PURGE Success"; > } > } > ... > } > > We do a consistent stream of PUT operations, its probably 10-15% of > all > our operations. So our ban list would get farily large I'm guessing? > > I'm not seeing evidence of a memory leak, and the pmap of the > process does show > 4G in the varnish_storage.bin mapping. > > I've attached the output of 'varnishstat -1' if that helps. This is > after I've > diverted some traffic around varnish because of the load. > > If this purge() is the culprit, then I should make this change? > > sub vcl_recv { > ... > } else if ( req.request == "PUT" || req.request == "PURGE" ) { > lookup; > } > ... > } > > sub vcl_hit { > if ( req.request == "PUT" || req.request == "PURGE" ) { > set obj.ttl = 0s; > if ( req.request == "PURGE" ) { > error 200 "PURGE Success"; > } > pass; > } > } > > sub vcl_miss { > if ( req.request == "PUT" ) { > pass; > } > if ( req.request == "PURGE") { > error 404 "Not in cache."; > } > } > > enjoy, > > -jeremy > > -- > = > = > ====================================================================== > Jeremy Hinegardner jeremy at hinegardner.org > > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc From jeremy at hinegardner.org Wed Aug 12 21:32:00 2009 From: jeremy at hinegardner.org (Jeremy Hinegardner) Date: Wed, 12 Aug 2009 15:32:00 -0600 Subject: abnormally high load? In-Reply-To: References: <20090812155953.GU17093@hinegardner.org> <4A82F0D0.5030503@gmail.com> <20090812180846.GV17093@hinegardner.org> Message-ID: <20090812213200.GW17093@hinegardner.org> On Wed, Aug 12, 2009 at 12:25:11PM -0700, Ken Brownfield wrote: > My first guess is that you're seeing varnish spawn a lot of threads because > your back-end isn't keeping up with the miss rate. My second guess is that > these misses are large files that are taking a long time for clients to > download, therefore piling up active client connections (and thus worker > threads). On the backends, I'm seeing extremely idle systems, and for a short spurt last night I changed the config to just 'pass' for everything and the backends were able to mostly handle the load, and I may be able to tune them to handle all of the load. That may be a route I take, just use varnish for routing the requests for determining the backend and pass it all through. The vast majority of our files are in the 1K->4K range, we do get a few on very rare occasion that are 1M or so. All of the machines involved here are on the same gigabit LAN. > I'm guessing your load is going high because you're swapping? In top, are > your CPUs saturated, or fairly idle? The CPU's are saturated, with 0 swap, and virtually 0 iowait. > If you're seeing CPU saturation, this is possibly an internal Varnish > issue. Your VCL seems sane, but we haven't seen the inline C. I could very well be breaking things with our VCL and inline C, here's our complete varnish configuration, inline C and small lib the inline C uses. http://gist.github.com/166767 If this is bending varnish in a manner that it shouldn't be, freel free shake your head and tell me 'how could you do something like that'. So far, it is working well, except for this load issue. > Hope it helps, Yup, so far every comment has helped some, I have a few things I'm going to change today and see if it alleviates the issue. enjoy, -jeremy > > On Aug 12, 2009, at 11:08 AM, Jeremy Hinegardner wrote: > >> On Wed, Aug 12, 2009 at 05:41:52PM +0100, Rob S wrote: >>> Jeremy Hinegardner wrote: >>>> Hi all, >>>> >>>> I'm trying to figure out if this is a normal situation or not. We have >>>> a >>>> varnish instance in front of 12 tokyo tyrant instances with some inline >>>> C >>>> in the VCL to determine which backend to talk to. >>>> >>>> >>> >>> If you restart varnish during one of these spikes, does it instantly >>> disappear? I've seen this happen (though only spiking to about 12), and >>> this is when Varnish has munched through far more memory than we've >>> allocated it. This problem is one I've been looking into with Ken >>> Brownfield, and touches on >>> http://projects.linpro.no/pipermail/varnish-misc/2009-April/002743.html >>> and >>> http://projects.linpro.no/pipermail/varnish-misc/2009-June/002840.html >>> >>> Do any of these tie up with your experience? >> >> Possibly, the correlation I can see with those instances is this section >> of our >> VCL >> >> sub vcl_recv { >> ... >> } else if ( req.request == "PUT" || req.request == "PURGE" ) { >> purge( "req.url == " req.url ); >> if ( req.request == "PUT" ) { >> pass; >> } else { >> error 200 "PURGE Success"; >> } >> } >> ... >> } >> >> We do a consistent stream of PUT operations, its probably 10-15% of all >> our operations. So our ban list would get farily large I'm guessing? >> >> I'm not seeing evidence of a memory leak, and the pmap of the process does >> show >> 4G in the varnish_storage.bin mapping. >> >> I've attached the output of 'varnishstat -1' if that helps. This is after >> I've >> diverted some traffic around varnish because of the load. >> >> If this purge() is the culprit, then I should make this change? >> >> sub vcl_recv { >> ... >> } else if ( req.request == "PUT" || req.request == "PURGE" ) { >> lookup; >> } >> ... >> } >> >> sub vcl_hit { >> if ( req.request == "PUT" || req.request == "PURGE" ) { >> set obj.ttl = 0s; >> if ( req.request == "PURGE" ) { >> error 200 "PURGE Success"; >> } >> pass; >> } >> } >> >> sub vcl_miss { >> if ( req.request == "PUT" ) { >> pass; >> } >> if ( req.request == "PURGE") { >> error 404 "Not in cache."; >> } >> } >> >> enjoy, >> >> -jeremy -- ======================================================================== Jeremy Hinegardner jeremy at hinegardner.org From moseleymark at gmail.com Thu Aug 13 00:38:10 2009 From: moseleymark at gmail.com (Mark Moseley) Date: Wed, 12 Aug 2009 17:38:10 -0700 Subject: abnormally high load? In-Reply-To: References: <20090812155953.GU17093@hinegardner.org> <4A82F0D0.5030503@gmail.com> <20090812180846.GV17093@hinegardner.org> Message-ID: <294d5daa0908121738g193a0fbl2142ab5c2e49a85f@mail.gmail.com> On Wed, Aug 12, 2009 at 12:25 PM, Ken Brownfield wrote: > My first guess is that you're seeing varnish spawn a lot of threads > because your back-end isn't keeping up with the miss rate. ?My second > guess is that these misses are large files that are taking a long time > for clients to download, therefore piling up active client connections > (and thus worker threads). > > I'm guessing your load is going high because you're swapping? ?In top, > are your CPUs saturated, or fairly idle? > > If you're seeing CPU saturation, this is possibly an internal Varnish > issue. ?Your VCL seems sane, but we haven't seen the inline C. > > It's a fact of life that you may periodically need more back-end or > worker threads than you would normally see. ?If you're on Linux (at > least), each of those threads will use 8MB of RAM (the default stack > size) which adds up quickly. ?You can reduce the thread stack size to > dramatically decrease how much memory Varnish uses as it scales threads. > > We run a patch here that adds a startup parameter to change the stack > size of backend and worker pthreads, but you could emulate this by > reducing stack size before running Varnish: > > ? ? ? ?ulimit -s 256 > ? ? ? ? ?or > ? ? ? ?limit stacksize 256 > > We run pretty heavy traffic (including inline C) with 256KB stack with > no problem. ?This adjustment should decrease memory usage as thread > counts increase, and if you're swapping it might help alleviate the > spikes. ?But if the spikes are due to a slow backend, that's probably > where I'd look first. > > Hope it helps, > -- > Ken Not looking to hijack the thread, but that got my attention. Is there any rule of thumb that would determine how small of a stack size a varnish could away with and still perform ok? I dropped our stack size to 1 meg, which seemed drastic at the time. But if you're doing 256k, that's even better. We're using varnish in a web hosting environment on 32-bit Debian Etch boxes (without hope of going 64-bit anytime soon), so I can cram about 800 threads total without hitting the 3gb limit. Traffic patterns are pretty much random, i.e. size of requests are all over the board, though the average is about 40k. Hit rate is ~25% (which sounds awful but for web hosting, we're overjoyed). Any red flags for 256kb threads there? From kb+varnish at slide.com Thu Aug 13 02:06:53 2009 From: kb+varnish at slide.com (Ken Brownfield) Date: Wed, 12 Aug 2009 19:06:53 -0700 Subject: abnormally high load? In-Reply-To: <294d5daa0908121738g193a0fbl2142ab5c2e49a85f@mail.gmail.com> References: <20090812155953.GU17093@hinegardner.org> <4A82F0D0.5030503@gmail.com> <20090812180846.GV17093@hinegardner.org> <294d5daa0908121738g193a0fbl2142ab5c2e49a85f@mail.gmail.com> Message-ID: I never found a way to see how much stack is /used/ vs. /allocated/ in a process or thread, so it would be great if someone had ideas? I could only experiment in production, first moving us to 1MB, then 256KB. I've yet to see any issues at 256KB, but we can reach the upper limits of thread-count sanity on our boxes with that setting, so I haven't dropped it further in production. The minimums I reached in minimal testing were 128KB with the ulimit method, and 64KB with the (IMHO cleaner) backend/worker-thread-only approach. I'm not sure what in Varnish would use more than that much stack, but 256KB seems like the sweet spot. We're 64-bit Ubuntu, and I would assume that a somewhat smaller stack would work on 32-bit, possibly making 128KB safe. Unless you're doing recursion or using large declared structures in inline C, I wouldn't think you'd see large stack allocations or huge shifts in allocation during operation. I don't /believe/ objects are ever allocated on the stack, or that there's a lot of recursion in the code. FWIW I personally don't see any red flags. -- Ken On Aug 12, 2009, at 5:38 PM, Mark Moseley wrote: > Not looking to hijack the thread, but that got my attention. Is there > any rule of thumb that would determine how small of a stack size a > varnish could away with and still perform ok? I dropped our stack size > to 1 meg, which seemed drastic at the time. But if you're doing 256k, > that's even better. We're using varnish in a web hosting environment > on 32-bit Debian Etch boxes (without hope of going 64-bit anytime > soon), so I can cram about 800 threads total without hitting the 3gb > limit. Traffic patterns are pretty much random, i.e. size of requests > are all over the board, though the average is about 40k. Hit rate is > ~25% (which sounds awful but for web hosting, we're overjoyed). Any > red flags for 256kb threads there? From sridhar at primesoftsolutionsinc.com Thu Aug 13 05:46:28 2009 From: sridhar at primesoftsolutionsinc.com (Sridhar) Date: Thu, 13 Aug 2009 11:16:28 +0530 Subject: caching problem Message-ID: <0FB0D67F437F4F66843C4BBCDBF458D3@SridharRaju> Hi All, I have setup the plone site behind varnish and pound. I observe in consistency in cache hits. If I look the varnishstats, its not displaying any cache hits. Previously it use to disaply. Here is my config # # This is a basic VCL configuration file for varnish. See the vcl(7) # man page for details on VCL syntax and semantics. # # $Id: default.vcl 1424 2007-05-15 19:38:56Z des $ # # Default backend definition. Set this to point to your content # server. backend default { set backend.host = "x.x.x.x"; set backend.port = "8080"; } acl purge { "localhost"; "x.x.x.x"/24; } sub vcl_hash { set req.hash += req.http.cookie; if (req.request == "GET" && req.http.cookie) { lookup; } lookup; } sub vcl_recv { if (req.request != "GET" && req.request != "HEAD") { # PURGE request if zope asks nicely if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed."; } lookup; } pipe; } if (req.http.Expect) { pipe; } if (req.http.Authenticate || req.http.Authorization) { pass; } # We only care about the "__ac.*" cookies, used for authentication if (req.http.Cookie && req.http.Cookie ~ "__ac(|_(name|password|persistent))=") { pass; } lookup; } # Do the PURGE thing sub vcl_hit { if (req.request == "PURGE") { set obj.ttl = 0s; error 200 "Purged"; } } sub vcl_miss { if (req.request == "PURGE") { error 404 "Not in cache"; } } # Enforce a minimum TTL, since we PURGE changed objects actively from Zope. #sub vcl_fetch { # if (obj.ttl < 3600s) { # set obj.ttl = 3600s; # } #} # Below is a commented-out copy of the default VCL logic. If you # redefine any of these subroutines, the built-in logic will be # appended to your code. ## Called when a client request is received # #sub vcl_recv { # if (req.request != "GET" && req.request != "HEAD") { # pipe; # } # if (req.http.Expect) { # pipe; # if (req.http.Expect) { # pipe; # } # if (req.http.Authenticate || req.http.Cookie) { # pass; # } # lookup; #} # ## Called when entering pipe mode # #sub vcl_pipe { # pipe; #} # ## Called when entering pass mode # #sub vcl_pass { # pass; #} # ## Called when entering an object into the cache # #sub vcl_hash { # hash; #} # ## Called when the requested object was found in the cache # #sub vcl_hit { # if (!obj.cacheable) { # pass; # } # deliver; #} # ## Called when the requested object was not found in the cache #sub vcl_miss { # fetch; #} # ## Called when the requested object has been retrieved from the ## backend, or the request to the backend has failed # sub vcl_fetch { if (!obj.valid) { error; } if (!obj.cacheable) { pass; } # if (resp.http.Set-Cookie) { # pass; # } # insert; } # ## Called when an object nears its expiry time # #sub vcl_timeout { # discard; #} Please help me, I am struggling from long time to solve this problem. I use firefox and I.E browsers. Regards, Sridhar Raju PrimeSoft Solutions Inc Phone: 040-27762986/27762987 Skype ID: sridharsagi www.primesoftsolutionsinc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 2699 bytes Desc: not available URL: From plfgoa at gmail.com Thu Aug 13 08:19:41 2009 From: plfgoa at gmail.com (Paras Fadte) Date: Thu, 13 Aug 2009 13:49:41 +0530 Subject: Varnish 503 Service unavailable error Message-ID: <75cf5800908130119o5b10f41bn85e7d76fcb7ebc96@mail.gmail.com> Hi, Sometimes I receive 503 service unavailable error even though there are 4 backends . This would mean that all the backends are unavailable at a given time which I don't think is the case . What could be done to resolve this ? Following are the probe settings. .probe = { .url = "/check.gif"; .timeout = 500 ms; .interval = 1s; .window = 6; .threshold = 4; } I have tried setting the "first_byte_timeout" value to 300 but still sometimes receive the 503 error . Should I increase the "connect_timeout" value ? would that help ? Its default value is 0.4 seconds Thank you. -Paras From rtshilston at gmail.com Thu Aug 13 08:28:09 2009 From: rtshilston at gmail.com (Rob S) Date: Thu, 13 Aug 2009 09:28:09 +0100 Subject: Varnish 503 Service unavailable error In-Reply-To: <75cf5800908130119o5b10f41bn85e7d76fcb7ebc96@mail.gmail.com> References: <75cf5800908130119o5b10f41bn85e7d76fcb7ebc96@mail.gmail.com> Message-ID: <4A83CE99.5080507@gmail.com> Paras Fadte wrote: > Hi, > > Sometimes I receive 503 service unavailable error even though there > are 4 backends . This would mean that all the backends are unavailable > at a given time which I don't think is the case . Can you replay your varnishlog file, and look for Backend_health items, and confirm that they did all go sick at the same time? If they did, then you'll need to look at your backends themselves. Are you separately monitoring them with Nagios, Zabbix, Pingdom or something like that? If replaying varnishlog shows they weren't sick, then I suggest you get the varnish transaction ID from one of these 503 errors, and then extract the relevant portion of the varnishlog. This might help explain the path taken by your request through the VCL, and help you diagnose a logic problem. Finally, we define all our backends as being monitored by probe, but also redefine them without a probe. director failsafepool random { { .backend = serverAfailsafe; .weight = 1; } { .backend = serverBfailsafe; .weight = 1; } { .backend = serverCfailsafe; .weight = 1; } { .backend = serverDfailsafe; .weight = 1; } } We then use logic like: set req.backend = monitoredpool; if (!req.backend.healthy) { set req.backend = failsafepool; } You can then look in your varnishncsa log to see if it used the normal or failsafe backends were used. Rob From tfheen at redpill-linpro.com Thu Aug 13 09:20:55 2009 From: tfheen at redpill-linpro.com (Tollef Fog Heen) Date: Thu, 13 Aug 2009 11:20:55 +0200 Subject: Varnish 503 Service unavailable error In-Reply-To: <75cf5800908130119o5b10f41bn85e7d76fcb7ebc96@mail.gmail.com> (Paras Fadte's message of "Thu, 13 Aug 2009 13:49:41 +0530") References: <75cf5800908130119o5b10f41bn85e7d76fcb7ebc96@mail.gmail.com> Message-ID: <87fxbw5atk.fsf@qurzaw.linpro.no> ]] Paras Fadte | I have tried setting the "first_byte_timeout" value to 300 but still | sometimes receive the 503 error . Should I increase the | "connect_timeout" value ? would that help ? Its default value is 0.4 | seconds It might help to increase the connect timeout, yes. It would be helpful if you posted the output of varnishstat -1 -- Tollef Fog Heen Redpill Linpro -- Changing the game! t: +47 21 54 41 73 From plfgoa at gmail.com Thu Aug 13 09:43:43 2009 From: plfgoa at gmail.com (Paras Fadte) Date: Thu, 13 Aug 2009 15:13:43 +0530 Subject: Varnish 503 Service unavailable error In-Reply-To: <87fxbw5atk.fsf@qurzaw.linpro.no> References: <75cf5800908130119o5b10f41bn85e7d76fcb7ebc96@mail.gmail.com> <87fxbw5atk.fsf@qurzaw.linpro.no> Message-ID: <75cf5800908130243k3c0e7db6i2d01788a42ab352e@mail.gmail.com> Following is the output of varnishstat -1 . Also ,do I have to restart varnish when I change the "connect_timeout" parameter ? I had changed it by telnetting to port 99 and using param.set to set new value to it. Its currently set to 1 second. ./varnishstat -1 uptime 63373 . Child uptime client_conn 634043 10.00 Client connections accepted client_req 17826332 281.29 Client requests received cache_hit 9499758 149.90 Cache hits cache_hitpass 18026 0.28 Cache hits for pass cache_miss 8306918 131.08 Cache misses backend_conn 8294978 130.89 Backend connections success backend_unhealthy 0 0.00 Backend connections not attempted backend_busy 0 0.00 Backend connections too many backend_fail 37154 0.59 Backend connections failures backend_reuse 176 0.00 Backend connections reuses backend_recycle 4135 0.07 Backend connections recycles backend_unused 0 0.00 Backend connections unused n_srcaddr 1128 . N struct srcaddr n_srcaddr_act 94 . N active struct srcaddr n_sess_mem 1550 . N struct sess_mem n_sess 251 . N struct sess n_object 296920 . N struct object n_objecthead 297600 . N struct objecthead n_smf 594501 . N struct smf n_smf_frag 5072 . N small free smf n_smf_large 2 . N large free smf n_vbe_conn 79 . N struct vbe_conn n_bereq 233 . N struct bereq n_wrk 129 . N worker threads n_wrk_create 6518 0.10 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 1689251 26.66 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 40268 0.64 N overflowed work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 7 . N backends n_expired 7859618 . N expired objects n_lru_nuked 122535 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 6525309 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 1395 0.02 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 17776012 280.50 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 634043 10.00 Total Sessions s_req 17826354 281.29 Total Requests s_pipe 0 0.00 Total pipe s_pass 18274 0.29 Total pass s_fetch 8294976 130.89 Total fetch s_hdrbytes 7506851578 118455.05 Total header bytes s_bodybytes 33144737427 523010.39 Total body bytes sess_closed 101942 1.61 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 0 0.00 Session Linger sess_herd 17790160 280.72 Session herd shm_records 1322178282 20863.43 SHM records shm_writes 48024610 757.81 SHM writes shm_flushes 1486419 23.46 SHM flushes due to overflow shm_cont 3563 0.06 SHM MTX contention shm_cycles 596 0.01 SHM cycles through buffer sm_nreq 16778276 264.75 allocator requests sm_nobj 589427 . outstanding allocations sm_balloc 3745964032 . bytes allocated sm_bfree 549003264 . bytes free sma_nreq 0 0.00 SMA allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq 31613 0.50 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 18446744073709550684 . SMS outstanding bytes sms_balloc 14697712 . SMS bytes allocated sms_bfree 14698178 . SMS bytes freed backend_req 8294971 130.89 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock 0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) On Thu, Aug 13, 2009 at 2:50 PM, Tollef Fog Heen wrote: > ]] Paras Fadte > > | I have tried setting the "first_byte_timeout" value to 300 but still > | sometimes receive the 503 error . Should I increase the > | "connect_timeout" value ? would that help ? Its default value is 0.4 > | seconds > > It might help to increase the connect timeout, yes. ?It would be helpful > if you posted the output of varnishstat -1 > > -- > Tollef Fog Heen > Redpill Linpro -- Changing the game! > t: +47 21 54 41 73 > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > From phk at phk.freebsd.dk Thu Aug 13 08:28:09 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 13 Aug 2009 08:28:09 +0000 Subject: abnormally high load? In-Reply-To: Your message of "Wed, 12 Aug 2009 19:06:53 MST." Message-ID: <1331.1250152089@critter.freebsd.dk> In message , Ken Brownfield wri tes: >Unless you're doing recursion or using large declared structures in >inline C, [...] We do use limited recursion, for instance in the case of ESI where included objects result in reentrance of cache_center.c And yes, wouldn't it have been nice if POSIX had included a way to see how many pages the kernel know you use ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From moseleymark at gmail.com Thu Aug 13 18:38:14 2009 From: moseleymark at gmail.com (Mark Moseley) Date: Thu, 13 Aug 2009 11:38:14 -0700 Subject: abnormally high load? In-Reply-To: References: <20090812155953.GU17093@hinegardner.org> <4A82F0D0.5030503@gmail.com> <20090812180846.GV17093@hinegardner.org> <294d5daa0908121738g193a0fbl2142ab5c2e49a85f@mail.gmail.com> Message-ID: <294d5daa0908131138u3cc92b27ta8b98bfc661dd835@mail.gmail.com> On Wed, Aug 12, 2009 at 7:06 PM, Ken Brownfield wrote: > I never found a way to see how much stack is /used/ vs. /allocated/ in > a process or thread, so it would be great if someone had ideas? > > I could only experiment in production, first moving us to 1MB, then > 256KB. ?I've yet to see any issues at 256KB, but we can reach the > upper limits of thread-count sanity on our boxes with that setting, so > I haven't dropped it further in production. > > The minimums I reached in minimal testing were 128KB with the ulimit > method, and 64KB with the (IMHO cleaner) backend/worker-thread-only > approach. ?I'm not sure what in Varnish would use more than that much > stack, but 256KB seems like the sweet spot. > > We're 64-bit Ubuntu, and I would assume that a somewhat smaller stack > would work on 32-bit, possibly making 128KB safe. > > Unless you're doing recursion or using large declared structures in > inline C, I wouldn't think you'd see large stack allocations or huge > shifts in allocation during operation. ?I don't /believe/ objects are > ever allocated on the stack, or that there's a lot of recursion in the > code. > > FWIW I personally don't see any red flags. > -- > Ken That's excellent information. We're still very early in our varnish deployment, so tuning info is greatly appreciated. Thanks! From kristian at redpill-linpro.com Fri Aug 14 11:05:03 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Fri, 14 Aug 2009 13:05:03 +0200 Subject: Security.VCL Message-ID: <20090814110503.GE4896@kjeks.linpro.no> I just committed /varnish-tools/security.vcl, which is an early version of a pet project Edward Bjarte Fjellsk?l, Kacper Wysocki and myself have been working at. The idea is to add basic filtering of common exploits in VCL, but with minimal impact on normal VCL. This early version has a few ugly details (like hard coded paths), and some of the rules, specially in vcl/breach/, are likely to be downright wrong. The work is loosely based on mod_security (breach/ is automatically generated based on mod_security), but we've added several of our own rules too. The major drawbacks right now is that we can't parse POST-data, and that Varnish uses POSIX regex while mod_security use Perl regex. If you're curious about Security.VCL, I suggest you take a look at the README and the vcl/main.vcl. We'll continue to work on this sporadically, but patches are welcome. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From moseleymark at gmail.com Tue Aug 18 00:55:39 2009 From: moseleymark at gmail.com (Mark Moseley) Date: Mon, 17 Aug 2009 17:55:39 -0700 Subject: Thread pools Message-ID: <294d5daa0908171755y44f5c132o587f3c8188493a4@mail.gmail.com> I've seen various things in the wiki and threads on this list talking about thread pools. In general, the advice is typically conservative, i.e. don't use more than the default 2 thread pools unless you have to. I've also seen the occasional comment suggesting one run as many thread pools as there are cores/cpus. Is the "conservative" still generally correct? Is there any serious penalty for running ${NUM_OF_CORES_OR_NUM_OF_CPUS} thread pools (but not exceeding the # of CPUs/cores)? Or is there a serious penalty for running *fewer* than ${NUM_OF_CORES_OR_NUM_OF_CPUS} thread pools? Given something like a (quite) busy quad-core dual processor box, can I afford to do 4 thread pools (or even 6 or 8)? So far I've been just doing the default # of thread pools (with min. of 500 threads in total). Also, the wiki mentions that it's mainly appropriate when you run into locks tying things up. Is that mainly a case of high LRU turnover or are there other scenarios where locking is an issue? What are the symptoms of locking becoming an issue with the current configuration and what fields in varnishstat should I be looking at? In my case, LRU turnover is non-trivial. As mentioned on another ticket, we're using varnish in a web hosting environment, so the hit rate is ~25-30%, which for this environment is utterly fantastic -- and so far we've been totally impressed with it. Sorry for the mass of questions, I'm just looking to eke out every possible iota of performance. We've mainly got Dell Poweredge 850s and 1950s running varnish. The 850s are straightforward enough, but I worry about wasting any CPU% on those 8 core 1950s -- but I also don't want to introduce a whole other set of problems by using too many thread pools. Thanks! From yann.malet at gmail.com Tue Aug 18 01:08:59 2009 From: yann.malet at gmail.com (Yann Malet) Date: Mon, 17 Aug 2009 21:08:59 -0400 Subject: Proxy from and ordered list of web server until one of them send a 200 status code Message-ID: Hello, I am new to varnish and I end up on this mailing list after several days of google search and a day of reading your documentation. Based on this small experience I get the feeling that varnish might be the tool that I am looking for. However I am not yet sure because I haven't found and example of vcl configuration doing what I want. Let me give you the context of this request we will soon replace our old cms by a new one : both of them are served by 2 differents web server or on the same web server but on 2 different port : weserver_new_cms:8081 and webserver_old_cms:8082 What I would like to do is to have a single frontend web server called : frontend:8080 that cache and the get the requested page from either weserver_new_cms:8081 or webserver_old_cms:8082 and cache the result. So the journey of the request will look like this : Browser request the page : *frontend:8080/foo/bar* This request reach the frontend:8080, it looks if the page is the cache. If the page is in the cache it serves it from there else it sends the request to* weserver_new_cms:8081*. There are 2 cases there the page exists or not. If the page exists it serves the page to the frontend that puts it in the cache and sends it to the client. If the page does not exist it means that weserver_new_cms:8081 returns 404 the frontend should reverse proxy *to webserver_old_cms:8082. *There is again 2 cases there the page exists or it doesn't. If the page exists it serves the page to the frontend that puts it in the cache and send it to the client. If the page does not exist it returns a 404 error to the client because the page does not exist in any (new, old) cms. It seems to me that *vcl_fetch* is the right place to hook this logic but so far I have no idea on how to write this. Some help/ guidance would be very appreciated. Regards, Yann -------------- next part -------------- An HTML attachment was scrubbed... URL: From plfgoa at gmail.com Tue Aug 18 07:02:46 2009 From: plfgoa at gmail.com (Paras Fadte) Date: Tue, 18 Aug 2009 12:32:46 +0530 Subject: Varnish 503 Service unavailable error In-Reply-To: <75cf5800908130243k3c0e7db6i2d01788a42ab352e@mail.gmail.com> References: <75cf5800908130119o5b10f41bn85e7d76fcb7ebc96@mail.gmail.com> <87fxbw5atk.fsf@qurzaw.linpro.no> <75cf5800908130243k3c0e7db6i2d01788a42ab352e@mail.gmail.com> Message-ID: <75cf5800908180002l5369ff16u306aaf34e1c04b8c@mail.gmail.com> Hi, Can anybody please respond to this ? -Paras On Thu, Aug 13, 2009 at 3:13 PM, Paras Fadte wrote: > Following is the output of varnishstat -1 . Also ,do I have to restart > varnish when I change the "connect_timeout" parameter ? ? I had > changed it by telnetting to port 99 and using param.set to set new > value to it. Its currently set to 1 second. > > ./varnishstat -1 > > uptime ? ? ? ? ? ? ? ? ?63373 ? ? ? ? ?. ? Child uptime > client_conn ? ? ? ? ? ?634043 ? ? ? ?10.00 Client connections accepted > client_req ? ? ? ? ? 17826332 ? ? ? 281.29 Client requests received > cache_hit ? ? ? ? ? ? 9499758 ? ? ? 149.90 Cache hits > cache_hitpass ? ? ? ? ? 18026 ? ? ? ? 0.28 Cache hits for pass > cache_miss ? ? ? ? ? ?8306918 ? ? ? 131.08 Cache misses > backend_conn ? ? ? ? ?8294978 ? ? ? 130.89 Backend connections success > backend_unhealthy ? ? ? ? ? ?0 ? ? ? ? 0.00 Backend connections not attempted > backend_busy ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 Backend connections too many > backend_fail ? ? ? ? ? ?37154 ? ? ? ? 0.59 Backend connections failures > backend_reuse ? ? ? ? ? ? 176 ? ? ? ? 0.00 Backend connections reuses > backend_recycle ? ? ? ? ?4135 ? ? ? ? 0.07 Backend connections recycles > backend_unused ? ? ? ? ? ? ?0 ? ? ? ? 0.00 Backend connections unused > n_srcaddr ? ? ? ? ? ? ? ?1128 ? ? ? ? ?. ? N struct srcaddr > n_srcaddr_act ? ? ? ? ? ? ?94 ? ? ? ? ?. ? N active struct srcaddr > n_sess_mem ? ? ? ? ? ? ? 1550 ? ? ? ? ?. ? N struct sess_mem > n_sess ? ? ? ? ? ? ? ? ? ?251 ? ? ? ? ?. ? N struct sess > n_object ? ? ? ? ? ? ? 296920 ? ? ? ? ?. ? N struct object > n_objecthead ? ? ? ? ? 297600 ? ? ? ? ?. ? N struct objecthead > n_smf ? ? ? ? ? ? ? ? ?594501 ? ? ? ? ?. ? N struct smf > n_smf_frag ? ? ? ? ? ? ? 5072 ? ? ? ? ?. ? N small free smf > n_smf_large ? ? ? ? ? ? ? ? 2 ? ? ? ? ?. ? N large free smf > n_vbe_conn ? ? ? ? ? ? ? ? 79 ? ? ? ? ?. ? N struct vbe_conn > n_bereq ? ? ? ? ? ? ? ? ? 233 ? ? ? ? ?. ? N struct bereq > n_wrk ? ? ? ? ? ? ? ? ? ? 129 ? ? ? ? ?. ? N worker threads > n_wrk_create ? ? ? ? ? ? 6518 ? ? ? ? 0.10 N worker threads created > n_wrk_failed ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 N worker threads not created > n_wrk_max ? ? ? ? ? ? 1689251 ? ? ? ?26.66 N worker threads limited > n_wrk_queue ? ? ? ? ? ? ? ? 0 ? ? ? ? 0.00 N queued work requests > n_wrk_overflow ? ? ? ? ?40268 ? ? ? ? 0.64 N overflowed work requests > n_wrk_drop ? ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 N dropped work requests > n_backend ? ? ? ? ? ? ? ? ? 7 ? ? ? ? ?. ? N backends > n_expired ? ? ? ? ? ? 7859618 ? ? ? ? ?. ? N expired objects > n_lru_nuked ? ? ? ? ? ?122535 ? ? ? ? ?. ? N LRU nuked objects > n_lru_saved ? ? ? ? ? ? ? ? 0 ? ? ? ? ?. ? N LRU saved objects > n_lru_moved ? ? ? ? ? 6525309 ? ? ? ? ?. ? N LRU moved objects > n_deathrow ? ? ? ? ? ? ? ? ?0 ? ? ? ? ?. ? N objects on deathrow > losthdr ? ? ? ? ? ? ? ? ?1395 ? ? ? ? 0.02 HTTP header overflows > n_objsendfile ? ? ? ? ? ? ? 0 ? ? ? ? 0.00 Objects sent with sendfile > n_objwrite ? ? ? ? ? 17776012 ? ? ? 280.50 Objects sent with write > n_objoverflow ? ? ? ? ? ? ? 0 ? ? ? ? 0.00 Objects overflowing workspace > s_sess ? ? ? ? ? ? ? ? 634043 ? ? ? ?10.00 Total Sessions > s_req ? ? ? ? ? ? ? ?17826354 ? ? ? 281.29 Total Requests > s_pipe ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 Total pipe > s_pass ? ? ? ? ? ? ? ? ?18274 ? ? ? ? 0.29 Total pass > s_fetch ? ? ? ? ? ? ? 8294976 ? ? ? 130.89 Total fetch > s_hdrbytes ? ? ? ? 7506851578 ? ?118455.05 Total header bytes > s_bodybytes ? ? ? 33144737427 ? ?523010.39 Total body bytes > sess_closed ? ? ? ? ? ?101942 ? ? ? ? 1.61 Session Closed > sess_pipeline ? ? ? ? ? ? ? 0 ? ? ? ? 0.00 Session Pipeline > sess_readahead ? ? ? ? ? ? ?0 ? ? ? ? 0.00 Session Read Ahead > sess_linger ? ? ? ? ? ? ? ? 0 ? ? ? ? 0.00 Session Linger > sess_herd ? ? ? ? ? ?17790160 ? ? ? 280.72 Session herd > shm_records ? ? ? ?1322178282 ? ? 20863.43 SHM records > shm_writes ? ? ? ? ? 48024610 ? ? ? 757.81 SHM writes > shm_flushes ? ? ? ? ? 1486419 ? ? ? ?23.46 SHM flushes due to overflow > shm_cont ? ? ? ? ? ? ? ? 3563 ? ? ? ? 0.06 SHM MTX contention > shm_cycles ? ? ? ? ? ? ? ?596 ? ? ? ? 0.01 SHM cycles through buffer > sm_nreq ? ? ? ? ? ? ?16778276 ? ? ? 264.75 allocator requests > sm_nobj ? ? ? ? ? ? ? ?589427 ? ? ? ? ?. ? outstanding allocations > sm_balloc ? ? ? ? ?3745964032 ? ? ? ? ?. ? bytes allocated > sm_bfree ? ? ? ? ? ?549003264 ? ? ? ? ?. ? bytes free > sma_nreq ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 SMA allocator requests > sma_nobj ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ?. ? SMA outstanding allocations > sma_nbytes ? ? ? ? ? ? ? ? ?0 ? ? ? ? ?. ? SMA outstanding bytes > sma_balloc ? ? ? ? ? ? ? ? ?0 ? ? ? ? ?. ? SMA bytes allocated > sma_bfree ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ?. ? SMA bytes free > sms_nreq ? ? ? ? ? ? ? ?31613 ? ? ? ? 0.50 SMS allocator requests > sms_nobj ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ?. ? SMS outstanding allocations > sms_nbytes ? ? ? 18446744073709550684 ? ? ? ? ?. ? SMS outstanding bytes > sms_balloc ? ? ? ? ? 14697712 ? ? ? ? ?. ? SMS bytes allocated > sms_bfree ? ? ? ? ? ?14698178 ? ? ? ? ?. ? SMS bytes freed > backend_req ? ? ? ? ? 8294971 ? ? ? 130.89 Backend requests made > n_vcl ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ? ? 0.00 N vcl total > n_vcl_avail ? ? ? ? ? ? ? ? 1 ? ? ? ? 0.00 N vcl available > n_vcl_discard ? ? ? ? ? ? ? 0 ? ? ? ? 0.00 N vcl discarded > n_purge ? ? ? ? ? ? ? ? ? ? 1 ? ? ? ? ?. ? N total active purges > n_purge_add ? ? ? ? ? ? ? ? 1 ? ? ? ? 0.00 N new purges added > n_purge_retire ? ? ? ? ? ? ?0 ? ? ? ? 0.00 N old purges deleted > n_purge_obj_test ? ? ? ? ? ?0 ? ? ? ? 0.00 N objects tested > n_purge_re_test ? ? ? ? ? ? 0 ? ? ? ? 0.00 N regexps tested against > n_purge_dups ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 N duplicate purges removed > hcb_nolock ? ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 HCB Lookups without lock > hcb_lock ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 HCB Lookups with lock > hcb_insert ? ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 HCB Inserts > esi_parse ? ? ? ? ? ? ? ? ? 0 ? ? ? ? 0.00 Objects ESI parsed (unlock) > esi_errors ? ? ? ? ? ? ? ? ?0 ? ? ? ? 0.00 ESI parse errors (unlock) > > > > > > On Thu, Aug 13, 2009 at 2:50 PM, Tollef Fog > Heen wrote: >> ]] Paras Fadte >> >> | I have tried setting the "first_byte_timeout" value to 300 but still >> | sometimes receive the 503 error . Should I increase the >> | "connect_timeout" value ? would that help ? Its default value is 0.4 >> | seconds >> >> It might help to increase the connect timeout, yes. ?It would be helpful >> if you posted the output of varnishstat -1 >> >> -- >> Tollef Fog Heen >> Redpill Linpro -- Changing the game! >> t: +47 21 54 41 73 >> _______________________________________________ >> varnish-misc mailing list >> varnish-misc at projects.linpro.no >> http://projects.linpro.no/mailman/listinfo/varnish-misc >> > From rtshilston at gmail.com Tue Aug 18 07:19:36 2009 From: rtshilston at gmail.com (Rob S) Date: Tue, 18 Aug 2009 08:19:36 +0100 Subject: Proxy from and ordered list of web server until one of them send a 200 status code In-Reply-To: References: Message-ID: <4A8A5608.1020604@gmail.com> Yann Malet wrote: > Browser request the page : *frontend:8080/foo/bar* > This request reach the frontend:8080, it looks if the page is the > cache. If the page is in the cache it serves it from there else it > sends the request to* weserver_new_cms:8081*. There are 2 cases there > the page exists or not. If the page exists it serves the page to the > frontend that puts it in the cache and sends it to the client. If the > page does not exist it means that weserver_new_cms:8081 returns 404 > the frontend should reverse proxy *to webserver_old_cms:8082. *There > is again 2 cases there the page exists or it doesn't. If the page > exists it serves the page to the frontend that puts it in the cache > and send it to the client. If the page does not exist it returns a 404 > error to the client because the page does not exist in any (new, old) cms. > > It seems to me that *vcl_fetch* is the right place to hook this logic > but so far I have no idea on how to write this. Some help/ guidance > would be very appreciated. > Yann, Varnish can definitely do this, and by default Varnish will serve from its cache anything that is there. So, you just need to worry about the "it's not in the cache" scenario, and instead do something like the following. First, you'll need to define your backend nodes: backend oldcmsnode { .host = "webserver_old_cms"; .port="8082"; } backend newcmsnode { .host = "webserver_new_cms"; .port="8081"; } director oldcms random { { .backend = oldcmsnode ; .weight = 1; } } director newcms random { { .backend = newcmsnode ; .weight = 1; } } then, at the top of sub vcl_recv, we say "If we're trying for the first time, use the newcmsnode, otherwise use the oldcmsnode" set req.backend = newcmsnode; if (req.restarts > 0) { set req.backend = oldcmsnode; } in vcl_fetch, put some logic to say "if we got a 404, and it was our first attempt (and therefore we're using the newcmsnode), we should restart and try again". if (obj.status == 404 && req.restarts==0) { restart; } I hope this points you in the right direction. Rob From yann.malet at gmail.com Tue Aug 18 14:41:16 2009 From: yann.malet at gmail.com (Yann Malet) Date: Tue, 18 Aug 2009 10:41:16 -0400 Subject: Proxy from and ordered list of web server until one of them send a 200 status code In-Reply-To: <4A8A5608.1020604@gmail.com> References: <4A8A5608.1020604@gmail.com> Message-ID: Hello Rob, Thank you for this detailed solution I have tested it and it work great. I have created a file called migration.vcl that contains: """ backend oldcmsnode { .host = "old_cms_webserver"; .port="8081"; } backend newcmsnode { .host = "new_cms_webserver"; .port="8082"; } director oldcms random { { .backend = oldcmsnode ; .weight = 1; } } director newcms random { { .backend = newcmsnode ; .weight = 1; } } sub vcl_recv { set req.backend = newcms; if (req.restarts > 0) { set req.backend = oldcms; } } sub vcl_fetch { if (obj.status == 404 && req.restarts==0) { restart; } } """ Then I run it using this command : sudo varnishd -a 127.0.0.1:8080 -F -f /etc/varnish/migration.vcl Regards, Yann On Tue, Aug 18, 2009 at 3:19 AM, Rob S wrote: > Yann Malet wrote: > >> Browser request the page : *frontend:8080/foo/bar* >> This request reach the frontend:8080, it looks if the page is the cache. >> If the page is in the cache it serves it from there else it sends the >> request to* weserver_new_cms:8081*. There are 2 cases there the page exists >> or not. If the page exists it serves the page to the frontend that puts it >> in the cache and sends it to the client. If the page does not exist it means >> that weserver_new_cms:8081 returns 404 the frontend should reverse proxy *to >> webserver_old_cms:8082. *There is again 2 cases there the page exists or it >> doesn't. If the page exists it serves the page to the frontend that puts it >> in the cache and send it to the client. If the page does not exist it >> returns a 404 error to the client because the page does not exist in any >> (new, old) cms. >> >> It seems to me that *vcl_fetch* is the right place to hook this logic but >> so far I have no idea on how to write this. Some help/ guidance would be >> very appreciated. >> >> Yann, > > Varnish can definitely do this, and by default Varnish will serve from its > cache anything that is there. So, you just need to worry about the "it's > not in the cache" scenario, and instead do something like the following. > First, you'll need to define your backend nodes: > > backend oldcmsnode { .host = "webserver_old_cms"; .port="8082"; } > backend newcmsnode { .host = "webserver_new_cms"; .port="8081"; } > > director oldcms random { > { .backend = oldcmsnode ; .weight = 1; } > } > > director newcms random { > { .backend = newcmsnode ; .weight = 1; } > } > > then, at the top of sub vcl_recv, we say "If we're trying for the first > time, use the newcmsnode, otherwise use the oldcmsnode" > > set req.backend = newcmsnode; > if (req.restarts > 0) { > set req.backend = oldcmsnode; > } > > in vcl_fetch, put some logic to say "if we got a 404, and it was our first > attempt (and therefore we're using the newcmsnode), we should restart and > try again". > > if (obj.status == 404 && req.restarts==0) { > restart; > } > > I hope this points you in the right direction. > > > > Rob > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl at slideshare.com Tue Aug 18 23:06:03 2009 From: karl at slideshare.com (Karl Pietri) Date: Tue, 18 Aug 2009 16:06:03 -0700 Subject: Varnish, long lived cache and purge on change Message-ID: <673ddc910908181606t3ba57de8ga9e39769f18d4880@mail.gmail.com> Hello everyone, Recently we decided that our primary page that everyone views doesn't really change all that often. In fact it changes very rarely except for the stats counters (views, downloads, etc). So we decided that we wanted to store everything in varnish for a super long time (and tell the client its not cacheable or cacheable for a very short amount of time), flush the page from varnish when it truly changes and have a very fast ajax call to update the stats. This worked great for about 2 days. Then we ran out of ram and varnish started causing a ton of swap activity and it increased the response times of everything on the site to unusable. After poking about i seem to have found the culprit. When you use url.purge it seems to keep a record of that and check every object as it is fetched to see if it was purged or not. To test this i set a script to purge a lot of stuff and got the same problem to happen. from varnishstat -1 n_purge 236369 . N total active purges n_purge_add 236388 2.31 N new purges added n_purge_retire 19 0.00 N old purges deleted n_purge_obj_test 1651452 16.12 N objects tested n_purge_re_test 5052057513 49316.27 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed each uptick is when i add 100k new purge records. you can see what will happen soon. http://wrenchies.net/~karl/Automator/ScreenShots/20090818160328.png We really want to take advantage of this style of essentially having static html served by varnish and flush it out when it changes. Does any one have advice on how to do this? Originally we had implemented this using the vlc to set the ttl to 0 but with all the combinations of accept-encoding that are possible we were getting many things not being purged from the cache. Another thought would be to refetch the page on change instead of purging it but that has the same problem with accept-encoding. after 36 hours between the 2 machines we have collected 1.3M objects in the cache and have not even come close to running out of space. We would actually like to increase our ttl for the cached objects even longer. I hope someone can help me out here. -Karl Pietri -------------- next part -------------- An HTML attachment was scrubbed... URL: From darryl.dixon at winterhouseconsulting.com Tue Aug 18 23:18:06 2009 From: darryl.dixon at winterhouseconsulting.com (Darryl Dixon - Winterhouse Consulting) Date: Wed, 19 Aug 2009 11:18:06 +1200 (NZST) Subject: Varnish, long lived cache and purge on change In-Reply-To: <673ddc910908181606t3ba57de8ga9e39769f18d4880@mail.gmail.com> References: <673ddc910908181606t3ba57de8ga9e39769f18d4880@mail.gmail.com> Message-ID: <61760.58.28.124.90.1250637486.squirrel@services.directender.co.nz> > Then we ran > out of ram and varnish started causing a ton of swap activity and it > increased the response times of everything on the site to unusable. > > After poking about i seem to have found the culprit. When you use > url.purge > it seems to keep a record of that and check every object as it is fetched > to > see if it was purged or not. To test this i set a script to purge a lot > of > stuff and got the same problem to happen. Hi Karl, This is about the fourth thread now on varnish-misc about the perils of using url.purge :) My suggestion in your case, if managing obj.ttl is too tough, is to look at breaking out the page with ESI and having shorter cache times on the portions of the page that will potentially end up getting updated. This combined with a reasonable grace period for those parts of the page should alleviate most of the load you might otherwise experience, and keep response times low and page freshness high. regards, Darryl Dixon Winterhouse Consulting Ltd http://www.winterhouseconsulting.com From kb+varnish at slide.com Tue Aug 18 23:34:37 2009 From: kb+varnish at slide.com (Ken Brownfield) Date: Tue, 18 Aug 2009 16:34:37 -0700 Subject: Varnish, long lived cache and purge on change In-Reply-To: <673ddc910908181606t3ba57de8ga9e39769f18d4880@mail.gmail.com> References: <673ddc910908181606t3ba57de8ga9e39769f18d4880@mail.gmail.com> Message-ID: <9B4C5CF0-CBD6-40EB-AB5D-5794E5A32A28@slide.com> Hey Karl. :-) The implementation of purge in Varnish is really a queue of refcounted ban objects. Every image hit is compared to the ban list to see if the object in cache should be reloaded from a backend. If you have purge_dups off, /every/ request to Varnish will regex against every single ban in the list. If you have purge_dups on, it will at least not compare against duplicate bans. However, a ban that has been created will stay around until /every/ object that was in the cache at the time of that ban has been re- requested, dupe or no. If you have lots of content, especially content that may not be accessed very often, the ban list can become enormous. Even with purge_dups, duplicate ban entries remain in memory. And the bans are only freed from RAM when their refcount hits 0 /AND/ they're at the very tail end of the ban queue. Because of the implementation, there's no clear way around this AFAICT. You can get a list of bans with the "purge.list" management command, but if it's more than ~2400 long you'll need to use netcat to get the list. Also, purged dups will NOT show up in this list, even though they're sitting on RAM. I have a trivial patch that will make dups show up in purge.list if you'd like to get an idea of how many bans you have. The implementation is actually really clever, IMHO, especially with regard to how it avoids locks, and there's really no other scalable way to implement a regex purge that I've been able to dream up. The only memory-reducing option within the existing implementation is to actually delete/free duplicate bans from the list, and to delete/ free bans when an object hit causes the associated ban's refcount to hit 0. However, this requires all access to the ban list to be locked, which is likely a significant performance hit. I've written this patch, and it works, but I haven't put significant load on it. I'm not sure if Varnish supports non-regex/non-wildcard purges? This would at least not have to go through the ban system, but obviously it doesn't work for arbitrary path purges. We version our static content, which avoids cache thrash and this purge side-effect. This is very easy if you have a central URL- generation system in code (templates, ajax, etc), but probably more problematic in situations where the URL needs to be "pretty". Ken On Aug 18, 2009, at 4:06 PM, Karl Pietri wrote: > Hello everyone, > Recently we decided that our primary page that everyone views > doesn't really change all that often. In fact it changes very > rarely except for the stats counters (views, downloads, etc). So we > decided that we wanted to store everything in varnish for a super > long time (and tell the client its not cacheable or cacheable for a > very short amount of time), flush the page from varnish when it > truly changes and have a very fast ajax call to update the stats. > This worked great for about 2 days. Then we ran out of ram and > varnish started causing a ton of swap activity and it increased the > response times of everything on the site to unusable. > > After poking about i seem to have found the culprit. When you use > url.purge it seems to keep a record of that and check every object > as it is fetched to see if it was purged or not. To test this i set > a script to purge a lot of stuff and got the same problem to happen. > > > from varnishstat -1 > > n_purge 236369 . N total active purges > n_purge_add 236388 2.31 N new purges added > n_purge_retire 19 0.00 N old purges deleted > n_purge_obj_test 1651452 16.12 N objects tested > n_purge_re_test 5052057513 49316.27 N regexps tested against > n_purge_dups 0 0.00 N duplicate purges removed > > each uptick is when i add 100k new purge records. you can see what > will happen soon. > > http://wrenchies.net/~karl/Automator/ScreenShots/20090818160328.png > > We really want to take advantage of this style of essentially having > static html served by varnish and flush it out when it changes. > > Does any one have advice on how to do this? > > Originally we had implemented this using the vlc to set the ttl to 0 > but with all the combinations of accept-encoding that are possible > we were getting many things not being purged from the cache. > > Another thought would be to refetch the page on change instead of > purging it but that has the same problem with accept-encoding. > > after 36 hours between the 2 machines we have collected 1.3M objects > in the cache and have not even come close to running out of space. > We would actually like to increase our ttl for the cached objects > even longer. > > I hope someone can help me out here. > > -Karl Pietri > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl at slideshare.com Tue Aug 18 23:53:54 2009 From: karl at slideshare.com (Karl Pietri) Date: Tue, 18 Aug 2009 16:53:54 -0700 Subject: Varnish, long lived cache and purge on change In-Reply-To: <9B4C5CF0-CBD6-40EB-AB5D-5794E5A32A28@slide.com> References: <673ddc910908181606t3ba57de8ga9e39769f18d4880@mail.gmail.com> <9B4C5CF0-CBD6-40EB-AB5D-5794E5A32A28@slide.com> Message-ID: <673ddc910908181653s62a7ba41w519343fd1238c286@mail.gmail.com> Hey Ken, =) Yeah this is what i was afraid of. I think we have a work around by normalizing the hash key to a few select things we want to support and on change setting the ttl of those objects to 0. This would avoid using the url.purge. All of our urls in this case are pretty, and not images. Thanks for the great info and sorry about the 4th thread on the subject, i did not search thoroughly enough in the archives. -Karl On Tue, Aug 18, 2009 at 4:34 PM, Ken Brownfield > wrote: > Hey Karl. :-) > The implementation of purge in Varnish is really a queue of refcounted ban > objects. Every image hit is compared to the ban list to see if the object > in cache should be reloaded from a backend. > > If you have purge_dups off, /every/ request to Varnish will regex against > every single ban in the list. If you have purge_dups on, it will at least > not compare against duplicate bans. > > However, a ban that has been created will stay around until /every/ object > that was in the cache at the time of that ban has been re-requested, dupe or > no. If you have lots of content, especially content that may not be > accessed very often, the ban list can become enormous. Even with > purge_dups, duplicate ban entries remain in memory. And the bans are only > freed from RAM when their refcount hits 0 /AND/ they're at the very tail end > of the ban queue. > > Because of the implementation, there's no clear way around this AFAICT. > > You can get a list of bans with the "purge.list" management command, but if > it's more than ~2400 long you'll need to use netcat to get the list. Also, > purged dups will NOT show up in this list, even though they're sitting on > RAM. I have a trivial patch that will make dups show up in purge.list if > you'd like to get an idea of how many bans you have. > > The implementation is actually really clever, IMHO, especially with regard > to how it avoids locks, and there's really no other scalable way to > implement a regex purge that I've been able to dream up. > > The only memory-reducing option within the existing implementation is to > actually delete/free duplicate bans from the list, and to delete/free bans > when an object hit causes the associated ban's refcount to hit 0. However, > this requires all access to the ban list to be locked, which is likely a > significant performance hit. I've written this patch, and it works, but I > haven't put significant load on it. > > I'm not sure if Varnish supports non-regex/non-wildcard purges? This would > at least not have to go through the ban system, but obviously it doesn't > work for arbitrary path purges. > > We version our static content, which avoids cache thrash and this purge > side-effect. This is very easy if you have a central URL-generation system > in code (templates, ajax, etc), but probably more problematic in situations > where the URL needs to be "pretty". > > Ken > > On Aug 18, 2009, at 4:06 PM, Karl Pietri wrote: > > Hello everyone, Recently we decided that our primary page that everyone > views doesn't really change all that often. In fact it changes very rarely > except for the stats counters (views, downloads, etc). So we decided that > we wanted to store everything in varnish for a super long time (and tell the > client its not cacheable or cacheable for a very short amount of time), > flush the page from varnish when it truly changes and have a very fast ajax > call to update the stats. This worked great for about 2 days. Then we ran > out of ram and varnish started causing a ton of swap activity and it > increased the response times of everything on the site to unusable. > > After poking about i seem to have found the culprit. When you use > url.purge it seems to keep a record of that and check every object as it is > fetched to see if it was purged or not. To test this i set a script to > purge a lot of stuff and got the same problem to happen. > > > from varnishstat -1 > > n_purge 236369 . N total active purges > n_purge_add 236388 2.31 N new purges added > n_purge_retire 19 0.00 N old purges deleted > n_purge_obj_test 1651452 16.12 N objects tested > n_purge_re_test 5052057513 49316.27 N regexps tested against > n_purge_dups 0 0.00 N duplicate purges removed > > each uptick is when i add 100k new purge records. you can see what will > happen soon. > > http://wrenchies.net/~karl/Automator/ScreenShots/20090818160328.png > > We really want to take advantage of this style of essentially having static > html served by varnish and flush it out when it changes. > > Does any one have advice on how to do this? > > Originally we had implemented this using the vlc to set the ttl to 0 but > with all the combinations of accept-encoding that are possible we were > getting many things not being purged from the cache. > > Another thought would be to refetch the page on change instead of purging > it but that has the same problem with accept-encoding. > > after 36 hours between the 2 machines we have collected 1.3M objects in the > cache and have not even come close to running out of space. We would > actually like to increase our ttl for the cached objects even longer. > > I hope someone can help me out here. > > -Karl Pietri > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rtshilston at gmail.com Wed Aug 19 07:57:42 2009 From: rtshilston at gmail.com (Rob S) Date: Wed, 19 Aug 2009 08:57:42 +0100 Subject: Varnish, long lived cache and purge on change In-Reply-To: <673ddc910908181653s62a7ba41w519343fd1238c286@mail.gmail.com> References: <673ddc910908181606t3ba57de8ga9e39769f18d4880@mail.gmail.com> <9B4C5CF0-CBD6-40EB-AB5D-5794E5A32A28@slide.com> <673ddc910908181653s62a7ba41w519343fd1238c286@mail.gmail.com> Message-ID: <4A8BB076.50009@gmail.com> phk and other deep Varnish developers, Do you think it'd ever be viable to have a sort of process that goes through the tail of the purge queue and applies the purges then deletes them from the queue? If so, how much work would it be to implement? There are a fair number of us who would really appreciate something like this, and I'm sure would make a contribution if someone was to implement something. Thanks, Rob Karl Pietri wrote: > Hey Ken, =) > Yeah this is what i was afraid of. I think we have a work around > by normalizing the hash key to a few select things we want to support > and on change setting the ttl of those objects to 0. This would avoid > using the url.purge. All of our urls in this case are pretty, and not > images. > > Thanks for the great info and sorry about the 4th thread on the > subject, i did not search thoroughly enough in the archives. > > -Karl > > On Tue, Aug 18, 2009 at 4:34 PM, Ken Brownfield > wrote: > > Hey Karl. :-) > > The implementation of purge in Varnish is really a queue of > refcounted ban objects. Every image hit is compared to the ban > list to see if the object in cache should be reloaded from a backend. > > If you have purge_dups off, /every/ request to Varnish will regex > against every single ban in the list. If you have purge_dups on, > it will at least not compare against duplicate bans. > > However, a ban that has been created will stay around until > /every/ object that was in the cache at the time of that ban has > been re-requested, dupe or no. If you have lots of content, > especially content that may not be accessed very often, the ban > list can become enormous. Even with purge_dups, duplicate ban > entries remain in memory. And the bans are only freed from RAM > when their refcount hits 0 /AND/ they're at the very tail end of > the ban queue. > > Because of the implementation, there's no clear way around this > AFAICT. > > You can get a list of bans with the "purge.list" management > command, but if it's more than ~2400 long you'll need to use > netcat to get the list. Also, purged dups will NOT show up in > this list, even though they're sitting on RAM. I have a trivial > patch that will make dups show up in purge.list if you'd like to > get an idea of how many bans you have. > > The implementation is actually really clever, IMHO, especially > with regard to how it avoids locks, and there's really no other > scalable way to implement a regex purge that I've been able to > dream up. > > The only memory-reducing option within the existing implementation > is to actually delete/free duplicate bans from the list, and to > delete/free bans when an object hit causes the associated ban's > refcount to hit 0. However, this requires all access to the ban > list to be locked, which is likely a significant performance hit. > I've written this patch, and it works, but I haven't put > significant load on it. > > I'm not sure if Varnish supports non-regex/non-wildcard purges? > This would at least not have to go through the ban system, but > obviously it doesn't work for arbitrary path purges. > > We version our static content, which avoids cache thrash and this > purge side-effect. This is very easy if you have a central > URL-generation system in code (templates, ajax, etc), but probably > more problematic in situations where the URL needs to be "pretty". > > Ken > > On Aug 18, 2009, at 4:06 PM, Karl Pietri wrote: > >> Hello everyone, >> Recently we decided that our primary page that everyone views >> doesn't really change all that often. In fact it changes very >> rarely except for the stats counters (views, downloads, etc). So >> we decided that we wanted to store everything in varnish for a >> super long time (and tell the client its not cacheable or >> cacheable for a very short amount of time), flush the page from >> varnish when it truly changes and have a very fast ajax call to >> update the stats. This worked great for about 2 days. Then we >> ran out of ram and varnish started causing a ton of swap activity >> and it increased the response times of everything on the site to >> unusable. >> >> After poking about i seem to have found the culprit. When you >> use url.purge it seems to keep a record of that and check every >> object as it is fetched to see if it was purged or not. To test >> this i set a script to purge a lot of stuff and got the same >> problem to happen. >> >> >> from varnishstat -1 >> >> n_purge 236369 . N total active purges >> n_purge_add 236388 2.31 N new purges added >> n_purge_retire 19 0.00 N old purges deleted >> n_purge_obj_test 1651452 16.12 N objects tested >> n_purge_re_test 5052057513 49316.27 N regexps tested against >> n_purge_dups 0 0.00 N duplicate purges removed >> >> each uptick is when i add 100k new purge records. you can see >> what will happen soon. >> >> http://wrenchies.net/~karl/Automator/ScreenShots/20090818160328.png >> >> >> We really want to take advantage of this style of essentially >> having static html served by varnish and flush it out when it >> changes. >> >> Does any one have advice on how to do this? >> >> Originally we had implemented this using the vlc to set the ttl >> to 0 but with all the combinations of accept-encoding that are >> possible we were getting many things not being purged from the cache. >> >> Another thought would be to refetch the page on change instead of >> purging it but that has the same problem with accept-encoding. >> >> after 36 hours between the 2 machines we have collected 1.3M >> objects in the cache and have not even come close to running out >> of space. We would actually like to increase our ttl for the >> cached objects even longer. >> >> I hope someone can help me out here. >> >> -Karl Pietri >> _______________________________________________ >> varnish-misc mailing list >> varnish-misc at projects.linpro.no >> >> http://projects.linpro.no/mailman/listinfo/varnish-misc > > From phk at phk.freebsd.dk Wed Aug 19 10:54:48 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Wed, 19 Aug 2009 10:54:48 +0000 Subject: Thread pools In-Reply-To: Your message of "Mon, 17 Aug 2009 17:55:39 MST." <294d5daa0908171755y44f5c132o587f3c8188493a4@mail.gmail.com> Message-ID: <1886.1250679288@critter.freebsd.dk> In message <294d5daa0908171755y44f5c132o587f3c8188493a4 at mail.gmail.com>, Mark M oseley writes: >I've seen various things in the wiki and threads on this list talking >about thread pools. In general, the advice is typically conservative, >i.e. don't use more than the default 2 thread pools unless you have >to. I've also seen the occasional comment suggesting one run as many >thread pools as there are cores/cpus. I think the point here is "don't make 1000 or even 100 pools". One pool per core should be all you need to practically eliminate thread contention, but to truly realize this, we would have to pin pools on cores and other nasty and often backfiring "optimizations". Having a few too many pools probably does not hurt too much, but may increase the thread create/kill ratio a bit. >Also, the wiki mentions that it's mainly appropriate when you run into >locks tying things up. Is that mainly a case of high LRU turnover or >are there other scenarios where locking is an issue? What are the >symptoms of locking becoming an issue with the current configuration >and what fields in varnishstat should I be looking at? POSIX unfortunately does not offer any standard tools for analysing lock behaviour, so we have had to make some pretty crude ones ourselves. The main sign of lock issues is that the number of context switches increase drastically, you OS can probably give you a view of that number. If you want to go deeper, we have a flag in diag_bitmaps that enables shmlogging of lock contentions (or even all lock operations), together with varnishtop suitably filtered, that gives a good idea which locks we have trouble with. >but I worry about wasting any CPU% on those 8 core 1950s [...] Varnish is all about wasting CPU%, normally we barely touch the CPUs, man systems running with 80-90% idle CPUs. Have you played a bit with varnihshist ? I suspect that may be the most sensitive, if crude, indicator of overall performance we can offer. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Wed Aug 19 15:51:26 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Wed, 19 Aug 2009 15:51:26 +0000 Subject: Varnish, long lived cache and purge on change In-Reply-To: Your message of "Wed, 19 Aug 2009 08:57:42 +0100." <4A8BB076.50009@gmail.com> Message-ID: <2837.1250697086@critter.freebsd.dk> In message <4A8BB076.50009 at gmail.com>, Rob S writes: >phk and other deep Varnish developers, > >Do you think it'd ever be viable to have a sort of process that goes >through the tail of the purge queue and applies the purges then deletes >them from the queue? If so, how much work would it be to implement? Right now we do not know which objects hold onto a ban, only the number of objects that do. Do implement it, we would need to put a linked list in each ban and wire the referencing objects onto it. My only worry is that it adds a linked list to the objcore structure taking it from 88 to 104 bytes. I seem to recall that the locking is benign. Probably the more interesting question is how aggressive you want it to be: if it is too militant, it will cause a lot of needless disk activity. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From rtshilston at gmail.com Wed Aug 19 16:00:10 2009 From: rtshilston at gmail.com (Rob S) Date: Wed, 19 Aug 2009 17:00:10 +0100 Subject: Varnish, long lived cache and purge on change In-Reply-To: <2837.1250697086@critter.freebsd.dk> References: <2837.1250697086@critter.freebsd.dk> Message-ID: <4A8C218A.3000105@gmail.com> Poul-Henning Kamp wrote: > My only worry is that it adds a linked list to the objcore structure > taking it from 88 to 104 bytes. > I realise this could be undesirable, but at the moment varnish is proving quite difficult to use in sites that frequently purge, with different users adding their own workarounds (versioning URLs, restarting Varnish, tweaking the hash key etc). Everything is a trade off, but I think it's desirable to increase the memory footprint per object so as to not bring down the server with massive memory growth. > Probably the more interesting question is how aggressive you want it to > be: if it is too militant, it will cause a lot of needless disk activity I feel that some sort of hysteresis on the size of the purge list would make most sense, perhaps starting to process if the list exceeds more than X bytes, and stop when the list is < Y bytes. Having thought a little more about this, I realise I don't know whether graced requests respect bans. If they don't, then processing the ban list will change Varnish's behaviour. Rob From phk at phk.freebsd.dk Wed Aug 19 16:16:50 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Wed, 19 Aug 2009 16:16:50 +0000 Subject: Varnish, long lived cache and purge on change In-Reply-To: Your message of "Wed, 19 Aug 2009 17:00:10 +0100." <4A8C218A.3000105@gmail.com> Message-ID: <2993.1250698610@critter.freebsd.dk> In message <4A8C218A.3000105 at gmail.com>, Rob S writes: >Having thought a little more about this, I realise I don't know whether >graced requests respect bans. If they don't, then processing the ban >list will change Varnish's behaviour. Bans are always respected. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Wed Aug 19 21:04:33 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Wed, 19 Aug 2009 21:04:33 +0000 Subject: Varnish, long lived cache and purge on change In-Reply-To: Your message of "Wed, 19 Aug 2009 15:51:26 GMT." <2837.1250697086@critter.freebsd.dk> Message-ID: <30191.1250715873@critter.freebsd.dk> In message <2837.1250697086 at critter.freebsd.dk>, "Poul-Henning Kamp" writes: >In message <4A8BB076.50009 at gmail.com>, Rob S writes: Just to follow up to myself after trying to hack up a solution in -trunk: >I seem to recall that the locking is benign. Make that "Mostly benign" :-) >Probably the more interesting question is how aggressive you want it to >be: if it is too militant, it will cause a lot of needless disk activity. There was actually a far more interesting question, or rather issue: The lurker thread does not have a HTTP request. That means that we can not evaluate a ban test like "req.url ~ foo": we simply don't have a req.url to compare with. So provided you only have obj.* tests in your bans, it is possible, for req.* tests it is a no go... The obvious workaround is evident, store the req.* fields you need in obscure obj.* headers (possibly stripping them in vcl_deliver). With that caveat, give r4206 a shot if you dare... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From moseleymark at gmail.com Fri Aug 21 22:44:41 2009 From: moseleymark at gmail.com (Mark Moseley) Date: Fri, 21 Aug 2009 15:44:41 -0700 Subject: Thread pools In-Reply-To: <1886.1250679288@critter.freebsd.dk> References: <294d5daa0908171755y44f5c132o587f3c8188493a4@mail.gmail.com> <1886.1250679288@critter.freebsd.dk> Message-ID: <294d5daa0908211544o4f89d1a6s2d7d580b15358828@mail.gmail.com> On Wed, Aug 19, 2009 at 3:54 AM, Poul-Henning Kamp wrote: > In message <294d5daa0908171755y44f5c132o587f3c8188493a4 at mail.gmail.com>, Mark M > oseley writes: >>I've seen various things in the wiki and threads on this list talking >>about thread pools. In general, the advice is typically conservative, >>i.e. don't use more than the default 2 thread pools unless you have >>to. I've also seen the occasional comment suggesting one run as many >>thread pools as there are cores/cpus. > > I think the point here is "don't make 1000 or even 100 pools". > > One pool per core should be all you need to practically eliminate > thread contention, but to truly realize this, we would have to pin > pools on cores and other nasty and often backfiring "optimizations". > > Having a few too many pools probably does not hurt too much, but > may increase the thread create/kill ratio a bit. > >>Also, the wiki mentions that it's mainly appropriate when you run into >>locks tying things up. Is that mainly a case of high LRU turnover or >>are there other scenarios where locking is an issue? What are the >>symptoms of locking becoming an issue with the current configuration >>and what fields in varnishstat should I be looking at? > > POSIX unfortunately does not offer any standard tools for analysing > lock behaviour, so we have had to make some pretty crude ones > ourselves. > > The main sign of lock issues is that the number of context switches > increase drastically, you OS can probably give you a view of that > number. > > If you want to go deeper, we have a flag in diag_bitmaps that enables > shmlogging of lock contentions (or even all lock operations), together > with varnishtop suitably filtered, that gives a good idea which locks > we have trouble with. > >>but I worry about wasting any CPU% on those 8 core 1950s [...] > > Varnish is all about wasting CPU%, normally we barely touch the > CPUs, man systems running with 80-90% idle CPUs. > > Have you played a bit with varnihshist ? ? I suspect that may be > the most sensitive, if crude, indicator of overall performance > we can offer. This is all excellent info, lots to chew on. Thanks! I've stared at varnishhist a number of times and both our hits and misses are showing up in pretty expected places. From h.paulissen at qbell.nl Sat Aug 22 16:17:10 2009 From: h.paulissen at qbell.nl (Henry Paulissen) Date: Sat, 22 Aug 2009 18:17:10 +0200 Subject: Thank you varnish team Message-ID: <007001ca2344$0099ace0$01cd06a0$@paulissen@qbell.nl> Because there is no other real place to do this, im going to rubbish this maillist. I would like to thank everybody who is involved in the development of varnish. It's a super product and its performance is outstanding (especially if your used to squid). First time we struggled some bid with the config, but it sure is flexible and highly customizable. We are currently doing a 5000 connections per second on a regular day (regular static images), and 500 connections per second for big photo's (1280x1024). Both servers are in a seperate server daemon on the same physical host. Before this setup we used 2 physical lighthttpd servers to serve all the images, but on the most busiest hours it was a bit laggy and load times varied from 200ms to 5s a images. Most likely this is due the fact there are some problems about threading in lighthttpd (it only uses 1 thread). With our new varnish setup we have 2 physical servers serving cache miss images and a varnish server who caches it. We choose for 2 backend servers for the redundancy we could let 1 server serve all images. In the near future we will go to more varnish servers to add redundancy and maybe we are going to make a CDN with it. Varnish server: Intel XEON 3.2GHZ 4GB Memory CPU Load: Cpu0 : 2.1% us, 1.1% sy, 0.0% ni, 96.3% id, 0.4% wa, 0.0% hi, 0.0% si Cpu1 : 0.5% us, 0.4% sy, 0.0% ni, 98.6% id, 0.5% wa, 0.0% hi, 0.0% si Cpu2 : 2.8% us, 0.9% sy, 0.0% ni, 96.2% id, 0.1% wa, 0.0% hi, 0.0% si Cpu3 : 0.3% us, 0.2% sy, 0.0% ni, 99.1% id, 0.3% wa, 0.0% hi, 0.0% si Maybe I could install boinc on it, so it can crunch some cpu to find ET :p. Keep up the development. But watch out you aren't making a huge loggy squid from it, with features nobody is using ;). My customer prefers to stay anonymous. But I can say he's in top 500 of alexa world ranking. Regards, Henry -------------- next part -------------- An HTML attachment was scrubbed... URL: From v.bilek at 1art.cz Thu Aug 27 06:58:13 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Thu, 27 Aug 2009 08:58:13 +0200 Subject: bad national characters in synthetic Message-ID: <4A962E85.8040809@1art.cz> Helo I have tried to set up localized error pages but varnish seems to return screwd national chars. part of vlc: set obj.http.Content-Type = "text/html; charset=utf-8"; synthetic {"

D?kujeme za pochopen?, T?m

"}; returned from browser:

D?77777704?77777633kujeme za pochopen?77777703?77777655, T?77777703?77777675m

Any sugestions? Vasek Bilek From phk at phk.freebsd.dk Thu Aug 27 07:46:09 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 27 Aug 2009 07:46:09 +0000 Subject: bad national characters in synthetic In-Reply-To: Your message of "Thu, 27 Aug 2009 08:58:13 +0200." <4A962E85.8040809@1art.cz> Message-ID: <34372.1251359169@critter.freebsd.dk> Any sugestions? File a bugreport. Use &...; syntax until I have time to look at it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From gerald.leier at lixto.com Thu Aug 27 14:29:13 2009 From: gerald.leier at lixto.com (Gerald Leier) Date: Thu, 27 Aug 2009 16:29:13 +0200 Subject: cant get restart; to fetch and deliver from other backend on HTTP error Message-ID: <1251383353.21431.108.camel@pioneer> hello, I want to use varnish to "hide" HTTP 50x codes. As far as i understood this is done in the "sub vcl_fetch" After setting up 2 servers(one returning the requested page the other returning 500 errors) i tested a bit but i have some bug in there i cant get a grip on. after the first node returns a http 500 error varnish continues with the second node....here is the part where it stops doing what i want: ..... 10 VCL_return c restart 10 VCL_call c recv 10 VCL_return c pass 10 VCL_call c pass 10 VCL_return c pass 11 BackendOpen b test2 10.10.1.1 51154 10.10.1.50 38080 10 Backend c 11 www_director test2 11 TxRequest b POST 11 TxURL b /testservice 11 TxProtocol b HTTP/1.1 11 TxHeader b User-Agent: curl 11 TxHeader b Host: 10.10.1.1:38080 11 TxHeader b Accept: */* 11 TxHeader b Content-Type: text/xml;charset=UTF-8 11 TxHeader b Content-Length: 326 11 TxHeader b X-Varnish: 614647428 11 TxHeader b X-Forwarded-For: 10.21.1.40 11 BackendClose b test2 10 VCL_call c error 10 VCL_return c deliver ..... has anyone a hint on what i am doing wrong? thanks gerald ..... ./varnishd -f ../etc/varnish/testing.vcl -a 10.10.1.1:38080 ../etc/varnish/testing.vcl: backend test1 { .host = "10.10.1.50"; .port = "38080"; } backend test2 { .host = "10.10.1.60"; .port = "38080"; } director www_director round-robin { { .backend = test1; } { .backend = test2; } } sub vcl_recv { # Force lookup if the request is a no-cache request from the client if (req.http.Cache-Control ~ "no-cache") { purge_url(req.url); } if (req.restarts == 0) { set req.backend = www_director; } else { set req.backend = www_director; } if (req.request != "GET" && req.request != "HEAD") { /* We only deal with GET and HEAD by default */ return (pass); } if (req.http.Authorization || req.http.Cookie) { /* Not cacheable by default */ return (pass); } # set an artificial header to pass on the true client IP address remove req.http.X-Varnish-Client-IP; set req.http.X-Varnish-Client-IP = client.ip; } sub vcl_fetch { if (obj.status == 500 || obj.status == 503 || obj.status == 504) { restart; } } sub vcl_deliver { # hide existence of this proxy server remove resp.http.X-Varnish; remove resp.http.Via; deliver; } ------------------------------------------------------------------------ log: ------------------------------------------------------------------------ 10 SessionOpen c 10.21.1.40 7984 10.10.1.1:38080 10 ReqStart c 10.21.1.40 7984 614647428 10 RxRequest c POST 10 RxURL c /testservice 10 RxProtocol c HTTP/1.1 10 RxHeader c User-Agent: curl 10 RxHeader c Host: 10.10.1.1:38080 10 RxHeader c Accept: */* 10 RxHeader c Content-Type: text/xml;charset=UTF-8 10 RxHeader c Content-Length: 326 10 VCL_call c recv 10 VCL_return c pass 10 VCL_call c pass 10 VCL_return c pass 11 BackendOpen b test1 10.10.1.1 60016 10.10.1.60 38080 10 Backend c 11 www_director test1 11 TxRequest b POST 11 TxURL b /testservice 11 TxProtocol b HTTP/1.1 11 TxHeader b User-Agent: curl 11 TxHeader b Host: 10.10.1.1:38080 11 TxHeader b Accept: */* 11 TxHeader b Content-Type: text/xml;charset=UTF-8 11 TxHeader b Content-Length: 326 11 TxHeader b X-Varnish: 614647428 11 TxHeader b X-Forwarded-For: 10.21.1.40 0 CLI - Rd ping 0 CLI - Wr 0 200 PONG 1251380397 1.0 0 CLI - Rd ping 0 CLI - Wr 0 200 PONG 1251380400 1.0 11 RxProtocol b HTTP/1.1 11 RxStatus b 500 11 RxResponse b Internal Server Error 11 RxHeader b X-Powered-By: Servlet/2.5 11 RxHeader b Server: Sun GlassFish Enterprise Server 11 RxHeader b Content-Type: text/xml;charset="utf-8" 11 RxHeader b Transfer-Encoding: chunked 11 RxHeader b Date: Thu, 27 Aug 2009 13:39:56 GMT 11 RxHeader b Connection: close 10 ObjProtocol c HTTP/1.1 10 ObjStatus c 500 10 ObjResponse c Internal Server Error 10 ObjHeader c X-Powered-By: Servlet/2.5 10 ObjHeader c Server: Sun GlassFish Enterprise Server 10 ObjHeader c Content-Type: text/xml;charset="utf-8" 10 ObjHeader c Date: Thu, 27 Aug 2009 13:39:56 GMT 11 BackendClose b test1 10 TTL c 614647428 RFC 120 1251380396 0 0 0 0 10 VCL_call c fetch 10 VCL_return c restart 10 VCL_call c recv 10 VCL_return c pass 10 VCL_call c pass 10 VCL_return c pass 11 BackendOpen b test2 10.10.1.1 51154 10.10.1.50 38080 10 Backend c 11 www_director test2 11 TxRequest b POST 11 TxURL b /testservice 11 TxProtocol b HTTP/1.1 11 TxHeader b User-Agent: curl 11 TxHeader b Host: 10.10.1.1:38080 11 TxHeader b Accept: */* 11 TxHeader b Content-Type: text/xml;charset=UTF-8 11 TxHeader b Content-Length: 326 11 TxHeader b X-Varnish: 614647428 11 TxHeader b X-Forwarded-For: 10.21.1.40 11 BackendClose b test2 10 VCL_call c error 10 VCL_return c deliver 10 Length c 465 10 VCL_call c deliver 10 VCL_return c deliver 10 TxProtocol c HTTP/1.1 10 TxStatus c 503 10 TxResponse c Service Unavailable 10 TxHeader c Server: Varnish 10 TxHeader c Retry-After: 0 10 TxHeader c Content-Type: text/html; charset=utf-8 10 TxHeader c Content-Length: 465 10 TxHeader c Date: Thu, 27 Aug 2009 13:40:01 GMT 10 TxHeader c Age: 5 10 TxHeader c Connection: close 10 ReqEnd c 614647428 1251380396.585108042 1251380401.618464947 0.000111103 5.033315897 0.000041008 10 SessionClose c error From maillists0 at gmail.com Fri Aug 28 10:20:40 2009 From: maillists0 at gmail.com (maillists0 at gmail.com) Date: Fri, 28 Aug 2009 06:20:40 -0400 Subject: cache location Message-ID: Newbie question. I've searched the docs (which are great, btw), but I can't find this. Apologies if it's obvious... a pointer to the right place would be much appreciated. I need to tell Varnish to keep its disk cache on a specific device, and I'd also like to limit the size. Where do I control the location and size of the cache? -------------- next part -------------- An HTML attachment was scrubbed... URL: From phk at phk.freebsd.dk Fri Aug 28 11:54:27 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri, 28 Aug 2009 11:54:27 +0000 Subject: cache location In-Reply-To: Your message of "Fri, 28 Aug 2009 06:20:40 -0400." Message-ID: <4335.1251460467@critter.freebsd.dk> In message , maill ists0 at gmail.com writes: >I need to tell Varnish to keep its disk cache on a specific device, and I'd >also like to limit the size. Where do I control the location and size of the >cache? -sfile,$filename,$size For instance: -sfile,/tmp/foo,1g -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From pubcrawler.com at gmail.com Sat Aug 29 04:09:15 2009 From: pubcrawler.com at gmail.com (pub crawler) Date: Sat, 29 Aug 2009 00:09:15 -0400 Subject: Varnish memory consumption issues? Message-ID: <4c3149fb0908282109w59772378x4a3c40a31e0a5032@mail.gmail.com> Hello, new to Varnish. We have been running Varnish for about 5 days now. So far, excellent product. We have a potential issue and I haven't seen anything like this before. We just restarted Varnish - we have a 1GB cache file on disk. When I run top, I see Varnish is using : 10m RES 153g VIRT Why is Varnish reporting this sort of VIRT memory? How do we correct that usage number or reset it? From m at ooh.dk Sat Aug 29 07:28:09 2009 From: m at ooh.dk (=?UTF-8?B?TWlra2VsIEjDuGdo?=) Date: Sat, 29 Aug 2009 09:28:09 +0200 Subject: Drupal.org now powered by Varnish Message-ID: <52d800cd0908290028p6bc28751n63cd47136000983b@mail.gmail.com> As another Varnish success story, I thought I'd mention that open source CMS Drupal has replaced Squid with Varnish with great results: http://nnewton.org/node/9 -- Kind regards, Mikkel H?gh From stuart.yeates at vuw.ac.nz Mon Aug 31 01:11:33 2009 From: stuart.yeates at vuw.ac.nz (stuart yeates) Date: Mon, 31 Aug 2009 13:11:33 +1200 Subject: behaviour when client drops the connection? Message-ID: <4A9B2345.3030402@vuw.ac.nz> Hello I'm looking to use varnish in front of an XSLT/PostgreSQL/Tomcat/Java site and have a question about varnish's behaviour when a client drops a connection. The problem is that there are a relatively small number of pages which are _very_ slow to generate (>60 minutes) and a large number that are relatively fast. Clients tend to abort the slow pages, so (using our current reverse proxy) they never make it into the cache. Ideally I'd like to make varnish continue the request and cache the result. Is that possible? Is there a way to make these slow-to-generate pages have a much longer cache-time? cheers stuart -- Stuart Yeates http://www.nzetc.org/ New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/ Institutional Repository From david.birdsong at gmail.com Mon Aug 31 06:11:59 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Sun, 30 Aug 2009 23:11:59 -0700 Subject: forcing cache under original url Message-ID: I'm trying to wrap my brain around the VCL needed to do what I'm thinking of, but it'd be helpful to know up front if what I'm trying to achieve isn't possible. I'm hoping to provide a backend of last resort for certain types of requests for the cases of 1. an actual down backend 2. where obj.status != 200 && req.request == "GET" && req.url ~ "\.(gif|jpg|swf|css|js|png|jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm) #2 means i can't rely on probes to set my backend to the backend of last resort. here's the pertinent vcl vcl_fetch sub vcl_fetch { # force minimum ttl of 300 seconds if (obj.ttl < 604800s) { set obj.ttl = 604800s; } if (obj.status != 200) if (req.restarts > 0) { deliver; } if (req.request == "GET" && req.url ~ "\.(gif|jpg|swf|css|js|png|jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") { restart; } else { deliver; } } } sub vcl_recv { if (req.restarts == 0) { if (req.http.host ~ "^img3\.(.*)$") { set req.backend = b0; } else { error 750 "invalid host header"; } } if (req.request == "GET" && req.url ~ "\.(gif|jpg|swf|css|js|png|jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") { if (! req.backend.healthy && req.restarts == 0) { set req.backend = last_resort_backend; } lookup; } lookup; } I'm able to get varnish to re-send a failed request to last_resort_backend. I'm still working on the necessary VCL to follow the 2 redirects that the last_resort_backend will send -I'd like varnish to complete these and not pass them on to the client. What I'd like to do is instruct it to cache the static file under the original url, not the url of the last 302. My inexperience using varnish has me thinking about having to set the req.url for first the 301 and then again for the 302 combined with restarts, but this made me think that I'd have to store the original url somewhere and set it again. Is this type of thing possible? Apologies if this is not clear. From david.birdsong at gmail.com Mon Aug 31 06:52:56 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Sun, 30 Aug 2009 23:52:56 -0700 Subject: forcing cache under original url In-Reply-To: References: Message-ID: Actually, I've thought better of getting in the way of all the redirects. I can just hand all the objects back to the clients and let varnish cache objects as they go past. It just works. On Sun, Aug 30, 2009 at 11:11 PM, David Birdsong wrote: > I'm trying to wrap my brain around the VCL needed to do what I'm > thinking of, but it'd be helpful to know up front if what I'm trying > to achieve isn't possible. > > I'm hoping to provide a backend of last resort for certain types of > requests for the cases of > 1. an actual down backend > 2. where obj.status != 200 && req.request == "GET" && req.url ~ > "\.(gif|jpg|swf|css|js|png|jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm) > > #2 means i can't rely on probes to set my backend to the backend of last resort. > > here's the pertinent vcl vcl_fetch > > sub vcl_fetch { > ? ? ? ?# force minimum ttl of 300 seconds > ? ? ? ?if (obj.ttl < ?604800s) { > ? ? ? ? ? ? ? ?set obj.ttl = 604800s; > ? ? ? ?} > ? ? ? ?if (obj.status != 200) > ? ? ? ? ?if (req.restarts > 0) { > ? ? ? ? ? ?deliver; > ? ? ? ? ?} > ? ? ? ? ?if (req.request == "GET" && req.url ~ > "\.(gif|jpg|swf|css|js|png|jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") > { > ? ? ? ? ? ?restart; > ? ? ? ? ?} else { > ? ? ? ? ? ?deliver; > ? ? ? ? ?} > ? ? ? ?} > } > > sub vcl_recv { > ? ? ? ?if (req.restarts == 0) { > ? ? ? ? ?if (req.http.host ~ "^img3\.(.*)$") { > ? ? ? ? ? ? ? ? ?set req.backend = b0; > ? ? ? ? ? } else { > ? ? ? ? ? ? ? ? ?error 750 "invalid host header"; > ? ? ? ? ?} > ? ? ? ?} > ?if (req.request == "GET" && req.url ~ > "\.(gif|jpg|swf|css|js|png|jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") > { > ? ? ? ? ?if (! req.backend.healthy && req.restarts == 0) { > ? ? ? ? ? ?set req.backend = last_resort_backend; > ? ? ? ? ?} > ? ? ? ? ? ?lookup; > ? ? ? ?} > ? ? ? lookup; > } > > I'm able to get varnish to re-send a failed request to > last_resort_backend. ?I'm still working on the necessary VCL to follow > the 2 redirects that the last_resort_backend will send -I'd like > varnish to complete these and not pass them on to the client. > > ?What I'd like to do is instruct it to cache the static file under the > original url, not the url of the last 302. > > My inexperience using varnish has me thinking about having to set the > req.url for first the 301 and then again for the 302 combined with > restarts, but this made me think that I'd have to store the original > url somewhere and set it again. ?Is this type of thing possible? > > Apologies if this is not clear. > From david.birdsong at gmail.com Mon Aug 31 12:36:51 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Mon, 31 Aug 2009 05:36:51 -0700 Subject: many workers threads failed with EAGAIN Message-ID: varnishlog has a lot of these: 0 Debug - "Create worker thread failed 11 Resource temporarily unavailable" sure enough, overflowed and dropped work requests are steadily on the rise Hitrate ratio: 10 100 1000 Hitrate avg: 0.8584 0.8506 0.8581 1123681 423.11 380.14 Client connections accepted 706252 189.05 238.92 Client requests received 587166 159.04 198.64 Cache hits 6928 3.00 2.34 Cache hits for pass 111419 27.01 37.69 Cache misses 119397 30.01 40.39 Backend conn. success 2 0.00 0.00 Backend conn. not attempted 5744 . . N struct sess_mem 3663 . . N struct sess 95290 . . N struct object 95912 . . N struct objecthead 6 . . N struct vbe_conn 1007 . . N worker threads 1007 0.00 0.34 N worker threads created 12425 4.00 4.20 N worker threads not created 2002 1.00 0.68 N queued work requests 651706 157.04 220.47 N overflowed work requests 460396 262.07 155.75 N dropped work requests 9 . . N backends 16469 . . N LRU nuked objects 0 . . N LRU moved objects 5 0.00 0.00 HTTP header overflows 580405 158.04 196.35 Objects sent with write 674521 186.05 228.19 Total Sessions 706252 189.05 238.92 Total Requests 904 0.00 0.31 Total pipe 6928 3.00 2.34 Total pass 118116 30.01 39.96 Total fetch 189339016 50665.57 64052.44 Total header bytes 49792939291 18055006.14 16844702.06 Total body bytes 143000 47.01 48.38 Session Closed 779 0.00 0.26 Session Pipeline 83 0.00 0.03 Session Read Ahead 609288 155.04 206.12 Session Linger My arguments to start varnishd: ulimit -n 500000 # stack size ulimit -s 256 /usr/local/sbin/varnishd \ -T localhost:6082 \ -f $VCL \ -s malloc,10GB \ -P /var/run/varnish.pid \ -h classic,1000003 \ -p lru_interval=3600 \ -p thread_pool_min=500 \ -p thread_pool_max=2000 \ -p thread_pools=2 \ -p obj_workspace=4096 \ -u apache \ -p listen_depth=4096 From maillists0 at gmail.com Mon Aug 31 21:20:52 2009 From: maillists0 at gmail.com (maillists0 at gmail.com) Date: Mon, 31 Aug 2009 17:20:52 -0400 Subject: Cache configuration Message-ID: I'm calling Varnish with the option "-s file,/dir/cachefile,1g" When I restart it, I see that everything is fetched fresh from the source server. Is that expected behaviour? Can I make the cache persist across restarts/reboots? Related question: how should I plan memory/swap usage? I have millions of 4-5k objects. I'd like to limit the cache size of a single varnish server to around 50Gigs. Should I expect that varnish will try to use that same amount in swap? What about corresponding memory usage? Also, may I add that varnishlog is the coolest thing since sliced bread? -------------- next part -------------- An HTML attachment was scrubbed... URL: From phk at phk.freebsd.dk Mon Aug 31 21:31:01 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 31 Aug 2009 21:31:01 +0000 Subject: Cache configuration In-Reply-To: Your message of "Mon, 31 Aug 2009 17:20:52 -0400." Message-ID: <45113.1251754261@critter.freebsd.dk> In message , maill ists0 at gmail.com writes: >I'm calling Varnish with the option "-s file,/dir/cachefile,1g" When I >restart it, I see that everything is fetched fresh from the source server. >Is that expected behaviour? Can I make the cache persist across >restarts/reboots? I am working on a persistent storage module as we speak... >Related question: how should I plan memory/swap usage? I have millions of >4-5k objects. I'd like to limit the cache size of a single varnish server to >around 50Gigs. Should I expect that varnish will try to use that same amount >in swap? What about corresponding memory usage? If you have millions of objects, you will want to tune the parameter obj_workspace as far down as you sensibly can to reduce overhead. As for memory use, I do not have an exact formula I can give you but most of the non-storage-file space used is proportional to number of objects, and not very tweakable. >Also, may I add that varnishlog is the coolest thing since sliced bread? Have you tried varnishtop ? :-) Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Mon Aug 31 21:41:50 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 31 Aug 2009 21:41:50 +0000 Subject: many workers threads failed with EAGAIN In-Reply-To: Your message of "Mon, 31 Aug 2009 05:36:51 MST." Message-ID: <45161.1251754910@critter.freebsd.dk> In message , David Birdsong writes: >varnishlog has a lot of these: > 0 Debug - "Create worker thread failed 11 Resource temporarily >unavailable" > >sure enough, overflowed and dropped work requests are steadily on the rise >Hitrate ratio: 10 100 1000 >Hitrate avg: 0.8584 0.8506 0.8581 > 1007 . . N worker threads > 1007 0.00 0.34 N worker threads created > 12425 4.00 4.20 N worker threads not created > 2002 1.00 0.68 N queued work requests > 651706 157.04 220.47 N overflowed work requests > 460396 262.07 155.75 N dropped work requests It is not clear to me why you ended up needing so many threads, but the usualy explanation is comms problems either in client or backend direction. If you have not enabled backend probing, you should do so, since that prevents the threads from getting stuck on a troubled backend. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Mon Aug 31 21:47:11 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 31 Aug 2009 21:47:11 +0000 Subject: Raising funds for developing Varnish Message-ID: <45219.1251755231@critter.freebsd.dk> Software development may be open, and the result shared with an Open Source Software licence, but the actual hours of programming are not gratis. Just like everybody else, I need money for mortgage, kids and food, money I make, by doing things with computers, for people who are willing to pay for that. One of the things I do for money, is develop Varnish. >From the very beginning of the project the Norwegian company Redpill-Linpro, has channeled money from sponsors and customers and paid for my time. Redpill-Linpro also offer commercial services based on Varnish, consultancy, hosting, support and other services. That way, the companies which have a contract with Redpill-Linpro, help pay for the future development of Varnish. For all practical purposes this has worked great until now. Unfortunately, some obscure tax rules makes it pretty nasty for me and my accountant: if more than 30% of my work is for the same customer in a Nordic country, I may be deemed an employee of said company with possible double taxation, and other unpleasant paper work as a result. This effectively puts a cap on the amount of work I can do on Redpill-Linpro and consequently: on Varnish. It is my impression, that a fair number of Varnish users are not likely to need the professional services of Redpill-Linpro, and thus unlikely to help pay for future Varnish development via that route. This is where the "Varnish Moral License" comes into the picture: The Varnish Moral License, is a voluntary license payment, directly to the author of Varnish, which helps pay for the development of Varnish. Buying a Varnish Moral License is 100% voluntary, if you do not make money from your website, there is no reason why you should pay for a license to use Varnish on it. If however, Varnish helps your website generate a profit, you should consider getting a Varnish Moral Licence. In all cases, it is entirely up to you (and your morals) if you should get a license or not. That is why I called it a "Moral License". Please buy one. More details and FAQ at: http://phk.freebsd.dk/VML/ Poul-Henning Kamp -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From david.birdsong at gmail.com Mon Aug 31 22:10:08 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Mon, 31 Aug 2009 15:10:08 -0700 Subject: many workers threads failed with EAGAIN In-Reply-To: <45161.1251754910@critter.freebsd.dk> References: <45161.1251754910@critter.freebsd.dk> Message-ID: I'm caching a pretty large working set of small objects, so I'm pretty sure I need a large number of threads. I started out with: # -p thread_pool_min not specified -p thread_pool_max=1000 \ -p thread_pools=2 \ ...but quickly saw many verflowed work requests. My estimates are rough, but this was measured after a few minutes with varnish in front of two backends that normally serve between 700-1100 requests/sec of about 100k to 300k unique objects per day(my num objects could be higher, I'm uncertain.) So before posting I had completely dwelled on a max_threads knob in linux which was already set quite high. cat /proc/sys/kernel/threads-max 143360 I thought more about how threads get their own process entries in Linux and so increased max user processes (-u) 71680. No more "Create worker thread failed 11 Resource temporarily" messages. Now varnish consumes the 8Gb of RAM until it starts swapping the last 2GB and IOwait consumes the box, cache hit ratio degrades eventually and varnishd wedges...all traffic from the box zero's out. On Mon, Aug 31, 2009 at 2:41 PM, Poul-Henning Kamp wrote: > In message , David > ?Birdsong writes: >>varnishlog has a lot of these: >> 0 Debug ? ? ? ?- "Create worker thread failed 11 Resource temporarily >>unavailable" >> >>sure enough, overflowed and dropped work requests are steadily on the rise >>Hitrate ratio: ? ? ? 10 ? ? ?100 ? ? 1000 >>Hitrate avg: ? ? 0.8584 ? 0.8506 ? 0.8581 > >> ? ? ? ?1007 ? ? ? ? ?. ? ? ? ? ? ?. ? N worker threads >> ? ? ? ?1007 ? ? ? ? 0.00 ? ? ? ? 0.34 N worker threads created >> ? ? ? 12425 ? ? ? ? 4.00 ? ? ? ? 4.20 N worker threads not created >> ? ? ? ?2002 ? ? ? ? 1.00 ? ? ? ? 0.68 N queued work requests >> ? ? ?651706 ? ? ? 157.04 ? ? ? 220.47 N overflowed work requests >> ? ? ?460396 ? ? ? 262.07 ? ? ? 155.75 N dropped work requests > > It is not clear to me why you ended up needing so many threads, > but the usualy explanation is comms problems either in client > or backend direction. > > If you have not enabled backend probing, you should do so, since that > prevents the threads from getting stuck on a troubled backend. > > Poul-Henning > > -- > Poul-Henning Kamp ? ? ? | UNIX since Zilog Zeus 3.20 > phk at FreeBSD.ORG ? ? ? ? | TCP/IP since RFC 956 > FreeBSD committer ? ? ? | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > From jlevine at iwin.com Mon Aug 31 22:24:11 2009 From: jlevine at iwin.com (Joshua Levine) Date: Mon, 31 Aug 2009 15:24:11 -0700 Subject: Raising funds for developing Varnish In-Reply-To: <45219.1251755231@critter.freebsd.dk> Message-ID: Very much respected, and understood. Thank you for a great tool. Joshua On 8/31/09 2:47 PM, "Poul-Henning Kamp" wrote: > > Software development may be open, and the result shared with an > Open Source Software licence, but the actual hours of programming > are not gratis. > > Just like everybody else, I need money for mortgage, kids and food, > money I make, by doing things with computers, for people who are > willing to pay for that. > > One of the things I do for money, is develop Varnish. > > From the very beginning of the project the Norwegian company > Redpill-Linpro, has channeled money from sponsors and customers and > paid for my time. > > Redpill-Linpro also offer commercial services based on Varnish, > consultancy, hosting, support and other services. > > That way, the companies which have a contract with Redpill-Linpro, > help pay for the future development of Varnish. > > For all practical purposes this has worked great until now. > > Unfortunately, some obscure tax rules makes it pretty nasty for me > and my accountant: if more than 30% of my work is for the same > customer in a Nordic country, I may be deemed an employee of said > company with possible double taxation, and other unpleasant paper > work as a result. > > This effectively puts a cap on the amount of work I can do on > Redpill-Linpro and consequently: on Varnish. > > It is my impression, that a fair number of Varnish users are not > likely to need the professional services of Redpill-Linpro, and > thus unlikely to help pay for future Varnish development via that > route. > > This is where the "Varnish Moral License" comes into the picture: > > The Varnish Moral License, is a voluntary license payment, directly > to the author of Varnish, which helps pay for the development of > Varnish. > > Buying a Varnish Moral License is 100% voluntary, if you do not > make money from your website, there is no reason why you should pay > for a license to use Varnish on it. > > If however, Varnish helps your website generate a profit, you should > consider getting a Varnish Moral Licence. > > In all cases, it is entirely up to you (and your morals) if you > should get a license or not. > > That is why I called it a "Moral License". > > Please buy one. > > More details and FAQ at: http://phk.freebsd.dk/VML/ > > Poul-Henning Kamp > > From phk at phk.freebsd.dk Mon Aug 31 22:42:48 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 31 Aug 2009 22:42:48 +0000 Subject: many workers threads failed with EAGAIN In-Reply-To: Your message of "Mon, 31 Aug 2009 15:10:08 MST." Message-ID: <45411.1251758568@critter.freebsd.dk> In message , David Birdsong writes: >...but quickly saw many verflowed work requests. My estimates are >rough, but this was measured after a few minutes with varnish in front >of two backends that normally serve between 700-1100 requests/sec of >about 100k to 300k unique objects per day(my num objects could be >higher, I'm uncertain.) I'm surprised if you need 1000 threads for that load, why does it take so long to serve them ? (Avg 1s ?) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From david.birdsong at gmail.com Mon Aug 31 23:36:13 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Mon, 31 Aug 2009 16:36:13 -0700 Subject: many workers threads failed with EAGAIN In-Reply-To: <45411.1251758568@critter.freebsd.dk> References: <45411.1251758568@critter.freebsd.dk> Message-ID: On Mon, Aug 31, 2009 at 3:42 PM, Poul-Henning Kamp wrote: > In message , David > Birdsong writes: > >>...but quickly saw many verflowed work requests. ?My estimates are >>rough, but this was measured after a few minutes with varnish in front >>of two backends that normally serve between 700-1100 requests/sec of >>about 100k to 300k unique objects per day(my num objects could be >>higher, I'm uncertain.) > > I'm surprised if you need 1000 threads for that load, why does > it take so long to serve them ? ?(Avg 1s ?) > perhaps i do not need so many workers. Since I have your attention, I'll provide some more concrete numbers. Each server puses roughly 600K unique objects per day, an hour of peak is about 90K unique objects and an hour in the trough is about 50K objects where each objects is between 60-120Kb. My goal was to determine the maximum number of backends a single instance of varnish could handle given that traffic pattern. My blind guess/hope was 5-7 backends, so I estimated about 3-4.6 million unique objects per day, but with my parameters on a dual core Intex 2.5Ghz with 8G ram and 10G malloc'd with a swap partition on a non-system drive, I was only able to serve traffic for two backends while varnish consumed the RAM. Once it went into swap, IOwait made varnish unstable. I couldn't discern much out of varnishstat to tell me what was going on under the hood. This is when I started moving thread_pool_max up and down. Oh, and I do have health checks on my backends: backend img10 { .host = "vimg10.imageshack.us"; .port = "80"; .probe = { .url = "/hc.txt"; .timeout = 0.5 s; .window = 8; # how many probes are examined .threshold = 3; # how many must pass for us to be healthy .interval = 3s; # time between health checks } } I'm about to try another run with the same backends on a dual quad core Xeon with 16Gb of ram, this time I will probably have varnish malloc just under the max memory to avoid so much swapping which is seemingly worse than the cache miss. If you have any suggestions or need me to provide more info, I'd appreciate your expertise. > -- > Poul-Henning Kamp ? ? ? | UNIX since Zilog Zeus 3.20 > phk at FreeBSD.ORG ? ? ? ? | TCP/IP since RFC 956 > FreeBSD committer ? ? ? | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > From g.wildmann at datamatix.at Fri Aug 21 14:09:29 2009 From: g.wildmann at datamatix.at (Guenter Wildmann) Date: Fri, 21 Aug 2009 14:09:29 -0000 Subject: rewrite redirects Message-ID: <4A8EA68C.7020403@datamatix.at> Hello! Can varnish rewrite redirects from the backend (equivalent to ProxyPassReverse in apache)? If so, how is it done? -- Regards Guenter