From phk at phk.freebsd.dk Tue Sep 1 08:59:18 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 01 Sep 2009 08:59:18 +0000 Subject: many workers threads failed with EAGAIN In-Reply-To: Your message of "Mon, 31 Aug 2009 16:36:13 MST." Message-ID: <47325.1251795558@critter.freebsd.dk> In message , David Birdsong writes: What does your responstimes look like in varnishhist ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From david.birdsong at gmail.com Tue Sep 1 09:19:09 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Tue, 1 Sep 2009 02:19:09 -0700 Subject: many workers threads failed with EAGAIN In-Reply-To: <47325.1251795558@critter.freebsd.dk> References: <47325.1251795558@critter.freebsd.dk> Message-ID: misses favor 1e-2[s] with a few at 1e-1[s] after retesting on a 16Gb machine, i'm getting much better results. but still after varnish consumes all ram, iowait goes up, process load skyrockets to like 2200 processes marked for run. sometimes varnish recovers in 5 minutes and others in about 1 minute. this makes me think that all the threads are piling up for something like access to the lru list. when i first starting testing tonight, the first pile-up was at 1 hour which was what my lru_interval was set to. i've since reduced via varnishadm to 10 mins and it now hangs as soon as ram is consumed(which takes a little over 10 mins). i tried re-raising it to 3600 and even 4200, but it still seems to pile up sooner. here's what a few graphs look like... http://img162.imageshack.us/img162/9300/hourlyvarnishhang.png aside from these pile-ups, i'm almost to a workable state. though i'd like to increase the backends to 5-6 if possible. On Tue, Sep 1, 2009 at 1:59 AM, Poul-Henning Kamp wrote: > In message , David > ?Birdsong writes: > > What does your responstimes look like in varnishhist ? > > -- > Poul-Henning Kamp ? ? ? | UNIX since Zilog Zeus 3.20 > phk at FreeBSD.ORG ? ? ? ? | TCP/IP since RFC 956 > FreeBSD committer ? ? ? | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > From gerald.leier at lixto.com Tue Sep 1 09:26:09 2009 From: gerald.leier at lixto.com (Gerald Leier) Date: Tue, 01 Sep 2009 11:26:09 +0200 Subject: if obj.status == 50[1|2|3|x] -> reissue request on next backend(s) Message-ID: <1251797169.13050.48.camel@pioneer> hello again, Is or isnt it possible to make varnish ask another backend if the first returns a HTTP 500 or any other user defined HTTP code when forwarding a users request? and if its possible -> whats the varnish way to do that? thanks for ANY answer lg gerald -- From david.birdsong at gmail.com Tue Sep 1 09:44:55 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Tue, 1 Sep 2009 02:44:55 -0700 Subject: many workers threads failed with EAGAIN In-Reply-To: References: <47325.1251795558@critter.freebsd.dk> Message-ID: ok, this time it's completely wedged. it's resetting all new tcp connections to the process. if i check status through the mangament port, says the child is alive. On Tue, Sep 1, 2009 at 2:19 AM, David Birdsong wrote: > misses favor 1e-2[s] ?with a few at 1e-1[s] > > > after retesting on a 16Gb machine, i'm getting much better results. > but still after varnish consumes all ram, iowait goes up, process load > skyrockets to like 2200 processes marked for run. ?sometimes varnish > recovers in 5 minutes and others in about 1 minute. ?this makes me > think that all the threads are piling up for something like access to > the lru list. > > when i first starting testing tonight, the first pile-up was at 1 hour > which was what my lru_interval was set to. ?i've since reduced via > varnishadm to 10 mins and it now hangs as soon as ram is > consumed(which takes a little over 10 mins). ?i tried re-raising it to > 3600 and even 4200, but it still seems to pile up sooner. > > here's what a few graphs look like... > http://img162.imageshack.us/img162/9300/hourlyvarnishhang.png > > aside from these pile-ups, i'm almost to a workable state. ?though i'd > like to increase the backends to 5-6 if possible. > > On Tue, Sep 1, 2009 at 1:59 AM, Poul-Henning Kamp wrote: >> In message , David >> ?Birdsong writes: >> >> What does your responstimes look like in varnishhist ? >> >> -- >> Poul-Henning Kamp ? ? ? | UNIX since Zilog Zeus 3.20 >> phk at FreeBSD.ORG ? ? ? ? | TCP/IP since RFC 956 >> FreeBSD committer ? ? ? | BSD since 4.3-tahoe >> Never attribute to malice what can adequately be explained by incompetence. >> > From phk at phk.freebsd.dk Tue Sep 1 09:48:25 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 01 Sep 2009 09:48:25 +0000 Subject: if obj.status == 50[1|2|3|x] -> reissue request on next backend(s) In-Reply-To: Your message of "Tue, 01 Sep 2009 11:26:09 +0200." <1251797169.13050.48.camel@pioneer> Message-ID: <59713.1251798505@critter.freebsd.dk> In message <1251797169.13050.48.camel at pioneer>, Gerald Leier writes: >hello again, > >Is or isnt it possible to make varnish ask another backend >if the first returns a HTTP 500 or any other user defined >HTTP code when forwarding a users request? > >and if its possible -> whats the varnish way to do that? Use the "restart" facility, which basically tried the request once more from the beginning, with any possible modifications you have made or will make. Typically, you would set another backend in vcl_recv{}, something like: ... if (req.restarts > 0) { set req.backend = better_one; } else { set req.backend = normal_one } -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From gerald.leier at lixto.com Tue Sep 1 15:15:14 2009 From: gerald.leier at lixto.com (Gerald Leier) Date: Tue, 01 Sep 2009 17:15:14 +0200 Subject: if obj.status == 50[1|2|3|x] -> reissue request on next backend(s) In-Reply-To: <59713.1251798505@critter.freebsd.dk> References: <59713.1251798505@critter.freebsd.dk> Message-ID: <1251818114.16835.96.camel@pioneer> Hello again, thanks for the fast answer! unfortunatly i am still stuck. On Tue, 2009-09-01 at 09:48 +0000, Poul-Henning Kamp wrote: > In message <1251797169.13050.48.camel at pioneer>, Gerald Leier writes: > >hello again, > > > >Is or isnt it possible to make varnish ask another backend > >if the first returns a HTTP 500 or any other user defined > >HTTP code when forwarding a users request? > > > >and if its possible -> whats the varnish way to do that? > > Use the "restart" facility, which basically tried the request > once more from the beginning, with any possible modifications > you have made or will make. > > Typically, you would set another backend in vcl_recv{}, > something like: > > ... > if (req.restarts > 0) { > set req.backend = better_one; > } else { > set req.backend = normal_one > } > my setup consists of 3 machines. 1 loadbalancer 2 webservers(applicationservers) varnish version used: varnishd (varnish-2.0.4) i really do want to test nothing more then: have two content servers. round robin select one of them. if the selected returns a 50x -> forward request to other node i dont need cashing, i dont need funny rewriting, i dont need to embed fancy C code into VCL. nothing like that. i even checked out and compiled the latest varnish because i thought: http://varnish.projects.linpro.no/changeset/3948 (have been rolling back to 2.0.4 since then for "my" vcl code and the 2.1...varnish made it even worse) i dont want to post my whole config again. last time i did that no one seemed to think its worth answering such "massiv amounts" of spam. (Subject: cant get restart; to fetch and deliver from other backend on HTTP error) so here i am, with snipplets ......... backend test1 { .host = "10.10.10.20"; .port = "80"; } backend mars1 { .host = "10.10.10.30"; .port = "80"; } ......... ......... sub vcl_recv { if (req.restarts > 0) { set req.backend = test1; } else { set req.backend = test2; } ......... ......... sub vcl_fetch { if (obj.status == 500 || obj.status == 503 || obj.status == 504) { restart; } ......... => 503 guru meditation. any hints, links or examples are very welcome. i dont like to give up on something i spent a few days on by now... at least not now. that of course may change if i dont get this baby up and running within another few days. lg gerald > -- From v.bilek at 1art.cz Wed Sep 2 12:36:49 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Wed, 02 Sep 2009 14:36:49 +0200 Subject: too many clients Message-ID: <4A9E66E1.1090203@1art.cz> Helo I have got a problem using varnish infront of webserwer cluster running arroung 20K req/s; when i tried to put varnish(4 loadbalanced machines 8CPU cores each 32GB ram) infront of this farm it serves OK for few seconds but then aper in syslog such messages: kernel: TCP: drop open request from IP/port and varnish stops serving I thing it is related with number of clients (which was around 30K when the problem apeard) and rate of request from each of this clients ( arround 1 request each 5 seconds). I have tried tunning kernel TCP stack ... are there any recomandation on varnish seting for many clients? Vaclav Bilek my params: accept_fd_holdoff 50 [ms] acceptor default (epoll, poll) auto_restart on [bool] backend_http11 on [bool] between_bytes_timeout 60.000000 [s] cache_vbe_conns off [bool] cc_command "exec cc -fpic -shared -Wl,-x -o %o %s" cli_buffer 8192 [bytes] cli_timeout 5 [seconds] client_http11 off [bool] clock_skew 10 [s] connect_timeout 0.400000 [s] default_grace 10 default_ttl 60 [seconds] diag_bitmap 0x0 [bitmap] err_ttl 0 [seconds] esi_syntax 0 [bitmap] fetch_chunksize 128 [kilobytes] first_byte_timeout 60.000000 [s] group nogroup (65534) listen_address 0.0.0.0:80 listen_depth 10240 [connections] log_hashstring off [bool] log_local_address off [bool] lru_interval 2 [seconds] max_esi_includes 5 [includes] max_restarts 4 [restarts] obj_workspace 8192 [bytes] overflow_max 100 [%] ping_interval 3 [seconds] pipe_timeout 60 [seconds] prefer_ipv6 off [bool] purge_dups off [bool] purge_hash on [bool] rush_exponent 3 [requests per request] send_timeout 600 [seconds] sess_timeout 1 [seconds] sess_workspace 16384 [bytes] session_linger 0 [ms] shm_reclen 255 [bytes] shm_workspace 8192 [bytes] srcaddr_hash 1049 [buckets] srcaddr_ttl 30 [seconds] thread_pool_add_delay 20 [milliseconds] thread_pool_add_threshold 2 [requests] thread_pool_fail_delay 200 [milliseconds] thread_pool_max 2048 [threads] thread_pool_min 500 [threads] thread_pool_purge_delay 1000 [milliseconds] thread_pool_timeout 10 [seconds] thread_pools 4 [pools] user nobody (65534) vcl_trace off [bool] stats: 7514 Client connections accepted 35229 Client requests received 27328 Cache hits 57 Cache hits for pass 5396 Cache misses 7901 Backend connections success 0 Backend connections not attempted 0 Backend connections too many 0 Backend connections failures 0 Backend connections reuses 0 Backend connections recycles 0 Backend connections unused 5203 N struct srcaddr 3758 N active struct srcaddr 4520 N struct sess_mem 4298 N struct sess 5433 N struct object 1623 N struct objecthead 6923 N struct smf 0 N small free smf 32 N large free smf 2 N struct vbe_conn 19 N struct bereq 2000 N worker threads 2000 N worker threads created 0 N worker threads not created 0 N worker threads limited 0 N queued work requests 4 N overflowed work requests 0 N dropped work requests 1 N backends 34 N expired objects 0 N LRU nuked objects 0 N LRU saved objects 3239 N LRU moved objects 0 N objects on deathrow 0 HTTP header overflows 0 Objects sent with sendfile 8973 Objects sent with write 0 Objects overflowing workspace 7514 Total Sessions 35225 Total Requests 0 Total pipe 2505 Total pass 7899 Total fetch 3497006 Total header bytes 32744211 Total body bytes 631 Session Closed 0 Session Pipeline 0 Session Read Ahead 0 Session Linger 34676 Session herd 1399198 SHM records 96468 SHM writes 2 SHM flushes due to overflow 10 SHM MTX contention 0 SHM cycles through buffer 10926 allocator requests 6891 outstanding allocations 57233408 bytes allocated 29953720320 bytes free 0 SMA allocator requests 0 SMA outstanding allocations 0 SMA outstanding bytes 0 SMA bytes allocated 0 SMA bytes free 0 SMS allocator requests 0 SMS outstanding allocations 0 SMS outstanding bytes 0 SMS bytes allocated 0 SMS bytes freed 7900 Backend requests made 1 N vcl total 1 N vcl available 0 N vcl discarded 707 N total active purges 741 N new purges added 34 N old purges deleted 38968 N objects tested 504633 N regexps tested against 0 N duplicate purges removed 0 HCB Lookups without lock 0 HCB Lookups with lock 0 HCB Inserts 0 Objects ESI parsed (unlock) 0 ESI parse errors (unlock) From kristian at redpill-linpro.com Wed Sep 2 13:25:04 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Wed, 2 Sep 2009 15:25:04 +0200 Subject: Varnish memory consumption issues? In-Reply-To: <4c3149fb0908282109w59772378x4a3c40a31e0a5032@mail.gmail.com> References: <4c3149fb0908282109w59772378x4a3c40a31e0a5032@mail.gmail.com> Message-ID: <20090902132503.GB17535@kjeks.linpro.no> On Sat, Aug 29, 2009 at 12:09:15AM -0400, pub crawler wrote: > Hello, new to Varnish. > > We have been running Varnish for about 5 days now. So far, excellent product. > > We have a potential issue and I haven't seen anything like this before. > > We just restarted Varnish - we have a 1GB cache file on disk. When I > run top, I see Varnish is using : > > 10m RES > 153g VIRT I'll leave a more detail explanation to PHK or someone more experienced with the deep magics of a VM, but the 10m is the real memory consumption which I'm sure top/free can confirm. Though 153G seem very high for VIRT. What I usually see is VIRT = 2x cache size, or thereabouts, but that might not be conclusive. Do you care to paste your startup arguments and parameters? -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From pubcrawler.com at gmail.com Wed Sep 2 13:36:25 2009 From: pubcrawler.com at gmail.com (pub crawler) Date: Wed, 2 Sep 2009 09:36:25 -0400 Subject: Varnish memory consumption issues? In-Reply-To: <20090902132503.GB17535@kjeks.linpro.no> References: <4c3149fb0908282109w59772378x4a3c40a31e0a5032@mail.gmail.com> <20090902132503.GB17535@kjeks.linpro.no> Message-ID: <4c3149fb0909020636p7cc03fb4hdf40c2bb3b275008@mail.gmail.com> Thanks Kristian for the explanation. I posted this, and I believe it likely was our issue - perhaps not starting Cherokee with proper parameters included. Our configuration is pretty barbones default per se. What is troubling is 153GB is more resources than this machine has RAM + disk combined. Stumped as to how Varnish or the OS would allocate all this memory when the resources not there. It's worth noting that we changed our commandline to spawn Varnish and Varnish seems to be more reasonable with memory now: 5775 nobody 20 0 4363m 169m 162m S 1 8.6 1:00.03 varnishd 4GB is the size of the cache file we setup for Cherokee so VIRT is right on consumption wise. RES memory at 169m is quite fine / reasonable also. These numbers seem to what others experience right? -Paul On Wed, Sep 2, 2009 at 9:25 AM, Kristian Lyngstol wrote: > On Sat, Aug 29, 2009 at 12:09:15AM -0400, pub crawler wrote: >> Hello, new to Varnish. >> >> We have been running Varnish for about 5 days now. So far, excellent product. >> >> We have a potential issue and I haven't seen anything like this before. >> >> We just restarted Varnish - we have a 1GB cache file on disk. When I >> run top, I see Varnish is using : >> >> 10m RES >> 153g VIRT > > I'll leave a more detail explanation to PHK or someone more experienced > with the deep magics of a VM, but the 10m is the real memory consumption > which I'm sure top/free can confirm. > > Though 153G seem very high for VIRT. What I usually see is VIRT = 2x cache > size, or thereabouts, but that might not be conclusive. > > Do you care to paste your startup arguments and parameters? > > -- > Kristian Lyngst?l > Redpill Linpro AS > Tlf: +47 21544179 > Mob: +47 99014497 > From maillists0 at gmail.com Wed Sep 2 14:12:00 2009 From: maillists0 at gmail.com (maillists0 at gmail.com) Date: Wed, 2 Sep 2009 10:12:00 -0400 Subject: Child Died Message-ID: I just started my first instance of varnish in production. Within 12 hours, there were alerts from our monitoring system that Varnish was taking 90% of the cpu. Right after that, I find these messages in /var/log/messages, several times over a 2 minute period: varnishd[12461]: Child (20086) not responding to ping, killing it. The child restarted, and the stats and cache all disappeared. This is a machine with 8 gigs of ram and a pair of slightly older quad core xeons. The storage method is file with a 50 gig limit. At its peak, the machine is serving around 40 requests a second, about 5000k a second. The configs are the defaults. What should my first steps be to troubleshoot this? Is there a likely culprit? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbaker-varnish-misc at flightaware.com Wed Sep 2 15:26:31 2009 From: dbaker-varnish-misc at flightaware.com (Daniel Baker) Date: Wed, 2 Sep 2009 10:26:31 -0500 Subject: Flushing buffer during page delivery Message-ID: <9FE0CAA8-149C-4534-AF2B-1A808AD92989@flightaware.com> Hello -- We have our apache children flush the buffer when generating a page after emitting the JS/CSS includes, after emitting the page framework, etc. However, despite that we're flushing the buffer to Varnish, it doesn't seem to deliver any of the data to the user until the page is complete. Although the page generation time is minimal, flushing the buffer early lets the user's browser start pulling dependencies while apache spends another 100ms or so delivering the rest of the page. Is there any way to get Varnish to flush the buffer regularly or accomplish this in some way? Thanks! Daniel Baker From gerald.leier at lixto.com Wed Sep 2 15:52:34 2009 From: gerald.leier at lixto.com (Gerald Leier) Date: Wed, 02 Sep 2009 17:52:34 +0200 Subject: if obj.status == 50[1|2|3|x] -> reissue request on next backend(s) In-Reply-To: <10a8a2180909020715j726c4d27xcc72ffd96b3fbe3e@mail.gmail.com> References: <59713.1251798505@critter.freebsd.dk> <1251818114.16835.96.camel@pioneer> <10a8a2180909020715j726c4d27xcc72ffd96b3fbe3e@mail.gmail.com> Message-ID: <1251906754.9300.29.camel@pioneer> hello again, On Wed, 2009-09-02 at 11:15 -0300, Alexandre Haguiar wrote: > This is my try... > > If you dont need cache you can try pass to pipe mode in vcl_recv so > all requests will be passed to backend and no 503 error will appear. > > sub vcl_recv { > if (req.restarts > 0) { > set req.backend = test1; > } else { > set req.backend = test2; > } > > pipe; > > Alexandre Haguiar > thanks, but this doesnt work as i desire. i reduced my config a bit more to match your codesniplet: (removed director) ------------------ backend test1 { .host = "10.10.10.20"; .port = "80"; } backend test2 { .host = "10.10.10.30"; .port = "80"; } sub vcl_recv { if (req.restarts > 0) { set req.backend = test2; } else { set req.backend = test1; } pipe; } sub vcl_fetch { if (obj.status == 500 || obj.status == 501 || obj.status == 502 || obj.status == 503 || obj.status == 504) { restart; } } sub vcl_deliver { deliver; } -------- setup: test1 returns HTTP/1.1 500 Internal Server Error test2 returns HTTP/1.1 200 OK now with the above config varnish does give the request to test1 -> return the 500 error to client :( thats not what i want :( please tell me if i am trying to use varnish for something it isnt intended to be used for. by now i think i may be playing with the wrong tool. thanks for any hint gerald > > On Tue, Sep 1, 2009 at 12:15 PM, Gerald Leier > wrote: > Hello again, > > thanks for the fast answer! > > unfortunatly i am still stuck. > > On Tue, 2009-09-01 at 09:48 +0000, Poul-Henning Kamp wrote: > > In message <1251797169.13050.48.camel at pioneer>, Gerald Leier > writes: > > >hello again, > > > > > >Is or isnt it possible to make varnish ask another backend > > >if the first returns a HTTP 500 or any other user defined > > >HTTP code when forwarding a users request? > > > > > >and if its possible -> whats the varnish way to do that? > > > > Use the "restart" facility, which basically tried the > request > > once more from the beginning, with any possible > modifications > > you have made or will make. > > > > Typically, you would set another backend in vcl_recv{}, > > something like: > > > > ... > > if (req.restarts > 0) { > > set req.backend = better_one; > > } else { > > set req.backend = normal_one > > } > > > > > my setup consists of 3 machines. > 1 loadbalancer > 2 webservers(applicationservers) > > varnish version used: > varnishd (varnish-2.0.4) > > > i really do want to test nothing more then: > have two content servers. > round robin select one of them. > if the selected returns a 50x -> forward request to other > node > i dont need cashing, i dont need funny rewriting, > i dont need to embed fancy C code into VCL. > nothing like that. > > i even checked out and compiled the latest varnish because i > thought: > http://varnish.projects.linpro.no/changeset/3948 > (have been rolling back to 2.0.4 since then for "my" vcl code > and the > 2.1...varnish made it even worse) > > i dont want to post my whole config again. last time i did > that > no one seemed to think its worth answering such "massiv > amounts" > of spam. > (Subject: cant get restart; to fetch and deliver from other > backend on > HTTP error) > > > so here i am, with snipplets > ......... > backend test1 { > .host = "10.10.10.20"; > .port = "80"; > } > > backend mars1 { > .host = "10.10.10.30"; > .port = "80"; > } > ......... > > ......... > sub vcl_recv { > if (req.restarts > 0) { > set req.backend = test1; > } else { > set req.backend = test2; > } > ......... > > ......... > sub vcl_fetch { > if (obj.status == 500 || obj.status == 503 || obj.status == > 504) { > restart; > } > ......... > > => 503 guru meditation. > > any hints, links or examples are very welcome. > > i dont like to give up on something i spent a few days on by > now... > at least not now. that of course may change if i dont get this > baby > up and running within another few days. > > lg > gerald > > > > > -- > > > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > > > > > -- > Alexandre Haguiar -- From v.bilek at 1art.cz Thu Sep 3 05:58:45 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Thu, 03 Sep 2009 07:58:45 +0200 Subject: purging Message-ID: <4A9F5B15.8070605@1art.cz> Hello I have got a question about purging architecture. In our aplication we use several backends and each of them serving the same data and purging parts of it. The purging is done in a way that tends to create duplicate purge request. question: 1) How does purging work with duplicate request? 2) is it a problem for performance to have thousands of purges? 3) how does purge request change state from "total active purges" to "old purges deleted" or to "duplicate purges removed"? Vaclav Bilek From phk at phk.freebsd.dk Thu Sep 3 08:34:05 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 03 Sep 2009 08:34:05 +0000 Subject: Flushing buffer during page delivery In-Reply-To: Your message of "Wed, 02 Sep 2009 10:26:31 EST." <9FE0CAA8-149C-4534-AF2B-1A808AD92989@flightaware.com> Message-ID: <79836.1251966845@critter.freebsd.dk> In message <9FE0CAA8-149C-4534-AF2B-1A808AD92989 at flightaware.com>, Daniel Baker writes: >We have our apache children flush the buffer when generating a page >after emitting the JS/CSS includes, after emitting the page framework, >etc. However, despite that we're flushing the buffer to Varnish, it >doesn't seem to deliver any of the data to the user until the page is >complete. Correct. That's how Varnish works. There is no workaround, if you want caching. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Thu Sep 3 08:54:19 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 03 Sep 2009 08:54:19 +0000 Subject: purging In-Reply-To: Your message of "Thu, 03 Sep 2009 07:58:45 +0200." <4A9F5B15.8070605@1art.cz> Message-ID: <79951.1251968059@critter.freebsd.dk> In message <4A9F5B15.8070605 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: >1) How does purging work with duplicate request? We try to eliminate them, and if they are exactly identical, we succeed. >2) is it a problem for performance to have thousands of purges? They will cost you some memory and cpu time, but it shouldn't be a problem. >3) how does purge request change state from "total active purges" to >"old purges deleted" or to "duplicate purges removed"? Old purges deleted are purges that no longer apply, ie: the have been tested against all objects in the cache when they were entered (or the objects expired without being requested or tested) Duplicates are the ones above. If you add a purge for req.url == "FOO" Then any older identical purges will be marked "gone" and not tested against, as the new purge will catch the offending objects. (We cannot remove the old purge from the list for reference counting reasons) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From kristian at redpill-linpro.com Thu Sep 3 09:00:32 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Thu, 3 Sep 2009 11:00:32 +0200 Subject: cant get restart; to fetch and deliver from other backend on HTTP error In-Reply-To: <1251383353.21431.108.camel@pioneer> References: <1251383353.21431.108.camel@pioneer> Message-ID: <20090903090032.GC17535@kjeks.linpro.no> On Thu, Aug 27, 2009 at 04:29:13PM +0200, Gerald Leier wrote: > After setting up 2 servers(one returning the requested page > the other returning 500 errors) i tested a bit but i have > some bug in there i cant get a grip on. > > after the first node returns a http 500 error varnish continues with > the second node....here is the part where it stops doing what i want: (...) > 11 TxHeader b X-Forwarded-For: 10.21.1.40 > 11 BackendClose b test2 > 10 VCL_call c error (...) If the connection failed, like it seems to have done here, you do not end up in VCL fetch. Note how varnish doesn't receive any headers from the web server (no RxHeader for the second backend). Varnish is unable to connect properly to your second server and that's what's causing problems. Varnishstat will probably reveal backend failures. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From quasirob at googlemail.com Thu Sep 3 10:41:22 2009 From: quasirob at googlemail.com (Rob Ayres) Date: Thu, 3 Sep 2009 11:41:22 +0100 Subject: varnish redundancy Message-ID: Hi, We are using varnish on a single server with four backend servers. This is working very well (thanks to the varnish developers!) but we wish to add some redundancy to the system. The obvious way is just to add another varnish running on another server in a different datacentre. The downside to this is that keeping two caches populated will increase the load on the backend servers. We do have the facility to use "backup chaining" which would allow us to use varnish_1 and then if that goes down we would automatically failover to varnish_2 but this would have an empty cache so a sudden and huge load would hit the backend servers. Does anyone have any experience or tips on using varnish in this way? Thanks, Rob -------------- next part -------------- An HTML attachment was scrubbed... URL: From phk at phk.freebsd.dk Thu Sep 3 11:01:55 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 03 Sep 2009 11:01:55 +0000 Subject: varnish redundancy In-Reply-To: Your message of "Thu, 03 Sep 2009 11:41:22 +0100." Message-ID: <19494.1251975715@critter.freebsd.dk> In message , Rob A yres writes: >Does anyone have any experience or tips on using varnish in this way? You can do some really insteresting things, such as: One such is to configure the two varnishes along these lines: vcl_recv { if (client.ip ~ other_varnish) { set req.backend = real_backend; } else { set req.backend = other_varnish; } } vcl_pass { set req.backend = real_backend; } vcl_pipe { set req.backend = real_backend; } For it to be really smart you want to use directors for the "other_varnish" and probes to ascertain health. We do not have a "priority_director" (we probably should have) but you can get much the same effect with the random director and very uneven weights: director other_backend { { .backend = b_other_varnish ; weight=100000; } { .backend = b_real_backend ; weight=1; } } Should the probes mark the other_varnish unhealthy, all trafic will go to the real backend. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From armdan20 at gmail.com Thu Sep 3 10:57:17 2009 From: armdan20 at gmail.com (andan andan) Date: Thu, 3 Sep 2009 12:57:17 +0200 Subject: varnish redundancy In-Reply-To: References: Message-ID: 2009/9/3 Rob Ayres : > Hi, > > We are using varnish on a single server with four backend servers. This is > working very well (thanks to the varnish developers!) but we wish to add > some redundancy to the system. The obvious way is just to add another > varnish running on another server in a different datacentre. The downside to > this is that keeping two caches populated will increase the load on the > backend servers. > > We do have the facility to use "backup chaining" which would allow us to use > varnish_1 and then if that goes down we would automatically failover to > varnish_2 but this would have an empty cache so a sudden and huge load would > hit the backend servers. > > Does anyone have any experience or tips on using varnish in this way? > Hi. We have two varnish load balanced with LVS (active-active). The difference between to use 1 or 2 varnish (for the backends load) is practically zero, because the hit rate is 90%. Hope this helps. BR. From gerald.leier at lixto.com Thu Sep 3 13:25:32 2009 From: gerald.leier at lixto.com (Gerald Leier) Date: Thu, 03 Sep 2009 15:25:32 +0200 Subject: cant get restart; to fetch and deliver from other backend on HTTP error In-Reply-To: <20090903090032.GC17535@kjeks.linpro.no> References: <1251383353.21431.108.camel@pioneer> <20090903090032.GC17535@kjeks.linpro.no> Message-ID: <1251984332.24106.33.camel@pioneer> hi, On Thu, 2009-09-03 at 11:00 +0200, Kristian Lyngstol wrote: > On Thu, Aug 27, 2009 at 04:29:13PM +0200, Gerald Leier wrote: > > After setting up 2 servers(one returning the requested page > > the other returning 500 errors) i tested a bit but i have > > some bug in there i cant get a grip on. > > > > after the first node returns a http 500 error varnish continues with > > the second node....here is the part where it stops doing what i want: > > (...) > > 11 TxHeader b X-Forwarded-For: 10.21.1.40 > > 11 BackendClose b test2 > > 10 VCL_call c error > (...) > > If the connection failed, like it seems to have done here, you do not end > up in VCL fetch. Note how varnish doesn't receive any headers from the web > server (no RxHeader for the second backend). > > Varnish is unable to connect properly to your second server and that's > what's causing problems. > well if varnish connects to the "HTTP 1.x 200 OK" server first it has no problem what so ever delivering the result. but if it hits a 500 first....well..... im pretty shure that its my vcl configs that are buggy and that this has nothing to do with the backend. but anyways. i am out of cloues, so ill setup another testsystem to validate the results i got out of the primary testsetup. gerald > Varnishstat will probably reveal backend failures. > -- From kristian at redpill-linpro.com Fri Sep 4 08:42:19 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Fri, 4 Sep 2009 10:42:19 +0200 Subject: cant get restart; to fetch and deliver from other backend on HTTP error In-Reply-To: <1251984332.24106.33.camel@pioneer> References: <1251383353.21431.108.camel@pioneer> <20090903090032.GC17535@kjeks.linpro.no> <1251984332.24106.33.camel@pioneer> Message-ID: <20090904084219.GB3883@kjeks.redpill-linpro.com> On Thu, Sep 03, 2009 at 03:25:32PM +0200, Gerald Leier wrote: > On Thu, 2009-09-03 at 11:00 +0200, Kristian Lyngstol wrote: > > On Thu, Aug 27, 2009 at 04:29:13PM +0200, Gerald Leier wrote: > > > > > > After setting up 2 servers(one returning the requested page > > > the other returning 500 errors) i tested a bit but i have > > > some bug in there i cant get a grip on. (snip snip) All I can say is that your VCL looks sane. I'm not ruling out that Varnish might be a factor here, but I'd have to play with the setup a bit to see what happens. Perhaps a packet capture between the varnish server and test backends could reveal what's going on? -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From niallo at metaweb.com Fri Sep 4 23:58:27 2009 From: niallo at metaweb.com (Niall O'Higgins) Date: Fri, 4 Sep 2009 16:58:27 -0700 Subject: req.xid semantics? Message-ID: <20090904235827.GA6403@digdug.corp.631h.metaweb.com> Hi all, I am looking for some sort of VCL identifier which I can use to disambiguate requests which occur within the same second, but in different threads. I *think* that req.xid will work here, but I'm unable to determine exactly what its semantics are, since there doesn't seem to be any documentation for it. It appears from my tests to increase monotonically with each request. Is this indeed the case? Thanks! -- Niall O'Higgins Software Engineer Metaweb Technologies, Inc. From phk at phk.freebsd.dk Sat Sep 5 08:10:41 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sat, 05 Sep 2009 08:10:41 +0000 Subject: req.xid semantics? In-Reply-To: Your message of "Fri, 04 Sep 2009 16:58:27 MST." <20090904235827.GA6403@digdug.corp.631h.metaweb.com> Message-ID: <4226.1252138241@critter.freebsd.dk> In message <20090904235827.GA6403 at digdug.corp.631h.metaweb.com>, Niall O'Higgin s writes: >I am looking for some sort of VCL identifier which I can use to >disambiguate requests which occur within the same second, but in >different threads. That is indeed what req.xid is for. It starts from a random value on startup, and each request gets a unique value. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From kristian at redpill-linpro.com Mon Sep 7 09:58:01 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Mon, 7 Sep 2009 11:58:01 +0200 Subject: Child Died In-Reply-To: References: Message-ID: <20090907095801.GC4909@kjeks.redpill-linpro.com> On Wed, Sep 02, 2009 at 10:12:00AM -0400, maillists0 at gmail.com wrote: > I just started my first instance of varnish in production. Within 12 hours, > there were alerts from our monitoring system that Varnish was taking 90% of > the cpu. Right after that, I find these messages in /var/log/messages, > several times over a 2 minute period: Did you check syslog for assert errors too? > varnishd[12461]: Child (20086) not responding to ping, killing it. > > The child restarted, and the stats and cache all disappeared. > > This is a machine with 8 gigs of ram and a pair of slightly older quad core > xeons. The storage method is file with a 50 gig limit. At its peak, the > machine is serving around 40 requests a second, about 5000k a second. The > configs are the defaults. > > What should my first steps be to troubleshoot this? Is there a likely > culprit? The first I'd do is check syslog for assert errors. If it's being killed in the same place, something must be wrong (... ). Secondly, I'd check the value of cli_timeout. This default has changed over time, but a very busy varnish can be slow to reply to pings from the management thread, and thus get killed needlessly. You can check it with the telnet interface or ?varnishadm -T localhost:yourmangementport param.show cli_timeout?. The new default is 10s, which should be enough, though it still might be too low for extremely busy threads. You may also want to supply a varnishstat -1 (after varnish has had a chance to warm up) and any custom VCL to the list. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From alexandrehaguiar at gmail.com Mon Sep 7 23:53:08 2009 From: alexandrehaguiar at gmail.com (Alexandre Haguiar) Date: Mon, 7 Sep 2009 20:53:08 -0300 Subject: Force proxy port Message-ID: <10a8a2180909071653j666ae77cwee8e05d51e0ca520@mail.gmail.com> Hi, My varnish installations hear from nginx port 8080 and respond on 80 but there are some links that when called change the url port to 8080 like below: called http://meublog.org/wp-admin reponse http://meublog.org:8080/wp-admin/ There are some way to force the url with a fixed port(in this case 80) Thanks Alexandre Haguiar -------------- next part -------------- An HTML attachment was scrubbed... URL: From v.bilek at 1art.cz Tue Sep 8 08:58:38 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Tue, 08 Sep 2009 10:58:38 +0200 Subject: client keepalive on high trafic site Message-ID: <4AA61CBE.2090007@1art.cz> Hello We have a problem with varnish using keepalive for clients, when established connections goes above 8K varnish slows down which means the site stops reponding. Unfortunately we cant use set resp.http.Connection="close"; because IE6 has problem with that ( after loading page source there is a lag(few seconds) before downloading page elements) is there any way how to disable keepalive just for certain clients (IE6); How hard would it be to implement global keepalive off in varnish? thanks for you time Vaclav Bilek From phk at phk.freebsd.dk Tue Sep 8 09:02:37 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 08 Sep 2009 09:02:37 +0000 Subject: client keepalive on high trafic site In-Reply-To: Your message of "Tue, 08 Sep 2009 10:58:38 +0200." <4AA61CBE.2090007@1art.cz> Message-ID: <31664.1252400557@critter.freebsd.dk> In message <4AA61CBE.2090007 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: >We have a problem with varnish using keepalive for clients, when >established connections goes above 8K varnish slows down which means the >site stops reponding. > >Unfortunately we cant use > set resp.http.Connection="close"; >because IE6 has problem with that ( after loading page source there is a >lag(few seconds) before downloading page elements) > >is there any way how to disable keepalive just for certain clients (IE6); > >How hard would it be to implement global keepalive off in varnish? You can tune the client keepalive interval with the parameter sess_timeout. It defaults to 5 seconds. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From v.bilek at 1art.cz Tue Sep 8 10:37:23 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Tue, 08 Sep 2009 12:37:23 +0200 Subject: client keepalive on high trafic site In-Reply-To: <31920.1252402740@critter.freebsd.dk> References: <31920.1252402740@critter.freebsd.dk> Message-ID: <4AA633E3.6030508@1art.cz> Poul-Henning Kamp napsal(a): > In message <4AA620D7.1060807 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: > >> our sess_timeout is set to 1sec alredy :( > > Then I doubt keepalives are your problem > > It is much more likely you have workerthreads waiting to push stuff > to slow clients. I think that is not the problem... when I disable keepalive by seting "connection: close" header then i get 4x higher throughput ( but works bad in IE6) ... till the 8K established connection everithing works fine (low latency responses, only 10%cpu usage on server, load around 1-2) ... the problem is that ther is by estimation around 150K of clients which is imposible to handle with keepalive enabled the webserver cluster we use till now had the same problem ... keepalive even set to 1 second causes so many opend connection that system stoped working... > > Consider increasing your kernels tcp sendbuffers ? set large enough for 95% of requests is it at all posible to implement global sesion_timeout=0? or is it a design problem? From phk at phk.freebsd.dk Tue Sep 8 10:41:56 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 08 Sep 2009 10:41:56 +0000 Subject: client keepalive on high trafic site In-Reply-To: Your message of "Tue, 08 Sep 2009 12:37:23 +0200." <4AA633E3.6030508@1art.cz> Message-ID: <32115.1252406516@critter.freebsd.dk> In message <4AA633E3.6030508 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: >I think that is not the problem... when I disable keepalive by seting >"connection: close" header then i get 4x higher throughput ( but works >bad in IE6) >... till the 8K established connection everithing works fine (low >latency responses, only 10%cpu usage on server, load around 1-2) Why don't you restrict "connection: close" to non-IE6 then ? You can test the User-Agent header in your VCL code -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From v.bilek at 1art.cz Tue Sep 8 10:47:49 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Tue, 08 Sep 2009 12:47:49 +0200 Subject: client keepalive on high trafic site In-Reply-To: <32115.1252406516@critter.freebsd.dk> References: <32115.1252406516@critter.freebsd.dk> Message-ID: <4AA63655.5020707@1art.cz> Poul-Henning Kamp napsal(a): > In message <4AA633E3.6030508 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: > >> I think that is not the problem... when I disable keepalive by seting >> "connection: close" header then i get 4x higher throughput ( but works >> bad in IE6) >> ... till the 8K established connection everithing works fine (low >> latency responses, only 10%cpu usage on server, load around 1-2) > > Why don't you restrict "connection: close" to non-IE6 then ? > > You can test the User-Agent header in your VCL code > Because i am not shure how to do that ... I cant test User-Agent in vcl_deliver because 'req.' object is not accesible there where shoud i put such a test? From phk at phk.freebsd.dk Tue Sep 8 10:53:22 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 08 Sep 2009 10:53:22 +0000 Subject: client keepalive on high trafic site In-Reply-To: Your message of "Tue, 08 Sep 2009 12:47:49 +0200." <4AA63655.5020707@1art.cz> Message-ID: <32157.1252407202@critter.freebsd.dk> In message <4AA63655.5020707 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: >Because i am not shure how to do that ... >I cant test User-Agent in vcl_deliver because 'req.' object is not >accesible there > >where shoud i put such a test? Ahh, that gets trick if you are running a version before we enabled that. How much IE6 traffic do you have ? If it is not too much, you could simply pass it. If it is too much for that, one dirty workaround is to define yourself as a backend, and send all IE6 to pass with yourself as backend, that way you get a cache-hit, but can use vcl_fetch on the pass transaction to nuke the connection closed header. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From v.bilek at 1art.cz Tue Sep 8 11:22:26 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Tue, 08 Sep 2009 13:22:26 +0200 Subject: client keepalive on high trafic site In-Reply-To: <32157.1252407202@critter.freebsd.dk> References: <32157.1252407202@critter.freebsd.dk> Message-ID: <4AA63E72.4010101@1art.cz> Poul-Henning Kamp napsal(a): > In message <4AA63655.5020707 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: > >> Because i am not shure how to do that ... >> I cant test User-Agent in vcl_deliver because 'req.' object is not >> accesible there >> >> where shoud i put such a test? > > Ahh, that gets trick if you are running a version before we enabled that. > > How much IE6 traffic do you have ? around 15% > > If it is not too much, you could simply pass it. will try as for-now workaround > > If it is too much for that, one dirty workaround is to define > yourself as a backend, and send all IE6 to pass with yourself > as backend, that way you get a cache-hit, but can use vcl_fetch > on the pass transaction to nuke the connection closed header. > > From carsten.ranfeld at evolver.de Tue Sep 8 14:02:33 2009 From: carsten.ranfeld at evolver.de (Carsten Ranfeld) Date: Tue, 08 Sep 2009 16:02:33 +0200 Subject: TCP RST in varnish reply Message-ID: <1252418553.2828.58.camel@cranfeld-linux.evolver.de> Hello, we use varnish at a customer's site for about 2 weeks and it sped up delivery performance a lot. Although we experience some problems, including the following: A HTTP POST request is made. For the respective URL part in vcl_recv() as well as in vcl_fetch() we configured a pass explicitly to circumvent any caching. So what happens? 1. the client sends its request to varnish 2. varnish passes the request to one of two backends 3. the backend replies 4. varnish replies to the client - and here the problem occurs - in the middle of the HTTP response the TCP connection is reset by varnish (or the machine). So content is not delivered fully. varnish log doesn't show any errors. Tests using different clients show a comparable behavior - TCP RST and partial delivered content, just the size of the content delivered differs. Did anyone experience a similar problem? I searched through documentation, mailing lists and Google. Unfortunately not a helpful page was found. Here the important configuration parts: sub vcl_recv { # just to make sure, we always pass any POST requests if (req.request != "GET" && req.request != "HEAD") { /* We only deal with GET and HEAD by default */ pass; } if (req.url !~ "^/(foo|bar|_em_cms)") { lookup; } } sub vcl_fetch { if (req.url ~ "^/_em_cms/") { pass; } } Regards and thanks in advance Carsten From v.bilek at 1art.cz Wed Sep 9 13:27:15 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Wed, 09 Sep 2009 15:27:15 +0200 Subject: performance scalability of a multi-core Message-ID: <4AA7AD33.1060801@1art.cz> Helo what are your experiences using varnish on multi CPU systems on Linux/Freebsd? my experience in Linux on 8core is that varnish never gets more than 20% of all CPUs, only vhen it is overloaded, then it takes all CPU ( but berformance drops). Vaclav Bilek From phk at phk.freebsd.dk Wed Sep 9 16:30:28 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Wed, 09 Sep 2009 16:30:28 +0000 Subject: performance scalability of a multi-core In-Reply-To: Your message of "Wed, 09 Sep 2009 15:27:15 +0200." <4AA7AD33.1060801@1art.cz> Message-ID: <75856.1252513828@critter.freebsd.dk> In message <4AA7AD33.1060801 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: >Helo > > >what are your experiences using varnish on multi CPU systems on >Linux/Freebsd? > >my experience in Linux on 8core is that varnish never gets more than >20% of all CPUs, only vhen it is overloaded, then it takes all CPU ( but >berformance drops). That is pretty typical behaviour. In general Varnish does not need much CPU, it needs only an average of seven sytem calls for a cache hit (last I looked). The problem is that when things pile up, for whatever reason, Varnish sort of climbs the pile. Various ideas have been floated, how to deal better with that, but no really good idea has been found yet. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From kb+varnish at slide.com Wed Sep 9 18:17:53 2009 From: kb+varnish at slide.com (Ken Brownfield) Date: Wed, 9 Sep 2009 11:17:53 -0700 Subject: performance scalability of a multi-core In-Reply-To: <75856.1252513828@critter.freebsd.dk> References: <75856.1252513828@critter.freebsd.dk> Message-ID: <1D5E4E47-4B6B-4811-A290-4128246D552A@slide.com> The bottleneck you would typically see is interrupts from network traffic (especially if you're tracking connections), bandwidth limits, slow backends, too many keepalive sessions, and pthread stack size. Some of those can exacerbate the thread count and memory usage on an already stodgy pthreads library. And these limitations are pretty much endemic of any proxy or server on every platform, IMHO. Perhaps a combination of thread- and event-based workers could scale more smoothly? In any case, we've seen Varnish saturating GigE with about 2-3 cores (out of 8). From real life experience in our usage case, Varnish would saturate 10GigE on an 8-core box. Few people need (or would want) to saturate 1GigE on a single box, much less 10GigE. I guess my point is that certain use cases (some valid, some not, some involving bad pthread libraries in distributions (lots of them out there!)) could cause specific scalability issues, and those specific cases should be the focus. "Varnish only scales to two cores" is a generality that my experience refutes, for what it's worth. Ken On Sep 9, 2009, at 9:30 AM, Poul-Henning Kamp wrote: > In message <4AA7AD33.1060801 at 1art.cz>, =?UTF-8?B? > VsOhY2xhdiBCw61sZWs=?= writes: >> Helo >> >> >> what are your experiences using varnish on multi CPU systems on >> Linux/Freebsd? >> >> my experience in Linux on 8core is that varnish never gets more than >> 20% of all CPUs, only vhen it is overloaded, then it takes all CPU >> ( but >> berformance drops). > > That is pretty typical behaviour. > > In general Varnish does not need much CPU, it needs only an average > of seven sytem calls for a cache hit (last I looked). > > The problem is that when things pile up, for whatever reason, Varnish > sort of climbs the pile. > > Various ideas have been floated, how to deal better with that, but > no really good idea has been found yet. > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk at FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by > incompetence. > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc From v.bilek at 1art.cz Thu Sep 10 06:56:18 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Thu, 10 Sep 2009 08:56:18 +0200 Subject: returning expired data Message-ID: <4AA8A312.1080109@1art.cz> Helo I have a problem that varnish sometimes returns expired data. The ttl of objects is from 1 to 10 seconds but varnish retuned objects older than tens of minutes. Grace is set to 60s. default ttl to 60s. Age header of such old object had negative value... Age: -6643 or Age: -4803 Any sugestion? Thanks... Vaclav Bilek From v.bilek at 1art.cz Thu Sep 10 07:04:07 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Thu, 10 Sep 2009 09:04:07 +0200 Subject: returning expired data In-Reply-To: <4AA8A312.1080109@1art.cz> References: <4AA8A312.1080109@1art.cz> Message-ID: <4AA8A4E7.1090106@1art.cz> Do not know if it is related but this sometime apears in the log: varnishd[12160]: Child (12161) died signal=6 varnishd[12160]: Child (12161) Panic message: Assert error in Tcheck(), cache.h line 648:#012 Condition((t.e) != 0) not true. thread = (cache-worker)sp = 0x7fd7a1952008 {#012 fd = 1213, id = 1213, xid = 905469220,#012 client = 213.220.224.168:1871,#012 step = STP_PIPE,#012 handling = pipe,#012 err_code = 400, err_reason = (null),#012 ws = 0x7fd7a1952078 { #012 id = "sess",#012 {s,f,r,e} = {0x7fd7a1952808,,+601,(nil),+16384},#012 },#012 worker = 0x7fd83b365be0 {#012 },#012 vcl = {#012 srcname = {#012 "input",#012 "Default",#012 },#012 },#012},#012 varnishd[12160]: Child cleanup complete varnishd[12160]: child (3092) Started varnishd[12160]: Child (3092) said Closed fds: 4 5 8 9 11 12 varnishd[12160]: Child (3092) said Child starts varnishd[12160]: Child (3092) said managed to mmap 30010953728 bytes of 30010953728 varnishd[12160]: Child (3092) said Ready V?clav B?lek napsal(a): > Helo > > > I have a problem that varnish sometimes returns expired data. > > The ttl of objects is from 1 to 10 seconds but varnish retuned objects > older than tens of minutes. > Grace is set to 60s. > default ttl to 60s. > > Age header of such old object had negative value... > Age: -6643 or > Age: -4803 > > Any sugestion? > > Thanks... > Vaclav Bilek > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc From v.bilek at 1art.cz Thu Sep 10 08:10:44 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Thu, 10 Sep 2009 10:10:44 +0200 Subject: client keepalive on high trafic site In-Reply-To: <4AA63E72.4010101@1art.cz> References: <32157.1252407202@critter.freebsd.dk> <4AA63E72.4010101@1art.cz> Message-ID: <4AA8B484.80708@1art.cz> > >> If it is too much for that, one dirty workaround is to define >> yourself as a backend, and send all IE6 to pass with yourself >> as backend, that way you get a cache-hit, but can use vcl_fetch >> on the pass transaction to nuke the connection closed header. Pass for IE6 did not solve the problem, I had to pipe it which means its uncachable :( are there any workarounds or tweaks for IE in Varnish? ... for example when IE asks for something by http1.1 protokol why is the answer in http 1.0 ? Is there any way how to force response to be the same protocol version? Vaclav Bilek From v.bilek at 1art.cz Thu Sep 10 08:22:44 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Thu, 10 Sep 2009 10:22:44 +0200 Subject: performance scalability of a multi-core In-Reply-To: <1D5E4E47-4B6B-4811-A290-4128246D552A@slide.com> References: <75856.1252513828@critter.freebsd.dk> <1D5E4E47-4B6B-4811-A290-4128246D552A@slide.com> Message-ID: <4AA8B754.1080706@1art.cz> > > I guess my point is that certain use cases (some valid, some not, some > involving bad pthread libraries in distributions (lots of them out > there!)) How can I identify if our pthread libraries are in trouble? From kristian at redpill-linpro.com Thu Sep 10 09:45:31 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Thu, 10 Sep 2009 11:45:31 +0200 Subject: returning expired data In-Reply-To: <4AA8A4E7.1090106@1art.cz> References: <4AA8A312.1080109@1art.cz> <4AA8A4E7.1090106@1art.cz> Message-ID: <20090910094531.GA4779@kjeks.redpill-linpro.com> On Thu, Sep 10, 2009 at 09:04:07AM +0200, V?clav B?lek wrote: > Do not know if it is related but this sometime apears in the log: Not likely to be related, no. > varnishd[12160]: Child (12161) died signal=6 > varnishd[12160]: Child (12161) Panic message: Assert error in Tcheck(), > cache.h line 648:#012 Condition((t.e) != 0) not true. thread = > (cache-worker)sp = 0x7fd7a1952008 {#012 fd = 1213, id = 1213, xid = > 905469220,#012 client = 213.220.224.168:1871,#012 step = STP_PIPE,#012 > handling = pipe,#012 err_code = 400, err_reason = (null),#012 ws = > 0x7fd7a1952078 { #012 id = "sess",#012 {s,f,r,e} = > {0x7fd7a1952808,,+601,(nil),+16384},#012 },#012 worker = > 0x7fd83b365be0 {#012 },#012 vcl = {#012 srcname = {#012 > "input",#012 "Default",#012 },#012 },#012},#012 > varnishd[12160]: Child cleanup complete > varnishd[12160]: child (3092) Started > varnishd[12160]: Child (3092) said Closed fds: 4 5 8 9 11 12 > varnishd[12160]: Child (3092) said Child starts > varnishd[12160]: Child (3092) said managed to mmap 30010953728 bytes of > 30010953728 > varnishd[12160]: Child (3092) said Ready This is some sort of bug or mishap; Varnish is throwing an assert error and essentially emptying your cache before the child is restarted. In your case, this happens in STP_PIPE, so when you're piping some data. > V?clav B?lek napsal(a): > > I have a problem that varnish sometimes returns expired data. > > > > The ttl of objects is from 1 to 10 seconds but varnish retuned objects > > older than tens of minutes. > > Grace is set to 60s. > > default ttl to 60s. Can you attach all the header-data you have regarding this issue? > > Age header of such old object had negative value... > > Age: -6643 or > > Age: -4803 Looks fairly strange.... -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From v.bilek at 1art.cz Thu Sep 10 09:54:00 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Thu, 10 Sep 2009 11:54:00 +0200 Subject: returning expired data In-Reply-To: <20090910094531.GA4779@kjeks.redpill-linpro.com> References: <4AA8A312.1080109@1art.cz> <4AA8A4E7.1090106@1art.cz> <20090910094531.GA4779@kjeks.redpill-linpro.com> Message-ID: <4AA8CCB8.7080208@1art.cz> >> V?clav B?lek napsal(a): >>> I have a problem that varnish sometimes returns expired data. >>> >>> The ttl of objects is from 1 to 10 seconds but varnish retuned objects >>> older than tens of minutes. >>> Grace is set to 60s. >>> default ttl to 60s. > > Can you attach all the header-data you have regarding this issue? on one object we are trying to save trafic so the only other haeder is: "Connection: close" on other objects: Vary Accept-Encoding Last-Modified Thu, 10 Sep 2009 09:50:35 GMT Cache-Control no-cache,no-store,must-revalidate Content-Type text/plain; charset=utf-8 Content-Length 1 Date Thu, 10 Sep 2009 09:50:50 GMT Age 0 #negative value if old data Connection close > >>> Age header of such old object had negative value... >>> Age: -6643 or >>> Age: -4803 > > Looks fairly strange.... > From v.bilek at 1art.cz Thu Sep 10 10:30:29 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Thu, 10 Sep 2009 12:30:29 +0200 Subject: returning expired data In-Reply-To: <20090910094531.GA4779@kjeks.redpill-linpro.com> References: <4AA8A312.1080109@1art.cz> <4AA8A4E7.1090106@1art.cz> <20090910094531.GA4779@kjeks.redpill-linpro.com> Message-ID: <4AA8D545.4090405@1art.cz> is it posible that it is related to clock shift we had after reboot? the time was after reboot for few minutes set 2 hours forward ( before ntp corects that) ... we forgot to add /dev/rtc to the kernel >>> I have a problem that varnish sometimes returns expired data. >>> >>> The ttl of objects is from 1 to 10 seconds but varnish retuned objects >>> older than tens of minutes. >>> Grace is set to 60s. >>> default ttl to 60s. > > Can you attach all the header-data you have regarding this issue? > >>> Age header of such old object had negative value... >>> Age: -6643 or >>> Age: -4803 > > Looks fairly strange.... > From v.bilek at 1art.cz Thu Sep 10 11:13:09 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Thu, 10 Sep 2009 13:13:09 +0200 Subject: returning expired data SOLVED In-Reply-To: <4AA8D545.4090405@1art.cz> References: <4AA8A312.1080109@1art.cz> <4AA8A4E7.1090106@1art.cz> <20090910094531.GA4779@kjeks.redpill-linpro.com> <4AA8D545.4090405@1art.cz> Message-ID: <4AA8DF45.7070404@1art.cz> V?clav B?lek napsal(a): > is it posible that it is related to clock shift we had after reboot? > the time was after reboot for few minutes set 2 hours forward ( before > ntp corects that) ... we forgot to add /dev/rtc to the kernel > confirmed... it was problem with the time shift From kb+varnish at slide.com Thu Sep 10 16:32:46 2009 From: kb+varnish at slide.com (Ken Brownfield) Date: Thu, 10 Sep 2009 09:32:46 -0700 Subject: performance scalability of a multi-core In-Reply-To: <4AA8B754.1080706@1art.cz> References: <75856.1252513828@critter.freebsd.dk> <1D5E4E47-4B6B-4811-A290-4128246D552A@slide.com> <4AA8B754.1080706@1art.cz> Message-ID: <5C379B2C-12AF-415D-A80A-FAC0F0D38422@slide.com> My weapon of choice there would be oprofile, run something like this under high load and/or when you have a lot of threads active: opcontrol --init # You'll want a debug kernel # For example, the Ubuntu package is linux-image-debug-server opcontrol --setup --vmlinux=/boot/vmlinux-2.6.24-server opcontrol --reset ; opcontrol --start ; sleep 60 ; opcontrol --stop opreport -r -l (with and without -l) At the bottom you'll see the most costly calls and any obvious stand- outs. My runs on Varnish show an interestingly high amount of timezone stuff in libc, but no real low-hanging fruit. -- kb On Sep 10, 2009, at 1:22 AM, V?clav B?lek wrote: > > >> >> I guess my point is that certain use cases (some valid, some not, >> some >> involving bad pthread libraries in distributions (lots of them out >> there!)) > > How can I identify if our pthread libraries are in trouble? From joe at joetify.com Thu Sep 10 19:03:56 2009 From: joe at joetify.com (Joe Williams) Date: Thu, 10 Sep 2009 12:03:56 -0700 Subject: if-none-match status Message-ID: <20090910120356.27ae58af@der-dieb> I am just curious in the status of if-none-match support from the commit logs and this mailing list email (http://www.mail-archive.com/varnish-misc at projects.linpro.no/msg02738.html) it looks like its been committed. As was asked in the aforementioned email are there any examples of how to use/configure/etc this new feature? Thanks. -Joe -- Name: Joseph A. Williams Email: joe at joetify.com Blog: http://www.joeandmotorboat.com/ From niallo at metaweb.com Thu Sep 10 21:27:40 2009 From: niallo at metaweb.com (Niall O'Higgins) Date: Thu, 10 Sep 2009 14:27:40 -0700 Subject: Sub-second granularity logs Message-ID: <20090910212740.GV6403@digdug.corp.631h.metaweb.com> Hi all, I am currently working on a custom logger for Varnish. One of the things we need is sub-second granularity for the start request log event (millisecond). It seems that currently Varnish only writes timestamps with a granularity of one second. I imagine that the reason for not providing finer-grained timestamps is to avoid additional gettimeofday(2) system calls. That makes sense, however for our application, we are probably willing to take the performance hit for higher accuracy in our statistics. So, any thoughts or hints on this feature? I am starting work on a patch to implement this for our own needs, if there was general interest from the community I'd be happy to contribute it back, and take into account any feedback. Thanks. -- Niall O'Higgins Software Engineer Metaweb Technologies, Inc. From phk at phk.freebsd.dk Thu Sep 10 21:33:09 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 10 Sep 2009 21:33:09 +0000 Subject: Sub-second granularity logs In-Reply-To: Your message of "Thu, 10 Sep 2009 14:27:40 MST." <20090910212740.GV6403@digdug.corp.631h.metaweb.com> Message-ID: <1499.1252618389@critter.freebsd.dk> In message <20090910212740.GV6403 at digdug.corp.631h.metaweb.com>, Niall O'Higgin s writes: >Hi all, > >I am currently working on a custom logger for Varnish. One of the >things we need is sub-second granularity for the start request log >event (millisecond). All the internal timestamps are in nanosecond resolution, just find the right printf and fix the format. Getting a timespec is just as expensive as getting a time_t, because time(2) simply calls clock_gettime(2) under your feet. Which shmrecord are we talking about ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From niallo at metaweb.com Thu Sep 10 21:54:39 2009 From: niallo at metaweb.com (Niall O'Higgins) Date: Thu, 10 Sep 2009 14:54:39 -0700 Subject: Sub-second granularity logs In-Reply-To: <1499.1252618389@critter.freebsd.dk> References: <1499.1252618389@critter.freebsd.dk> Message-ID: <20090910215439.GX6403@digdug.corp.631h.metaweb.com> On Thu, Sep 10, 2009 at 09:33:09PM +0000, Poul-Henning Kamp wrote: > All the internal timestamps are in nanosecond resolution, just > find the right printf and fix the format. > > Getting a timespec is just as expensive as getting a time_t, because > time(2) simply calls clock_gettime(2) under your feet. Ok great, of course what you say makes sense now that I think about it. So it should just be a matter of exposing this time in various ways. > Which shmrecord are we talking about ? I have only started looking through the existing shmlog tags to see exactly what corresponds to our needs, but basically we are looking for millisecond accuracy at each of these points: - When the cache received the request from the client - When the request header was forwarded by the cache - When the first byte of the response header was recieved back - When the end of the response header was received or detected - When the first byte of the response body was received or detected - When the cache finished sending the response back to the client Thanks! -- Niall O'Higgins Software Engineer Metaweb Technologies, Inc. From v.bilek at 1art.cz Fri Sep 11 14:06:33 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Fri, 11 Sep 2009 16:06:33 +0200 Subject: varnish Connection: close and IE6 Message-ID: <4AAA5969.5010002@1art.cz> Hello We are trying to deploy varnish in production but IE6 is a big problem for us. On the first try of lounching varnish we learned that it is imposible to use keepalive because of the number of clients. So we tried to disable keepalive by adding : set resp.http.Connection="close"; to the deliver. All worked fine except of IE6, where page load time raise extremely. So we tried to disable keepalive:off for IE6 by piping it through, but the pipe in varnish is probably buggy we get many asserts and varnish became unusable. Then web investigate further what is the reason of IE slow pageloads... the reason is that IE on some objects (for example png), doesn't care the "Connection: close" and doesn't close connection after getting content, then after varnish session timeout varnish closes the connection and IE gets another img and waits and so on... Question: Is there any way around that? Thanks Vaclav Bilek From ender at tuenti.com Fri Sep 11 16:40:55 2009 From: ender at tuenti.com (David =?iso-8859-1?q?Mart=EDnez_Moreno?=) Date: Fri, 11 Sep 2009 18:40:55 +0200 Subject: TCP RST in varnish reply In-Reply-To: <1252418553.2828.58.camel@cranfeld-linux.evolver.de> References: <1252418553.2828.58.camel@cranfeld-linux.evolver.de> Message-ID: <200909111840.55581.ender@tuenti.com> El Martes, 8 de Septiembre de 2009, Carsten Ranfeld escribi?: > Hello, > > we use varnish at a customer's site for about 2 weeks and it sped up > delivery performance a lot. Although we experience some problems, > including the following: > > A HTTP POST request is made. For the respective URL part in vcl_recv() > as well as in vcl_fetch() we configured a pass explicitly to circumvent > any caching. So what happens? > > 1. the client sends its request to varnish > 2. varnish passes the request to one of two backends > 3. the backend replies > 4. varnish replies to the client - and here the problem occurs - in the > middle of the HTTP response the TCP connection is reset by varnish (or > the machine). So content is not delivered fully. varnish log doesn't > show any errors. Tests using different clients show a comparable > behavior - TCP RST and partial delivered content, just the size of the > content delivered differs. > > > > Did anyone experience a similar problem? I searched through > documentation, mailing lists and Google. Unfortunately not a helpful > page was found. > > Here the important configuration parts: > > sub vcl_recv { > > # just to make sure, we always pass any POST requests > if (req.request != "GET" && req.request != "HEAD") { > /* We only deal with GET and HEAD by default */ > pass; > } > > if (req.url !~ "^/(foo|bar|_em_cms)") { > lookup; > } > } > sub vcl_fetch { > if (req.url ~ "^/_em_cms/") { > pass; > } > } > > Regards and thanks in advance > Carsten Hello, Carsten, can you run and post the following in the proxy? sysctl -a|grep tw Do you have any other sysctl setting altered from vanilla kernel? Best regards, Ender. -- I am a married potato! I am a married potato! -- Mr. Potato (Toy Story 2). -- Responsable de sistemas tuenti.com From carsten.ranfeld at evolver.de Fri Sep 11 18:43:39 2009 From: carsten.ranfeld at evolver.de (Carsten Ranfeld) Date: Fri, 11 Sep 2009 20:43:39 +0200 Subject: TCP RST in varnish reply In-Reply-To: <200909111840.55581.ender@tuenti.com> References: <1252418553.2828.58.camel@cranfeld-linux.evolver.de> <200909111840.55581.ender@tuenti.com> Message-ID: <1252694620.14055.8.camel@happy> Hello David, Am Freitag, den 11.09.2009, 18:40 +0200 schrieb David Mart?nez Moreno: [...] > > 4. varnish replies to the client - and here the problem occurs - in > > the middle of the HTTP response the TCP connection is reset by > > varnish (or the machine). So content is not delivered fully. varnish > > log doesn't show any errors. Tests using different clients show a > > comparable behavior - TCP RST and partial delivered content, just > > the size of the content delivered differs. > > > > > > > > Did anyone experience a similar problem? I searched through > > documentation, mailing lists and Google. Unfortunately not a helpful > > page was found. > > > > Here the important configuration parts: [...] > > Hello, Carsten, can you run and post the following in the proxy? > > sysctl -a|grep tw > Sure: net.ipv4.tcp_max_tw_buckets = 180000 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_tw_reuse = 0 As far as I know there should be a log message created, if the tw_bucket_limit was hit. We didn't found anything similar. > Do you have any other sysctl setting altered from vanilla kernel? > We have standard settings: net.ipv4.conf.default.rp_filter=1 net.ipv4.icmp_echo_ignore_broadcasts = 1 and modified settings: net.core.rmem_max=8388608 net.core.wmem_max=8388608 net.core.rmem_default=8388608 net.core.wmem_default=8388608 net.ipv4.tcp_rmem=4096 87380 8388608 net.ipv4.tcp_wmem=4096 16384 8388608 net.ipv4.tcp_mem=196608 262144 8388608 Best regards, Carsten From joe at joetify.com Mon Sep 14 02:14:12 2009 From: joe at joetify.com (Joe Williams) Date: Sun, 13 Sep 2009 19:14:12 -0700 Subject: if-none-match status In-Reply-To: <20090910120356.27ae58af@der-dieb> References: <20090910120356.27ae58af@der-dieb> Message-ID: <20090913191412.4a30b791@der-dieb> Anyone tried this out yet? Devs, Have any info on how to use it? -Joe On Thu, 10 Sep 2009 12:03:56 -0700 Joe Williams wrote: > > I am just curious in the status of if-none-match support from the > commit logs and this mailing list email > (http://www.mail-archive.com/varnish-misc at projects.linpro.no/msg02738.html) > it looks like its been committed. As was asked in the > aforementioned email are there any examples of how to > use/configure/etc this new feature? > > Thanks. > > -Joe > > -- Name: Joseph A. Williams Email: joe at joetify.com Blog: http://www.joeandmotorboat.com/ From david.birdsong at gmail.com Mon Sep 14 05:51:56 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Sun, 13 Sep 2009 22:51:56 -0700 Subject: smart health check response? Message-ID: So far my experience with Varnish has been that it runs great until it becomes overworked and consumes all resources on a machine and then finally doesn't respond to the parent's health check at which point the parent restarts it -all of cache is destroyed. We have a pretty large working set and so we're still experimenting with Varnish configurations that can handle our extremely random IO load. One thing I've found is that Varnish will try it's hardest to serve from cache, which I can't argue against in theory, but in practice it would be nice if it just gave up sometimes and fell back on the backends to keep itself healthy during times of extreme IOwait. I'd rather Varnish keep it's cache and let the backends pick up some slack than have it get killed by it's parent watcher PID. Once persistent storage is available, frequent restarts may not be such a big deal. So what I'm thinking is to work around this for now with HAproxy. Have it health check or track connections and decide to go straight to the backends given some conditions. This would give varnish a chance to serve it's existing requests, potentially let IOwait drop, alert us(we can add more varnish instances) and then pass health checks again. Haproxy polls varnishd, I've already added a simple health check in vcl_recv: if (req.url ~ "^/hc/vnsh\.health") { error 704 ""; } and vcl_error: sub vcl_error { if (obj.status == 704) { set obj.status = 204; deliver; } } ...are any of the stats available inside vcl_error? something like number of running threads or something else that would empower the health check with some introspection? The alternative is for me to have HAproxy track connections that could correlate to max_threads. Any feedback? Have I missed any settings in Varnish that could perform this same thing? From v.bilek at 1art.cz Mon Sep 14 14:07:44 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Mon, 14 Sep 2009 16:07:44 +0200 Subject: varnish Connection: close and IE6 In-Reply-To: <4AAA5969.5010002@1art.cz> References: <4AAA5969.5010002@1art.cz> Message-ID: <4AAE4E30.5000607@1art.cz> V?clav B?lek napsal(a): > Hello > > > We are trying to deploy varnish in production but IE6 is a big problem > for us. > > On the first try of lounching varnish we learned that it is imposible to > use keepalive because of the number of clients. So we tried to disable > keepalive by adding : > > set resp.http.Connection="close"; > > to the deliver. All worked fine except of IE6, where page load time > raise extremely. So we tried to disable keepalive:off for IE6 by piping > it through, but the pipe in varnish is probably buggy we get many > asserts and varnish became unusable. > Then web investigate further what is the reason of IE slow pageloads... > the reason is that IE on some objects (for example png), doesn't care > the "Connection: close" and doesn't close connection after getting > content, then after varnish session timeout varnish closes the > connection and IE gets another img and waits and so on... > > > Question: > > Is there any way around that? Helo With knowledge of that we dont know exactly how to patch for disabling keepalive we tried nasty hack: diff bin/varnishdcache_acceptor_epoll.c bin/varnishdcache_acceptor_epoll.c.new 114c114 < deadline = TIM_real() - params->sess_timeout; --- > // deadline = TIM_real() - params->sess_timeout; 117c117 < if (sp->t_open > deadline) --- > // if (sp->t_open > deadline) it worked in testing enviroment but in real trafic it was even worse (IE6 hanging for long time). I will be glad for any advice. Thanks Vaclav Bilek From l at lrowe.co.uk Mon Sep 14 16:49:38 2009 From: l at lrowe.co.uk (Laurence Rowe) Date: Mon, 14 Sep 2009 17:49:38 +0100 Subject: varnish redundancy In-Reply-To: <19494.1251975715@critter.freebsd.dk> References: <19494.1251975715@critter.freebsd.dk> Message-ID: 2009/9/3 Poul-Henning Kamp : > For it to be really smart you want to use directors for the > "other_varnish" and probes to ascertain health. > > We do not have a "priority_director" (we probably should have) > but you can get much the same effect with the random director > and very uneven weights: > > ? ? ? ?director other_backend { > ? ? ? ? ? ? ? ?{ .backend = b_other_varnish ; weight=100000; } > ? ? ? ? ? ? ? ?{ .backend = b_real_backend ; weight=1; } > ? ? ? ?} > > Should the probes mark the other_varnish unhealthy, all trafic > will go to the real backend. Is there an advantage in using a director here instead of the following? sub vcl_recv { set req.backend = haproxy01; if (!req.backend.healthy) { set req.backend = haproxy02; } ... Laurence From kristian at redpill-linpro.com Mon Sep 14 16:54:40 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Mon, 14 Sep 2009 18:54:40 +0200 Subject: varnish redundancy In-Reply-To: References: <19494.1251975715@critter.freebsd.dk> Message-ID: <20090914165437.GE7745@kjeks.linpro.no> On Mon, Sep 14, 2009 at 05:49:38PM +0100, Laurence Rowe wrote: > 2009/9/3 Poul-Henning Kamp : > > > For it to be really smart you want to use directors for the > > "other_varnish" and probes to ascertain health. > > > > We do not have a "priority_director" (we probably should have) > > but you can get much the same effect with the random director > > and very uneven weights: > > > > ? ? ? ?director other_backend { > > ? ? ? ? ? ? ? ?{ .backend = b_other_varnish ; weight=100000; } > > ? ? ? ? ? ? ? ?{ .backend = b_real_backend ; weight=1; } > > ? ? ? ?} > > > > Should the probes mark the other_varnish unhealthy, all trafic > > will go to the real backend. > > Is there an advantage in using a director here instead of the following? > > sub vcl_recv { > set req.backend = haproxy01; > if (!req.backend.healthy) { > set req.backend = haproxy02; > } Both approaches have their benefits. One benefit of using a random director is that it can have multiple fallbacks, but then again, you can achieve that by having two directors: a pool of primary directors (or a single backend), and do if (!req.backend.healthy) { set req.backend = fallbackdirector; } Using a single director with the weight-approach does have a benefit of being nicer to read and maintain, but will give the fallback backends some marginal amount of traffic even when the primary backends are healthy. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From phk at phk.freebsd.dk Mon Sep 14 13:05:32 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 14 Sep 2009 13:05:32 +0000 Subject: smart health check response? In-Reply-To: Your message of "Sun, 13 Sep 2009 22:51:56 MST." Message-ID: <1207.1252933532@critter.freebsd.dk> In message , David Birdsong writes: >...are any of the stats available inside vcl_error? Not directly, but you can relatively easily get your hands on them if you use inline C-code. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From david.birdsong at gmail.com Mon Sep 14 18:57:25 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Mon, 14 Sep 2009 11:57:25 -0700 Subject: obj.ttl derived? Message-ID: How is obj.ttl derived when both Expires and max-age is set by the backends? We had a case where the backend was setting Expires to 60seconds after the request and max-age was 5184000. Additionally in vcl_fetch: sub vcl_fetch { if (obj.ttl < 9000s) { set obj.ttl = 9000s; } } What would obj.ttl be set to given the Expires and max-age contradiction? From kristian at redpill-linpro.com Mon Sep 14 19:22:36 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Mon, 14 Sep 2009 21:22:36 +0200 Subject: obj.ttl derived? In-Reply-To: References: Message-ID: <20090914192236.GC4472@kjeks.linpro.no> On Mon, Sep 14, 2009 at 11:57:25AM -0700, David Birdsong wrote: > How is obj.ttl derived when both Expires and max-age is set by the backends? > > We had a case where the backend was setting Expires to 60seconds after > the request and max-age was 5184000. Additionally in vcl_fetch: > > sub vcl_fetch { > if (obj.ttl < 9000s) { > set obj.ttl = 9000s; > } > } > > What would obj.ttl be set to given the Expires and max-age contradiction? If s-maxage is set, use that as default, otherwise use max-age as default. If Expires is sooner than the default, use that. So in your example, the ttl should be 60s. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From david.birdsong at gmail.com Mon Sep 14 19:38:06 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Mon, 14 Sep 2009 12:38:06 -0700 Subject: obj.ttl derived? In-Reply-To: <20090914192236.GC4472@kjeks.linpro.no> References: <20090914192236.GC4472@kjeks.linpro.no> Message-ID: awesome, thanks. this explains poor cache-hit ratio. On Mon, Sep 14, 2009 at 12:22 PM, Kristian Lyngstol wrote: > On Mon, Sep 14, 2009 at 11:57:25AM -0700, David Birdsong wrote: >> How is obj.ttl derived when both Expires and max-age is set by the backends? >> >> We had a case where the backend was setting Expires to 60seconds after >> the request and max-age was 5184000. ?Additionally in vcl_fetch: >> >> sub vcl_fetch { >> ? ? ? ? if (obj.ttl < 9000s) { >> ? ? ? ? ? ? ? ? set obj.ttl = 9000s; >> ? ? ? ? } >> } >> >> What would obj.ttl be set to given the Expires and max-age contradiction? > > If s-maxage is set, use that as default, otherwise use max-age as default. > > If Expires is sooner than the default, use that. > > So in your example, the ttl should be 60s. > > -- > Kristian Lyngst?l > Redpill Linpro AS > Tlf: +47 21544179 > Mob: +47 99014497 > From david.birdsong at gmail.com Mon Sep 14 19:43:20 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Mon, 14 Sep 2009 12:43:20 -0700 Subject: obj.ttl derived? In-Reply-To: References: <20090914192236.GC4472@kjeks.linpro.no> Message-ID: On Mon, Sep 14, 2009 at 12:38 PM, David Birdsong wrote: > awesome, thanks. ?this explains poor cache-hit ratio. > > On Mon, Sep 14, 2009 at 12:22 PM, Kristian Lyngstol > wrote: >> On Mon, Sep 14, 2009 at 11:57:25AM -0700, David Birdsong wrote: >>> How is obj.ttl derived when both Expires and max-age is set by the backends? >>> >>> We had a case where the backend was setting Expires to 60seconds after >>> the request and max-age was 5184000. ?Additionally in vcl_fetch: >>> >>> sub vcl_fetch { >>> ? ? ? ? if (obj.ttl < 9000s) { >>> ? ? ? ? ? ? ? ? set obj.ttl = 9000s; >>> ? ? ? ? } >>> } >>> >>> What would obj.ttl be set to given the Expires and max-age contradiction? >> >> If s-maxage is set, use that as default, otherwise use max-age as default. >> >> If Expires is sooner than the default, use that. >> >> So in your example, the ttl should be 60s. oh, to clarify... would the stanza in vcl_fetch override the obj.ttl which was set by the Expire header? sub vcl_fetch { if (obj.ttl < 9000s) { set obj.ttl = 9000s; } } or do I have the order wrong? >> >> -- >> Kristian Lyngst?l >> Redpill Linpro AS >> Tlf: +47 21544179 >> Mob: +47 99014497 >> > From kristian at redpill-linpro.com Mon Sep 14 19:45:34 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Mon, 14 Sep 2009 21:45:34 +0200 Subject: obj.ttl derived? In-Reply-To: References: <20090914192236.GC4472@kjeks.linpro.no> Message-ID: <20090914194534.GF4472@kjeks.linpro.no> On Mon, Sep 14, 2009 at 12:43:20PM -0700, David Birdsong wrote: > On Mon, Sep 14, 2009 at 12:38 PM, David Birdsong > wrote: > > awesome, thanks. ?this explains poor cache-hit ratio. > > > > On Mon, Sep 14, 2009 at 12:22 PM, Kristian Lyngstol > > wrote: > >> On Mon, Sep 14, 2009 at 11:57:25AM -0700, David Birdsong wrote: > >>> How is obj.ttl derived when both Expires and max-age is set by the backends? > >>> > >>> We had a case where the backend was setting Expires to 60seconds after > >>> the request and max-age was 5184000. ?Additionally in vcl_fetch: > >>> > >>> sub vcl_fetch { > >>> ? ? ? ? if (obj.ttl < 9000s) { > >>> ? ? ? ? ? ? ? ? set obj.ttl = 9000s; > >>> ? ? ? ? } > >>> } > >>> > >>> What would obj.ttl be set to given the Expires and max-age contradiction? > >> > >> If s-maxage is set, use that as default, otherwise use max-age as default. > >> > >> If Expires is sooner than the default, use that. > >> > >> So in your example, the ttl should be 60s. > oh, to clarify... > > would the stanza in vcl_fetch override the obj.ttl which was set by > the Expire header? > sub vcl_fetch { > if (obj.ttl < 9000s) { > set obj.ttl = 9000s; > } > } > > or do I have the order wrong? Ah, yes, it would, I misread the < as >, sorry about that. But are you sure you're getting hits at all? -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From david.birdsong at gmail.com Mon Sep 14 19:52:15 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Mon, 14 Sep 2009 12:52:15 -0700 Subject: obj.ttl derived? In-Reply-To: <20090914194534.GF4472@kjeks.linpro.no> References: <20090914192236.GC4472@kjeks.linpro.no> <20090914194534.GF4472@kjeks.linpro.no> Message-ID: On Mon, Sep 14, 2009 at 12:45 PM, Kristian Lyngstol wrote: > On Mon, Sep 14, 2009 at 12:43:20PM -0700, David Birdsong wrote: >> On Mon, Sep 14, 2009 at 12:38 PM, David Birdsong >> wrote: >> > awesome, thanks. ?this explains poor cache-hit ratio. >> > >> > On Mon, Sep 14, 2009 at 12:22 PM, Kristian Lyngstol >> > wrote: >> >> On Mon, Sep 14, 2009 at 11:57:25AM -0700, David Birdsong wrote: >> >>> How is obj.ttl derived when both Expires and max-age is set by the backends? >> >>> >> >>> We had a case where the backend was setting Expires to 60seconds after >> >>> the request and max-age was 5184000. ?Additionally in vcl_fetch: >> >>> >> >>> sub vcl_fetch { >> >>> ? ? ? ? if (obj.ttl < 9000s) { >> >>> ? ? ? ? ? ? ? ? set obj.ttl = 9000s; >> >>> ? ? ? ? } >> >>> } >> >>> >> >>> What would obj.ttl be set to given the Expires and max-age contradiction? >> >> >> >> If s-maxage is set, use that as default, otherwise use max-age as default. >> >> >> >> If Expires is sooner than the default, use that. >> >> >> >> So in your example, the ttl should be 60s. >> oh, to clarify... >> >> would the stanza in vcl_fetch override the obj.ttl which was set by >> the Expire header? >> sub vcl_fetch { >> ? ? ? ?if (obj.ttl < 9000s) { >> ? ? ? ? ? ? ? ?set obj.ttl = 9000s; >> ? ? ? ?} >> } >> >> or do I have the order wrong? > > Ah, yes, it would, I misread the < as >, sorry about that. > > But are you sure you're getting hits at all? varnistat shows hitrate around .54 for 10 and 1000 second averages. N expired objects is anywhere between 10-40 second though and varnishd has only been running for about 12 hours. i expect the hit rate to drop sharply as peak traffic continues to diminish for the day. i think a 9000s obj.ttl is a complete mistake on our part. so i've followed some of the steps found here http://varnish.projects.linpro.no/wiki/VCLExampleLongerCaching on 1 of our servers. i'm going to compare it in a day or so. why do you ask though? > > -- > Kristian Lyngst?l > Redpill Linpro AS > Tlf: +47 21544179 > Mob: +47 99014497 > From kristian at redpill-linpro.com Mon Sep 14 20:01:14 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Mon, 14 Sep 2009 22:01:14 +0200 Subject: obj.ttl derived? In-Reply-To: References: <20090914192236.GC4472@kjeks.linpro.no> <20090914194534.GF4472@kjeks.linpro.no> Message-ID: <20090914200114.GG4472@kjeks.linpro.no> On Mon, Sep 14, 2009 at 12:52:15PM -0700, David Birdsong wrote: (snip snip) > > But are you sure you're getting hits at all? > > varnistat shows hitrate around .54 for 10 and 1000 second averages. N > expired objects is anywhere between 10-40 second though and varnishd > has only been running for about 12 hours. i expect the hit rate to > drop sharply as peak traffic continues to diminish for the day. Depending on the content, that might be a very low hit-rate. > i think a 9000s obj.ttl is a complete mistake on our part. so i've > followed some of the steps found here > http://varnish.projects.linpro.no/wiki/VCLExampleLongerCaching > on 1 of our servers. i'm going to compare it in a day or so. > > why do you ask though? It's fairly common to have pages that aren't cached. If I'm going to give you a single tip right now, it's to look at the output of: varnishtop -i TxURL That's a sorted list of what urls Varnish requests from your backend (in other words: cache misses). Most items should have a 1 or less next to them. The number represents a decaying average, and you should know why if you have items there that have double-digit representation. It should also give you an indication of whether this is a problem with your TTL or a few pages that aren't cached at all, all though the pass/hitpass counter in varnishstat will tell you that too. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From phk at phk.freebsd.dk Tue Sep 15 10:28:44 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 15 Sep 2009 10:28:44 +0000 Subject: travel: eurobsdcon2009 and vug1 Message-ID: <21357.1253010524@critter.freebsd.dk> Hi guys, I will be semi-offline from tomorrown, while I attend EuroBSDcon2009 in Cambridge. After EuroBSDcon2009 I will take a train to London to attend the very first V(arnish) U(ser) G(roup) meeting monday and tuesday. (http://varnish.projects.linpro.no/wiki/200909UserGroupMeeting) Wednesday I'll hang out with an old friend and thursday I'll start to dig myself out of the heaps which have accumulated. See you there... Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From v.bilek at 1art.cz Wed Sep 16 05:32:11 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Wed, 16 Sep 2009 07:32:11 +0200 Subject: varnish Connection: close and IE6 In-Reply-To: References: <4AAA5969.5010002@1art.cz> <4AAE4E30.5000607@1art.cz> Message-ID: <4AB0785B.2030400@1art.cz> Laurence Rowe napsal(a): > You can easily up the maximum open file descriptor limit from the > default of 1024 with `ulimit -n some_large_value` in the script used > to start varnish (must be done as root). Varnish should cope fine with > a large number of connections. > > Laurence > We already did but that is not a solution... with ses_timeout set to 1s varnish scales nice till 10K established connection but then performance drops. Our estimation of number of clients is between 180K to 250K each hiting our servers at least every 7 seconds ( ajax live data update) + x request when they click on anything... we serve that traffic by 4 loadbalanced varnish servers but with keepalive on its not possible, in peak rates we get around 8K established connection per server even with keepalive off (Connection: close header) but there is that problem with IE6 that blocks us. Vaclav Bilek From kb+varnish at slide.com Wed Sep 16 16:54:25 2009 From: kb+varnish at slide.com (Ken Brownfield) Date: Wed, 16 Sep 2009 09:54:25 -0700 Subject: #551: Varnish Crash: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188 In-Reply-To: <063.820eed1cc8278cb1996e9291e33fec3f@projects.linpro.no> References: <054.929506018381bcb32841815315546cac@projects.linpro.no> <063.820eed1cc8278cb1996e9291e33fec3f@projects.linpro.no> Message-ID: Ah, I stand corrected. But I was definitely having random crashes when I enabled the vcl_fetch() section below: sub vcl_recv { ... set req.http.Unmodified-Host = req.http.Host; set req.http.Unmodified-URL = req.url; ... } sub vcl_fetch { ... set obj.http.X-Token-URL = req.url; set obj.http.X-Original-URL = req.http.Unmodified-URL; set obj.http.X-Token-Host = req.http.Host; set obj.http.X-Original-Host = req.http.Unmodified-Host; set obj.http.X-Set-Cookie = obj.http.Set-Cookie; ... } I'm a bit loathe to reenable this to get a full stacktrace and gdb output, but if there's really nothing wrong with this I might consider it. Also, using trunk (a couple weeks ago) I can't reference obj in vcl_fetch() at all, which I assumed was an intentional side-step of the #310 bug. Thx, -- Ken On Sep 16, 2009, at 9:30 AM, Varnish wrote: > #551: Varnish Crash: Missing errorhandling code in HSH_Prepare(), > cache_hash.c > line 188 > ---------------------- > +----------------------------------------------------- > Reporter: cheerios | Owner: phk > Type: defect | Status: closed > Priority: normal | Milestone: > Component: varnishd | Version: 2.0 > Severity: normal | Resolution: invalid > Keywords: | > ---------------------- > +----------------------------------------------------- > Changes (by kristian): > > * status: new => closed > * resolution: => invalid > > Comment: > > Cheerios: I'm going to close this for now, since this sounds exactly > like > a sess_workspace issue. Feel free to re-open this if you can confirm > that > this is unaffected by sess_workspace. Further discussion should go > on the > mail list though. > > ... and: > > Replying to [comment:4 kb]: >> Possibly unrelated, but modifying obj in vcl_fetch() will cause >> crashes > (see #310); I found out the hard way. > > You mean vcl_hit. The object is safely locked in vcl_fetch, and can be > modified. > >> Odd though that setting obj.ttl specifically seems to be safe. > > Nah, not really that odd. Setting a ttl fairly atomic, while > manipulating > strings usually means copying and replacing. But this discussion > doesn't > belong here. > > -- > Ticket URL: > Varnish > The Varnish HTTP Accelerator > _______________________________________________ > varnish-bugs mailing list > varnish-bugs at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-bugs From kristian at redpill-linpro.com Wed Sep 16 17:03:11 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Wed, 16 Sep 2009 19:03:11 +0200 Subject: #551: Varnish Crash: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188 In-Reply-To: References: <054.929506018381bcb32841815315546cac@projects.linpro.no> <063.820eed1cc8278cb1996e9291e33fec3f@projects.linpro.no> Message-ID: <20090916170310.GB31173@kjeks.linpro.no> On Wed, Sep 16, 2009 at 09:54:25AM -0700, Ken Brownfield wrote: > Ah, I stand corrected. But I was definitely having random crashes > when I enabled the vcl_fetch() section below: (...) > I'm a bit loathe to reenable this to get a full stacktrace and gdb > output, but if there's really nothing wrong with this I might consider > it. Nothing wrong with it, but my first guess would be obj_workspace being overloaded, which would look similar to #551 unless you know exactly what to look for. > Also, using trunk (a couple weeks ago) I can't reference obj in > vcl_fetch() at all, which I assumed was an intentional side-step of > the #310 bug. Nope, it's just renamed beresp. For now, it's still the same thing for all intents and purposes, but the idea is to only fetch headers so as to be able to go from fetch to pipe in some unknown future. - Kristian -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From nkinkade at creativecommons.org Wed Sep 16 17:16:20 2009 From: nkinkade at creativecommons.org (Nathan Kinkade) Date: Wed, 16 Sep 2009 13:16:20 -0400 Subject: Content-Range header problem Message-ID: <1c5831080909161016m10a7e830j3b496d2bdb73e620@mail.gmail.com> Hi all, For some reason Varnish (2.0.4) doesn't seem to be respecting the Content-Range header, so our users are unable to resume large downloads that somehow got interrupted. Is this expected and normal behavior, or should this work? Do we need to add a rule in vcl_recv that passes any incoming requests containing a Content-Range header directly to the backend? Thanks, Nathan From kristian at redpill-linpro.com Wed Sep 16 17:29:58 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Wed, 16 Sep 2009 19:29:58 +0200 Subject: Content-Range header problem In-Reply-To: <1c5831080909161016m10a7e830j3b496d2bdb73e620@mail.gmail.com> References: <1c5831080909161016m10a7e830j3b496d2bdb73e620@mail.gmail.com> Message-ID: <20090916172958.GC31173@kjeks.linpro.no> Hi, On Wed, Sep 16, 2009 at 01:16:20PM -0400, Nathan Kinkade wrote: > For some reason Varnish (2.0.4) doesn't seem to be respecting the > Content-Range header, so our users are unable to resume large > downloads that somehow got interrupted. Is this expected and normal > behavior, or should this work? Do we need to add a rule in vcl_recv > that passes any incoming requests containing a Content-Range header > directly to the backend? We currently do not have support for range-headers, so you'll have to pipe these requests. Note that pass wont do the trick, since it'll still be "cached"/buffered in varnish (but immediately discarded). -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From nkinkade at creativecommons.org Wed Sep 16 17:36:00 2009 From: nkinkade at creativecommons.org (Nathan Kinkade) Date: Wed, 16 Sep 2009 13:36:00 -0400 Subject: Content-Range header problem In-Reply-To: <20090916172958.GC31173@kjeks.linpro.no> References: <1c5831080909161016m10a7e830j3b496d2bdb73e620@mail.gmail.com> <20090916172958.GC31173@kjeks.linpro.no> Message-ID: <1c5831080909161036l6e47039r37c9e9cb768b91c5@mail.gmail.com> On Wed, Sep 16, 2009 at 1:29 PM, Kristian Lyngstol wrote: > Hi, > > On Wed, Sep 16, 2009 at 01:16:20PM -0400, Nathan Kinkade wrote: >> For some reason Varnish (2.0.4) doesn't seem to be respecting the >> Content-Range header, so our users are unable to resume large >> downloads that somehow got interrupted. ?Is this expected and normal >> behavior, or should this work? ?Do we need to add a rule in vcl_recv >> that passes any incoming requests containing a Content-Range header >> directly to the backend? > > We currently do not have support for range-headers, so you'll have to pipe > these requests. Note that pass wont do the trick, since it'll still be > "cached"/buffered in varnish (but immediately discarded). Thanks. The resolution is simple, but I just wanted to be sure. In response to Michael's reply regarding why we are proxying such files, we haven't been doing it explicitly or on purpose ... it was just happening by default. We didn't have any explicit rules directories or files, though we may shortly after I finish this email. :-) Thanks again, Nathan From kristian at redpill-linpro.com Wed Sep 16 17:41:34 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Wed, 16 Sep 2009 19:41:34 +0200 Subject: Content-Range header problem In-Reply-To: <1c5831080909161036l6e47039r37c9e9cb768b91c5@mail.gmail.com> References: <1c5831080909161016m10a7e830j3b496d2bdb73e620@mail.gmail.com> <20090916172958.GC31173@kjeks.linpro.no> <1c5831080909161036l6e47039r37c9e9cb768b91c5@mail.gmail.com> Message-ID: <20090916174134.GD31173@kjeks.linpro.no> On Wed, Sep 16, 2009 at 01:36:00PM -0400, Nathan Kinkade wrote: > Thanks. The resolution is simple, but I just wanted to be sure. Always nice to verify :) By the way, I just read the labs.cc.o post [1] from when you started using Varnish and I'm curious as to how you're doing now with regards to these issues? [1] http://labs.creativecommons.org/2008/04/03/varnish-cache-at-cc/ -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From kb+varnish at slide.com Wed Sep 16 17:51:00 2009 From: kb+varnish at slide.com (Ken Brownfield) Date: Wed, 16 Sep 2009 10:51:00 -0700 Subject: #551: Varnish Crash: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188 In-Reply-To: <20090916170310.GB31173@kjeks.linpro.no> References: <054.929506018381bcb32841815315546cac@projects.linpro.no> <063.820eed1cc8278cb1996e9291e33fec3f@projects.linpro.no> <20090916170310.GB31173@kjeks.linpro.no> Message-ID: <647354B7-D835-483D-9541-1E5D13288E77@slide.com> On Sep 16, 2009, at 10:03 AM, Kristian Lyngstol wrote: > On Wed, Sep 16, 2009 at 09:54:25AM -0700, Ken Brownfield wrote: >> I'm a bit loathe to reenable this to get a full stacktrace and gdb >> output, but if there's really nothing wrong with this I might >> consider >> it. > > Nothing wrong with it, but my first guess would be obj_workspace being > overloaded, which would look similar to #551 unless you know exactly > what > to look for. I'm probably being a little too clever with my "-p obj_workspace=2048", which is normally enough. >> Also, using trunk (a couple weeks ago) I can't reference obj in >> vcl_fetch() at all, which I assumed was an intentional side-step of >> the #310 bug. > > Nope, it's just renamed beresp. For now, it's still the same thing > for all > intents and purposes, but the idea is to only fetch headers so as to > be > able to go from fetch to pipe in some unknown future. Thanks much, I think I'm all squared away with my user error. :-) Though it might be nice to more gracefully handle (or at least report) workspace overflows. -- Ken > - Kristian From kristian at redpill-linpro.com Wed Sep 16 17:54:24 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Wed, 16 Sep 2009 19:54:24 +0200 Subject: #551: Varnish Crash: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188 In-Reply-To: <647354B7-D835-483D-9541-1E5D13288E77@slide.com> References: <054.929506018381bcb32841815315546cac@projects.linpro.no> <063.820eed1cc8278cb1996e9291e33fec3f@projects.linpro.no> <20090916170310.GB31173@kjeks.linpro.no> <647354B7-D835-483D-9541-1E5D13288E77@slide.com> Message-ID: <20090916175424.GE31173@kjeks.linpro.no> On Wed, Sep 16, 2009 at 10:51:00AM -0700, Ken Brownfield wrote: (...) > Thanks much, I think I'm all squared away with my user error. :-) > Though it might be nice to more gracefully handle (or at least report) > workspace overflows. Yeah, I know... We haven't made any specific plans, but it's at the top of my personal wish-list right now. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From nkinkade at creativecommons.org Wed Sep 16 19:40:00 2009 From: nkinkade at creativecommons.org (Nathan Kinkade) Date: Wed, 16 Sep 2009 15:40:00 -0400 Subject: Content-Range header problem In-Reply-To: <20090916174134.GD31173@kjeks.linpro.no> References: <1c5831080909161016m10a7e830j3b496d2bdb73e620@mail.gmail.com> <20090916172958.GC31173@kjeks.linpro.no> <1c5831080909161036l6e47039r37c9e9cb768b91c5@mail.gmail.com> <20090916174134.GD31173@kjeks.linpro.no> Message-ID: <1c5831080909161240q3c56cf32v32dde728978464c2@mail.gmail.com> On Wed, Sep 16, 2009 at 1:41 PM, Kristian Lyngstol wrote: > On Wed, Sep 16, 2009 at 01:36:00PM -0400, Nathan Kinkade wrote: >> Thanks. ?The resolution is simple, but I just wanted to be sure. > > Always nice to verify :) > > By the way, I just read the labs.cc.o post [1] from when you started using > Varnish and I'm curious as to how you're doing now with regards to these > issues? > > [1] http://labs.creativecommons.org/2008/04/03/varnish-cache-at-cc/ I haven't actually checked the issue with 600M+ files. It's not often that we're hosting anything that big. In any case, since that time there have been no complaints or problem related to any large file. Anyway, I suppose it's better to explicitly pipe directly to Apache any requests for ISO CD images. :-) Bazaar. It happens that we don't do anything with Bazaar really. It was a coincidence that at the time I was writing a plugin for Planet Venus (feed aggregator) and those devs use Bazaar so I had been trying to make my patches available to them. That issue just sort of went away and we haven't had any reason to use Bazaar since, though we have used git and subversion quite a lot through Varnish without any issue. The bbPress issue seems to have gone away as well. The Apache KeepAlive issue was fixed at some point and I turned it back on a long time ago. The log file size issue was resolved by simply upgrading all the machines to amd64 distribution of Debian. Doing that remotely and in-place was outlined here: http://labs.creativecommons.org/2008/07/15/32-to-64bit-remotely/ We love Varnish at CC. Thanks! Nathan From ebe at dmi.dk Thu Sep 17 07:10:21 2009 From: ebe at dmi.dk (Eivind Bengtsson) Date: Thu, 17 Sep 2009 09:10:21 +0200 Subject: travel: eurobsdcon2009 and vug1 In-Reply-To: <21357.1253010524@critter.freebsd.dk> References: <21357.1253010524@critter.freebsd.dk> Message-ID: <4AB1E0DD.4030709@dmi.dk> Hi yourself ;-) As the VUG meeting approaches a few questions arise :-) What is the exact address (Millbank tower - yes but which floor/company :-)) Which hotel are you guys staying at ? (can you recommend) Have a nice day .. /Eivind Poul-Henning Kamp wrote: > Hi guys, > > I will be semi-offline from tomorrown, while I attend EuroBSDcon2009 > in Cambridge. > > After EuroBSDcon2009 I will take a train to London to attend the > very first V(arnish) U(ser) G(roup) meeting monday and tuesday. > > (http://varnish.projects.linpro.no/wiki/200909UserGroupMeeting) > > Wednesday I'll hang out with an old friend and thursday I'll start > to dig myself out of the heaps which have accumulated. > > See you there... > > Poul-Henning > -- Eivind Bengtsson Systemadministrator - Cand.merc.(dat) Danmarks Meteorologiske Institut Lyngbyvej 100 2100 K?benhavn ? Direkte tlf. : 39157544 Email: ebe at dmi.dk echo 'This is not a pipe.' | cat -> /dev/tty From schmidt at ze.tum.de Thu Sep 17 08:15:44 2009 From: schmidt at ze.tum.de (Gerhard Schmidt) Date: Thu, 17 Sep 2009 10:15:44 +0200 Subject: feature request: Direct control of directors via controll connection Message-ID: <4AB1F030.8050502@ze.tum.de> Hi, we are trying to replace our squid reverse proxy with varnish. We are running a management software that allows us to dynamically start, stop and move server between hosted sites for maintaince and dynamic load distribution. In squid this is implemented via a redirector script. Right now I'm trying to implement this by dynamically writing new config-files with different director settings and loading them via the control connection. The feature i would like to see, is a direct control of the directors via control connection. e.g. director.list list all defined directors director.getbackends list all backends of a director director.addbackend add a backend to a director director.delbackend deletes a backend form a director director.setbackend changes backendsettings like .weight etc. Regards Gerhard -- ---------------------------------------------------------- Gerhard Schmidt | E-Mail: schmidt at ze.tum.de Technische Universit?t M?nchen | WWW & Online Services | Tel: +49 89 289-25270 | PGP-PublicKey Fax: +49 89 289-25257 | on request -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 542 bytes Desc: OpenPGP digital signature URL: From kristian at redpill-linpro.com Thu Sep 17 09:30:02 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Thu, 17 Sep 2009 11:30:02 +0200 Subject: travel: eurobsdcon2009 and vug1 In-Reply-To: <4AB1E0DD.4030709@dmi.dk> References: <21357.1253010524@critter.freebsd.dk> <4AB1E0DD.4030709@dmi.dk> Message-ID: <20090917093002.GI31173@kjeks.linpro.no> On Thu, Sep 17, 2009 at 09:10:21AM +0200, Eivind Bengtsson wrote: > What is the exact address (Millbank tower - yes but which floor/company :-)) > Which hotel are you guys staying at ? (can you recommend) I can't answer for everyone, but I'm staying at Best Western Corona Hotel. I believe Tollef is too (?). As for location: It's Canonical's offices, if memory serves me right, that's the 28th floor. But I might be mistaken on the floor number. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From l at lrowe.co.uk Thu Sep 17 11:37:31 2009 From: l at lrowe.co.uk (Laurence Rowe) Date: Thu, 17 Sep 2009 12:37:31 +0100 Subject: travel: eurobsdcon2009 and vug1 In-Reply-To: <4AB1E0DD.4030709@dmi.dk> References: <21357.1253010524@critter.freebsd.dk> <4AB1E0DD.4030709@dmi.dk> Message-ID: And what time do we kick off on Monday morning? Laurence 2009/9/17 Eivind Bengtsson : > Hi yourself ;-) > > As the VUG meeting approaches a few questions arise :-) > > What is the exact address (Millbank tower - yes but which floor/company :-)) > Which hotel are you guys staying at ? (can you recommend) > > Have a nice day .. > /Eivind > > Poul-Henning Kamp wrote: >> Hi guys, >> >> I will be semi-offline from tomorrown, while I attend EuroBSDcon2009 >> in Cambridge. >> >> After EuroBSDcon2009 I will take a train to London to attend the >> very first V(arnish) U(ser) G(roup) meeting monday and tuesday. >> >> ? ?(http://varnish.projects.linpro.no/wiki/200909UserGroupMeeting) >> >> Wednesday I'll hang out with an old friend and thursday I'll start >> to dig myself out of the heaps which have accumulated. >> >> See you there... >> >> Poul-Henning >> > > > -- > Eivind Bengtsson > Systemadministrator - Cand.merc.(dat) > Danmarks Meteorologiske Institut > Lyngbyvej 100 > 2100 K?benhavn ? > Direkte tlf. : 39157544 > Email: ebe at dmi.dk > echo 'This is not a pipe.' | cat -> /dev/tty > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > From v.bilek at 1art.cz Thu Sep 17 15:02:37 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Thu, 17 Sep 2009 17:02:37 +0200 Subject: trunk and obj.ttl obj.grace obj.http.set-cookie Message-ID: <4AB24F8D.8000601@1art.cz> Helo I have tried trunk releas and hit problem vith VLC which worked in 2.0.4... Variable 'obj.grace' not accessible in method 'vcl_fetch' Variable 'obj.http.set-cookie' not accessible in method 'vcl_fetch' Variable 'obj.ttl' not accessible in method 'vcl_fetch' did the syntax changed? Vaclav Bilek From kristian at redpill-linpro.com Thu Sep 17 15:10:54 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Thu, 17 Sep 2009 17:10:54 +0200 Subject: trunk and obj.ttl obj.grace obj.http.set-cookie In-Reply-To: <4AB24F8D.8000601@1art.cz> References: <4AB24F8D.8000601@1art.cz> Message-ID: <20090917151054.GO31173@kjeks.linpro.no> On Thu, Sep 17, 2009 at 05:02:37PM +0200, V?clav B?lek wrote: > I have tried trunk releas and hit problem vith VLC which worked in 2.0.4... > > Variable 'obj.grace' not accessible in method 'vcl_fetch' > Variable 'obj.http.set-cookie' not accessible in method 'vcl_fetch' > Variable 'obj.ttl' not accessible in method 'vcl_fetch' > > > did the syntax changed? Yup, in fetch, most of what used to be 'obj' is now available as 'beresp'. So: beresp.grace, beresp.http.*, beresp.ttl in your case. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From sveniu at opera.com Sat Sep 19 17:46:19 2009 From: sveniu at opera.com (Sven Ulland) Date: Sat, 19 Sep 2009 19:46:19 +0200 Subject: Dropped connections with tcp_tw_recycle=1 Message-ID: <4AB518EB.1050309@opera.com> I was recently debugging an issue where several clients experienced sporadic problems connecting to a website cached by varnish. Every now and then (say, something like every 20-50th TCP connection) would time out, or sometimes take a few SYNs before being accepted. Here's a typical example. It's observed at the spot marked 'X' in this network structure from the client network's perspective: [clients] -> [NAT gateway] -> [bridge firewall]X -> [Internet] 0.00 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283647429 TSER=0 WS=6 2.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283648179 TSER=0 WS=6 8.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283649679 TSER=0 WS=6 20.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283652679 TSER=0 WS=6 44.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283658679 TSER=0 WS=6 93.00 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283670679 TSER=0 WS=6 93.00 varni-extip natgw-extip TCP http > 4292 [SYN, ACK] TSV=2342207123 TSER=283670679 Note: The NAT gateway didn't do port translation here. Also, the timestamp values were not touched by the NAT gateway. The varnish node is behind LVS-TUN, but the LVS was not the culprit. After troubleshooting with the website owner, tcpdumping at various points on both sides, it was clear that the packets were reaching the varnish node, but except the last SYN, they were all dropped. This turned out to be because the varnish node had the tcp_tw_recycle sysctl enabled. Switching it off fixed the problem. The performance page on the varnish wiki features recommends Linux sysctl settings, including enabling tcp_tw_recycle, since april 2008. The recycle setting was removed from that page recently, but I would think there are a lot of installations around the world that have it enabled. I tried to figure out exactly how the recycling mechanism works, but the code is too complex to figure out without time or kernel network experience. Recycling was introduced by David Miller in 2.3.15, ref and e.g. . Do anyone have a good grasp on how it works, its connection to the RFC 1323 PAWS mechanism, and its claimed incompatibility with NAT (ref )? When observing the same issue previously (dropped SYNs), I ditched tw_recycle in favour of tcp_tw_reuse, which doesn't seem to cause any problems (this was on a normal Apache system). It too is severely underdocumented, so I was hoping to shed some light on them both, and the exact circumstances where they are suitable for use. Sven From nick at loman.net Sun Sep 20 06:27:39 2009 From: nick at loman.net (Nick Loman) Date: Sun, 20 Sep 2009 15:57:39 +0930 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <4AB518EB.1050309@opera.com> References: <4AB518EB.1050309@opera.com> Message-ID: <4AB5CB5B.1000603@loman.net> Hi Sven, I don't know the basis precise for it, but I can vouch for the fact that tcp_tw_recycle is incompatible with NAT on the server side. I would guess it is because the NAT gateway keeps a connection tracking list and is unhappy that the webserver is trying to reuse the same ip:port hash whilst it is registered in TIME_WAIT mode. There was a discussion of this previously: http://projects.linpro.no/pipermail/varnish-misc/2009-April/002764.html As you say tw_reuse works OK with NAT. Cheers, Nick. Sven Ulland wrote: > I was recently debugging an issue where several clients experienced > sporadic problems connecting to a website cached by varnish. Every now > and then (say, something like every 20-50th TCP connection) would time > out, or sometimes take a few SYNs before being accepted. > > Here's a typical example. It's observed at the spot marked 'X' in this > network structure from the client network's perspective: > > [clients] -> [NAT gateway] -> [bridge firewall]X -> [Internet] > > 0.00 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283647429 > TSER=0 WS=6 > 2.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283648179 > TSER=0 WS=6 > 8.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283649679 > TSER=0 WS=6 > 20.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283652679 TSER=0 > WS=6 > 44.99 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283658679 TSER=0 > WS=6 > 93.00 natgw-extip varni-extip TCP 4292 > http [SYN] TSV=283670679 TSER=0 > WS=6 > 93.00 varni-extip natgw-extip TCP http > 4292 [SYN, ACK] TSV=2342207123 > TSER=283670679 > > Note: The NAT gateway didn't do port translation here. Also, the > timestamp values were not touched by the NAT gateway. The varnish node > is behind LVS-TUN, but the LVS was not the culprit. > > After troubleshooting with the website owner, tcpdumping at various > points on both sides, it was clear that the packets were reaching the > varnish node, but except the last SYN, they were all dropped. This > turned out to be because the varnish node had the tcp_tw_recycle sysctl > enabled. Switching it off fixed the problem. > > The performance page on the varnish wiki features recommends Linux > sysctl settings, including enabling tcp_tw_recycle, since april 2008. > The recycle setting was removed from that page recently, but I would > think there are a lot of installations around the world that have it > enabled. > > I tried to figure out exactly how the recycling mechanism works, but the > code is too complex to figure out without time or kernel network > experience. Recycling was introduced by David Miller in 2.3.15, ref > > and e.g. . > Do anyone have a good grasp on how it works, its connection to the RFC > 1323 PAWS mechanism, and its claimed incompatibility with NAT (ref > )? > > When observing the same issue previously (dropped SYNs), I ditched > tw_recycle in favour of tcp_tw_reuse, which doesn't seem to cause any > problems (this was on a normal Apache system). It too is severely > underdocumented, so I was hoping to shed some light on them both, and > the exact circumstances where they are suitable for use. > > Sven > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ > From slink at schokola.de Sun Sep 20 13:20:34 2009 From: slink at schokola.de (Nils Goroll) Date: Sun, 20 Sep 2009 15:20:34 +0200 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <4AB5CB5B.1000603@loman.net> References: <4AB518EB.1050309@opera.com> <4AB5CB5B.1000603@loman.net> Message-ID: <4AB62C22.5000908@schokola.de> > tcp_tw_recycle is incompatible with NAT on the server side ... because it will enforce the verification of TCP time stamps. Unless all clients behind a NAT (actually PAD/masquerading) device use identical timestamps (within a certain range), most of them will send invalid TCP timestamps so SYNs will get dropped. This issue had also kept me busy for long hours and the basic insight is simple: Premature optimization is the root of all evil ;-), or, less philosophical, don't tune experimental parameters (the kernel docs are very clear about this!). Nils From kristian at redpill-linpro.com Sun Sep 20 15:36:45 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Sun, 20 Sep 2009 17:36:45 +0200 Subject: Varnish User Group Meeting 2009-09 In-Reply-To: <87bpms7x7d.fsf@qurzaw.linpro.no> References: <87bpms7x7d.fsf@qurzaw.linpro.no> Message-ID: <20090920153645.GB5979@kjeks> On Fri, Aug 07, 2009 at 12:08:38PM +0200, Tollef Fog Heen wrote: > On September 21st and 22nd, the first Varnish User Group meeting will be > held, in Canonical Ltd's offices in Millbank Tower, London, UK. > > Please see http://varnish.projects.linpro.no/wiki/200909UserGroupMeeting A little update, since we seem to have forgotten to mention it: We will begin at 09:00 London-time and keep going through the day. Canonical have been kind enough to lend us the meeting room we'll be using. See you there :) -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From phk at phk.freebsd.dk Sun Sep 20 16:34:05 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sun, 20 Sep 2009 16:34:05 +0000 Subject: Varnish User Group Meeting 2009-09 In-Reply-To: Your message of "Sun, 20 Sep 2009 17:36:45 +0200." <20090920153645.GB5979@kjeks> Message-ID: <5438.1253464445@critter.freebsd.dk> In message <20090920153645.GB5979 at kjeks>, Kristian Lyngstol writes: >We will begin at 09:00 London-time and keep going through the day. >Canonical have been kind enough to lend us the meeting room we'll be using. I will attempt to be there at 9, but I have still not figured out the details of getting from Cambridge to London out, working on that right now. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From l at lrowe.co.uk Sun Sep 20 17:09:15 2009 From: l at lrowe.co.uk (Laurence Rowe) Date: Sun, 20 Sep 2009 18:09:15 +0100 Subject: Varnish User Group Meeting 2009-09 In-Reply-To: <5438.1253464445@critter.freebsd.dk> References: <20090920153645.GB5979@kjeks> <5438.1253464445@critter.freebsd.dk> Message-ID: >From Cambridge, take the train to London Kings Cross (approximately 50 minutes, runs every half hour). From Kings Cross take the Victoria Line (Underground) to Pimlico. Millbank tower is then a 1km walk. London journey planner: http://www.tfl.gov.uk/ National rail journey planner: http://www.nationalrail.co.uk/ Laurence 2009/9/20 Poul-Henning Kamp : > In message <20090920153645.GB5979 at kjeks>, Kristian Lyngstol writes: > >>We will begin at 09:00 London-time and keep going through the day. >>Canonical have been kind enough to lend us the meeting room we'll be using. > > I will attempt to be there at 9, but I have still not figured out the > details of getting from Cambridge to London out, working on that right > now. > > -- > Poul-Henning Kamp ? ? ? | UNIX since Zilog Zeus 3.20 > phk at FreeBSD.ORG ? ? ? ? | TCP/IP since RFC 956 > FreeBSD committer ? ? ? | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > From phk at phk.freebsd.dk Sun Sep 20 17:28:02 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sun, 20 Sep 2009 17:28:02 +0000 Subject: Varnish User Group Meeting 2009-09 In-Reply-To: Your message of "Sun, 20 Sep 2009 18:09:15 +0100." Message-ID: <5628.1253467682@critter.freebsd.dk> In message , Lauren ce Rowe writes: >>From Cambridge, take the train to London Kings Cross (approximately 50 >minutes, runs every half hour). From Kings Cross take the Victoria >Line (Underground) to Pimlico. Millbank tower is then a 1km walk. Yes, I have reached the same conclusion. I think I'll aim for the 0715 from cambridge, that should have me at Pimlico around 0830. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From michael at dynamine.net Sun Sep 20 20:25:13 2009 From: michael at dynamine.net (Michael S. Fischer) Date: Sun, 20 Sep 2009 13:25:13 -0700 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <4AB62C22.5000908@schokola.de> References: <4AB518EB.1050309@opera.com> <4AB5CB5B.1000603@loman.net> <4AB62C22.5000908@schokola.de> Message-ID: <257F6D6A-4749-4E32-B48F-D4902831311D@dynamine.net> On Sep 20, 2009, at 6:20 AM, Nils Goroll wrote: >> tcp_tw_recycle is incompatible with NAT on the server side > > ... because it will enforce the verification of TCP time stamps. > Unless all > clients behind a NAT (actually PAD/masquerading) device use > identical timestamps > (within a certain range), most of them will send invalid TCP > timestamps so SYNs > will get dropped. Since you seem pretty knowledgeable on the subject, can you please explain the difference between tcp_tw_reuse and tcp_tw_recycle? Thanks, --Michael From ml at tinwong.com Sun Sep 20 22:29:51 2009 From: ml at tinwong.com (M L) Date: Mon, 21 Sep 2009 06:29:51 +0800 Subject: died signal=6 , panic and restart every few sec. to min. Message-ID: <113d871c0909201529p1a21ffbesbf410298fae8f1a4@mail.gmail.com> Plz help, anyone have idea howto solve this problem ? varnishd -a 0.0.0.0:80 -T 127.0.0.1:3500 -p client_http11=on -f vconf2 -s file,/usr/local/varnish/cache.bin,80G -h classic,500009 -p listen_depth=4096 -p obj_workspace=32768 -p sess_workspace=32768 -p send_timeout=327 I got this message from /var/log/messages Sep 20 21:26:36 x2 varnishd[21933]: Child (21934) died signal=6 Sep 20 21:26:36 x2 varnishd[21933]: Child (21934) Panic message: Assert error in VRT_IP_string(), cache_vrt.c line 693: Condition((p = WS_Alloc(sp->http->ws, len)) != 0) nlient = 211.74.185.119:2909, step = STP_RECV, handling = error, err_code = 503, err_reason = (null), ws = 0x2abeb5926078 { overflow id = "sess", {s,f,r,e} = cname = { "input", "Default", }, }, }, Sep 20 21:26:36 x2 varnishd[21933]: child (21952) Started Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said Closed fds: 4 5 8 9 11 12 Sep 20 21:26:36 x2 varnishd [21933] : Child (21952) said Child starts Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said managed to mmap 85899345920 bytes of 85899345920 Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said Ready Sep 20 21:28:10 x2 varnishd[21933]: Child (21952) died signal=6 Sep 20 21:28:10 x2 varnishd[21933]: Child (21952) Panic message: Assert error in WS_Release(), cache_ws.c line 170: Condition(bytes <= ws->e - ws->f) not true. thread = (10:32759, step = STP_RECV, handling = error, err_code = 503, err_reason = (null), ws = 0x2abeb5a65078 { id = "sess", {s,f,r,e} = {0x2abeb5a65808+32738,+32 "Default", }, }, }, Thanks alot T W -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.birdsong at gmail.com Sun Sep 20 22:33:49 2009 From: david.birdsong at gmail.com (David Birdsong) Date: Sun, 20 Sep 2009 15:33:49 -0700 Subject: died signal=6 , panic and restart every few sec. to min. In-Reply-To: <113d871c0909201529p1a21ffbesbf410298fae8f1a4@mail.gmail.com> References: <113d871c0909201529p1a21ffbesbf410298fae8f1a4@mail.gmail.com> Message-ID: On Sun, Sep 20, 2009 at 3:29 PM, M L wrote: > Plz help, anyone have idea howto solve this problem ? > > varnishd -a 0.0.0.0:80 -T 127.0.0.1:3500 -p client_http11=on -f vconf2 -s > file,/usr/local/varnish/cache.bin,80G -h classic,500009 -p listen_depth=4096 > -p obj_workspace=32768 -p sess_workspace=32768 -p send_timeout=327 > > I got this message from /var/log/messages > > Sep 20 21:26:36 x2 varnishd[21933]: Child (21934) died signal=6 Sep 20 > 21:26:36 x2 varnishd[21933]: Child (21934) Panic message: Assert error in > VRT_IP_string(), cache_vrt.c line 693: Condition((p = WS_Alloc(sp->http->ws, > len)) != 0) nlient = 211.74.185.119:2909, step = STP_RECV, handling = error, > err_code = 503, err_reason = (null), ws = 0x2abeb5926078 { overflow id = > "sess", {s,f,r,e} = cname = { "input", "Default", }, }, }, > > Sep 20 21:26:36 x2 varnishd[21933]: child (21952) Started Sep 20 21:26:36 x2 > varnishd[21933]: Child (21952) said Closed fds: 4 5 8 9 11 12 Sep 20 > 21:26:36 x2 varnishd[21933]: Child (21952) said Child starts Sep 20 21:26:36 > x2 varnishd[21933]: Child (21952) said managed to mmap 85899345920 bytes of > 85899345920 Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said Ready Sep > 20 21:28:10 x2 varnishd[21933]: Child (21952) died signal=6 Sep 20 21:28:10 > x2 varnishd[21933]: Child (21952) Panic message: Assert error in > WS_Release(), cache_ws.c line 170: Condition(bytes <= ws->e - ws->f) not > true. thread = (10:32759, step = STP_RECV, handling = error, err_code = 503, > err_reason = (null), ws = 0x2abeb5a65078 { id = "sess", {s,f,r,e} = > {0x2abeb5a65808+32738,+32 "Default", }, }, }, what about your vcl file? are you modifying the object in vcl_hit at all? > > Thanks alot > > T W > > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > > From ml at tinwong.com Sun Sep 20 22:45:56 2009 From: ml at tinwong.com (M L) Date: Mon, 21 Sep 2009 06:45:56 +0800 Subject: died signal=6 , panic and restart every few sec. to min. In-Reply-To: References: <113d871c0909201529p1a21ffbesbf410298fae8f1a4@mail.gmail.com> Message-ID: <113d871c0909201545l1541510dt6d1c50cbec3cfdb@mail.gmail.com> Hi David Thanks for reply i never modifying vcl_hit my vcl backend default { .host = "10.0.0.5"; .port = "80"; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; } backend srv1 { .host = "10.0.0.5"; .port = "80"; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; } backend srv2 { .host = "10.0.0.5"; .port = "80"; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; } acl purge { "localhost"; "127.0.0.1"; } #recv sub vcl_recv { if (req.http.host ~ "www.foobar.com") { set req.http.host = "www.foobar.com"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = allhabit2; } }elseif ( req.http.host ~ "www.zoobar.com") { set req.http.host = "www.zoobar.com"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "www.yoobar.com") { set req.http.host = "www.yoobar.com"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "218.242.39.202") { set req.http.host = "118.142.39.202"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "218.242.39.203") { set req.http.host = "118.142.39.203"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "204.186.59.41") { set req.http.host = "204.186.59.41"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "204.126.59.45") { set req.http.host = "204.126.59.45"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }else{ error 401 "Bad Domain"; } #set req.grace = 30s; # Add a unique header containing the client address remove req.http.X-Forwarded-For; set req.http.X-Forwarded-For = client.ip; # [...] if (req.request == "PURGE") { if(!client.ip ~ purge) { error 405 "Not Allowed"; } lookup;} #if (req.request != "GET" && req.request != "HEAD") { # pipe; # } #if (req.request == "POST") { # pass; # } if (req.http.Expect) { pipe; } if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ pipe; } if (req.request != "GET" && req.request != "HEAD") { /* We only deal with GET and HEAD by default */ pass; } if (req.http.Cache-Control ~ "no-cache") { pass; } if (req.http.Authenticate) { pass; } #if (req.http.Cookie) { # pass; # } if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset req.http.cookie; lookup; # unset req.http.authenticate; } if (req.http.Accept-Encoding) { if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { # No point in compressing these remove req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { # unkown algorithm remove req.http.Accept-Encoding; } } } #end recv sub vcl_hash { set req.hash += req.url; set req.hash += req.http.host; #set req.hash += req.http.cookie; #set req.hash += server.ip; hash; } #end hash # sub vcl_hash { # set req.hash += req.url; # if (req.http.host) { # set req.hash += req.http.host; # } else { # set req.hash += server.ip; # } # hash; # } #if (req.http.Accept-Encoding ~ "gzip") { #set req.hash += "gzip"; #} #else if (req.http.Accept-Encoding ~ "deflate") { #set req.hash += "deflate"; #} #hash; #} #end hash #sub vcl_hash { # set req.hash += req.url; # set req.hash += req.http.host; # if (req.http.Accept-Encoding ~ "gzip") { # set req.hash += "gzip"; # } # else if (req.http.Accept-Encoding ~ "deflate") { # set req.hash += "deflate"; # } #} # strip the cookie before the image is inserted into cache. sub vcl_fetch { #if (obj.status != 200 && obj.status != 302) { #restart; #} if(obj.http.Set-Cookie){ pass; } if(obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private"){ pass; } # set obj.grace = 30s; if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset obj.http.set-cookie; set obj.ttl = 1w; } # if (req.request == "GET" && req.url ~ "\.(txt|js)$") { # set obj.ttl = 1d; # } else { # set obj.ttl = 1w; # } if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset obj.http.expires; set obj.http.cache-control = "max-age=315360000, public"; set obj.ttl = 1w; set obj.http.magicmarker = "1"; } Thx TW # if (obj.cacheable) { # /* Remove Expires from backend, it's not long enough */ # unset obj.http.expires; # /* Set the clients TTL on this object */ # set obj.http.cache-control = "max-age=315360000, public"; # /* Set how long Varnish will keep it */ # set obj.ttl = 1w; # /* marker for vcl_deliver to reset Age: */ # set obj.http.magicmarker = "1"; # } } #fetch end sub vcl_deliver { if (resp.http.magicmarker) { /* Remove the magic marker */ unset resp.http.magicmarker; /* By definition we have a fresh object */ set resp.http.age = "0"; if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } } #deliver end sub vcl_pipe { # http://varnish.projects.linpro.no/ticket/451 # This forces every pipe request to be the first one. set bereq.http.connection = "close"; } #pipe end sub vcl_error { if (obj.status == 503) { restart; } } #error end On Mon, Sep 21, 2009 at 6:33 AM, David Birdsong wrote: > On Sun, Sep 20, 2009 at 3:29 PM, M L wrote: > > Plz help, anyone have idea howto solve this problem ? > > > > varnishd -a 0.0.0.0:80 -T 127.0.0.1:3500 -p client_http11=on -f vconf2 > -s > > file,/usr/local/varnish/cache.bin,80G -h classic,500009 -p > listen_depth=4096 > > -p obj_workspace=32768 -p sess_workspace=32768 -p send_timeout=327 > > > > I got this message from /var/log/messages > > > > Sep 20 21:26:36 x2 varnishd[21933]: Child (21934) died signal=6 Sep 20 > > 21:26:36 x2 varnishd[21933]: Child (21934) Panic message: Assert error in > > VRT_IP_string(), cache_vrt.c line 693: Condition((p = > WS_Alloc(sp->http->ws, > > len)) != 0) nlient = 211.74.185.119:2909, step = STP_RECV, handling = > error, > > err_code = 503, err_reason = (null), ws = 0x2abeb5926078 { overflow id = > > "sess", {s,f,r,e} = cname = { "input", "Default", }, }, }, > > > > Sep 20 21:26:36 x2 varnishd[21933]: child (21952) Started Sep 20 21:26:36 > x2 > > varnishd[21933]: Child (21952) said Closed fds: 4 5 8 9 11 12 Sep 20 > > 21:26:36 x2 varnishd[21933]: Child (21952) said Child starts Sep 20 > 21:26:36 > > x2 varnishd[21933]: Child (21952) said managed to mmap 85899345920 bytes > of > > 85899345920 Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said Ready > Sep > > 20 21:28:10 x2 varnishd[21933]: Child (21952) died signal=6 Sep 20 > 21:28:10 > > x2 varnishd[21933]: Child (21952) Panic message: Assert error in > > WS_Release(), cache_ws.c line 170: Condition(bytes <= ws->e - ws->f) not > > true. thread = (10:32759, step = STP_RECV, handling = error, err_code = > 503, > > err_reason = (null), ws = 0x2abeb5a65078 { id = "sess", {s,f,r,e} = > > {0x2abeb5a65808+32738,+32 "Default", }, }, }, > > what about your vcl file? > > are you modifying the object in vcl_hit at all? > > > > > Thanks alot > > > > T W > > > > _______________________________________________ > > varnish-misc mailing list > > varnish-misc at projects.linpro.no > > http://projects.linpro.no/mailman/listinfo/varnish-misc > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgnet.dev+varnish at gmail.com Mon Sep 21 02:57:17 2009 From: pgnet.dev+varnish at gmail.com (PGNet Dev) Date: Sun, 20 Sep 2009 19:57:17 -0700 Subject: httpd asking for AUTH _twice_ when behind Varnish proxy ? works as expected without Varnish ... Message-ID: <94f2e81e0909201957r62ef4008q25960358560afaa5@mail.gmail.com> hi, i've just done a 1st migration from apache2+mod_ssl to pound + varnish + apache2 using, pound -V Version 2.4.5 varnishd -V varnishd (varnish-2.0.4) httpd2 -V Server version: Apache/2.2.13 (Linux/SUSE) in my original apache/ssl config, i've httpd DIGEST Auth set up (atm) on the web root. it works as expected. now that i've switched to the pound/varnish/apache2 setup, Auth still works -- but makes the request twice! if i visit https://www.mysite.com i get an initial request for AUTH at my defined realm :443, then after entering credentials there, the page paints -- and i get a second http AUTH dialog for the _same_ realm, but at :8081. switch back to a direct connect, and just the one AUTH dialog ... my relevant configs are below ... any ideas as to what's causing the double-AUTH request, and how to fix it would be much appreciated! thanks! /etc/pound.cfg ListenHTTP Address xx.xx.xx.xx Port 80 Service Redirect "https://www.mysite.com" End End ListenHTTPS Address xx.xx.xx.xx Port 443 Cert "/crypt/ssl/ssl.crt/combined.pem" Ciphers "AES256-SHA:AES128-SHA" NoHTTPS11 2 Service BackEnd Address 127.0.0.1 Port 8080 End End End /etc/sysconfig/varnish VARNISHD_PARAMS="-f /etc/varnish/vcl.conf -a 127.0.0.1:8080 -T 127.0.0.1:6082 -s file,/var/cache/varnish/varnish.bin,100M -n test" /etc/varnish/vcl.conf # cp of /etc/varnish/default.vcl, except: backend default { .host = "xx.xx.xx.xx"; .port = "8081"; } /etc/apache2/vhosts.d/www.mysite.com ... ... DocumentRoot /svr/www/mysite ... Options +ExecCGI +FollowSymLinks +Indexes DirectoryIndex index.html index.php AuthType Digest AuthName "AUTH mysite" AuthDigestProvider file AuthUserFile /crypt/wwwauth/.passwords.md5 AuthDigestDomain / require valid-user AddHandler fcgid-script .php FCGIWrapper "/usr/bin/php-cgi5 -d apc.shm_size=25 -c /etc/php5/fastcgi/" .php ... From v.bilek at 1art.cz Mon Sep 21 06:43:07 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Mon, 21 Sep 2009 08:43:07 +0200 Subject: trunk and obj.ttl obj.grace obj.http.set-cookie In-Reply-To: <20090917151054.GO31173@kjeks.linpro.no> References: <4AB24F8D.8000601@1art.cz> <20090917151054.GO31173@kjeks.linpro.no> Message-ID: <4AB7207B.70408@1art.cz> Thanks allot... Is there anything else what changed in vcl config in trunk? Kristian Lyngstol napsal(a): > On Thu, Sep 17, 2009 at 05:02:37PM +0200, V?clav B?lek wrote: >> I have tried trunk releas and hit problem vith VLC which worked in 2.0.4... >> >> Variable 'obj.grace' not accessible in method 'vcl_fetch' >> Variable 'obj.http.set-cookie' not accessible in method 'vcl_fetch' >> Variable 'obj.ttl' not accessible in method 'vcl_fetch' >> >> >> did the syntax changed? > > Yup, in fetch, most of what used to be 'obj' is now available as 'beresp'. > > So: beresp.grace, beresp.http.*, beresp.ttl in your case. > From slink at schokola.de Mon Sep 21 08:38:24 2009 From: slink at schokola.de (Nils Goroll) Date: Mon, 21 Sep 2009 10:38:24 +0200 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <257F6D6A-4749-4E32-B48F-D4902831311D@dynamine.net> References: <4AB518EB.1050309@opera.com> <4AB5CB5B.1000603@loman.net> <4AB62C22.5000908@schokola.de> <257F6D6A-4749-4E32-B48F-D4902831311D@dynamine.net> Message-ID: <4AB73B80.7000004@schokola.de> Hi Michael and all, >>> tcp_tw_recycle is incompatible with NAT on the server side >> >> ... because it will enforce the verification of TCP time stamps. >> Unless all >> clients behind a NAT (actually PAD/masquerading) device use identical >> timestamps >> (within a certain range), most of them will send invalid TCP >> timestamps so SYNs >> will get dropped. > > Since you seem pretty knowledgeable on the subject, can you please > explain the difference between tcp_tw_reuse and tcp_tw_recycle? I think I have understood the reason why tcp_tw_recycle does not work with NAT connections, but I must say I haven't fully devoured the linux TCP implementation to explain to you the design decisions regarding these two options. The very basic idea is to re-use tcp connections in TIME_WAIT state, saving the overhead of destroying and recreating TCP state. I remember that at one point I had thought to have understood the difference, but I can't recall at the moment. In short: I can tell you that you *must not* use tcp_tw_recycle for any machine talking to machines behind masquerading firewalls (iow, only use it inside isolated networks). But I cannot tell you what exactly it is supposed to do and what the difference is to tcp_tw_reuse. If anyone finds out, please let me know as well! Nils From ml at tinwong.com Mon Sep 21 10:18:07 2009 From: ml at tinwong.com (M L) Date: Mon, 21 Sep 2009 18:18:07 +0800 Subject: Plz Help Varnish died signal=6 , keep panic and restart every few second - minute Message-ID: <113d871c0909210318t79e5b515ke7853bde65895efd@mail.gmail.com> Hi list , i have googled few day , have no luck to find any clue for my problem anyone have idea solve this problem thanks alot Box config : centos 5.3 kernel 2.6.18-8.1.14.el5 64bit / varnish 2.0.4 varnishd -a 0.0.0.0:80 -T 127.0.0.1:3500 -p client_http11=on -f vconf2 -s file,/usr/local/varnish/cache.bin,80G -h classic,500009 -p listen_depth=4096 -p obj_workspace=32768 -p sess_workspace=32768 -p send_timeout=327 from /var/log/message Sep 20 21:26:36 x2 varnishd[21933]: Child (21934) died signal=6 Sep 20 21:26:36 x2 varnishd[21933]: Child (21934) Panic message: Assert error in VRT_IP_string(), cache_vrt.c line 693: Condition((p = WS_Alloc(sp->http->ws, len)) != 0) nlient = 211.74.185.119:2909, step = STP_RECV, handling = error, err_code = 503, err_reason = (null), ws = 0x2abeb5926078 { overflow id = "sess", {s,f,r,e} = cname = { "input", "Default", }, }, }, Sep 20 21:26:36 x2 varnishd[21933]: child (21952) Started Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said Closed fds: 4 5 8 9 11 12 Sep 20 21:26:36 x2 varnishd [21933] : Child (21952) said Child starts Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said managed to mmap 85899345920 bytes of 85899345920 Sep 20 21:26:36 x2 varnishd[21933]: Child (21952) said Ready Sep 20 21:28:10 x2 varnishd[21933]: Child (21952) died signal=6 Sep 20 21:28:10 x2 varnishd[21933]: Child (21952) Panic message: Assert error in WS_Release(), cache_ws.c line 170: Condition(bytes <= ws->e - ws->f) not true. thread = (10:32759, step = STP_RECV, handling = error, err_code = 503, err_reason = (null), ws = 0x2abeb5a65078 { id = "sess", {s,f,r,e} = {0x2abeb5a65808+32738,+32 "Default", }, }, }, VCL config backend default { .host = "10.0.0.5"; .port = "80"; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; } backend srv1 { .host = "10.0.0.5"; .port = "80"; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; } backend srv2 { .host = "10.0.0.5"; .port = "80"; .connect_timeout = 1s; .first_byte_timeout = 5s; .between_bytes_timeout = 2s; } acl purge { "localhost"; "127.0.0.1"; } #recv sub vcl_recv { if (req.http.host ~ "www.foobar.com") { set req.http.host = "www.foobar.com"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = allhabit2; } }elseif ( req.http.host ~ "www.zoobar.com") { set req.http.host = "www.zoobar.com"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "www.yoobar.com") { set req.http.host = "www.yoobar.com"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "218.242.39.202") { set req.http.host = "118.142.39.202"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "218.242.39.203") { set req.http.host = "118.142.39.203"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "204.186.59.41") { set req.http.host = "204.186.59.41"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }elseif ( req.http.host ~ "204.126.59.45") { set req.http.host = "204.126.59.45"; if (req.restarts == 0) { set req.backend = srv1; } else if (req.restarts == 1) { set req.backend = srv2; } }else{ error 401 "Bad Domain"; } #set req.grace = 30s; # Add a unique header containing the client address remove req.http.X-Forwarded-For; set req.http.X-Forwarded-For = client.ip; # [...] if (req.request == "PURGE") { if(!client.ip ~ purge) { error 405 "Not Allowed"; } lookup;} #if (req.request != "GET" && req.request != "HEAD") { # pipe; # } #if (req.request == "POST") { # pass; # } if (req.http.Expect) { pipe; } if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ pipe; } if (req.request != "GET" && req.request != "HEAD") { /* We only deal with GET and HEAD by default */ pass; } if (req.http.Cache-Control ~ "no-cache") { pass; } if (req.http.Authenticate) { pass; } #if (req.http.Cookie) { # pass; # } if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset req.http.cookie; lookup; # unset req.http.authenticate; } if (req.http.Accept-Encoding) { if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { # No point in compressing these remove req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { # unkown algorithm remove req.http.Accept-Encoding; } } } #end recv sub vcl_hash { set req.hash += req.url; set req.hash += req.http.host; #set req.hash += req.http.cookie; #set req.hash += server.ip; hash; } #end hash # sub vcl_hash { # set req.hash += req.url; # if (req.http.host) { # set req.hash += req.http.host; # } else { # set req.hash += server.ip; # } # hash; # } #if (req.http.Accept-Encoding ~ "gzip") { #set req.hash += "gzip"; #} #else if (req.http.Accept-Encoding ~ "deflate") { #set req.hash += "deflate"; #} #hash; #} #end hash #sub vcl_hash { # set req.hash += req.url; # set req.hash += req.http.host; # if (req.http.Accept-Encoding ~ "gzip") { # set req.hash += "gzip"; # } # else if (req.http.Accept-Encoding ~ "deflate") { # set req.hash += "deflate"; # } #} # strip the cookie before the image is inserted into cache. sub vcl_fetch { #if (obj.status != 200 && obj.status != 302) { #restart; #} if(obj.http.Set-Cookie){ pass; } if(obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private"){ pass; } # set obj.grace = 30s; if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset obj.http.set-cookie; set obj.ttl = 1w; } # if (req.request == "GET" && req.url ~ "\.(txt|js)$") { # set obj.ttl = 1d; # } else { # set obj.ttl = 1w; # } if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset obj.http.expires; set obj.http.cache-control = "max-age=315360000, public"; set obj.ttl = 1w; set obj.http.magicmarker = "1"; } # if (obj.cacheable) { # /* Remove Expires from backend, it's not long enough */ # unset obj.http.expires; # /* Set the clients TTL on this object */ # set obj.http.cache-control = "max-age=315360000, public"; # /* Set how long Varnish will keep it */ # set obj.ttl = 1w; # /* marker for vcl_deliver to reset Age: */ # set obj.http.magicmarker = "1"; # } } #fetch end sub vcl_deliver { if (resp.http.magicmarker) { /* Remove the magic marker */ unset resp.http.magicmarker; /* By definition we have a fresh object */ set resp.http.age = "0"; if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } } #deliver end sub vcl_pipe { # http://varnish.projects.linpro.no/ticket/451 # This forces every pipe request to be the first one. set bereq.http.connection = "close"; } #pipe end sub vcl_error { if (obj.status == 503) { restart; } } #error end Thanks TW -------------- next part -------------- An HTML attachment was scrubbed... URL: From v.bilek at 1art.cz Mon Sep 21 13:46:16 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Mon, 21 Sep 2009 15:46:16 +0200 Subject: varnish Connection: close and IE6 In-Reply-To: <4AAE4E30.5000607@1art.cz> References: <4AAA5969.5010002@1art.cz> <4AAE4E30.5000607@1art.cz> Message-ID: <4AB783A8.4070009@1art.cz> > > > With knowledge of that we dont know exactly how to patch for disabling > keepalive we tried nasty hack: > > diff bin/varnishdcache_acceptor_epoll.c > bin/varnishdcache_acceptor_epoll.c.new > 114c114 > < deadline = TIM_real() - params->sess_timeout; > --- >> // deadline = TIM_real() - params->sess_timeout; > 117c117 > < if (sp->t_open > deadline) > --- >> // if (sp->t_open > deadline) > > > it worked in testing enviroment but in real trafic it was even worse > (IE6 hanging for long time). as we try to disble client side keepalive we made change in cache_acceptor_epoll.c by disabling sesion timeout chceck and shorting this timeout: if ((tmp_timeout - last_timeout) > 60) to: if ((tmp_timeout - last_timeout) > 0.1) and it looks like it solved our problem ... is there any problem we should be expecting on high load after this modification? From joe at joetify.com Mon Sep 21 17:29:52 2009 From: joe at joetify.com (Joe Williams) Date: Mon, 21 Sep 2009 10:29:52 -0700 Subject: if-none-match status In-Reply-To: <20090913191412.4a30b791@der-dieb> References: <20090910120356.27ae58af@der-dieb> <20090913191412.4a30b791@der-dieb> Message-ID: <20090921102952.5277153a@der-dieb> Anything? -Joe On Sun, 13 Sep 2009 19:14:12 -0700 Joe Williams wrote: > > Anyone tried this out yet? > > > Devs, > > Have any info on how to use it? > > > -Joe > > > On Thu, 10 Sep 2009 12:03:56 -0700 > Joe Williams wrote: > > > > > I am just curious in the status of if-none-match support from the > > commit logs and this mailing list email > > (http://www.mail-archive.com/varnish-misc at projects.linpro.no/msg02738.html) > > it looks like its been committed. As was asked in the > > aforementioned email are there any examples of how to > > use/configure/etc this new feature? > > > > Thanks. > > > > -Joe > > > > > > -- Name: Joseph A. Williams Email: joe at joetify.com Blog: http://www.joeandmotorboat.com/ From stockrt at gmail.com Mon Sep 21 17:44:27 2009 From: stockrt at gmail.com (=?ISO-8859-1?Q?Rog=E9rio_Schneider?=) Date: Mon, 21 Sep 2009 14:44:27 -0300 Subject: varnish Connection: close and IE6 In-Reply-To: <4AB783A8.4070009@1art.cz> References: <4AAA5969.5010002@1art.cz> <4AAE4E30.5000607@1art.cz> <4AB783A8.4070009@1art.cz> Message-ID: <100657c90909211044o62c2d04bn79a46df64713dd35@mail.gmail.com> V?clav, you should read this two tickets: http://varnish.projects.linpro.no/ticket/492 http://varnish.projects.linpro.no/ticket/235 The 492 may be a better fix for your problem, better than "if ((tmp_timeout - last_timeout) > 0.1)". There we have patches for 2.0.4 too. The only problem I can see with the trick you made, is that you are cleaning sockets too frequently, and this can make your varnish somewhat slow on higher loads. Good luck, Rog?rio Schneider 2009/9/21 V?clav B?lek : > >> >> >> With knowledge of that we dont know exactly how to patch for disabling >> keepalive we tried nasty hack: >> >> diff bin/varnishdcache_acceptor_epoll.c >> bin/varnishdcache_acceptor_epoll.c.new >> 114c114 >> < ? ? ? ? ? ? ? deadline = TIM_real() - params->sess_timeout; >> --- >>> // ? ? ? ? ? ?deadline = TIM_real() - params->sess_timeout; >> 117c117 >> < ? ? ? ? ? ? ? ? ? ? ? if (sp->t_open > deadline) >> --- >>> // ? ? ? ? ? ? ? ? ? ?if (sp->t_open > deadline) >> >> >> it worked in testing enviroment but in real trafic it was even worse >> (IE6 hanging for long time). > > as we try to disble client side keepalive we made change in > cache_acceptor_epoll.c by disabling sesion timeout chceck and shorting > this timeout: > if ((tmp_timeout - last_timeout) > 60) > to: > if ((tmp_timeout - last_timeout) > 0.1) > > and it looks like it solved our problem ... > > is there any problem we should be expecting on high load after this > modification? > > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc > -- Rog?rio Schneider MSN: stockrt at hotmail.com GTalk: stockrt at gmail.com TerraVoip: stockrt Skype: stockrt http://stockrt.github.com From sveniu at opera.com Mon Sep 21 19:06:10 2009 From: sveniu at opera.com (Sven Ulland) Date: Mon, 21 Sep 2009 21:06:10 +0200 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <4AB62C22.5000908@schokola.de> References: <4AB518EB.1050309@opera.com> <4AB5CB5B.1000603@loman.net> <4AB62C22.5000908@schokola.de> Message-ID: <4AB7CEA2.2060906@opera.com> Nils Goroll wrote: >> tcp_tw_recycle is incompatible with NAT on the server side > > ... because it will enforce the verification of TCP time stamps. > Unless all clients behind a NAT (actually PAD/masquerading) device > use identical timestamps (within a certain range), most of them will > send invalid TCP timestamps so SYNs will get dropped. I've been digging a bit more. The drops happen because PAWS thinks they are "old duplicate segments from earlier incarnations of the connection". A new incoming connection request will eventually call tcp_ipv4.c:tcp_v4_conn_request(), where we find the following code that ends up dropping some SYNs if recycling is enabled: if (tmp_opt.saw_tstamp && tcp_death_row.sysctl_tw_recycle && (dst = inet_csk_route_req(sk, req)) != NULL && (peer = rt_get_peer((struct rtable *)dst)) != NULL && peer->v4daddr == saddr) { if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL && (s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED); goto drop_and_release; } } The outer conditional verifies that the incoming SYN has a timestamp, that tcp_tw_recycle is enabled, and that the origin exists in our peer cache. Note that it only checks the IP of the origin. Doesn't it make sense to also match on port? The inner conditional tests two things: First, that the peer's last seen timestamp has not expired (it expires in 60 ticks). Next, that the new incoming timestamp [req->ts_recent] is at least one tick [TCP_PAWS_WINDOW] *before* the last seen timestamp from the peer [peer->tcp_ts] (i.e. that it's an old duplicate). (Also, you can verify if you get drops by checking the PAWSPassive value in /proc/net/netstat.) Here's the origin of the code, appx B.2 (b) in VJ et al's RFC 1323: """ An additional mechanism could be added to the TCP, a per-host cache of the last timestamp received from any connection [peer->tcp_ts]. This value [peer->tcp_ts] could then be used in the PAWS mechanism to reject old duplicate segments [req] from earlier incarnations of the connection, if the timestamp clock can be guaranteed to have ticked at least once [TCP_PAWS_WINDOW] since the old connection was open. """ -- http://tools.ietf.org/html/rfc1323#page-29 I'm wondering why the source port is not taken into consideration here. A "previous incarnation of the connection" would surely have the same source port? So if a new incoming connection has a different source port, it should not be a candidate for rejection. tcp_tw_recycle and _reuse's actual reuse of tw buckets seems to happen when setting up outbound connections. I haven't looked at those yet. Sven From slink at schokola.de Tue Sep 22 07:19:33 2009 From: slink at schokola.de (Nils Goroll) Date: Tue, 22 Sep 2009 09:19:33 +0200 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <4AB7CEA2.2060906@opera.com> References: <4AB518EB.1050309@opera.com> <4AB5CB5B.1000603@loman.net> <4AB62C22.5000908@schokola.de> <4AB7CEA2.2060906@opera.com> Message-ID: <4AB87A85.4030605@schokola.de> Sven, >>> tcp_tw_recycle is incompatible with NAT on the server side >> >> ... because it will enforce the verification of TCP time stamps. >> Unless all clients behind a NAT (actually PAD/masquerading) device >> use identical timestamps (within a certain range), most of them will >> send invalid TCP timestamps so SYNs will get dropped. > > I've been digging a bit more. [...] Thank you very much for your writeup regarding tcp_tw_recycle and timestamp verification. This is the part which I think I had already understood ... > tcp_tw_recycle and _reuse's actual reuse of tw buckets seems to happen > when setting up outbound connections. I haven't looked at those yet. ... but this is the part which I don't have a good understanding of yet. > The outer conditional verifies that the incoming SYN has a timestamp, > that tcp_tw_recycle is enabled, and that the origin exists in our > peer cache. Note that it only checks the IP of the origin. Doesn't it > make sense to also match on port? My understanding is that the fact that the connection is in TIME_WAIT implies that the source port should not be reused at this time. Nils From sveniu at opera.com Tue Sep 22 07:51:59 2009 From: sveniu at opera.com (Sven Ulland) Date: Tue, 22 Sep 2009 09:51:59 +0200 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <4AB87A85.4030605@schokola.de> References: <4AB518EB.1050309@opera.com> <4AB5CB5B.1000603@loman.net> <4AB62C22.5000908@schokola.de> <4AB7CEA2.2060906@opera.com> <4AB87A85.4030605@schokola.de> Message-ID: <4AB8821F.6000105@opera.com> Nils Goroll wrote: >> The outer conditional verifies that the incoming SYN has >> a timestamp, that tcp_tw_recycle is enabled, and that the origin >> exists in our peer cache. Note that it only checks the IP of the >> origin. Doesn't it make sense to also match on port? > > My understanding is that the fact that the connection is in > TIME_WAIT implies that the source port should not be reused at this > time. Right, you're saying that the srcaddr+srcport pair of a connection in TIME_WAIT should not be reused under this scheme (i.e. the SYN can be dropped), and I agree. Then I don't understand why a new connection originating from a *different* source port (although from the same source IP) is also considered a dupe and dropped. SYN retries don't change/increase the source port afterall. Is this a mistake in the TCP code, or maybe in my understanding of the issue? Sven From v.bilek at 1art.cz Tue Sep 22 08:44:42 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Tue, 22 Sep 2009 10:44:42 +0200 Subject: PIPE asserts Message-ID: <4AB88E7A.9050100@1art.cz> Hello On high load we are geting asserts (and varnish restarts) like this: varnishd[23515]: Child (7569) Panic message: Assert error in Tcheck(), cache.h line 648:#012 Condition((t.e) != 0) not true. thread = (cache-worker)sp = 0x7f76c5875008 {#012 fd = 611, id = 611, xid = 778413112,#012 client = 62.141.2.8:56778,#012 step = STP_PIPE,#012 handling = pipe,#012 err_code = 400, err_reason = (null),#012 ws = 0x7f76c5875078 { #012 id = "sess",#012 {s,f,r,e} = {0x7f76c5875808,,+40,(nil),+16384},#012 },#012 worker = 0x7f7694353be0 {#012 },#012 vcl = {#012 srcname = {#012 "input",#012 "Default",#012 },#012 },#012},#012 We do not PIPE anything explicitly ... after a day of run we have tens of passes in hundreds millions of requests. Is there any way how to absolutely disable piping of requests? What kind of requests is piped? Is there anything I can do for debuging this? Vaclav Bilek From slink at schokola.de Tue Sep 22 12:33:50 2009 From: slink at schokola.de (Nils Goroll) Date: Tue, 22 Sep 2009 14:33:50 +0200 Subject: Dropped connections with tcp_tw_recycle=1 In-Reply-To: <4AB8821F.6000105@opera.com> References: <4AB518EB.1050309@opera.com> <4AB5CB5B.1000603@loman.net> <4AB62C22.5000908@schokola.de> <4AB7CEA2.2060906@opera.com> <4AB87A85.4030605@schokola.de> <4AB8821F.6000105@opera.com> Message-ID: <4AB8C42E.9040807@schokola.de> Sven, > Right, you're saying that the srcaddr+srcport pair of a connection in > TIME_WAIT should not be reused under this scheme (i.e. the SYN can be > dropped), and I agree. Then I don't understand why a new connection > originating from a *different* source port (although from the same > source IP) is also considered a dupe and dropped. Are you referring to this code? if (tmp_opt.saw_tstamp && tcp_death_row.sysctl_tw_recycle && (dst = inet_csk_route_req(sk, req)) != NULL && (peer = rt_get_peer((struct rtable *)dst)) != NULL && peer->v4daddr == saddr) { if (xtime.tv_sec < peer->tcp_ts_stamp + TCP_PAWS_MSL && (s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW) { NET_INC_STATS_BH(LINUX_MIB_PAWSPASSIVEREJECTED); dst_release(dst); goto drop_and_free; } } Again, I cannot tell you what the intention of the implementors might have been, but my interpretation is that they wanted to implement time stamp checking as a (from the security standpoint positive) side effect of tw_recycle. I haven't thought about how (or if) the tw_recycle code could be improved, because I believe the benefits of TCP state reuse is overrated and the disadvantages overweight the advantages. Also, my work focuses on OSes which don't have this issue ;-) Thanks, Nils From v.bilek at 1art.cz Tue Sep 22 13:50:45 2009 From: v.bilek at 1art.cz (=?ISO-8859-1?Q?V=E1clav_B=EDlek?=) Date: Tue, 22 Sep 2009 15:50:45 +0200 Subject: PIPE asserts In-Reply-To: <4AB88E7A.9050100@1art.cz> References: <4AB88E7A.9050100@1art.cz> Message-ID: <4AB8D635.5060804@1art.cz> i redefined vcl_recv this way: if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ return (error); } and get assert like this: varnishd[7432]: Child (26652) Panic message: Assert error in http_StatusMessage(), cache_http.c line 111:#012 Condition(status >= 100 && status <= 999) not true. errno = 104 (Connection reset by peer) thread = (cache-worker)sp = 0x7f71469a7008 {#012 fd = 393, id = 393, xid = 1393611480,#012 client = 94.246.126.148:35576,#012 step = STP_ERROR,#012 handling = error,#012 ws = 0x7f71469a7078 { #012 id = "sess",#012 {s,f,r,e} = {0x7f71469a7808,,+460,(nil),+16384},#012 },#012 worker = 0x7f710977bbe0 {#012 },#012 vcl = {#012 srcname = {#012 "input",#012 "Default",#012 },#012 },#012 obj = 0x7f780f4b1000 {#012 refcnt = 1, xid = 1393611480,#012 ws = 0x7f780f4b1028 { #012 id = "obj",#012 {s,f,r,e} = {0x7f780f4b1358,,+78,(nil),+7336},#012 },#012 http = {#012 ws = 0x7f780f4b1028 { #012 id = "obj",#012 {s,f,r,e} = {0x7f780f4b1358,,+78,(nil),+7336},#012 },#012 hd = {#012 "Date: Tue, 22 Sep 2009 13:06:36 GMT",#012 "Server: Varnish",#012 "Retry-After: 0",#012 },#012 },#012 len = 0,#012 store = {#012 },#012 },#012},#012 is there any way how to block such bad request? V?clav B?lek napsal(a): > Hello > > On high load we are geting asserts (and varnish restarts) like this: > > varnishd[23515]: Child (7569) Panic message: Assert error in Tcheck(), > cache.h line 648:#012 Condition((t.e) != 0) not true. thread = > (cache-worker)sp = 0x7f76c5875008 {#012 fd = 611, id = 611, xid = > 778413112,#012 client = 62.141.2.8:56778,#012 step = STP_PIPE,#012 > handling = pipe,#012 err_code = 400, err_reason = (null),#012 ws = > 0x7f76c5875078 { #012 id = "sess",#012 {s,f,r,e} = > {0x7f76c5875808,,+40,(nil),+16384},#012 },#012 worker = > 0x7f7694353be0 {#012 },#012 vcl = {#012 srcname = {#012 > "input",#012 "Default",#012 },#012 },#012},#012 > > We do not PIPE anything explicitly ... after a day of run we have tens > of passes in hundreds millions of requests. > Is there any way how to absolutely disable piping of requests? > What kind of requests is piped? > > Is there anything I can do for debuging this? > > Vaclav Bilek > > > _______________________________________________ > varnish-misc mailing list > varnish-misc at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-misc From ryanchan404 at gmail.com Tue Sep 22 14:41:27 2009 From: ryanchan404 at gmail.com (Ryan Chan) Date: Tue, 22 Sep 2009 22:41:27 +0800 Subject: Serving FLV files using varnish vs nginx? Message-ID: <45d40ce30909220741y10e1d12dn5e2b5780f752ffc0@mail.gmail.com> Hello, Anyone got experience in using varnish serving FLV file? How does it compare with nginx? Or it is recommended to using varnish for video streaming? Thanks. From ml at tinwong.com Tue Sep 22 17:22:06 2009 From: ml at tinwong.com (M L) Date: Wed, 23 Sep 2009 01:22:06 +0800 Subject: Backend fail & 503 Service Unavailable Message-ID: <113d871c0909221022g2eff01d8gf6857f0dfe27aad8@mail.gmail.com> Hi list I love varnish and really want to use it :D Any clue to fix my problem , it come out alot backend fail ( i guess timeout problem ) my setup 1.Centos 5.3 64bit varnish / webserver 2.nginx backend server (it run over 200+days 2Mil pv/day without any problem & healthy hardware) 3.varnish connect to nginx in same internal switchs (3com 5500 Giga layer4) 4.Tested different version nginx was same happen ( nginx-0.6.36 & nginx-0.7.61) 5.Tested 2 different hardware for varnish same happen 6.Changed nginx different timeout same happen , if changed to keepalive 0 will more backend fail 7.When i changed vcl to director rr x50 times , it didnt show 503 Service Unavailable on client side but like 2-8 sec. lag when then Backend fail number increase #start varnishd -p lru_interval=3600 -a 0.0.0.0:80 -T localhost:3500 -p client_http11=on -f vconf2 -s file,/usr/local/varnish/cache.bin,80G -h classic,500009 -p listen_depth=4096 -p obj_workspace=32768 -p sess_workspace=32768 -p send_timeout=327 -p first_byte_timeout=300 -p connect_timeout=5 -p vcl_trace=on #varnishlog 140 ReqStart c 121.203.78.124 4755 1383283991 140 RxRequest c GET 140 RxURL c /thread-1131553-1-1.html 140 RxProtocol c HTTP/1.1 140 RxHeader c Host: www.zoobar.com 140 RxHeader c User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 140 RxHeader c Accept: image/png,image/*;q=0.8,*/*;q=0.5 140 RxHeader c Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3 140 RxHeader c Accept-Encoding: gzip,deflate 140 RxHeader c Accept-Charset: Big5,utf-8;q=0.7,*;q=0.7 140 RxHeader c Keep-Alive: 300 140 RxHeader c Connection: keep-alive 140 RxHeader c Referer: http://www.zoobar.com/thread-1131553-1-1.html 140 RxHeader c Cookie: cdb_sid=Nf6FM3; cdb_oldtopics=D1131553D1131773D1129353D1129581D1130425D1131274D1121699D1122534D1122581D1124932D1125704D1126044D1126641D1126650D1127247D1128227D1128288D1128329D1128640D1129165D1129403D1130965D1131013D1131057D1131460D1131464D1131491D 140 VCL_call c recv 140 VCL_trace c 1 106.14 140 VCL_trace c 2 110.5 140 VCL_trace c 3 110.41 140 VCL_trace c 9 143.5 140 VCL_trace c 14 157.1 140 VCL_trace c 15 157.5 140 VCL_trace c 17 162.1 140 VCL_trace c 18 162.5 140 VCL_trace c 26 172.9 140 VCL_trace c 27 172.13 140 VCL_trace c 30 180.1 140 VCL_trace c 31 180.5 140 VCL_trace c 33 184.1 140 VCL_trace c 34 184.5 140 VCL_trace c 36 192.1 140 VCL_trace c 37 192.5 140 VCL_trace c 39 198.1 140 VCL_trace c 40 198.5 140 VCL_trace c 41 198.31 140 VCL_trace c 42 199.9 140 VCL_trace c 44 202.18 140 VCL_trace c 45 202.53 140 VCL_trace c 49 213.1 140 VCL_trace c 84 42.14 140 VCL_trace c 85 43.9 140 VCL_trace c 93 53.5 140 VCL_trace c 94 53.9 140 VCL_trace c 97 57.5 140 VCL_trace c 98 57.9 140 VCL_trace c 99 57.35 140 VCL_trace c 100 57.52 140 VCL_return c pass 140 VCL_call c pass 140 VCL_trace c 103 74.14 140 VCL_return c pass 140 VCL_call c error 140 VCL_trace c 124 129.15 140 VCL_return c deliver 140 Length c 466 140 VCL_call c deliver 140 VCL_trace c 69 327.17 140 VCL_trace c 70 328.21 140 VCL_trace c 75 344.1 140 VCL_trace c 120 110.17 140 VCL_return c deliver 140 TxProtocol c HTTP/1.1 140 TxStatus c 503 140 TxResponse c Service Unavailable 140 TxHeader c Server: Varnish 140 TxHeader c Retry-After: 0 140 TxHeader c Content-Type: text/html; charset=utf-8 140 TxHeader c Content-Length: 466 140 TxHeader c Date: Tue, 22 Sep 2009 16:15:52 GMT 140 TxHeader c X-Varnish: 1383283991 140 TxHeader c Age: 0 140 TxHeader c Via: 1.1 varnish 140 TxHeader c Connection: close 140 ReqEnd c 1383283991 1253636152.715658903 1253636152.715944052 0.016985893 0.000265121 0.000020027 140 SessionClose c error 140 StatSess c 121.203.78.124 4755 0 1 1 0 1 0 235 466 #varnishstat -1 uptime 266 . Child uptime client_conn 13993 52.61 Client connections accepted client_req 43378 163.08 Client requests received cache_hit 31219 117.36 Cache hits cache_hitpass 86 0.32 Cache hits for pass cache_miss 3523 13.24 Cache misses backend_conn 12054 45.32 Backend connections success backend_unhealthy 0 0.00 Backend connections not attempted backend_busy 0 0.00 Backend connections too many backend_fail 5900 22.18 Backend connections failures backend_reuse 3503 13.17 Backend connections reuses backend_recycle 11552 43.43 Backend connections recycles backend_unused 0 0.00 Backend connections unused n_srcaddr 1246 . N struct srcaddr n_srcaddr_act 64 . N active struct srcaddr n_sess_mem 974 . N struct sess_mem n_sess 84 . N struct sess n_object 3040 . N struct object n_objecthead 1972 . N struct objecthead n_smf 6460 . N struct smf n_smf_frag 573 . N small free smf n_smf_large 7 . N large free smf n_vbe_conn 119 . N struct vbe_conn n_bereq 240 . N struct bereq n_wrk 261 . N worker threads n_wrk_create 261 0.98 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 336496 1265.02 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 4696 17.65 N overflowed work requests n_wrk_drop 374 1.41 N dropped work requests n_backend 60 . N backends n_expired 675 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 0 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 41590 156.35 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 10425 39.19 Total Sessions s_req 43325 162.88 Total Requests s_pipe 0 0.00 Total pipe s_pass 8542 32.11 Total pass s_fetch 11996 45.10 Total fetch s_hdrbytes 16332373 61399.90 Total header bytes s_bodybytes 266640005 1002406.03 Total body bytes sess_closed 2116 7.95 Session Closed sess_pipeline 54 0.20 Session Pipeline sess_readahead 34 0.13 Session Read Ahead sess_linger 0 0.00 Session Linger sess_herd 41284 155.20 Session herd shm_records 3806411 14309.82 SHM records shm_writes 134677 506.30 SHM writes shm_flushes 40 0.15 SHM flushes due to overflow shm_cont 884 3.32 SHM MTX contention shm_cycles 1 0.00 SHM cycles through buffer sm_nreq 24335 91.48 allocator requests sm_nobj 5880 . outstanding allocations sm_balloc 129392640 . bytes allocated sm_bfree 85769953280 . bytes free sma_nreq 0 0.00 SMA allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq 111 0.42 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 50376 . SMS bytes allocated sms_bfree 50376 . SMS bytes freed backend_req 12054 45.32 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock 0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) # my vcl director srv1 round-robin { { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; } } } acl purge { "localhost"; "127.0.0.1"; } #recv sub vcl_recv { #set req.grace = 30s; if (req.http.host ~ "www.zoobar.com") { set req.http.host = "www.zoobar.com"; set req.backend = srv1; }elseif ( req.http.host ~ "www.voobar.com") { set req.http.host = "www.voobar.com"; set req.backend = srv1; }elseif ( req.http.host ~ "www.hoobar.com") { set req.http.host = "www.hoobar.com"; set req.backend = srv1; }else{ error 401 "Bad Domain"; } # Add a unique header containing the client address remove req.http.X-Forwarded-For; set req.http.X-Forwarded-For = client.ip; # [...] if (req.request == "PURGE") { if(!client.ip ~ purge) { error 405 "Not Allowed"; } lookup;} if (req.http.Expect) { pipe; } if (req.request != "GET" && req.request != "HEAD" && req.request != "PUT" && req.request != "POST" && req.request != "TRACE" && req.request != "OPTIONS" && req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ pipe; } if (req.request != "GET" && req.request != "HEAD") { /* We only deal with GET and HEAD by default */ pass; } if (req.http.Cache-Control ~ "no-cache") { pass; } if (req.http.Authenticate) { pass; } #if (req.http.Cookie) { # pass; # } if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset req.http.cookie; lookup; # unset req.http.authenticate; } if (req.http.Accept-Encoding) { if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { # No point in compressing these remove req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { # unkown algorithm remove req.http.Accept-Encoding; } } } #end recv sub vcl_hash { set req.hash += req.url; set req.hash += req.http.host; #set req.hash += req.http.cookie; #set req.hash += server.ip; hash; } #end hash # strip the cookie before the image is inserted into cache. sub vcl_fetch { #set obj.grace = 30s; if(obj.http.Set-Cookie){ pass; } if(obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private"){ pass; } if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset obj.http.set-cookie; set obj.ttl = 1w; } if (req.url ~ "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { unset obj.http.expires; set obj.http.cache-control = "max-age=315360000, public"; set obj.ttl = 1w; set obj.http.magicmarker = "1"; } if (obj.status == 503) { restart; } # if (obj.cacheable) { # /* Remove Expires from backend, it's not long enough */ # unset obj.http.expires; # /* Set the clients TTL on this object */ # set obj.http.cache-control = "max-age=315360000, public"; # /* Set how long Varnish will keep it */ # set obj.ttl = 1w; # /* marker for vcl_deliver to reset Age: */ # set obj.http.magicmarker = "1"; # } } #fetch end sub vcl_deliver { if (resp.http.magicmarker) { /* Remove the magic marker */ unset resp.http.magicmarker; /* By definition we have a fresh object */ set resp.http.age = "0"; if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } } #deliver end sub vcl_pipe { # http://varnish.projects.linpro.no/ticket/451 # This forces every pipe request to be the first one. set bereq.http.connection = "close"; } #pipe end sub vcl_hit { if (req.request == "PURGE") { set obj.ttl = 0s; error 200 "Purged."; } if (!obj.cacheable) { pass; } deliver; } Thank you TW -------------- next part -------------- An HTML attachment was scrubbed... URL: From pubcrawler.com at gmail.com Tue Sep 22 20:45:09 2009 From: pubcrawler.com at gmail.com (pub crawler) Date: Tue, 22 Sep 2009 16:45:09 -0400 Subject: Backend fail & 503 Service Unavailable In-Reply-To: <113d871c0909221022g2eff01d8gf6857f0dfe27aad8@mail.gmail.com> References: <113d871c0909221022g2eff01d8gf6857f0dfe27aad8@mail.gmail.com> Message-ID: <4c3149fb0909221345u38a88630pa465aaca2f3a37d6@mail.gmail.com> Well I might be a bit off on this, but 503's often involve ulimit ceiling maximums being hit and exceeded. ulimit -a What is that limit set at for open files? In our experience many high performance web related software products suffer from low open file limit. I've seen 256 in SunOS, 1024 in Ubuntu. We bump our settings in startup scripts for key servers to 8096. We run on dedicated servers - so no resource contention with other users. -Paul On 9/22/09, M L wrote: > Hi list > > > I love varnish and really want to use it :D Any clue to fix my problem , it > come out alot backend fail ( i guess timeout problem ) > > my setup > > 1.Centos 5.3 64bit varnish / webserver > 2.nginx backend server (it run over 200+days 2Mil pv/day without any problem > & healthy hardware) > 3.varnish connect to nginx in same internal switchs (3com 5500 Giga layer4) > 4.Tested different version nginx was same happen ( nginx-0.6.36 & > nginx-0.7.61) > 5.Tested 2 different hardware for varnish same happen > 6.Changed nginx different timeout same happen , if changed to keepalive 0 > will more backend fail > 7.When i changed vcl to director rr x50 times , it didnt show 503 Service > Unavailable on client side but like 2-8 sec. lag when then Backend fail > number increase > > > #start > > varnishd -p lru_interval=3600 -a 0.0.0.0:80 -T localhost:3500 -p > client_http11=on -f vconf2 -s file,/usr/local/varnish/cache.bin,80G -h > classic,500009 -p listen_depth=4096 -p obj_workspace=32768 -p > sess_workspace=32768 -p send_timeout=327 -p first_byte_timeout=300 -p > connect_timeout=5 -p vcl_trace=on > > > > #varnishlog > > 140 ReqStart c 121.203.78.124 4755 1383283991 > 140 RxRequest c GET > 140 RxURL c /thread-1131553-1-1.html > 140 RxProtocol c HTTP/1.1 > 140 RxHeader c Host: www.zoobar.com > 140 RxHeader c User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; > zh-TW; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 > 140 RxHeader c Accept: image/png,image/*;q=0.8,*/*;q=0.5 > 140 RxHeader c Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3 > 140 RxHeader c Accept-Encoding: gzip,deflate > 140 RxHeader c Accept-Charset: Big5,utf-8;q=0.7,*;q=0.7 > 140 RxHeader c Keep-Alive: 300 > 140 RxHeader c Connection: keep-alive > 140 RxHeader c Referer: http://www.zoobar.com/thread-1131553-1-1.html > 140 RxHeader c Cookie: cdb_sid=Nf6FM3; > cdb_oldtopics=D1131553D1131773D1129353D1129581D1130425D1131274D1121699D1122534D1122581D1124932D1125704D1126044D1126641D1126650D1127247D1128227D1128288D1128329D1128640D1129165D1129403D1130965D1131013D1131057D1131460D1131464D1131491D > 140 VCL_call c recv > 140 VCL_trace c 1 106.14 > 140 VCL_trace c 2 110.5 > 140 VCL_trace c 3 110.41 > 140 VCL_trace c 9 143.5 > 140 VCL_trace c 14 157.1 > 140 VCL_trace c 15 157.5 > 140 VCL_trace c 17 162.1 > 140 VCL_trace c 18 162.5 > 140 VCL_trace c 26 172.9 > 140 VCL_trace c 27 172.13 > 140 VCL_trace c 30 180.1 > 140 VCL_trace c 31 180.5 > 140 VCL_trace c 33 184.1 > 140 VCL_trace c 34 184.5 > 140 VCL_trace c 36 192.1 > 140 VCL_trace c 37 192.5 > 140 VCL_trace c 39 198.1 > 140 VCL_trace c 40 198.5 > 140 VCL_trace c 41 198.31 > 140 VCL_trace c 42 199.9 > 140 VCL_trace c 44 202.18 > 140 VCL_trace c 45 202.53 > 140 VCL_trace c 49 213.1 > 140 VCL_trace c 84 42.14 > 140 VCL_trace c 85 43.9 > 140 VCL_trace c 93 53.5 > 140 VCL_trace c 94 53.9 > 140 VCL_trace c 97 57.5 > 140 VCL_trace c 98 57.9 > 140 VCL_trace c 99 57.35 > 140 VCL_trace c 100 57.52 > 140 VCL_return c pass > 140 VCL_call c pass > 140 VCL_trace c 103 74.14 > 140 VCL_return c pass > 140 VCL_call c error > 140 VCL_trace c 124 129.15 > 140 VCL_return c deliver > 140 Length c 466 > 140 VCL_call c deliver > 140 VCL_trace c 69 327.17 > 140 VCL_trace c 70 328.21 > 140 VCL_trace c 75 344.1 > 140 VCL_trace c 120 110.17 > 140 VCL_return c deliver > 140 TxProtocol c HTTP/1.1 > 140 TxStatus c 503 > 140 TxResponse c Service Unavailable > 140 TxHeader c Server: Varnish > 140 TxHeader c Retry-After: 0 > 140 TxHeader c Content-Type: text/html; charset=utf-8 > 140 TxHeader c Content-Length: 466 > 140 TxHeader c Date: Tue, 22 Sep 2009 16:15:52 GMT > 140 TxHeader c X-Varnish: 1383283991 > 140 TxHeader c Age: 0 > 140 TxHeader c Via: 1.1 varnish > 140 TxHeader c Connection: close > 140 ReqEnd c 1383283991 1253636152.715658903 1253636152.715944052 > 0.016985893 0.000265121 0.000020027 > 140 SessionClose c error > 140 StatSess c 121.203.78.124 4755 0 1 1 0 1 0 235 466 > > > > > #varnishstat -1 > > uptime 266 . Child uptime > client_conn 13993 52.61 Client connections accepted > client_req 43378 163.08 Client requests received > cache_hit 31219 117.36 Cache hits > cache_hitpass 86 0.32 Cache hits for pass > cache_miss 3523 13.24 Cache misses > backend_conn 12054 45.32 Backend connections success > backend_unhealthy 0 0.00 Backend connections not > attempted > backend_busy 0 0.00 Backend connections too many > backend_fail 5900 22.18 Backend connections failures > backend_reuse 3503 13.17 Backend connections reuses > backend_recycle 11552 43.43 Backend connections recycles > backend_unused 0 0.00 Backend connections unused > n_srcaddr 1246 . N struct srcaddr > n_srcaddr_act 64 . N active struct srcaddr > n_sess_mem 974 . N struct sess_mem > n_sess 84 . N struct sess > n_object 3040 . N struct object > n_objecthead 1972 . N struct objecthead > n_smf 6460 . N struct smf > n_smf_frag 573 . N small free smf > n_smf_large 7 . N large free smf > n_vbe_conn 119 . N struct vbe_conn > n_bereq 240 . N struct bereq > n_wrk 261 . N worker threads > n_wrk_create 261 0.98 N worker threads created > n_wrk_failed 0 0.00 N worker threads not created > n_wrk_max 336496 1265.02 N worker threads limited > n_wrk_queue 0 0.00 N queued work requests > n_wrk_overflow 4696 17.65 N overflowed work requests > n_wrk_drop 374 1.41 N dropped work requests > n_backend 60 . N backends > n_expired 675 . N expired objects > n_lru_nuked 0 . N LRU nuked objects > n_lru_saved 0 . N LRU saved objects > n_lru_moved 0 . N LRU moved objects > n_deathrow 0 . N objects on deathrow > losthdr 0 0.00 HTTP header overflows > n_objsendfile 0 0.00 Objects sent with sendfile > n_objwrite 41590 156.35 Objects sent with write > n_objoverflow 0 0.00 Objects overflowing workspace > s_sess 10425 39.19 Total Sessions > s_req 43325 162.88 Total Requests > s_pipe 0 0.00 Total pipe > s_pass 8542 32.11 Total pass > s_fetch 11996 45.10 Total fetch > s_hdrbytes 16332373 61399.90 Total header bytes > s_bodybytes 266640005 1002406.03 Total body bytes > sess_closed 2116 7.95 Session Closed > sess_pipeline 54 0.20 Session Pipeline > sess_readahead 34 0.13 Session Read Ahead > sess_linger 0 0.00 Session Linger > sess_herd 41284 155.20 Session herd > shm_records 3806411 14309.82 SHM records > shm_writes 134677 506.30 SHM writes > shm_flushes 40 0.15 SHM flushes due to overflow > shm_cont 884 3.32 SHM MTX contention > shm_cycles 1 0.00 SHM cycles through buffer > sm_nreq 24335 91.48 allocator requests > sm_nobj 5880 . outstanding allocations > sm_balloc 129392640 . bytes allocated > sm_bfree 85769953280 . bytes free > sma_nreq 0 0.00 SMA allocator requests > sma_nobj 0 . SMA outstanding allocations > sma_nbytes 0 . SMA outstanding bytes > sma_balloc 0 . SMA bytes allocated > sma_bfree 0 . SMA bytes free > sms_nreq 111 0.42 SMS allocator requests > sms_nobj 0 . SMS outstanding allocations > sms_nbytes 0 . SMS outstanding bytes > sms_balloc 50376 . SMS bytes allocated > sms_bfree 50376 . SMS bytes freed > backend_req 12054 45.32 Backend requests made > n_vcl 1 0.00 N vcl total > n_vcl_avail 1 0.00 N vcl available > n_vcl_discard 0 0.00 N vcl discarded > n_purge 1 . N total active purges > n_purge_add 1 0.00 N new purges added > n_purge_retire 0 0.00 N old purges deleted > n_purge_obj_test 0 0.00 N objects tested > n_purge_re_test 0 0.00 N regexps tested against > n_purge_dups 0 0.00 N duplicate purges removed > hcb_nolock 0 0.00 HCB Lookups without lock > hcb_lock 0 0.00 HCB Lookups with lock > hcb_insert 0 0.00 HCB Inserts > esi_parse 0 0.00 Objects ESI parsed (unlock) > esi_errors 0 0.00 ESI parse errors (unlock) > > > # my vcl > > director srv1 round-robin { > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = "80"; > } } > } > > > acl purge { > > "localhost"; "127.0.0.1"; > } > > #recv > sub vcl_recv { > > #set req.grace = 30s; > > if (req.http.host ~ "www.zoobar.com") { > set req.http.host = "www.zoobar.com"; > set req.backend = srv1; > > > }elseif ( req.http.host ~ "www.voobar.com") { > set req.http.host = "www.voobar.com"; > set req.backend = srv1; > > > }elseif ( req.http.host ~ "www.hoobar.com") { > set req.http.host = "www.hoobar.com"; > set req.backend = srv1; > > }else{ > error 401 "Bad Domain"; > } > > # Add a unique header containing the client address > remove req.http.X-Forwarded-For; > set req.http.X-Forwarded-For = client.ip; > # [...] > > > if (req.request == "PURGE") { > if(!client.ip ~ purge) { > error 405 "Not Allowed"; > } lookup;} > > if (req.http.Expect) { > pipe; > } > > > if (req.request != "GET" && > req.request != "HEAD" && > req.request != "PUT" && > req.request != "POST" && > req.request != "TRACE" && > req.request != "OPTIONS" && > req.request != "DELETE") { > /* Non-RFC2616 or CONNECT which is weird. */ > pipe; > } > if (req.request != "GET" && req.request != "HEAD") { > /* We only deal with GET and HEAD by default */ > pass; > } > > > > > if (req.http.Cache-Control ~ "no-cache") { > pass; > } > > if (req.http.Authenticate) { > pass; > } > > #if (req.http.Cookie) { > # pass; > # } > > if (req.url ~ > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > unset req.http.cookie; > lookup; > # unset req.http.authenticate; > } > > if (req.http.Accept-Encoding) { > if (req.url ~ > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > # No point in compressing these > remove req.http.Accept-Encoding; > } elsif (req.http.Accept-Encoding ~ "gzip") { > set req.http.Accept-Encoding = "gzip"; > } elsif (req.http.Accept-Encoding ~ "deflate") { > set req.http.Accept-Encoding = "deflate"; > } else { > # unkown algorithm > remove req.http.Accept-Encoding; > } > } > > > } #end recv > > > sub vcl_hash { > set req.hash += req.url; > set req.hash += req.http.host; > #set req.hash += req.http.cookie; > #set req.hash += server.ip; > hash; > } #end hash > > > # strip the cookie before the image is inserted into cache. > sub vcl_fetch { > > #set obj.grace = 30s; > > > if(obj.http.Set-Cookie){ > pass; > } > > > if(obj.http.Pragma ~ "no-cache" || > obj.http.Cache-Control ~ "no-cache" || > obj.http.Cache-Control ~ "private"){ > pass; > } > > > if (req.url ~ > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > unset obj.http.set-cookie; > set obj.ttl = 1w; > } > > > > > if (req.url ~ > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > unset obj.http.expires; > set obj.http.cache-control = "max-age=315360000, public"; > set obj.ttl = 1w; > set obj.http.magicmarker = "1"; > } > > > if (obj.status == 503) { > > restart; > } > > > # if (obj.cacheable) { > # /* Remove Expires from backend, it's not long > enough */ > # unset obj.http.expires; > > # /* Set the clients TTL on this object */ > # set obj.http.cache-control = "max-age=315360000, > public"; > > # /* Set how long Varnish will keep it */ > # set obj.ttl = 1w; > > # /* marker for vcl_deliver to reset Age: */ > # set obj.http.magicmarker = "1"; > # } > > > } #fetch end > > > > > sub vcl_deliver { > if (resp.http.magicmarker) { > /* Remove the magic marker */ > unset resp.http.magicmarker; > > /* By definition we have a fresh object */ > set resp.http.age = "0"; > if (obj.hits > 0) { > set resp.http.X-Cache = "HIT"; > } else { > set resp.http.X-Cache = "MISS"; > } > > } > > > } #deliver end > > > sub vcl_pipe { > # http://varnish.projects.linpro.no/ticket/451 > # This forces every pipe request to be the first one. > set bereq.http.connection = "close"; > } #pipe end > > > sub vcl_hit { > if (req.request == "PURGE") { > set obj.ttl = 0s; > error 200 "Purged."; > } > > if (!obj.cacheable) { > pass; > } > deliver; > } > > > > > Thank you > > TW > From angelo.iannello at dp2000.it Wed Sep 23 07:03:20 2009 From: angelo.iannello at dp2000.it (angelo iannello) Date: Wed, 23 Sep 2009 09:03:20 +0200 Subject: basic authentication varnish before squid Message-ID: <001101ca3c1b$efd18630$6700a8c0@comdata.local> Hi all, i have configured varnish that caching for speed improvement all request passed to a backend based on squid proxy. lan => varnish => squid => internet Squid have a basic authentication but the popup relative does not appear to the client browser, squid says cache denied authentication. If i use squid directly i have succesfull access on the popup and i can authenticate on it. It's possible that varnish not recognize this authentication? do exist a workaround for solving this problem? tnx Angelo From phk at phk.freebsd.dk Wed Sep 23 11:30:50 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Wed, 23 Sep 2009 11:30:50 +0000 Subject: VUG1 meeting resport Message-ID: <1950.1253705450@critter.freebsd.dk> I thought I would drop you a short report on our "Varnish User Group" meeting this monday and tuesday in London. About a dozen people participated, filling the meeting room Canonical Software kindly provided to us for free. Obviously, my "TODO" list grew continuously during the meeting but it was my impression that everybody got something out of the meeting, but on that I will let people speak for themselves. We talked about a lot of things, and the result is more or less a sort of a roadmap, but I need to do some text-procesing before I can give a coherent view of it. My personal impression was that the "user group" format worked out well, we are not a big enough community to do actual conferences yet. I also think most of the knowledge that needs to be shared, are in the brains of everybody else than me[1]. One of the action items were to get more VCL code up on the wiki, both snippets and complete examples, so that Varnish users can learn and find inspiration from each other. We talked about frequency of these meetings, and I think the overall consensus was "no more than a couple of times every year". There were no clear consensus, if it would be a good idea to piggyback on other conferences (FOSSDEM etc) to synergize travel and the end result I think, was that whoever arranges the meeting gets to decide where and when. Next meeting will probably be in .NL in the feb-march time-frame. See you there! Poul-Henning [1] I was the only on in the room who did not run Varnish :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From ml at tinwong.com Wed Sep 23 12:36:25 2009 From: ml at tinwong.com (M L) Date: Wed, 23 Sep 2009 20:36:25 +0800 Subject: Backend fail & 503 Service Unavailable In-Reply-To: <4c3149fb0909221345u38a88630pa465aaca2f3a37d6@mail.gmail.com> References: <113d871c0909221022g2eff01d8gf6857f0dfe27aad8@mail.gmail.com> <4c3149fb0909221345u38a88630pa465aaca2f3a37d6@mail.gmail.com> Message-ID: <113d871c0909230536x5b2521f4n1ff951e4d1dd9f4c@mail.gmail.com> Thanks Paul Oh yes my fault , my varnish server open files 1024 , i guess my problem solved , thx a lot Thanks TW On Wed, Sep 23, 2009 at 4:45 AM, pub crawler wrote: > Well I might be a bit off on this, but 503's often involve ulimit > ceiling maximums being hit and exceeded. > > ulimit -a > > What is that limit set at for open files? > > In our experience many high performance web related software products > suffer from low open file limit. I've seen 256 in SunOS, 1024 in > Ubuntu. We bump our settings in startup scripts for key servers to > 8096. We run on dedicated servers - so no resource contention with > other users. > > -Paul > > On 9/22/09, M L wrote: > > Hi list > > > > > > I love varnish and really want to use it :D Any clue to fix my problem , > it > > come out alot backend fail ( i guess timeout problem ) > > > > my setup > > > > 1.Centos 5.3 64bit varnish / webserver > > 2.nginx backend server (it run over 200+days 2Mil pv/day without any > problem > > & healthy hardware) > > 3.varnish connect to nginx in same internal switchs (3com 5500 Giga > layer4) > > 4.Tested different version nginx was same happen ( nginx-0.6.36 & > > nginx-0.7.61) > > 5.Tested 2 different hardware for varnish same happen > > 6.Changed nginx different timeout same happen , if changed to keepalive 0 > > will more backend fail > > 7.When i changed vcl to director rr x50 times , it didnt show 503 Service > > Unavailable on client side but like 2-8 sec. lag when then Backend fail > > number increase > > > > > > #start > > > > varnishd -p lru_interval=3600 -a 0.0.0.0:80 -T localhost:3500 -p > > client_http11=on -f vconf2 -s file,/usr/local/varnish/cache.bin,80G -h > > classic,500009 -p listen_depth=4096 -p obj_workspace=32768 -p > > sess_workspace=32768 -p send_timeout=327 -p first_byte_timeout=300 -p > > connect_timeout=5 -p vcl_trace=on > > > > > > > > #varnishlog > > > > 140 ReqStart c 121.203.78.124 4755 1383283991 > > 140 RxRequest c GET > > 140 RxURL c /thread-1131553-1-1.html > > 140 RxProtocol c HTTP/1.1 > > 140 RxHeader c Host: www.zoobar.com > > 140 RxHeader c User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; > > zh-TW; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 > > 140 RxHeader c Accept: image/png,image/*;q=0.8,*/*;q=0.5 > > 140 RxHeader c Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3 > > 140 RxHeader c Accept-Encoding: gzip,deflate > > 140 RxHeader c Accept-Charset: Big5,utf-8;q=0.7,*;q=0.7 > > 140 RxHeader c Keep-Alive: 300 > > 140 RxHeader c Connection: keep-alive > > 140 RxHeader c Referer: > http://www.zoobar.com/thread-1131553-1-1.html > > 140 RxHeader c Cookie: cdb_sid=Nf6FM3; > > > cdb_oldtopics=D1131553D1131773D1129353D1129581D1130425D1131274D1121699D1122534D1122581D1124932D1125704D1126044D1126641D1126650D1127247D1128227D1128288D1128329D1128640D1129165D1129403D1130965D1131013D1131057D1131460D1131464D1131491D > > 140 VCL_call c recv > > 140 VCL_trace c 1 106.14 > > 140 VCL_trace c 2 110.5 > > 140 VCL_trace c 3 110.41 > > 140 VCL_trace c 9 143.5 > > 140 VCL_trace c 14 157.1 > > 140 VCL_trace c 15 157.5 > > 140 VCL_trace c 17 162.1 > > 140 VCL_trace c 18 162.5 > > 140 VCL_trace c 26 172.9 > > 140 VCL_trace c 27 172.13 > > 140 VCL_trace c 30 180.1 > > 140 VCL_trace c 31 180.5 > > 140 VCL_trace c 33 184.1 > > 140 VCL_trace c 34 184.5 > > 140 VCL_trace c 36 192.1 > > 140 VCL_trace c 37 192.5 > > 140 VCL_trace c 39 198.1 > > 140 VCL_trace c 40 198.5 > > 140 VCL_trace c 41 198.31 > > 140 VCL_trace c 42 199.9 > > 140 VCL_trace c 44 202.18 > > 140 VCL_trace c 45 202.53 > > 140 VCL_trace c 49 213.1 > > 140 VCL_trace c 84 42.14 > > 140 VCL_trace c 85 43.9 > > 140 VCL_trace c 93 53.5 > > 140 VCL_trace c 94 53.9 > > 140 VCL_trace c 97 57.5 > > 140 VCL_trace c 98 57.9 > > 140 VCL_trace c 99 57.35 > > 140 VCL_trace c 100 57.52 > > 140 VCL_return c pass > > 140 VCL_call c pass > > 140 VCL_trace c 103 74.14 > > 140 VCL_return c pass > > 140 VCL_call c error > > 140 VCL_trace c 124 129.15 > > 140 VCL_return c deliver > > 140 Length c 466 > > 140 VCL_call c deliver > > 140 VCL_trace c 69 327.17 > > 140 VCL_trace c 70 328.21 > > 140 VCL_trace c 75 344.1 > > 140 VCL_trace c 120 110.17 > > 140 VCL_return c deliver > > 140 TxProtocol c HTTP/1.1 > > 140 TxStatus c 503 > > 140 TxResponse c Service Unavailable > > 140 TxHeader c Server: Varnish > > 140 TxHeader c Retry-After: 0 > > 140 TxHeader c Content-Type: text/html; charset=utf-8 > > 140 TxHeader c Content-Length: 466 > > 140 TxHeader c Date: Tue, 22 Sep 2009 16:15:52 GMT > > 140 TxHeader c X-Varnish: 1383283991 > > 140 TxHeader c Age: 0 > > 140 TxHeader c Via: 1.1 varnish > > 140 TxHeader c Connection: close > > 140 ReqEnd c 1383283991 1253636152.715658903 1253636152.715944052 > > 0.016985893 0.000265121 0.000020027 > > 140 SessionClose c error > > 140 StatSess c 121.203.78.124 4755 0 1 1 0 1 0 235 466 > > > > > > > > > > #varnishstat -1 > > > > uptime 266 . Child uptime > > client_conn 13993 52.61 Client connections accepted > > client_req 43378 163.08 Client requests received > > cache_hit 31219 117.36 Cache hits > > cache_hitpass 86 0.32 Cache hits for pass > > cache_miss 3523 13.24 Cache misses > > backend_conn 12054 45.32 Backend connections success > > backend_unhealthy 0 0.00 Backend connections not > > attempted > > backend_busy 0 0.00 Backend connections too many > > backend_fail 5900 22.18 Backend connections failures > > backend_reuse 3503 13.17 Backend connections reuses > > backend_recycle 11552 43.43 Backend connections recycles > > backend_unused 0 0.00 Backend connections unused > > n_srcaddr 1246 . N struct srcaddr > > n_srcaddr_act 64 . N active struct srcaddr > > n_sess_mem 974 . N struct sess_mem > > n_sess 84 . N struct sess > > n_object 3040 . N struct object > > n_objecthead 1972 . N struct objecthead > > n_smf 6460 . N struct smf > > n_smf_frag 573 . N small free smf > > n_smf_large 7 . N large free smf > > n_vbe_conn 119 . N struct vbe_conn > > n_bereq 240 . N struct bereq > > n_wrk 261 . N worker threads > > n_wrk_create 261 0.98 N worker threads created > > n_wrk_failed 0 0.00 N worker threads not created > > n_wrk_max 336496 1265.02 N worker threads limited > > n_wrk_queue 0 0.00 N queued work requests > > n_wrk_overflow 4696 17.65 N overflowed work requests > > n_wrk_drop 374 1.41 N dropped work requests > > n_backend 60 . N backends > > n_expired 675 . N expired objects > > n_lru_nuked 0 . N LRU nuked objects > > n_lru_saved 0 . N LRU saved objects > > n_lru_moved 0 . N LRU moved objects > > n_deathrow 0 . N objects on deathrow > > losthdr 0 0.00 HTTP header overflows > > n_objsendfile 0 0.00 Objects sent with sendfile > > n_objwrite 41590 156.35 Objects sent with write > > n_objoverflow 0 0.00 Objects overflowing workspace > > s_sess 10425 39.19 Total Sessions > > s_req 43325 162.88 Total Requests > > s_pipe 0 0.00 Total pipe > > s_pass 8542 32.11 Total pass > > s_fetch 11996 45.10 Total fetch > > s_hdrbytes 16332373 61399.90 Total header bytes > > s_bodybytes 266640005 1002406.03 Total body bytes > > sess_closed 2116 7.95 Session Closed > > sess_pipeline 54 0.20 Session Pipeline > > sess_readahead 34 0.13 Session Read Ahead > > sess_linger 0 0.00 Session Linger > > sess_herd 41284 155.20 Session herd > > shm_records 3806411 14309.82 SHM records > > shm_writes 134677 506.30 SHM writes > > shm_flushes 40 0.15 SHM flushes due to overflow > > shm_cont 884 3.32 SHM MTX contention > > shm_cycles 1 0.00 SHM cycles through buffer > > sm_nreq 24335 91.48 allocator requests > > sm_nobj 5880 . outstanding allocations > > sm_balloc 129392640 . bytes allocated > > sm_bfree 85769953280 . bytes free > > sma_nreq 0 0.00 SMA allocator requests > > sma_nobj 0 . SMA outstanding allocations > > sma_nbytes 0 . SMA outstanding bytes > > sma_balloc 0 . SMA bytes allocated > > sma_bfree 0 . SMA bytes free > > sms_nreq 111 0.42 SMS allocator requests > > sms_nobj 0 . SMS outstanding allocations > > sms_nbytes 0 . SMS outstanding bytes > > sms_balloc 50376 . SMS bytes allocated > > sms_bfree 50376 . SMS bytes freed > > backend_req 12054 45.32 Backend requests made > > n_vcl 1 0.00 N vcl total > > n_vcl_avail 1 0.00 N vcl available > > n_vcl_discard 0 0.00 N vcl discarded > > n_purge 1 . N total active purges > > n_purge_add 1 0.00 N new purges added > > n_purge_retire 0 0.00 N old purges deleted > > n_purge_obj_test 0 0.00 N objects tested > > n_purge_re_test 0 0.00 N regexps tested against > > n_purge_dups 0 0.00 N duplicate purges removed > > hcb_nolock 0 0.00 HCB Lookups without lock > > hcb_lock 0 0.00 HCB Lookups with lock > > hcb_insert 0 0.00 HCB Inserts > > esi_parse 0 0.00 Objects ESI parsed (unlock) > > esi_errors 0 0.00 ESI parse errors (unlock) > > > > > > # my vcl > > > > director srv1 round-robin { > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > { .backend = { .connect_timeout = 2s; .host = "10.0.0.5"; .port = > "80"; > > } } > > } > > > > > > acl purge { > > > > "localhost"; "127.0.0.1"; > > } > > > > #recv > > sub vcl_recv { > > > > #set req.grace = 30s; > > > > if (req.http.host ~ "www.zoobar.com") { > > set req.http.host = "www.zoobar.com"; > > set req.backend = srv1; > > > > > > }elseif ( req.http.host ~ "www.voobar.com") { > > set req.http.host = "www.voobar.com"; > > set req.backend = srv1; > > > > > > }elseif ( req.http.host ~ "www.hoobar.com") { > > set req.http.host = "www.hoobar.com"; > > set req.backend = srv1; > > > > }else{ > > error 401 "Bad Domain"; > > } > > > > # Add a unique header containing the client address > > remove req.http.X-Forwarded-For; > > set req.http.X-Forwarded-For = client.ip; > > # [...] > > > > > > if (req.request == "PURGE") { > > if(!client.ip ~ purge) { > > error 405 "Not Allowed"; > > } lookup;} > > > > if (req.http.Expect) { > > pipe; > > } > > > > > > if (req.request != "GET" && > > req.request != "HEAD" && > > req.request != "PUT" && > > req.request != "POST" && > > req.request != "TRACE" && > > req.request != "OPTIONS" && > > req.request != "DELETE") { > > /* Non-RFC2616 or CONNECT which is weird. */ > > pipe; > > } > > if (req.request != "GET" && req.request != "HEAD") { > > /* We only deal with GET and HEAD by default */ > > pass; > > } > > > > > > > > > > if (req.http.Cache-Control ~ "no-cache") { > > pass; > > } > > > > if (req.http.Authenticate) { > > pass; > > } > > > > #if (req.http.Cookie) { > > # pass; > > # } > > > > if (req.url ~ > > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > > unset req.http.cookie; > > lookup; > > # unset req.http.authenticate; > > } > > > > if (req.http.Accept-Encoding) { > > if (req.url ~ > > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > > # No point in compressing these > > remove req.http.Accept-Encoding; > > } elsif (req.http.Accept-Encoding ~ "gzip") { > > set req.http.Accept-Encoding = "gzip"; > > } elsif (req.http.Accept-Encoding ~ "deflate") { > > set req.http.Accept-Encoding = "deflate"; > > } else { > > # unkown algorithm > > remove req.http.Accept-Encoding; > > } > > } > > > > > > } #end recv > > > > > > sub vcl_hash { > > set req.hash += req.url; > > set req.hash += req.http.host; > > #set req.hash += req.http.cookie; > > #set req.hash += server.ip; > > hash; > > } #end hash > > > > > > # strip the cookie before the image is inserted into cache. > > sub vcl_fetch { > > > > #set obj.grace = 30s; > > > > > > if(obj.http.Set-Cookie){ > > pass; > > } > > > > > > if(obj.http.Pragma ~ "no-cache" || > > obj.http.Cache-Control ~ "no-cache" || > > obj.http.Cache-Control ~ "private"){ > > pass; > > } > > > > > > if (req.url ~ > > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > > unset obj.http.set-cookie; > > set obj.ttl = 1w; > > } > > > > > > > > > > if (req.url ~ > > "\.(zip|ico|dat|torrent|png|gif|jpg|swf|css|js|bmp|bz2|tbz|mp3|ogg)$") { > > unset obj.http.expires; > > set obj.http.cache-control = "max-age=315360000, public"; > > set obj.ttl = 1w; > > set obj.http.magicmarker = "1"; > > } > > > > > > if (obj.status == 503) { > > > > restart; > > } > > > > > > # if (obj.cacheable) { > > # /* Remove Expires from backend, it's not long > > enough */ > > # unset obj.http.expires; > > > > # /* Set the clients TTL on this object */ > > # set obj.http.cache-control = "max-age=315360000, > > public"; > > > > # /* Set how long Varnish will keep it */ > > # set obj.ttl = 1w; > > > > # /* marker for vcl_deliver to reset Age: */ > > # set obj.http.magicmarker = "1"; > > # } > > > > > > } #fetch end > > > > > > > > > > sub vcl_deliver { > > if (resp.http.magicmarker) { > > /* Remove the magic marker */ > > unset resp.http.magicmarker; > > > > /* By definition we have a fresh object */ > > set resp.http.age = "0"; > > if (obj.hits > 0) { > > set resp.http.X-Cache = "HIT"; > > } else { > > set resp.http.X-Cache = "MISS"; > > } > > > > } > > > > > > } #deliver end > > > > > > sub vcl_pipe { > > # http://varnish.projects.linpro.no/ticket/451 > > # This forces every pipe request to be the first one. > > set bereq.http.connection = "close"; > > } #pipe end > > > > > > sub vcl_hit { > > if (req.request == "PURGE") { > > set obj.ttl = 0s; > > error 200 "Purged."; > > } > > > > if (!obj.cacheable) { > > pass; > > } > > deliver; > > } > > > > > > > > > > Thank you > > > > TW > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phk at phk.freebsd.dk Thu Sep 24 07:09:25 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 24 Sep 2009 07:09:25 +0000 Subject: Backend fail & 503 Service Unavailable In-Reply-To: Your message of "Wed, 23 Sep 2009 01:22:06 +0800." <113d871c0909221022g2eff01d8gf6857f0dfe27aad8@mail.gmail.com> Message-ID: <1823.1253776165@critter.freebsd.dk> In message <113d871c0909221022g2eff01d8gf6857f0dfe27aad8 at mail.gmail.com>, M L w rites: >backend_conn 12054 45.32 Backend connections success >backend_unhealthy 0 0.00 Backend connections not attempted >backend_busy 0 0.00 Backend connections too many >backend_fail 5900 22.18 Backend connections failures >backend_reuse 3503 13.17 Backend connections reuses >backend_recycle 11552 43.43 Backend connections recycles >backend_unused 0 0.00 Backend connections unused I don't think there is much I can suggest you, the majority of your problem seems to be that your backend cannot keep up with the traffic you throw at it. Almost 1 in 3 connections to the backend fails: 5900 / (12054 + 5900) = 0.33 -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Thu Sep 24 06:53:16 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 24 Sep 2009 06:53:16 +0000 Subject: PIPE asserts In-Reply-To: Your message of "Tue, 22 Sep 2009 15:50:45 +0200." <4AB8D635.5060804@1art.cz> Message-ID: <1666.1253775196@critter.freebsd.dk> In message <4AB8D635.5060804 at 1art.cz>, =?ISO-8859-1?Q?V=E1clav_B=EDlek?= writes : >i redefined vcl_recv this way: > > if (req.request != "GET" && > req.request != "HEAD" && > req.request != "PUT" && > req.request != "POST" && > req.request != "TRACE" && > req.request != "OPTIONS" && > req.request != "DELETE") { > /* Non-RFC2616 or CONNECT which is weird. */ > return (error); > } Please open a ticket on this one. As a workaround, you can use: [...] req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ error 503; (or any other status code than 503 which you might prefer) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From kristian at redpill-linpro.com Thu Sep 24 21:00:10 2009 From: kristian at redpill-linpro.com (Kristian Lyngstol) Date: Thu, 24 Sep 2009 23:00:10 +0200 Subject: Restructuring the Wiki Message-ID: <20090924210010.GA20909@kjeks.getinternet.no> As we talked about during the VUG meeting, the Wiki could use some love. Unless anyone has any strong objections, I'll restructure it shortly. I'll probably rewrite, rename and add a considerable amount of information, so I apologize in advance for the inconvenience. I hope to avoid removing data unless it's incorrect, obsolete or otherwise unwanted. I did a quick tour of the Wiki, and figured I'd divide the information into categories similar to the following (these are my raw notes, so fill in your own blanks): Wiki - Overview - News - Varnish Major.minor.patch - ? - Features - Architect Notes - Version N - Current - "Frequently declined" - Shopping List - Planned - FAQ - Source - Community - IRC / ML - Documentation - Introduction - Reference documentation - Options - CLI - VCL - Request Flowcharts - Guides - "Getting started" - Preparing for production - OS tweaks - Tips and tricks for hit-rate - Examples - Reference examples (VCL examples demonstrating specific features) - General VCL example-snippets - Complete examples - Reviewed - Raw - Tuning examples (Think various parameter-tweaks) - Develop - Debugging Varnish - (most of what is under DeveloperResources) - Resources - Varnish-cache.com - Commercial support - Accelerating wordpress with varnish, etc - "trouble log" -> void? Or fix? - "who uses..." ? This is likely to change during the actual work, but it should give you an idea at what I'm aiming at. My goal is to make it easy to find what you are looking for and make the front page useful both for first time visitors and for those of us who use the Wiki on a daily basis. Knowing myself and how I've done wiki-restructuring in the past, I expect the majority of the work will happen in one or two evenings, so thou has been warned. So again: comments, objections etc. are welcome, as you're the people who will use this wiki and hopefully help improve it. -- Kristian Lyngst?l Redpill Linpro AS Tlf: +47 21544179 Mob: +47 99014497 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From v.bilek at 1art.cz Fri Sep 25 06:34:29 2009 From: v.bilek at 1art.cz (=?UTF-8?B?VsOhY2xhdiBCw61sZWs=?=) Date: Fri, 25 Sep 2009 08:34:29 +0200 Subject: PIPE asserts In-Reply-To: <1666.1253775196@critter.freebsd.dk> References: <1666.1253775196@critter.freebsd.dk> Message-ID: <4ABC6475.3000001@1art.cz> >> /* Non-RFC2616 or CONNECT which is weird. */ >> return (error); >> } > > Please open a ticket on this one. > > As a workaround, you can use: > > [...] > req.request != "DELETE") { > /* Non-RFC2616 or CONNECT which is weird. */ > error 503; > > (or any other status code than 503 which you might prefer) > Hello What is the difference between [...] req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ error 503; and [...] req.request != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ return (error); Vaclav Bilek From phk at phk.freebsd.dk Fri Sep 25 06:38:22 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri, 25 Sep 2009 06:38:22 +0000 Subject: PIPE asserts In-Reply-To: Your message of "Fri, 25 Sep 2009 08:34:29 +0200." <4ABC6475.3000001@1art.cz> Message-ID: <2862.1253860702@critter.freebsd.dk> In message <4ABC6475.3000001 at 1art.cz>, =?UTF-8?B?VsOhY2xhdiBCw61sZWs=?= writes: >Hello > >What is the difference between > > [...] > req.request != "DELETE") { > /* Non-RFC2616 or CONNECT which is weird. */ > error 503; This sets the returned status to "503" >and > [...] > req.request != "DELETE") { > /* Non-RFC2616 or CONNECT which is weird. */ > return (error); This reuses whatever status you might already have received from the backend, in the case of vcl_recv{} there were none, hence the panic. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From ml at tinwong.com Fri Sep 25 06:43:16 2009 From: ml at tinwong.com (M L) Date: Fri, 25 Sep 2009 14:43:16 +0800 Subject: Backend fail & 503 Service Unavailable In-Reply-To: <1823.1253776165@critter.freebsd.dk> References: <113d871c0909221022g2eff01d8gf6857f0dfe27aad8@mail.gmail.com> <1823.1253776165@critter.freebsd.dk> Message-ID: <113d871c0909242343v12905c8bmcb1899ba2ee09bda@mail.gmail.com> Thanks for reply phk My mistake , i didnt set enough open file ( ulimit ) in varnish server , problem are fixed , it run smooth with 0 backend_fail now thanks for helping :D Thanks List TW On Thu, Sep 24, 2009 at 3:09 PM, Poul-Henning Kamp wrote: > In message <113d871c0909221022g2eff01d8gf6857f0dfe27aad8 at mail.gmail.com>, > M L w > rites: > > >backend_conn 12054 45.32 Backend connections success > >backend_unhealthy 0 0.00 Backend connections not > attempted > >backend_busy 0 0.00 Backend connections too many > >backend_fail 5900 22.18 Backend connections failures > >backend_reuse 3503 13.17 Backend connections reuses > >backend_recycle 11552 43.43 Backend connections recycles > >backend_unused 0 0.00 Backend connections unused > > I don't think there is much I can suggest you, the majority of your > problem seems to be that your backend cannot keep up with the traffic > you throw at it. > > Almost 1 in 3 connections to the backend fails: > > 5900 / (12054 + 5900) = 0.33 > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk at FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roi at metacafe.com Mon Sep 21 11:55:07 2009 From: roi at metacafe.com (Roi Avinoam) Date: Mon, 21 Sep 2009 14:55:07 +0300 Subject: Varnish virtual memory usage Message-ID: <533E1F3F0E92C246A5129A1CF8F6F694F73D77@brain.office.mc> Hey, At Metacafe we're testing the integration with Varnish, and I was tasked with benchmarking our Varnish setup. I intentionally over-flooded the server with requests, in an attempt to see how the system will behave under extensive traffic. Surprisingly, the server ran out of swap and crashed. In out configuration, "-s file,/var/lib/varnish/varnish_storage.bin,1G". Does it mean Varnish shouldn't use more than 1GB of the virtual memory? Is there any other way to limit the memory/storage usage? The system has 4GB RAM, 2GB Swap on Red Hat Enterprise. Varnish version 2.0.3 Thanks for your help! :D -- Roi Avinoam Programmer Metacafe http://www.metacafe.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From angelo.iannello at dp2000.it Mon Sep 21 13:04:56 2009 From: angelo.iannello at dp2000.it (angelo iannello) Date: Mon, 21 Sep 2009 15:04:56 +0200 Subject: authentication varnish before squid Message-ID: <00c701ca3abc$227e84e0$6700a8c0@comdata.local> Hi all, i have configured varnish that caching for speed improvement all request passed to a backend based on squid proxy. lan => varnish => squid => internet Squid have a basic authentication but the popup relative does not appear to the client browser, squid says cache denied authentication. If i use squid directly i have succesfull access on the popup and i can authenticate on it. It's possible that varnish not recognize this authentication? do exist a workaround for solving this problem? tnx Angelo From scaunter at topscms.com Fri Sep 25 12:19:07 2009 From: scaunter at topscms.com (Caunter, Stefan) Date: Fri, 25 Sep 2009 08:19:07 -0400 Subject: Varnish virtual memory usage Message-ID: <064FF286FD17EC418BFB74629578BED112C6C83C@tmg-mail4.torstar.net> What is the vcl config and what is in the RH system log? -------------------------- Sent using BlackBerry 416 561 4871 -----Original Message----- From: varnish-misc-bounces at projects.linpro.no To: varnish-misc at projects.linpro.no Sent: Mon Sep 21 07:55:07 2009 Subject: Varnish virtual memory usage Hey, At Metacafe we?re testing the integration with Varnish, and I was tasked with benchmarking our Varnish setup. I intentionally over-flooded the server with requests, in an attempt to see how the system will behave under extensive traffic. Surprisingly, the server ran out of swap and crashed. In out configuration, "-s file,/var/lib/varnish/varnish_storage.bin,1G". Does it mean Varnish shouldn?t use more than 1GB of the virtual memory? Is there any other way to limit the memory/storage usage? The system has 4GB RAM, 2GB Swap on Red Hat Enterprise. Varnish version 2.0.3 Thanks for your help! :D -- Roi Avinoam Programmer Metacafe http://www.metacafe.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.boer at netclever.nl Mon Sep 28 09:46:51 2009 From: martin.boer at netclever.nl (Martin Boer) Date: Mon, 28 Sep 2009 11:46:51 +0200 Subject: newbie questions Message-ID: <4AC0860B.1080206@netclever.nl> Hello, I am looking into Varnish to see if it can replace Squid and so far that seems fairly easy to do. I've read the documentation and still have some questions/remarks. Remarks; - The documentation on the website seems to try to describe both the 1.1 as the 2.0 syntax. Or more likely; not all the documentation has been upgraded to the 2.0 syntax. The man pages were more helpful, but of course reading the man pages wasn't exactly the first thing I tried. :) - At this moment I'm testing varnish using jmeter and browsing my brains out. It seems that varnish can easily handle everything I throw at it but it would have been nice if in the documentation there would be some performance figures. Some figures with different cpu/power, memory, OS brand and 32/64 versions would be nice. And of course how much bandwidth you can server before saturating varnish. I understand that every situation is different but it would help when someone needs to convince management. Questions; - I've seen that I could use -s file or -s malloc. What happens if I use neither of them and what are the pro's and con's of either option. The man pages describes perfectly -how- to use the options but not really -why-. I'm now using the factory default -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G but don't really know if that's smart. The same goes for some other options. Some, 'if you want , you might try ' would help. Our websites are very dynamic but a ttl of 300s and grace period of 10m are workable. As most data except the pictures will be outdated very quick it wouldn't really be a problem if there wouldn't be a file. If that would help performance on a daily basis that would be preferable to a faster startup in the unlikely event varnish crashes. - I've seen that varnish also has load balancing features like pound. Will more features be added so I could skip pound in the near future. I know there's no way you can really answer that question for me, but never mind that. I suppose dropping pound would decrease latency which is always good. Regards, Martin From quasirob at googlemail.com Mon Sep 28 11:48:54 2009 From: quasirob at googlemail.com (Rob Ayres) Date: Mon, 28 Sep 2009 12:48:54 +0100 Subject: Is it possible to simulate this rewrite in vcl? Message-ID: Hi, I've had to put an instance of apache behind varnish as a redirector as I can't think of a way of making varnish do it. Can this apache rewrite be done in vcl? RewriteRule ^/(.*)/home/(.*)$ http://$1_host.example.com/$2 [P] I've been looking at a mix of using regsub to change parts of the URL and redirecting using a "error 750" but its becoming more complicated and using apache is looking like the simplest solution unless I'm missing something obvious in vcl. Thanks, Rob -------------- next part -------------- An HTML attachment was scrubbed... URL: From phk at phk.freebsd.dk Mon Sep 28 11:51:47 2009 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon, 28 Sep 2009 11:51:47 +0000 Subject: Is it possible to simulate this rewrite in vcl? In-Reply-To: Your message of "Mon, 28 Sep 2009 12:48:54 +0100." Message-ID: <11187.1254138707@critter.freebsd.dk> In message , Rob A yres writes: >I've had to put an instance of apache behind varnish as a redirector as I >can't think of a way of making varnish do it. Can this apache rewrite be >done in vcl? > >RewriteRule ^/(.*)/home/(.*)$ http://$1_host.example.com/$2 [P] Not right of the bat. Provided you add backend instances for all the servers serving these domains it is possible, but you need to do the $1 part as a if/elseif/elseif/elseif/.../else to set the req.backend. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From martin.boer at netclever.nl Tue Sep 29 12:42:22 2009 From: martin.boer at netclever.nl (Martin Boer) Date: Tue, 29 Sep 2009 14:42:22 +0200 Subject: using grace Message-ID: <4AC200AE.90301@netclever.nl> I'm a bit playing around with set.obj and grace and it's not quite working the way I expect. I would like 0-1.5 minutes; serve object to end user out of cache. 1.5-61.5 minutes; serve object to end user out of cache, fetch new object, put that in cache and reset timer for that object. The latter doesn't seem to work; after 1.5 minutes the loading of a webpage is as slow as without using varnish. Am I using the wrong configuration or is grace not working like I want to ? At this moment I'm using the following configuration: DAEMON_OPTS="-a 192.168.100.11:80 \ -T 192.168.100.11:6082 \ -f /etc/varnish/bizztravel.vcl \ -p lru_interval=3600 -p thread_pool_max=4000 \ -p thread_pools=4 \ -p listen_depth=4096 \ -h classic,500009 \ -s file,/chroot/varnish/varnish_storage.bin,1G" sub vcl_recv { set req.grace = 60m; if (req.url ~ "error.cgi" || req.url ~ "admin" || req.url ~ "feeds" || req.url ~ "static" ) { return(pass); } if (req.url ~ "whatever") { return(pipe); } return(lookup); } sub vcl_fetch { if (obj.ttl < 90s) { set obj.ttl = 90s; } set obj.grace = 60m; if (req.url ~ "error.cgi" || req.url ~ "admin" || req.url ~ "feeds" || req.url ~ "static" ) { return(pass); } if (req.url ~ "^/pics" || req.url ~ "^/css" || req.url ~ "^/js" || req.url ~ "^/images" ) { set obj.ttl = 86400s; } return(deliver); } Regards, Martin Boer