From kristian at varnish-software.com Thu Sep 1 09:35:12 2011 From: kristian at varnish-software.com (Kristian Lyngstol) Date: Thu, 1 Sep 2011 11:35:12 +0200 Subject: Experimental-ims branch tested: [Fryer] FAIL. 22 of 24 tests succeeded. Message-ID: <20110901093512.GA16206@freud.kly.no> The branch mostly survives. No crashes. However, two tests are failing. A test designed to stress rapid expiry and one that tests rapid ttl=0s - so in this case it's probably triggering the same bug. The default VCL used (err, the fryer-default, not varnish-default) sets beresp.do_stream = true, so unless a test explicitly disables it or overrides the VCL, all the tests are run with that set. I ran the rapid-expiry test with do_stream = false to check if that was the issue. Disabling do_stream improved the result - the expected amount of requests were received - but it still gave hitpass. When I investigated the varnishlog with do_stream=true, it showed Stream Error's. It also verified hitpass, though there was no reason that this traffic should cause a hitpass. As this is also a bit of a test on fryer (the below mail will be sent to a public list of some sort soon-ish, instead of to myself) I will resist explaining in detail what every test does - unless asked (do ask - anyone!). Grab a hold on me on IRC or mail if you want me to re-test or test with some specific vcl or parameter, and I'll set it up. - Kristian PS: For the record: Master passes all of them, though the 4g-ones are not properly measuring the traffic due to s_bodybytes issues. ----- Forwarded message from kristian at oneiros.varnish-software.com ----- Date: Thu, 1 Sep 2011 00:12:45 +0200 (CEST) From: kristian at oneiros.varnish-software.com To: kristian at varnish-software.com Subject: [Fryer] FAIL. 22 of 24 tests succeeded. Tests Failed: httperf-rapid-expire purge-fail Tests OK: httperf-lru-default httperf-lru-stream-nogzip 4gpluss-nostream 4gpluss-stream 4gpluss-nogzip cold-default basic-fryer cold-gzip httperf-lru-nostream-gzip streaming cold-nogzip streaming-gzip httperf-lru-nostream-nogzip 4gpluss httperf-hot memleak sky-misc httperf-lru-stream-gzip httperf-lru-nostream-default streaming-grace siege-test httperf-lru-stream-default 2011-08-31 19:15:56 [1,20]: Server tristran checked out varnish-3.0.0-beta2-221-g71ee192 of branch experimental-ims 2011-08-31 19:17:16 [2,80]: httperf-lru-default(httperf): Starting test 2011-08-31 19:20:32 [2,195]: httperf-lru-stream-nogzip(httperf): Starting test 2011-08-31 19:23:47 [2,194]: 4gpluss-nostream(httperf): Starting test 2011-08-31 19:45:16 [2,1289]: 4gpluss-stream(httperf): Starting test 2011-08-31 20:06:37 [2,1280]: 4gpluss-nogzip(httperf): Starting test 2011-08-31 20:50:27 [2,2630]: cold-default(httperf): Starting test 2011-08-31 20:54:40 [2,253]: basic-fryer(httperf): Starting test 2011-08-31 20:55:00 [2,20]: cold-gzip(httperf): Starting test 2011-08-31 20:59:24 [2,263]: httperf-lru-nostream-gzip(httperf): Starting test 2011-08-31 21:02:37 [2,192]: streaming(httperf): Starting test 2011-08-31 21:05:23 [2,166]: cold-nogzip(httperf): Starting test 2011-08-31 21:09:07 [2,223]: streaming-gzip(httperf): Starting test 2011-08-31 21:11:50 [2,162]: httperf-lru-nostream-nogzip(httperf): Starting test 2011-08-31 21:14:50 [2,180]: 4gpluss(httperf): Starting test 2011-08-31 21:35:30 [2,1240]: httperf-hot(httperf): Starting test 2011-08-31 21:37:21 [2,110]: memleak(httperf): Starting test 2011-08-31 21:48:30 [2,669]: sky-misc(httperf): Starting test 2011-08-31 21:54:14 [2,343]: httperf-lru-stream-gzip(httperf): Starting test 2011-08-31 21:57:40 [2,206]: httperf-lru-nostream-default(httperf): Starting test 2011-08-31 22:01:07 [2,206]: httperf-rapid-expire(httperf): Starting test 2011-08-31 22:02:51 WARNING [0,104]: httperf-rapid-expire(httperf): Out of bounds: client_req(123737) less than lower boundary 999640 2011-08-31 22:02:51 WARNING [0, 0]: httperf-rapid-expire(httperf): Out of bounds: cache_hitpass(97376) more than upper boundary 0 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Load: 00:02:52 up 28 days, 12:18, 2 users, load average: 2.05, 9.29, 11.92 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Test name: httperf-rapid-expire 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Varnish options: 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): -t=2 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Varnish parameters: 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Payload size (excludes headers): 256 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Branch: experimental-ims 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Number of clients involved: 24 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Type of test: httperf 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Test iterations: 1 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Runtime: 98 seconds 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): VCL: backend foo { .host = "localhost"; .port = "80"; } sub vcl_fetch { set beresp.do_stream = true; } 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Number of total connections: 100000 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Note: connections are subject to rounding when divided among clients. Expect slight deviations. 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Requests per connection: 10 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Extra options to httperf: --wset=100,0.30 2011-08-31 22:02:52 [1, 0]: httperf-rapid-expire(httperf): Httperf command (last client): httperf --hog --timeout 60 --num-calls 10 --num-conns 4166 --port 8080 --burst-length 10 --client 23/24 --server 10.20.100.4 --wset=100,0.30 2011-08-31 22:02:59 [2, 7]: purge-fail(httperf): Starting test 2011-08-31 22:05:22 WARNING [0,142]: purge-fail(httperf): Out of bounds: cache_hitpass(243398) more than upper boundary 0 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Load: 00:05:23 up 28 days, 12:21, 2 users, load average: 2.00, 6.68, 10.56 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Test name: purge-fail 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Varnish options: 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): -w=5,100 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Varnish parameters: 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Payload size (excludes headers): 1K 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Branch: experimental-ims 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Number of clients involved: 24 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Type of test: httperf 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Test iterations: 1 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Runtime: 136 seconds 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): VCL: backend foo { .host = "localhost"; .port = "80"; } sub vcl_recv { if (!req.url ~ "/0/0.html") { set req.request = "PURGE"; } set req.url = "/foo"; return (lookup); } sub vcl_hit { if (req.request == "PURGE") { set obj.ttl = 0s; error 200 "OK"; } } sub vcl_miss { if (req.request == "PURGE") { error 200 "Not in cache but not confusing httperf"; } } 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Number of total connections: 300000 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Note: connections are subject to rounding when divided among clients. Expect slight deviations. 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Requests per connection: 1 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Extra options to httperf: --wset=999,0.5 --timeout=5 2011-08-31 22:05:23 [1, 0]: purge-fail(httperf): Httperf command (last client): httperf --hog --timeout 60 --num-calls 1 --num-conns 12500 --port 8080 --burst-length 10 --client 23/24 --server 10.20.100.4 --wset=999,0.5 --timeout=5 2011-08-31 22:05:30 [2, 7]: streaming-grace(httperf): Starting test 2011-08-31 22:08:16 [2,166]: siege-test(siege): Starting test 2011-08-31 22:09:31 [2,74]: httperf-lru-stream-default(httperf): Starting test 2011-08-31 22:12:45 WARNING [0,193]: Tests finished with problems detected. Failed expectations: 2 Total run time: 10630 seconds ----- End forwarded message ----- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: Digital signature URL: From jbq at caraldi.com Fri Sep 2 12:20:41 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Fri, 2 Sep 2011 14:20:41 +0200 Subject: Assert error in WSLR() on SLT_HttpGarbage Message-ID: Hello Varnish, With 3.0.1 rc1 I get the following crashes frequently: https://gist.github.com/1188467 Is it related to HTTP/1.0? I can't see a crash with HTTP/1.1. Any idea? -- Jean-Baptiste Quenot From jbq at caraldi.com Fri Sep 2 12:25:26 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Fri, 2 Sep 2011 14:25:26 +0200 Subject: varnishncsa outage Message-ID: Hello list, We are running varnish 3.0.1 rc1 in production since a few days, and notice that varnishncsa sometimes disappears! It runs for hours and suddenly exits without an explanation. The only exit() calls I can find in the code are when the format string is invalid, which would only occur on start. No core dump, no kernel message. I don't know if varnishncsa printed an error because when run as a daemon there is no error log (maybe using syslog for this would be more appropriate BTW). Your help will be appreciated, -- Jean-Baptiste Quenot From thierry.magnien at sfr.com Fri Sep 2 12:39:46 2011 From: thierry.magnien at sfr.com (MAGNIEN, Thierry) Date: Fri, 2 Sep 2011 14:39:46 +0200 Subject: varnishncsa outage In-Reply-To: References: Message-ID: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> Hi, We experience the same thing with varnish 3.0.1. I left a varnishncsa running with gdb attached to it and I'm still waiting for it to stop. I hope I'll see something when it does. Regards, Thierry -----Message d'origine----- De?: varnish-dev-bounces at varnish-cache.org [mailto:varnish-dev-bounces at varnish-cache.org] De la part de Jean-Baptiste Quenot Envoy??: vendredi 2 septembre 2011 14:25 ??: varnish-dev at varnish-cache.org Objet?: varnishncsa outage Hello list, We are running varnish 3.0.1 rc1 in production since a few days, and notice that varnishncsa sometimes disappears! It runs for hours and suddenly exits without an explanation. The only exit() calls I can find in the code are when the format string is invalid, which would only occur on start. No core dump, no kernel message. I don't know if varnishncsa printed an error because when run as a daemon there is no error log (maybe using syslog for this would be more appropriate BTW). Your help will be appreciated, -- Jean-Baptiste Quenot _______________________________________________ varnish-dev mailing list varnish-dev at varnish-cache.org https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev From phk at phk.freebsd.dk Fri Sep 2 14:40:26 2011 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri, 02 Sep 2011 14:40:26 +0000 Subject: Assert error in WSLR() on SLT_HttpGarbage In-Reply-To: Your message of "Fri, 02 Sep 2011 14:20:41 +0200." Message-ID: <3185.1314974426@critter.freebsd.dk> In message , Jean-Baptiste Quenot writes: >With 3.0.1 rc1 I get the following crashes frequently: > >https://gist.github.com/1188467 > >Is it related to HTTP/1.0? I can't see a crash with HTTP/1.1. It is probably related to cookies longer than the http_req_hdr_len param -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From jbq at caraldi.com Fri Sep 2 15:01:22 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Fri, 2 Sep 2011 17:01:22 +0200 Subject: Assert error in WSLR() on SLT_HttpGarbage In-Reply-To: <3185.1314974426@critter.freebsd.dk> References: <3185.1314974426@critter.freebsd.dk> Message-ID: Dumb questions: 1) is it a difficult task to provide a meaningful error message in the crash report? 2) why not issue a 413 entity too large when some headers longer than http_req_hdr_len, instead of crashing? I mean is it normal behavior for varnish to crash in those circumstances, or is it a bug? I also had similar "assert" crashes with varnish 2.1. 2011/9/2 Poul-Henning Kamp : > In message > , Jean-Baptiste Quenot writes: > >>With 3.0.1 rc1 I get the following crashes frequently: >> >>https://gist.github.com/1188467 >> >>Is it related to HTTP/1.0? ?I can't see a crash with HTTP/1.1. > > It is probably related to cookies longer than the http_req_hdr_len param > > -- > Poul-Henning Kamp ? ? ? | UNIX since Zilog Zeus 3.20 > phk at FreeBSD.ORG ? ? ? ? | TCP/IP since RFC 956 > FreeBSD committer ? ? ? | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > -- Jean-Baptiste Quenot From phk at phk.freebsd.dk Fri Sep 2 15:19:15 2011 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri, 02 Sep 2011 15:19:15 +0000 Subject: Assert error in WSLR() on SLT_HttpGarbage In-Reply-To: Your message of "Fri, 02 Sep 2011 17:01:22 +0200." Message-ID: <3333.1314976755@critter.freebsd.dk> In message , Jean-Baptiste Quenot writes: >1) is it a difficult task to provide a meaningful error message in the >crash report? It's a bug which only got fixed just after 3.0.1 unfortunately. >2) why not issue a 413 entity too large when some headers longer than >http_req_hdr_len, instead of crashing? History. At the time the consensus was that if we saw something that was too long, it was probably an attack of some kind and just closing was the best idea. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From jbq at caraldi.com Fri Sep 2 15:40:43 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Fri, 2 Sep 2011 17:40:43 +0200 Subject: varnishncsa outage In-Reply-To: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> Message-ID: It turns to be a good old "segmentation fault" crash, caused by the SLT_ReqEnd tag processing "time to first byte". Here is the gdb backtrace: https://gist.github.com/1188943 -- Jean-Baptiste Quenot From tfheen at varnish-software.com Mon Sep 5 06:58:23 2011 From: tfheen at varnish-software.com (Tollef Fog Heen) Date: Mon, 05 Sep 2011 08:58:23 +0200 Subject: varnishncsa outage In-Reply-To: (Jean-Baptiste Quenot's message of "Fri, 2 Sep 2011 17:40:43 +0200") References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> Message-ID: <87wrdn5vcg.fsf@qurzaw.varnish-software.com> ]] Jean-Baptiste Quenot | It turns to be a good old "segmentation fault" crash, caused by the | SLT_ReqEnd tag processing "time to first byte". No, it's not, it's from: case 'm': VSB_cat(os, lp->df_m); break; now why lp->df_m, which is the request method is null, I don't know. -- Tollef Fog Heen Varnish Software t: +47 21 98 92 64 From kristian at varnish-software.com Mon Sep 5 10:07:19 2011 From: kristian at varnish-software.com (Kristian Lyngstol) Date: Mon, 5 Sep 2011 12:07:19 +0200 Subject: Automated stress testing with Fryer Message-ID: <20110905100719.GD3515@freud.kly.no> Greetings, I've just now put fryer in "production". Which means you can throw your branch in the fryer and see if it comes out well-done or over-cooked. I did some rudimentary documentation of usage at: https://www.varnish-cache.org/trac/wiki/AutomatedStressTesting The gist of it is: - Fryer checks out varnish from a clean git repo of our choice, builds and installs on a dedicated server - Fryer then executes a number of different tests, using 3 other machines to generate traffic. - The result is evaluated by looking for assert errors and certain values in varnishstat. Typical things to test for is that the requested number of requests matches the actual number. The size. Number of expired objects. Hitpass. Etc. - Results are sent to varnish-test at varnish-cache.org, see https://www.varnish-cache.org/lists/pipermail/varnish-test/2011-September/thread.html for a few of the test-runs I ran this weekend. - If you have git commit access, you can add or remove what branches are tested. If you don't, just ask :) - New tests are written on-demand. They mainly use httperf, with siege as an option. - 24 tests currently exist. The source is currently not available, but will be, when I get around to it. (It's cleared with The Boss (if he remembers)). I can also, of course, run fryer manually on request. Each complete run takes several hours (2-ish, see the report mails), but I can run individual tests. For those who haven't been paying attention, we've already either discovered or verified several bugs in the last few weeks using fryer. I'm quite happy with that. All comments are welcomed. - Kristian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: Digital signature URL: From jbq at caraldi.com Tue Sep 6 08:18:13 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Tue, 6 Sep 2011 10:18:13 +0200 Subject: varnishncsa outage In-Reply-To: <87wrdn5vcg.fsf@qurzaw.varnish-software.com> References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> <87wrdn5vcg.fsf@qurzaw.varnish-software.com> Message-ID: Here is more information about this crash: (gdb) up (gdb) print *lp $1 = {df_H = 0x0, df_U = 0x0, df_q = 0x0, df_b = 0x0, df_h = 0x11e56a0 "84.103.222.181", df_m = 0x0, df_s = 0x0, df_t = {tm_sec = 37, tm_min = 30, tm_hour = 17, tm_mday = 2, tm_mon = 8, tm_year = 111, tm_wday = 5, tm_yday = 244, tm_isdst = 1, tm_gmtoff = 7200, tm_zone = 0x11bca30 "CEST"}, df_u = 0x0, df_ttfb = 0x11dbc00 "nan", df_hitmiss = 0x0, df_handling = 0x0, active = 1, complete = 1, bitmap = 0, req_headers = {vtqh_first = 0x0, vtqh_last = 0x0}, resp_headers = {vtqh_first = 0x0, vtqh_last = 0x0}} Please find attached a patch that checks for mandatory fields. Curiously the code was there but disabled. 2011/9/5 Tollef Fog Heen : > ]] Jean-Baptiste Quenot > > | It turns to be a good old "segmentation fault" crash, caused by the > | SLT_ReqEnd tag processing "time to first byte". > > No, it's not, it's from: > > ? ? ? ? ? ? ? ?case 'm': > ? ? ? ? ? ? ? ? ? ? ? ?VSB_cat(os, lp->df_m); > ? ? ? ? ? ? ? ? ? ? ? ?break; > > now why lp->df_m, which is the request method is null, I don't know. > > -- > Tollef Fog Heen > Varnish Software > t: +47 21 98 92 64 > -- Jean-Baptiste Quenot -------------- next part -------------- A non-text attachment was scrubbed... Name: Varnish_3.0.1-Skip-log-line-when-a-mandatory-field-is-missing.patch Type: application/octet-stream Size: 858 bytes Desc: not available URL: From jbq at caraldi.com Tue Sep 6 08:21:22 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Tue, 6 Sep 2011 10:21:22 +0200 Subject: varnishncsa outage In-Reply-To: References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> <87wrdn5vcg.fsf@qurzaw.varnish-software.com> Message-ID: I notice another crash with varnishncsa, related to header processing: https://gist.github.com/1196892 AFAICT this is fixed in last commits "Ignore invalid HTTP headers" from Andreas Plesner Jacobsen in Varnish master branch. Can someone confirm? -- Jean-Baptiste Quenot From apj at mutt.dk Tue Sep 6 08:26:59 2011 From: apj at mutt.dk (Andreas Plesner Jacobsen) Date: Tue, 6 Sep 2011 10:26:59 +0200 Subject: varnishncsa outage In-Reply-To: References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> <87wrdn5vcg.fsf@qurzaw.varnish-software.com> Message-ID: <20110906082659.GN1944@nerd.dk> On Tue, Sep 06, 2011 at 10:21:22AM +0200, Jean-Baptiste Quenot wrote: > I notice another crash with varnishncsa, related to header processing: > > https://gist.github.com/1196892 > > AFAICT this is fixed in last commits "Ignore invalid HTTP headers" > from Andreas Plesner Jacobsen in Varnish master branch. Can someone > confirm? Looks like the one I fixed, yes. -- Andreas From tfheen at varnish-software.com Tue Sep 6 09:41:43 2011 From: tfheen at varnish-software.com (Tollef Fog Heen) Date: Tue, 06 Sep 2011 11:41:43 +0200 Subject: varnishncsa outage In-Reply-To: (Jean-Baptiste Quenot's message of "Tue, 6 Sep 2011 10:18:13 +0200") References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> <87wrdn5vcg.fsf@qurzaw.varnish-software.com> Message-ID: <87sjoavwh4.fsf@qurzaw.varnish-software.com> ]] Jean-Baptiste Quenot | Here is more information about this crash: | | (gdb) up | (gdb) print *lp | $1 = {df_H = 0x0, df_U = 0x0, df_q = 0x0, df_b = 0x0, df_h = 0x11e56a0 | "84.103.222.181", df_m = 0x0, df_s = 0x0, df_t = {tm_sec = 37, tm_min | = 30, tm_hour = 17, tm_mday = 2, tm_mon = 8, tm_year = 111, tm_wday = | 5, tm_yday = 244, tm_isdst = 1, tm_gmtoff = 7200, | tm_zone = 0x11bca30 "CEST"}, df_u = 0x0, df_ttfb = 0x11dbc00 | "nan", df_hitmiss = 0x0, df_handling = 0x0, active = 1, complete = 1, | bitmap = 0, req_headers = {vtqh_first = 0x0, vtqh_last = 0x0}, | resp_headers = {vtqh_first = 0x0, vtqh_last = 0x0}} Can you please capture varnishlog from a request which causes the crash? The backtrace above does unfortunately not help me. | Please find attached a patch that checks for mandatory fields. | Curiously the code was there but disabled. That code isn't correct with user-specifiable fields, so it should probably just be removed. Regards, -- Tollef Fog Heen Varnish Software t: +47 21 98 92 64 From tfheen at varnish-software.com Tue Sep 6 11:25:36 2011 From: tfheen at varnish-software.com (Tollef Fog Heen) Date: Tue, 6 Sep 2011 13:25:36 +0200 Subject: [PATCH] Use predetermined -s file name by default Message-ID: <1315308336-23911-1-git-send-email-tfheen@varnish-software.com> If no -s argument was given, we would generate a random file name in the runtime directory. This file was never removed, something that would fill up the disk over time. Use a predetermined name instead. --- bin/varnishd/storage_file.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/bin/varnishd/storage_file.c b/bin/varnishd/storage_file.c index ea78526..bb0bb43 100644 --- a/bin/varnishd/storage_file.c +++ b/bin/varnishd/storage_file.c @@ -117,7 +117,7 @@ smf_initfile(struct stevedore *st, struct smf_sc *sc, const char *size) } static const char default_size[] = "50%"; -static const char default_filename[] = "."; +static const char default_filename[] = "./varnish.storage"; static void smf_init(struct stevedore *parent, int ac, char * const *av) -- 1.7.5.4 From tfheen at varnish-software.com Tue Sep 6 11:31:03 2011 From: tfheen at varnish-software.com (Tollef Fog Heen) Date: Tue, 6 Sep 2011 13:31:03 +0200 Subject: [PATCH] Switch default storage allocator to malloc, limited to 100MB Message-ID: <1315308663-24062-1-git-send-email-tfheen@varnish-software.com> --- bin/varnishd/varnishd.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/bin/varnishd/varnishd.c b/bin/varnishd/varnishd.c index 5fb054a..8ff592c 100644 --- a/bin/varnishd/varnishd.c +++ b/bin/varnishd/varnishd.c @@ -349,7 +349,7 @@ main(int argc, char * const *argv) const char *n_arg = NULL; const char *P_arg = NULL; const char *S_arg = NULL; - const char *s_arg = "file"; + const char *s_arg = "malloc,100M"; int s_arg_given = 0; const char *T_arg = NULL; char *p, *vcl = NULL; -- 1.7.5.4 From thierry.magnien at sfr.com Wed Sep 7 14:59:53 2011 From: thierry.magnien at sfr.com (MAGNIEN, Thierry) Date: Wed, 7 Sep 2011 16:59:53 +0200 Subject: [PATCH] vmod_digest In-Reply-To: <4E4D1C2C.3000005@schokola.de> References: <4D42F382.5050402@schokola.de> <4E4D1C2C.3000005@schokola.de> Message-ID: <4A029B1A60B8E340A50D654D2F130DAA2FE8FE41BF@EXCV001.encara.local.ads> Hi Nils and all, I'm very interested in this vmod as I have to implement such functions in a near future. Do you plan to release a newer version soon or do I start with what you posted and try to continue ? Regards, Thierry -----Message d'origine----- De?: varnish-dev-bounces at varnish-cache.org [mailto:varnish-dev-bounces at varnish-cache.org] De la part de Nils Goroll Envoy??: jeudi 18 ao?t 2011 16:06 ??: Laurence Rowe Cc?: Varnish Development Objet?: Re: [PATCH] vmod_digest Hi Laurence and all, sorry for the _very_ late response, I've been highly customer-demand driven and (and chased by deadlines) during the last months, so much work I started on Varnish stayed unfinished. The good news is that customers now demand more Varnish work again, so I hope to return to the more active circle again this year. >> phk had spotted a race condition in base64_init which I had overlooked (and even >> commented on sarcastically), so I've added code to pre-generate the base64 >> lookup tables. >> >> All other changes are due to the changes to varnish code since then. > > This looks really interesting. Do you have an up to date version of > this patch? It no longer applies and I was unable to find the base git > revision in the repository (maybe my lack of git-fu.) phk wasn't happy with the patch at the time for two reasons, IIUC: - he didn't like the generator for the base64_tables.h to be done in python (which I didn't quite understand at the time because he uses python scripts to generate such things all over the place) - he had asked me to clean up the digest functions code to use algorithm-specific function pointers rather than the case statements Because, at the time, I had no resources to finish these two things, the module did not make it any official repo at the time. I am using the code in production, but not on current versions of varnish. I will definitely need a Varnish 3-compatible digest module by the end of 2011, so I will come back to this. If anyone else wants to finish this, feel free. Nils _______________________________________________ varnish-dev mailing list varnish-dev at varnish-cache.org https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev From jbq at caraldi.com Wed Sep 7 19:33:35 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Wed, 7 Sep 2011 21:33:35 +0200 Subject: varnishncsa outage In-Reply-To: <87sjoavwh4.fsf@qurzaw.varnish-software.com> References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> <87wrdn5vcg.fsf@qurzaw.varnish-software.com> <87sjoavwh4.fsf@qurzaw.varnish-software.com> Message-ID: 2011/9/6 Tollef Fog Heen : > > Can you please capture varnishlog from a request which causes the crash? > The backtrace above does unfortunately not help me. I'm sorry I can't afford to crash varnishncsa anymore. My boss is getting angry with holes in the monitoring graphs. > | Please find attached a patch that checks for mandatory fields. > | Curiously the code was there but disabled. > > That code isn't correct with user-specifiable fields, so it should > probably just be removed. What do you mean with user-specifiable fields? Since I did that change (and the broken header patch) varnishncsa did not crash at all. All the best, -- Jean-Baptiste Quenot From tfheen at varnish-software.com Thu Sep 8 07:11:07 2011 From: tfheen at varnish-software.com (Tollef Fog Heen) Date: Thu, 08 Sep 2011 09:11:07 +0200 Subject: varnishncsa outage In-Reply-To: (Jean-Baptiste Quenot's message of "Wed, 7 Sep 2011 21:33:35 +0200") References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> <87wrdn5vcg.fsf@qurzaw.varnish-software.com> <87sjoavwh4.fsf@qurzaw.varnish-software.com> Message-ID: <87bouv1pbo.fsf@qurzaw.varnish-software.com> ]] Jean-Baptiste Quenot | 2011/9/6 Tollef Fog Heen : | > | > Can you please capture varnishlog from a request which causes the crash? | > The backtrace above does unfortunately not help me. | | I'm sorry I can't afford to crash varnishncsa anymore. My boss is | getting angry with holes in the monitoring graphs. You can run two varnishncsas side by side, one patched and one unpatched. | > | Please find attached a patch that checks for mandatory fields. | > | Curiously the code was there but disabled. | > | > That code isn't correct with user-specifiable fields, so it should | > probably just be removed. | | What do you mean with user-specifiable fields? Since I did that | change (and the broken header patch) varnishncsa did not crash at all. Using the -F format. -- Tollef Fog Heen Varnish Software t: +47 21 98 92 64 From jbq at caraldi.com Thu Sep 8 07:58:32 2011 From: jbq at caraldi.com (Jean-Baptiste Quenot) Date: Thu, 8 Sep 2011 09:58:32 +0200 Subject: varnishncsa outage In-Reply-To: <87bouv1pbo.fsf@qurzaw.varnish-software.com> References: <4A029B1A60B8E340A50D654D2F130DAA2FE8F1DC6F@EXCV001.encara.local.ads> <87wrdn5vcg.fsf@qurzaw.varnish-software.com> <87sjoavwh4.fsf@qurzaw.varnish-software.com> <87bouv1pbo.fsf@qurzaw.varnish-software.com> Message-ID: 2011/9/8 Tollef Fog Heen : > | > That code isn't correct with user-specifiable fields, so it should > | > probably just be removed. > | > | What do you mean with user-specifiable fields? ?Since I did that > | change (and the broken header patch) varnishncsa did not crash at all. > > Using the -F format. I do use the -F command-line option. Usually when user-specified fields are missing, it is replaced with a hyphen: "-". This is fine, except for very important fields like request method, protocol version, path and status, hence the verification. So I gave a try, ran telnet on varnish http port and typed a slash followed by two newlines. Here is the result: 12 SessionOpen c 127.0.0.1 57867 :80 12 Debug c herding 12 SessionClose c junk 12 ReqStart c 127.0.0.1 57867 1750343256 12 HttpGarbage c / 12 ReqEnd c 1750343256 1315468371.652468920 1315468371.652515173 0.640084982 nan nan 12 StatSess c 127.0.0.1 57867 1 1 1 0 0 0 0 0 As you can see there is no RxRequest tag so df_m is not set. Varnishncsa crashed, as expected. -- Jean-Baptiste Quenot From thomas at prommer.net Thu Sep 15 15:32:27 2011 From: thomas at prommer.net (Thomas Prommer) Date: Thu, 15 Sep 2011 11:32:27 -0400 Subject: Load balancing geographically distributed nodes with varnish - a good idea? Message-ID: Varnish Community, We are managing a cluster farm of 6 nodes that are geographically distributed across Europe (Amsterdam, London, Lisbon, Frankfurt, Zurich, Milan) delivering our internationalized application for the appropriate CCTLD for such nodes. All nodes have the same server image and application deployed. The server distribution is critical to ensure low latency in local markets as well as for SEO reasons. The application is a simple LAMP application (no centralized data) that is using Varnish and Lighttpd Fast CGI for optimal scaling. However, we still run into scaling issues were essentially one node gets hit hard with local traffic while all the other severs are pretty idle. Our question is if there is a common recommendation of load balancing a server cluster where the servers are geographically distributed and also if varnish or the lighttpd fastcgi server would be more appropriate to carry out the load balancing? We know that both systems allow for load balancing but we are concerned that simply load balancing the IPs of geographically servers wouldn't perform too well because an additional round trip to a remote server location would be introduced. In a nutshell, our questions are: Are there any good strategies around load balancing geographical distributed servers? What are the evaluation points for deciding if either Varnish or Lighttpd FastCGI would be more appropriate to own the load balancing responsibility? Thanks /Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky at crucially.net Thu Sep 15 15:54:14 2011 From: sky at crucially.net (Artur Bergman) Date: Thu, 15 Sep 2011 08:54:14 -0700 Subject: Load balancing geographically distributed nodes with varnish - a good idea? In-Reply-To: References: Message-ID: Not really a lightthpd or varnish issue. DNS is really you answer. Also, if you get hit hard in on location, sending it somewhere else and you no longer have latency advantage/ On Sep 15, 2011, at 8:32 AM, Thomas Prommer wrote: > Varnish Community, > > We are managing a cluster farm of 6 nodes that are geographically distributed across Europe (Amsterdam, London, Lisbon, Frankfurt, Zurich, Milan) delivering our internationalized application for the appropriate CCTLD for such nodes. All nodes have the same server image and application deployed. The server distribution is critical to ensure low latency in local markets as well as for SEO reasons. > > The application is a simple LAMP application (no centralized data) that is using Varnish and Lighttpd Fast CGI for optimal scaling. However, we still run into scaling issues were essentially one node gets hit hard with local traffic while all the other severs are pretty idle. > > Our question is if there is a common recommendation of load balancing a server cluster where the servers are geographically distributed and also if varnish or the lighttpd fastcgi server would be more appropriate to carry out the load balancing? > > We know that both systems allow for load balancing but we are concerned that simply load balancing the IPs of geographically servers wouldn't perform too well because an additional round trip to a remote server location would be introduced. > > In a nutshell, our questions are: > > Are there any good strategies around load balancing geographical distributed servers? > What are the evaluation points for deciding if either Varnish or Lighttpd FastCGI would be more appropriate to own the load balancing responsibility? > > Thanks /Thomas > > _______________________________________________ > varnish-dev mailing list > varnish-dev at varnish-cache.org > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From aotto at mosso.com Thu Sep 15 16:10:19 2011 From: aotto at mosso.com (Adrian Otto) Date: Thu, 15 Sep 2011 09:10:19 -0700 Subject: Load balancing geographically distributed nodes with varnish - a good idea? In-Reply-To: References: Message-ID: <6C5F4372-D706-4D51-9DAB-7A7A7C543704@mosso.com> Thomas, I agree with Artur that DNS is a good place to decide where you want to route people on a global basis in a situation where you have a temporary regional overload. That's how most CDN networks solve that issue. You could use an API controlled DNS service like Route 53 from AWS, and when you detect the congestion condition, adjust the DNS records to respond with alternate IP addresses that have adequate capacity available. You will need to use a DNS TTL with a relatively low value, perhaps in the 10 minute range. This should allow you to automatically shed load away from your congested sites. Adrian On Sep 15, 2011, at 8:32 AM, Thomas Prommer wrote: > Varnish Community, > > We are managing a cluster farm of 6 nodes that are geographically distributed across Europe (Amsterdam, London, Lisbon, Frankfurt, Zurich, Milan) delivering our internationalized application for the appropriate CCTLD for such nodes. All nodes have the same server image and application deployed. The server distribution is critical to ensure low latency in local markets as well as for SEO reasons. > > The application is a simple LAMP application (no centralized data) that is using Varnish and Lighttpd Fast CGI for optimal scaling. However, we still run into scaling issues were essentially one node gets hit hard with local traffic while all the other severs are pretty idle. > > Our question is if there is a common recommendation of load balancing a server cluster where the servers are geographically distributed and also if varnish or the lighttpd fastcgi server would be more appropriate to carry out the load balancing? > > We know that both systems allow for load balancing but we are concerned that simply load balancing the IPs of geographically servers wouldn't perform too well because an additional round trip to a remote server location would be introduced. > > In a nutshell, our questions are: > > Are there any good strategies around load balancing geographical distributed servers? > What are the evaluation points for deciding if either Varnish or Lighttpd FastCGI would be more appropriate to own the load balancing responsibility? > > Thanks /Thomas > > _______________________________________________ > varnish-dev mailing list > varnish-dev at varnish-cache.org > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From footplus at gmail.com Thu Sep 15 21:02:13 2011 From: footplus at gmail.com (=?UTF-8?B?QXVyw6lsaWVu?=) Date: Thu, 15 Sep 2011 23:02:13 +0200 Subject: Persistent storage question Message-ID: Hello, I'm currently investigating an issue on some caches we are trying to put in production, and I think we'll make a separate post about the whole setup, but i'm currently personnally interested in the following messages: default[18290]: Child (19438) said Out of space in persistent silo default[18290]: Child (19438) said Committing suicide, restart will make space These can be triggered in storage_persistent_silo.c, but I'm not exactly clear on why varnish commits "suicide", and how this could be a "normal" condition (exit 0 + auto restart). We're using one of the latest trunk versions (d56069e), with various persistent storage sizes (tried 3*30G, 1*90Gb), on a Linux server with 48Gb memory. We're caching relatively big files (avg size: ~25 Mb), and they have a long expiry time (~1year). Also, the document I found, https://www.varnish-cache.org/trac/wiki/ArchitecturePersistentStorage, does not exactly explain if/how the segments are reused (or I did not understand it). What is the reason and intent behind this restart ? Are the cache contents lost in this case ? Could this be caused by a certain workflow or configuration ? Thanks, Best regards, -- Aur?lien Guillaume -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailing.lists.wam at gmail.com Fri Sep 16 08:25:47 2011 From: mailing.lists.wam at gmail.com (mailing.lists.wam mailing.lists.wam) Date: Fri, 16 Sep 2011 10:25:47 +0200 Subject: Persistent storage question In-Reply-To: References: Message-ID: I would like to add some details to this case: We encounter various varnish panic (the forked processus crash, won't restart and nothing listen to http/80 port anymore), with persistent storage (tested with 20/35/40/90G) and kernel address randomize On/Off. Same servers with file,malloc parameters instead of persistent are healthy. Feel free to contact me to get the full coredump. All details below :) Varnish Version : 3 - trunk d56069e Sep 06, 2011 d56069e8ef221310d75455feb9b03483c9caf63b Ubuntu 10.04 64Bits Linux 2.6.32-33-generic #72-Ubuntu SMP Fri Jul 29 21:07:13 UTC 2011 x86_64 GNU/Linux 48G RAM / two Intel(R) Xeon(R) CPU L5640 @ 2.27GHz SSD-SATA 90G 2) Startup config : VARNISH_INSTANCE=default START=yes NFILES="131072" MEMLOCK="82000" VARNISH_VCL_CONF=/etc/varnish/default/default.vcl VARNISH_LISTEN_ADDRESS= VARNISH_LISTEN_PORT=80 VARNISH_ADMIN_LISTEN_ADDRESS=127.0.0.1 VARNISH_ADMIN_LISTEN_PORT=6082 VARNISH_SECRET_FILE=/etc/varnish/default/secret VARNISH_THREAD_POOLS=12 VARNISH_STORAGE_FILE_1=/mnt/ssd/varnish/cachefile1 VARNISH_STORAGE_SIZE=30G VARNISH_STORAGE_1="persistent,${VARNISH_STORAGE_FILE_1},${VARNISH_STORAGE_SIZE}" DAEMON_OPTS=" -n ${VARNISH_INSTANCE} \ -u root \ -a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \ -f ${VARNISH_VCL_CONF} \ -T ${VARNISH_ADMIN_LISTEN_ADDRESS}:${VARNISH_ADMIN_LISTEN_PORT} \ -S ${VARNISH_SECRET_FILE} \ -s ${VARNISH_STORAGE_1} \ -s Transient=malloc,1G\ -p first_byte_timeout=5 \ -p between_bytes_timeout=5 \ -p pipe_timeout=5 \ -p send_timeout=2700 \ -p default_grace=240 \ -p default_ttl=3600 \ -p http_gzip_support=off \ -p http_range_support=on \ -p max_restarts=2 \ -p thread_pool_add_delay=2 \ -p thread_pool_max=4000 \ -p thread_pool_min=80 \ -p thread_pool_timeout=120 \ -p thread_pools=12 \ -p thread_stats_rate=50 #### VCL FILE ##### ### SECDownMod ### https://github.com/footplus/libvmod-secdown import secdown; include "/etc/varnish/backend/director_edge_2xx.vcl"; include "/etc/varnish/acl/purge.vcl"; sub vcl_recv { set req.backend = origin; if (req.request !~ "(GET|HEAD|PURGE)") { error 405 "Not allowed."; } if (req.url ~ "^/files") { set req.url = secdown.check_url(req.url, "MySecretIsNotYourSecret", "/link-expired.html", "/link-error.html"); } # Before anything else we need to fix gzip compression if (req.http.Accept-Encoding) { if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|flv|ts|mp4)$") { # No point in compressing these remove req.http.Accept-Encoding; } else if (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } else if (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { # unknown algorithm remove req.http.Accept-Encoding; } } # Allow a PURGE method to clear cache via regular expression. if (req.request == "PURGE") { # If the client has not an authorized IP or # if he comes from the HTTPS proxy on localhost, deny it. if (!client.ip ~ purge || req.http.X-Forwarded-For) { error 405 "Not allowed."; } ban_url(req.url); error 200 "Expression " + req.url + " added to ban.list."; } } sub vcl_pipe { set bereq.http.connection = "close"; } sub vcl_pass { # return (pass); } sub vcl_hash { hash_data(req.url); return (hash); } sub vcl_hit { # return (deliver); } sub vcl_miss { # return (fetch); } sub vcl_fetch { unset beresp.http.expires; set beresp.http.cache-control = "max-age=86400"; set beresp.ttl = 365d; if (beresp.status >= 400) { set beresp.ttl = 1m; } if ((beresp.status == 301) || (beresp.status == 302) || (beresp.status == 401)) { return (hit_for_pass); } } sub vcl_deliver { # Rename Varnish XIDs headers if (resp.http.X-Varnish) { set resp.http.X-Object-ID = resp.http.X-Varnish; unset resp.http.X-Varnish; } remove resp.http.Via; remove resp.http.X-Powered-By; # return (deliver); } sub vcl_error { # Do not reveal what's inside the box :) remove obj.http.Server; set obj.http.Server = "EdgeCache/1.4"; } sub vcl_init { # return (ok); } sub vcl_fini { # return (ok); } 3) Assert Message (from syslog) Sep 15 18:21:02 e101 default[18290]: Child (19438) said Out of space in persistent silo Sep 15 18:21:02 e101 default[18290]: Child (19438) said Committing suicide, restart will make space Sep 15 18:21:02 e101 default[18290]: Child (19438) ended Sep 15 18:21:02 e101 default[18290]: Child cleanup complete Sep 15 18:21:02 e101 default[18290]: child (20924) Started Sep 15 18:21:02 e101 default[18290]: Child (20924) said Child starts Sep 15 18:21:02 e101 default[18290]: Child (20924) said Dropped 11 segments to make free_reserve Sep 15 18:21:02 e101 default[18290]: Child (20924) said Silo completely loaded Sep 15 18:21:27 e101 default[18290]: Child (20924) died signal=6 (core dumped) Sep 15 18:21:27 e101 default[18290]: Child (20924) Panic message: Assert error in smp_oc_getobj(), storage_persistent_silo.c line 401:#012 Condition((o)->mag ic == 0x32851d42) not true.#012thread = (ban-lurker)#012ident = Linux,2.6.32-33-generic,x86_64,-spersistent,-smalloc,-hcritbit,epoll#012Backtrace:#012 0x437e 49: pan_backtrace+19#012 0x43811e: pan_ic+1ad#012 0x45da38: smp_oc_getobj+282#012 0x415407: oc_getobj+14c#012 0x417848: ban_lurker_work+299#012 0x41793d: ban_lurker+5b#012 0x43ad91: wrk_bgthread+184#012 0x7ffff6a9c9ca: _end+7ffff6408692#012 0x7ffff67f970d: _end+7ffff61653d5#012 Sep 15 18:21:27 e101 default[18290]: Child cleanup complete Sep 15 18:21:27 e101 default[18290]: child (21898) Started Sep 15 18:21:27 e101 default[18290]: Pushing vcls failed: CLI communication error (hdr) Sep 15 18:21:27 e101 default[18290]: Stopping Child Sep 15 18:21:27 e101 default[18290]: Child (21898) died signal=6 (core dumped) Sep 15 18:21:27 e101 default[18290]: Child (21898) Panic message: Assert error in smp_open_segs(), storage_persistent.c line 239:#012 Condition(sg1->p.offset != sg->p.offset) not true.#012thread = (cache-main)#012ident = Linux,2.6.32-33-generic,x86_64,-spersistent,-smalloc,-hcritbit,no_waiter#012Backtrace:#012 0x 437e49: pan_backtrace+19#012 0x43811e: pan_ic+1ad#012 0x45a568: smp_open_segs+415#012 0x45ab93: smp_open+236#012 0x456391: STV_open+40#012 0x435fa4: chil d_main+124#012 0x44d3a7: start_child+36a#012 0x44ddce: mgt_sigchld+3e7#012 0x7ffff7bd1fec: _end+7ffff753dcb4#012 0x7ffff7bd2348: _end+7ffff753e010#012 Sep 15 18:21:27 e101 default[18290]: Child (-1) said Child starts Sep 15 18:21:27 e101 default[18290]: Child cleanup complete 4) GDB Core bt (gdb) bt #0 0x00007ffff6746a75 in raise () from /lib/libc.so.6 #1 0x00007ffff674a5c0 in abort () from /lib/libc.so.6 #2 0x00000000004381dd in pan_ic (func=0x482dd5 "smp_open_segs", file=0x4827c4 "storage_persistent.c", line=239, cond=0x48283f "sg1->p.offset != sg->p.offset", err=0, xxx=0) at cache_panic.c:374 #3 0x000000000045a568 in smp_open_segs (sc=0x7ffff6433000, ctx=0x7ffff6433220) at storage_persistent.c:239 #4 0x000000000045ab93 in smp_open (st=0x7ffff64213c0) at storage_persistent.c:331 #5 0x0000000000456391 in STV_open () at stevedore.c:406 #6 0x0000000000435fa4 in child_main () at cache_main.c:128 #7 0x000000000044d3a7 in start_child (cli=0x0) at mgt_child.c:345 #8 0x000000000044ddce in mgt_sigchld (e=0x7ffff64da1d0, what=-1) at mgt_child.c:524 #9 0x00007ffff7bd1fec in vev_sched_signal (evb=0x7ffff6408380) at vev.c:435 #10 0x00007ffff7bd2348 in vev_schedule_one (evb=0x7ffff6408380) at vev.c:478 #11 0x00007ffff7bd1d2a in vev_schedule (evb=0x7ffff6408380) at vev.c:363 #12 0x000000000044e1c9 in MGT_Run () at mgt_child.c:602 #13 0x0000000000461a64 in main (argc=0, argv=0x7fffffffebd0) at varnishd.c:650 5) Last lines of varnishlog 221 SessionOpen c 85.93.199.29 58335 :80 234 SessionOpen c 77.196.147.182 2273 :80 2011/9/15 Aur?lien > Hello, > > I'm currently investigating an issue on some caches we are trying to put in > production, and I think we'll make a separate post about the whole setup, > but i'm currently personnally interested in the following messages: > > default[18290]: Child (19438) said Out of space in persistent silo > default[18290]: Child (19438) said Committing suicide, restart will make > space > > These can be triggered in storage_persistent_silo.c, but I'm not exactly > clear on why varnish commits "suicide", and how this could be a "normal" > condition (exit 0 + auto restart). > > We're using one of the latest trunk versions (d56069e), with various > persistent storage sizes (tried 3*30G, 1*90Gb), on a Linux server with 48Gb > memory. We're caching relatively big files (avg size: ~25 Mb), and they have > a long expiry time (~1year). > > Also, the document I found, > https://www.varnish-cache.org/trac/wiki/ArchitecturePersistentStorage, > does not exactly explain if/how the segments are reused (or I did not > understand it). > > What is the reason and intent behind this restart ? Are the cache contents > lost in this case ? Could this be caused by a certain workflow or > configuration ? > > Thanks, > Best regards, > -- > Aur?lien Guillaume > > > _______________________________________________ > varnish-dev mailing list > varnish-dev at varnish-cache.org > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at prommer.net Fri Sep 16 08:53:54 2011 From: thomas at prommer.net (Thomas Prommer) Date: Fri, 16 Sep 2011 04:53:54 -0400 Subject: Load balancing geographically distributed nodes with varnish - a good idea? In-Reply-To: References: Message-ID: Thanks Artur, Can you provide some more detail/a reference pointer how you would see this being managed on a DNS level? We are currently not managing our our DNS server or have plans to do so. Are there OOTB DNS solutions that consider the health status of defined nodes and adapt DNS dispatching accordingly? Coming back to my original question, would using Varnish to load-balance geographically distributed nodes be necessarily always a bad idea or an acceptable and effective practice in high load scenarios? Naturally we would be willing to sacrifice the loss of optimized latency and SEO benefits temporarily in benefit of resolving cpu & mem peak conditions on one particular thoughts. Appreciate your input /Thomas On Thu, Sep 15, 2011 at 11:54 AM, Artur Bergman wrote: > Not really a lightthpd or varnish issue. DNS is really you answer. > > Also, if you get hit hard in on location, sending it somewhere else and you > no longer have latency advantage/ > > On Sep 15, 2011, at 8:32 AM, Thomas Prommer wrote: > > Varnish Community, > > We are managing a cluster farm of 6 nodes that are geographically > distributed across Europe (Amsterdam, London, Lisbon, Frankfurt, Zurich, > Milan) delivering our internationalized application for the appropriate > CCTLD for such nodes. All nodes have the same server image and application > deployed. The server distribution is critical to ensure low latency in local > markets as well as for SEO reasons. > > The application is a simple LAMP application (no centralized data) that is > using Varnish and Lighttpd Fast CGI for optimal scaling. However, we still > run into scaling issues were essentially one node gets hit hard with local > traffic while all the other severs are pretty idle. > > Our question is if there is a common recommendation of load balancing a > server cluster where the servers are geographically distributed and also if > varnish or the lighttpd fastcgi server would be more appropriate to carry > out the load balancing? > > We know that both systems allow for load balancing but we are concerned > that simply load balancing the IPs of geographically servers wouldn't > perform too well because an additional round trip to a remote server > location would be introduced. > > In a nutshell, our questions are: > > Are there any good strategies around load balancing geographical > distributed servers? > What are the evaluation points for deciding if either Varnish or Lighttpd > FastCGI would be more appropriate to own the load balancing responsibility? > > Thanks /Thomas > _______________________________________________ > varnish-dev mailing list > varnish-dev at varnish-cache.org > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rameshts at rediff.co.in Fri Sep 16 09:47:41 2011 From: rameshts at rediff.co.in (Ramesh T S) Date: 16 Sep 2011 09:47:41 -0000 Subject: =?utf-8?B?UmU6IExvYWQgYmFsYW5jaW5nIGdlb2dyYXBoaWNhbGx5IGRpc3RyaWJ1dGVkIG5vZGVzIHdpdGggdmFybmlzaCAtIGEgZ29vZCBpZGVhPw==?= Message-ID: <1316163369.S.11670.51607.H.WVRob21hcyBQcm9tbWVyAFJlOiBMb2FkIGJhbGFuY2luZyBnZW9ncmFwaGljYWxseSBkaXN0cmlidXRlZCA_.pro-236-98.old.1316166461.18430@webmail.rediffmail.com> powerdns with geoip should be good starters + ruby module for service health checks. From: Thomas Prommer <thomas at prommer.net> Sent: Fri, 16 Sep 2011 14:26:09 To: Artur Bergman <sky at crucially.net> Cc: varnish-dev at varnish-cache.org Subject: Re: Load balancing geographically distributed nodes with varnish - a good idea? Thanks Artur, Can you provide some more detail/a reference pointer how you would see this being managed on a DNS level? We are currently not managing our our DNS server or have plans to do so. Are there OOTB DNS solutions that consider the health status of defined nodes and adapt DNS dispatching accordingly? Coming back to my original question, would using Varnish to load-balance geographically distributed nodes be necessarily always a bad idea or an acceptable and effective practice in high load scenarios? Naturally we would be willing to sacrifice the loss of optimized latency and SEO benefits temporarily in benefit of resolving cpu & mem peak conditions on one particular thoughts. Appreciate your input /Thomas On Thu, Sep 15, 2011 at 11:54 AM, Artur Bergman <sky at crucially.net> wrote: Not really a lightthpd or varnish issue. DNS is really you answer. Also, if you get hit hard in on location, sending it somewhere else and you no longer have latency advantage/ On Sep 15, 2011, at 8:32 AM, Thomas Prommer wrote: Varnish Community,We are managing a cluster farm of 6 nodes that are geographically distributed across Europe (Amsterdam, London, Lisbon, Frankfurt, Zurich, Milan) delivering our internationalized application for the appropriate CCTLD for such nodes. All nodes have the same server image and application deployed. The server distribution is critical to ensure low latency in local markets as well as for SEO reasons. The application is a simple LAMP application (no centralized data) that is using Varnish and Lighttpd Fast CGI for optimal scaling. However, we still run into scaling issues were essentially one node gets hit hard with local traffic while all the other severs are pretty idle. Our question is if there is a common recommendation of load balancing a server cluster where the servers are geographically distributed and also if varnish or the lighttpd fastcgi server would be more appropriate to carry out the load balancing? We know that both systems allow for load balancing but we are concerned that simply load balancing the IPs of geographically servers wouldn't perform too well because an additional round trip to a remote server location would be introduced. In a nutshell, our questions are:Are there any good strategies around load balancing geographical distributed servers? What are the evaluation points for deciding if either Varnish or Lighttpd FastCGI would be more appropriate to own the load balancing responsibility? Thanks /Thomas _______________________________________________ varnish-dev mailing list varnish-dev at varnish-cache.org https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev _______________________________________________ varnish-dev mailing list varnish-dev at varnish-cache.org https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.souvignet at smartjog.com Tue Sep 27 12:42:43 2011 From: thomas.souvignet at smartjog.com (Thomas SOUVIGNET) Date: Tue, 27 Sep 2011 14:42:43 +0200 Subject: Streaming development Message-ID: <4E81C4C3.6000109@smartjog.com> Hi, I'm working for a company named SmartJog (http://www.smartjog.com). We currently are trying to build a file caching solution for HTTP servers streaming video and audio content. After having benchmarked several caching solutions we found out Varnish was the best performance-wise. Unfortunately, there seem to lack several features we would need and we ran into several issues. Currently, our main problem is that while using a Varnish server in "streaming mode" and when a client connects to this server and asks for a file not cached yet, Varnish starts retrieving the file and sending it to the client while filling its cache. But if meanwhile a second client connects and ask for the same file, Varnish waits for having finished retrieving the whole file to start responding to this second client. Our plan at first is to patch this with the following behavior: if the second client asks for a file that is currently being cached but its caching isn't finished, we retrieve the file from the origin server or from the current cache if enough is filled and send it to the second client except this time we mark it as non-cacheable. Would you be interested by this patch ? If so, are we at least doing it the good way ? Secondly, as we are going to need to do more developments on the streaming part of Varnish, would you be interested in SmartJog contributing to Varnish for this ? Thanks in advance for your answers. Regards, -- Thomas SOUVIGNET, R&E Engineer SmartJog SAS - http://www.smartjog.com - A TDF Group Company Office: 27, blvd Hippolyte Marques 94200 Ivry-sur-Seine - France EU Phone: +33 (0)1 5868 6207 From jon-lists at jfpossibilities.com Wed Sep 28 18:56:51 2011 From: jon-lists at jfpossibilities.com (Jon Foster) Date: Wed, 28 Sep 2011 11:56:51 -0700 Subject: VSL_OpenStats() Message-ID: <4E836DF3.9070807@jfpossibilities.com> I see that the VSL_OpenStats() function was removed from the Varnish API in v3. I got lost trying to follow GIT logs so can someone enlighten me as to what its replaced with? Or let me know where the API documentation is? THX - Jon -- Jon Foster JF Possibilities, Inc. jon at jfpossibilities.com 541-410-2760 Making computers work for you! From varnish at bsdchicks.com Thu Sep 29 06:35:38 2011 From: varnish at bsdchicks.com (Rogier R. Mulhuijzen) Date: Thu, 29 Sep 2011 08:35:38 +0200 (CEST) Subject: VSL_OpenStats() In-Reply-To: <4E836DF3.9070807@jfpossibilities.com> References: <4E836DF3.9070807@jfpossibilities.com> Message-ID: <20110929083450.Q37583@ishtar.drwilco.net> I believe VSC_Open might be what you're looking for? When in doubt, read varnishstat source. Cheers, DocWilco On Wed, 28 Sep 2011, Jon Foster wrote: > I see that the VSL_OpenStats() function was removed from the Varnish API > in v3. I got lost trying to follow GIT logs so can someone enlighten me > as to what its replaced with? Or let me know where the API documentation is? > > THX - Jon > > -- > Jon Foster > JF Possibilities, Inc. > jon at jfpossibilities.com > 541-410-2760 > Making computers work for you! > > > > _______________________________________________ > varnish-dev mailing list > varnish-dev at varnish-cache.org > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev > From phk at phk.freebsd.dk Thu Sep 29 11:11:26 2011 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 29 Sep 2011 11:11:26 +0000 Subject: Streaming development In-Reply-To: Your message of "Tue, 27 Sep 2011 14:42:43 +0200." <4E81C4C3.6000109@smartjog.com> Message-ID: <92788.1317294686@critter.freebsd.dk> In message <4E81C4C3.6000109 at smartjog.com>, Thomas SOUVIGNET writes: >Secondly, as we are going to need to do more developments on the >streaming part of Varnish, would you be interested in SmartJog >contributing to Varnish for this ? Martin at varnish-software.com already has a more complete streaming implementation which will be integrated pretty soon, you should help him test that. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From slink at schokola.de Thu Sep 29 11:44:15 2011 From: slink at schokola.de (Nils Goroll) Date: Thu, 29 Sep 2011 13:44:15 +0200 Subject: [PATCH] solaris sandbox / least privileges overhaul Message-ID: <4E845A0F.7040809@schokola.de> An embedded and charset-unspecified text was scrubbed... Name: 0001-solaris-sandbox-least-privileges-overhaul.patch URL: From thomas.souvignet at smartjog.com Thu Sep 29 12:08:26 2011 From: thomas.souvignet at smartjog.com (Thomas SOUVIGNET) Date: Thu, 29 Sep 2011 14:08:26 +0200 Subject: Streaming development In-Reply-To: <92788.1317294686@critter.freebsd.dk> References: <92788.1317294686@critter.freebsd.dk> Message-ID: <4E845FBA.3080600@smartjog.com> On 09/29/2011 01:11 PM, Poul-Henning Kamp wrote: > Martin at varnish-software.com already has a more complete streaming > implementation which will be integrated pretty soon, you should > help him test that. > Ok, I'll contact him right away, thanks for your answer. -- Thomas SOUVIGNET, R&E Engineer SmartJog SAS - http://www.smartjog.com - A TDF Group Company Office: 27, blvd Hippolyte Marques 94200 Ivry-sur-Seine - France EU Phone: +33 (0)1 5868 6207 From slink at schokola.de Thu Sep 29 17:34:39 2011 From: slink at schokola.de (Nils Goroll) Date: Thu, 29 Sep 2011 19:34:39 +0200 Subject: fix compiler warnings: [PATCH] solaris sandbox / least privileges overhaul In-Reply-To: <4E845A0F.7040809@schokola.de> References: <4E845A0F.7040809@schokola.de> Message-ID: <4E84AC2F.3060202@schokola.de> look like it's been too long without varnish and too long without C for me... -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 0001-solaris-sandbox-least-privileges-overhaul.patch URL: From martin at varnish-software.com Fri Sep 30 11:18:41 2011 From: martin at varnish-software.com (Martin Blix Grydeland) Date: Fri, 30 Sep 2011 07:18:41 -0400 Subject: Streaming development In-Reply-To: <4A029B1A60B8E340A50D654D2F130DAA2FEB7D0264@EXCV001.encara.local.ads> References: <4E81C4C3.6000109@smartjog.com> <92788.1317294686@critter.freebsd.dk> <4A029B1A60B8E340A50D654D2F130DAA2FEB7D0264@EXCV001.encara.local.ads> Message-ID: Hi Thierry, Thank you for your interest in the streaming bits. The streaming branch is open and available on my github Varnish repository. It can be accessed at http://github.com/mbgrydeland/varnish-cache-streaming Look at the branch 'streaming' for a master based one, and streaming-3.0 for one based on the 3.0 tree. All experiences and feedback are most welcome. Regards, Martin Blix Grydeland On Thursday, September 29, 2011, MAGNIEN, Thierry wrote: > Hi Martin, > > I'm Thierry Magnien from French carrier SFR. We met at the Varnish Administration Course in Paris last December. As PHK wrote in an email to the mailing-list, it seems that you have a working implementation of the streaming branch of Varnish. It seems i twill be integrated in version 3.1 but is there a possibility to have access to this branch ? As we use Varnish for our CDN, streaming is one of the things we are eagerly waiting ;-) > > We would be happy to test and relates bugs and/or performance results that may help you. > > Thanks a lot, > Thierry (thierr1 on IRC) > > Thierry MAGNIEN > DGRE/DT/TPS/PFD > SFR > T : 01 70 18 50 61 - M : 06 28 09 90 94 - F : 01 70 18 xx xx > thierry.magnien at sfr.com > 40-42 quai du Point du Jour > 92659 Boulogne Billancourt > www.groupeneufcegetel.fr > www.sfr.fr > > -- Martin Blix Grydeland Varnish Software AS -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.souvignet at smartjog.com Fri Sep 30 12:22:44 2011 From: thomas.souvignet at smartjog.com (Thomas SOUVIGNET) Date: Fri, 30 Sep 2011 14:22:44 +0200 Subject: Streaming development In-Reply-To: References: <4E81C4C3.6000109@smartjog.com> <92788.1317294686@critter.freebsd.dk> <4A029B1A60B8E340A50D654D2F130DAA2FEB7D0264@EXCV001.encara.local.ads> Message-ID: <4E85B494.5060509@smartjog.com> On 09/30/2011 01:18 PM, Martin Blix Grydeland wrote: > Hi Thierry, > > Thank you for your interest in the streaming bits. > > The streaming branch is open and available on my github Varnish > repository. It can be accessed at > http://github.com/mbgrydeland/varnish-cache-streaming > Look at the branch 'streaming' for a master based one, and > streaming-3.0 for one based on the 3.0 tree. Thanks a lot, it looks really promising. I'll get back to you as soon as we ran our tests. -- Thomas SOUVIGNET, R&E Engineer SmartJog SAS - http://www.smartjog.com - A TDF Group Company Office: 27, blvd Hippolyte Marques 94200 Ivry-sur-Seine - France EU Phone: +33 (0)1 5868 6207 From phk at phk.freebsd.dk Fri Sep 30 13:38:38 2011 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Fri, 30 Sep 2011 13:38:38 +0000 Subject: fix compiler warnings: [PATCH] solaris sandbox / least privileges overhaul In-Reply-To: Your message of "Thu, 29 Sep 2011 19:34:39 +0200." <4E84AC2F.3060202@schokola.de> Message-ID: <28005.1317389918@critter.freebsd.dk> Committed. I put the solaris sandbox in its own sourcefile for clarity, yell at me (and send patches) until it works :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.