From info at dubistmeinheld.de  Thu Jun  1 11:51:00 2017
From: info at dubistmeinheld.de (info at dubistmeinheld.de)
Date: Thu, 1 Jun 2017 13:51:00 +0200
Subject: Can't create more than 494 threads
Message-ID: <c679ad49-386e-220a-14c3-2812059b907f@dubistmeinheld.de>

Hi,

I'm using Varnish since a couple of years and I'm very satisfied. Thank
you for the work!

Recently I setup a new server with Varnish 5.0. For some reasons,
threads_created is stuck at 494 and threads_failed increase every
second, even with no load (-s malloc,2G -p thread_pools=2 -p
thread_pool_min=250 -p thread_pool_max=2000 -p thread_pool_fail_delay=2).

MAIN.threads                    494          .   Total number of threads
MAIN.threads_limited              0         0.00 Threads hit max
MAIN.threads_created            494         0.01 Threads created
MAIN.threads_destroyed            0         0.00 Threads destroyed
MAIN.threads_failed             356         0.01 Thread creation failed

I tried now for a couple of days to fix this issue and I'm looking in
the area of a kernel param. But I'm completely stuck on what's causing it.

I would appreciate any help and pointing me into the right direction.

Best regards,
Jens


From japrice at gmail.com  Thu Jun  1 16:52:42 2017
From: japrice at gmail.com (Jason Price)
Date: Thu, 1 Jun 2017 12:52:42 -0400
Subject: Can't create more than 494 threads
In-Reply-To: <c679ad49-386e-220a-14c3-2812059b907f@dubistmeinheld.de>
References: <c679ad49-386e-220a-14c3-2812059b907f@dubistmeinheld.de>
Message-ID: <CAChvjRAdr7kvmXWnEaa2H+UudqGawxnzc-jpqbqTcRDcw7Se+g@mail.gmail.com>

dimesg might help.  log files should indicate if you're in an 'open files
limit' issue...

I don't think varnishlog will will show the kind of errors you're looking
for here.

On Thu, Jun 1, 2017 at 7:51 AM, info at dubistmeinheld.de <
info at dubistmeinheld.de> wrote:

> Hi,
>
> I'm using Varnish since a couple of years and I'm very satisfied. Thank
> you for the work!
>
> Recently I setup a new server with Varnish 5.0. For some reasons,
> threads_created is stuck at 494 and threads_failed increase every
> second, even with no load (-s malloc,2G -p thread_pools=2 -p
> thread_pool_min=250 -p thread_pool_max=2000 -p thread_pool_fail_delay=2).
>
> MAIN.threads                    494          .   Total number of threads
> MAIN.threads_limited              0         0.00 Threads hit max
> MAIN.threads_created            494         0.01 Threads created
> MAIN.threads_destroyed            0         0.00 Threads destroyed
> MAIN.threads_failed             356         0.01 Thread creation failed
>
> I tried now for a couple of days to fix this issue and I'm looking in
> the area of a kernel param. But I'm completely stuck on what's causing it.
>
> I would appreciate any help and pointing me into the right direction.
>
> Best regards,
> Jens
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170601/950f295e/attachment.html>

From np.lists at sharphosting.uk  Thu Jun  1 21:22:02 2017
From: np.lists at sharphosting.uk (Nigel Peck)
Date: Thu, 1 Jun 2017 16:22:02 -0500
Subject: Unexplained Cache MISSes
In-Reply-To: <CABoVN9Cd4TRS_C_6cLaLyuwnyb5J_MEHN_qR83p0SnCx0Mak+w@mail.gmail.com>
References: <211c667a-ce70-6373-c840-4482c159e38c@sharphosting.uk>
 <b073048c-0f7e-7304-839c-c1ee37acf977@sharphosting.uk>
 <CAJ6ZYQzp-3vA1GzKWwt0mP94947xmUPYvsygGU32HezQ9a4n9w@mail.gmail.com>
 <db381417-a8c1-fa5c-d24d-67d668f720c3@sharphosting.uk>
 <CABoVN9BrynCxCtP8QAv2Jdxiq7Z+=BWRXQ5o0F1C015RrJi5+A@mail.gmail.com>
 <35dfe986-72dc-95f5-0319-9d0743aebbe4@sharphosting.uk>
 <CABoVN9Dkh3QHhA5J6EMOsT9CQmLgPz-4NWAO3_ied-oqok4iOg@mail.gmail.com>
 <3fdcafb5-4000-3d64-478b-fb60baa9a783@sharphosting.uk>
 <CABoVN9Cd4TRS_C_6cLaLyuwnyb5J_MEHN_qR83p0SnCx0Mak+w@mail.gmail.com>
Message-ID: <53cec1b0-0b57-7110-4b78-9ac280eaa782@sharphosting.uk>

On 31/05/2017 18:33, Dridi Boukelmoune wrote:
> There's no ordering guarantee in the varnishlog output, although they
> should likely be ordered since they share the same hash. You'd need to
> check the Timestamp records to get a grasp of chronology.

Thanks, I'll keep that in mind. I looked at a typical set of entries 
that I saved. This is not a busy site. All the timestamps are in order.

I've included the full log below, based on a search for the ReqURL. 
There is the PURGE and then the restart that gets a HIT. The restart 
shows itself as being the 7th hit on that object - X-Cache: HIT (7) - I 
can't check the VXID because the PURGE entry doesn't include it[1].

And then the next request, 40 minutes later, gets a MISS. All caching on 
this server is for one week. The next request an hour after that gets 
HIT(1). So all working properly apart from the restart getting a HIT, 
resulting in the next request getting a MISS instead[2].

It seems clear to me that there is some bug causing a delay on the PURGE 
going through in some cases (around 10% of purges in my case), so the 
restart comes back round before the PURGE has completed. The purge 
completes after the restart.

[1] - I'm not sure how I can get the VXID for a purge, since it seems 
vcl_purge does not have access to the obj it is going to purge. 
Hopefully the obj.hits being more than 1 or 2 in the restarted hit is 
evidence enough, and the lack of intervening entries on this non-busy site.

[2] - Very noticeable in my case, because I am using Varnish to ensure 
every request is a cache HIT, even for pages that only get viewed once 
or twice a week, to improve performance. So I'm monitoring MISSes and 
seeing this happening.

> If it's a bug, it might be one of those hard to reproduce...
> 
> Amazingly enough I never looked at the logs of a purge, maybe ExpKill
> could give us a VXID to then check against the hit. If only SomeoneElse(tm)
> could spare me the time and look at it themselves and tell us (wink wink=).

I'm very happy to help in any way I can. Please let me know anything I 
can do or information I can provide. I'm no C programmer (web 
developer/server admin), so can't help out with 
coding/patching/debugging[3], but anything else I can do, please let me 
know what you need.

Would a cleanly installed server and absolute minimum VCL to reproduce 
this be useful? You would be welcome to have access to that server, if 
useful, once I've got it set up and producing the same problem.

Nigel

[3] - Assuming it's a bug, which for my part I'm convinced it is at this 
point.

*   << Request  >> 266604
-   Begin          req 266603 rxreq
-   Timestamp      Start: 1495662133.465511 0.000000 0.000000
-   Timestamp      Req: 1495662133.465511 0.000000 0.000000
-   ReqStart       xxx.xxx.xxx.xx2 57250
-   ReqMethod      PURGE
-   ReqURL         /example/url
-   ReqProtocol    HTTP/1.1
-   ReqHeader      TE: deflate,gzip;q=0.3
-   ReqHeader      Connection: TE, close
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      Host: www.example.com
-   ReqHeader      User-Agent: SuperDuperApps-Cache-Purger/0.1
-   ReqHeader      X-Forwarded-For: xxx.xxx.xxx.xx2
-   VCL_call       RECV
-   ReqHeader      X-Processed-By: Melian
-   VCL_acl        MATCH purgers "xxx.xxx.xxx.xx2"
-   VCL_return     purge
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       PURGE
-   ReqMethod      GET
-   VCL_return     restart
-   Timestamp      Restart: 1495662133.465563 0.000052 0.000052
-   Link           req 266605 restart
-   End

*   << Request  >> 266605
-   Begin          req 266604 restart
-   Timestamp      Start: 1495662133.465563 0.000052 0.000000
-   ReqStart       xxx.xxx.xxx.xx2 57250
-   ReqMethod      GET
-   ReqURL         /example/url
-   ReqProtocol    HTTP/1.1
-   ReqHeader      TE: deflate,gzip;q=0.3
-   ReqHeader      Connection: TE, close
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      Host: www.example.com
-   ReqHeader      User-Agent: SuperDuperApps-Cache-Purger/0.1
-   ReqHeader      X-Forwarded-For: xxx.xxx.xxx.xx2
-   ReqHeader      X-Processed-By: Melian
-   VCL_call       RECV
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            132102
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Date: Wed, 24 May 2017 02:37:14 GMT
-   RespHeader     Server: Apache/2
-   RespHeader     P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
-   RespHeader     Last-Modified: Wed, 24 May 2017 02:37:15 GMT
-   RespHeader     Content-Type: text/html; charset=utf-8
-   RespHeader     X-Host: www.example.com
-   RespHeader     X-URL: /example/url
-   RespHeader     Cache-Control: max-age=3600
-   RespHeader     Content-Encoding: gzip
-   RespHeader     Vary: Accept-Encoding
-   RespHeader     X-Varnish: 266605 132102
-   RespHeader     Age: 68698
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespUnset      Age: 68698
-   RespHeader     Age: 0
-   RespHeader     X-Cache: HIT (7)
-   RespUnset      X-Host: www.example.com
-   RespUnset      X-URL: /example/url
-   RespUnset      X-Varnish: 266605 132102
-   RespUnset      Via: 1.1 varnish-v4
-   RespHeader     Via: Varnish
-   VCL_return     deliver
-   Timestamp      Process: 1495662133.465618 0.000107 0.000055
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Content-Length: 7493
-   Debug          "RES_MODE 2"
-   RespHeader     Connection: close
-   Timestamp      Resp: 1495662133.465660 0.000149 0.000042
-   ReqAcct        225 0 225 396 7493 7889
-   End

*   << Request  >> 3017
-   Begin          req 3016 rxreq
-   Timestamp      Start: 1495664394.000921 0.000000 0.000000
-   Timestamp      Req: 1495664394.000921 0.000000 0.000000
-   ReqStart       xxx.xxx.xxx.xx3 45771
-   ReqMethod      GET
-   ReqURL         /example/url
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: www.example.com
-   ReqHeader      Connection: Keep-alive
-   ReqHeader      Accept: 
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
-   ReqHeader      From: googlebot(at)googlebot.com
-   ReqHeader      User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; 
+http://www.google.com/bot.html)
-   ReqHeader      Accept-Encoding: gzip,deflate,br
-   ReqHeader      If-Modified-Since: Wed, 24 May 2017 15:16:04 GMT
-   ReqHeader      X-Forwarded-For: xxx.xxx.xxx.xx3
-   VCL_call       RECV
-   ReqHeader      X-Processed-By: Melian
-   VCL_return     hash
-   ReqUnset       Accept-Encoding: gzip,deflate,br
-   ReqHeader      Accept-Encoding: gzip
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 3018 fetch
-   Timestamp      Fetch: 1495664394.381188 0.380267 0.380267
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Date: Wed, 24 May 2017 22:19:54 GMT
-   RespHeader     Server: Apache/2
-   RespHeader     P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
-   RespHeader     Last-Modified: Wed, 24 May 2017 22:19:54 GMT
-   RespHeader     Content-Type: text/html; charset=utf-8
-   RespHeader     X-Host: www.example.com
-   RespHeader     X-URL: /example/url
-   RespHeader     Cache-Control: max-age=3600
-   RespHeader     Content-Encoding: gzip
-   RespHeader     Vary: Accept-Encoding
-   RespHeader     X-Varnish: 3017
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   RespUnset      X-Host: www.example.com
-   RespUnset      X-URL: /example/url
-   RespUnset      X-Varnish: 3017
-   RespUnset      Via: 1.1 varnish-v4
-   RespHeader     Via: Varnish
-   VCL_return     deliver
-   Timestamp      Process: 1495664394.381214 0.380294 0.000026
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Transfer-Encoding: chunked
-   Debug          "RES_MODE 8"
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1495664394.396562 0.395641 0.015347
-   ReqAcct        409 0 409 404 7493 7897
-   End

*   << Request  >> 35821
-   Begin          req 35820 rxreq
-   Timestamp      Start: 1495668065.207785 0.000000 0.000000
-   Timestamp      Req: 1495668065.207785 0.000000 0.000000
-   ReqStart       xxx.xxx.xxx.xx1 33904
-   ReqMethod      GET
-   ReqURL         /example/url
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: www.example.com
-   ReqHeader      Connection: Keep-alive
-   ReqHeader      Accept: 
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
-   ReqHeader      From: googlebot(at)googlebot.com
-   ReqHeader      User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; 
+http://www.google.com/bot.html)
-   ReqHeader      Accept-Encoding: gzip,deflate,br
-   ReqHeader      If-Modified-Since: Wed, 24 May 2017 22:19:54 GMT
-   ReqHeader      X-Forwarded-For: xxx.xxx.xxx.xx1
-   VCL_call       RECV
-   ReqHeader      X-Processed-By: Melian
-   VCL_return     hash
-   ReqUnset       Accept-Encoding: gzip,deflate,br
-   ReqHeader      Accept-Encoding: gzip
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            3018
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Date: Wed, 24 May 2017 22:19:54 GMT
-   RespHeader     Server: Apache/2
-   RespHeader     P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
-   RespHeader     Last-Modified: Wed, 24 May 2017 22:19:54 GMT
-   RespHeader     Content-Type: text/html; charset=utf-8
-   RespHeader     X-Host: www.example.com
-   RespHeader     X-URL: /example/url
-   RespHeader     Cache-Control: max-age=3600
-   RespHeader     Content-Encoding: gzip
-   RespHeader     Vary: Accept-Encoding
-   RespHeader     X-Varnish: 35821 3018
-   RespHeader     Age: 3670
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespUnset      Age: 3670
-   RespHeader     Age: 0
-   RespHeader     X-Cache: HIT (1)
-   RespUnset      X-Host: www.example.com
-   RespUnset      X-URL: /example/url
-   RespUnset      X-Varnish: 35821 3018
-   RespUnset      Via: 1.1 varnish-v4
-   RespHeader     Via: Varnish
-   VCL_return     deliver
-   Timestamp      Process: 1495668065.207927 0.000142 0.000142
-   RespProtocol   HTTP/1.1
-   RespStatus     304
-   RespReason     Not Modified
-   RespReason     Not Modified
-   Debug          "RES_MODE 0"
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1495668065.208002 0.000217 0.000075
-   ReqAcct        409 0 409 367 0 367
-   End


From np.lists at sharphosting.uk  Thu Jun  1 21:29:57 2017
From: np.lists at sharphosting.uk (Nigel Peck)
Date: Thu, 1 Jun 2017 16:29:57 -0500
Subject: Unexplained Cache MISSes
In-Reply-To: <CABoVN9BHX8cFEUuGh3tcDMEvAdhfVfg6HapSU_RTeUw-VKXfzw@mail.gmail.com>
References: <211c667a-ce70-6373-c840-4482c159e38c@sharphosting.uk>
 <b073048c-0f7e-7304-839c-c1ee37acf977@sharphosting.uk>
 <CAJ6ZYQzp-3vA1GzKWwt0mP94947xmUPYvsygGU32HezQ9a4n9w@mail.gmail.com>
 <db381417-a8c1-fa5c-d24d-67d668f720c3@sharphosting.uk>
 <CABoVN9BrynCxCtP8QAv2Jdxiq7Z+=BWRXQ5o0F1C015RrJi5+A@mail.gmail.com>
 <35dfe986-72dc-95f5-0319-9d0743aebbe4@sharphosting.uk>
 <CABoVN9Dkh3QHhA5J6EMOsT9CQmLgPz-4NWAO3_ied-oqok4iOg@mail.gmail.com>
 <CAJ6ZYQyLp=PLM04NFWPm+UQzi2ursBA_dLs=5a7HY8sj6Le=xA@mail.gmail.com>
 <CABoVN9BHX8cFEUuGh3tcDMEvAdhfVfg6HapSU_RTeUw-VKXfzw@mail.gmail.com>
Message-ID: <3a9c36a9-d503-0bde-802b-b5dcf15b3e3f@sharphosting.uk>

On 31/05/2017 18:21, Dridi Boukelmoune wrote:
> On Wed, May 31, 2017 at 6:25 PM, Guillaume Quintard
>> I got that idea too, but the HIT after the purge return an object with a
>> large age.
> 
> The age is something that could come from the backend. Does the VXID
> match the one that was just purged when a restart gets a hit?

As mentioned in the other email, a purge log entry does not include the 
VXID as far as I can see, and "obj" is not available in vcl_purge either.

I can say that in my case there is definitely no Age header coming from 
the back-end. Also as shown in the example I sent it is the 7th HIT on 
that object.

Nigel


From info at dubistmeinheld.de  Fri Jun  2 08:49:06 2017
From: info at dubistmeinheld.de (info at dubistmeinheld.de)
Date: Fri, 2 Jun 2017 10:49:06 +0200
Subject: Can't create more than 494 threads
In-Reply-To: <CAChvjRAdr7kvmXWnEaa2H+UudqGawxnzc-jpqbqTcRDcw7Se+g@mail.gmail.com>
References: <c679ad49-386e-220a-14c3-2812059b907f@dubistmeinheld.de>
 <CAChvjRAdr7kvmXWnEaa2H+UudqGawxnzc-jpqbqTcRDcw7Se+g@mail.gmail.com>
Message-ID: <ee579334-3982-f5b7-bc9a-ade80cdb336b@dubistmeinheld.de>

On 01.06.2017 18:52, Jason Price wrote:
> dimesg might help.  log files should indicate if you're in an 'open
> files limit' issue...

You pointed me in the right direction regarding log files. Thanks!

>From syslog (only shown when starting varnish this error comes up, then
it's omitted):
kernel: [84513.627267] cgroup:
fork rejected by pids controller in /system.slice/varnish.service

Which led me to the solution to change systemd settings:
https://www.novell.com/support/kb/doc.php?id=7018594

This may also affect not only openSuSE distros in the future. Happy again!

> On Thu, Jun 1, 2017 at 7:51 AM, info at dubistmeinheld.de
> <mailto:info at dubistmeinheld.de> <info at dubistmeinheld.de
> <mailto:info at dubistmeinheld.de>> wrote:
> 
>     Hi,
> 
>     I'm using Varnish since a couple of years and I'm very satisfied. Thank
>     you for the work!
> 
>     Recently I setup a new server with Varnish 5.0. For some reasons,
>     threads_created is stuck at 494 and threads_failed increase every
>     second, even with no load (-s malloc,2G -p thread_pools=2 -p
>     thread_pool_min=250 -p thread_pool_max=2000 -p
>     thread_pool_fail_delay=2).
> 
>     MAIN.threads                    494          .   Total number of threads
>     MAIN.threads_limited              0         0.00 Threads hit max
>     MAIN.threads_created            494         0.01 Threads created
>     MAIN.threads_destroyed            0         0.00 Threads destroyed
>     MAIN.threads_failed             356         0.01 Thread creation failed
> 
>     I tried now for a couple of days to fix this issue and I'm looking in
>     the area of a kernel param. But I'm completely stuck on what's
>     causing it.
> 
>     I would appreciate any help and pointing me into the right direction.
> 
>     Best regards,
>     Jens
> 
>     _______________________________________________
>     varnish-misc mailing list
>     varnish-misc at varnish-cache.org <mailto:varnish-misc at varnish-cache.org>
>     https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>     <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
> 
> 


From joh.hendriks at gmail.com  Fri Jun  2 10:06:18 2017
From: joh.hendriks at gmail.com (Johan Hendriks)
Date: Fri, 2 Jun 2017 12:06:18 +0200
Subject: Varnish performance with phpinfo
Message-ID: <6fa4576d-d25e-b770-44da-98877379a815@gmail.com>

Hello all, First sorry for the long email.
I have a strange issue with varnish. At least I think it is strange.

We start some tests with varnish, but we have an issue.

I am running varnish 4.1.6 on FreeBSD 11.1-prerelease. Where varnish
listen on port 82 and apache on 80, This is just for the tests.
We use the following start options.

# Varnish
varnishd_enable="YES"
varnishd_listen="192.168.2.247:82"
varnishd_pidfile="/var/run/varnishd.pid"
varnishd_storage="default=malloc,2024M"
varnishd_config="/usr/local/etc/varnish/default.vcl"
varnishd_hash="critbit"
varnishd_admin=":6082"
varnishncsa_enable="YES"

We did a test with a static page and that went fine. First we see it is
not cached, second attempt is cached.

root at desk:~ # curl -I www.testdomain.nl:82/info.html
HTTP/1.1 200 OK
Date: Fri, 02 Jun 2017 09:19:52 GMT
Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
ETag: "cf4-550e57bc1f812"
Content-Length: 3316
Content-Type: text/html
cache-control: max-age = 259200
X-Varnish: 2
Age: 0
Via: 1.1 varnish-v4
Server: varnish
X-Powered-By: My Varnish
X-Cache: MISS
Accept-Ranges: bytes
Connection: keep-alive

root at desk:~ # curl -I www.testdomain.nl:82/info.html
HTTP/1.1 200 OK
Date: Fri, 02 Jun 2017 09:19:52 GMT
Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
ETag: "cf4-550e57bc1f812"
Content-Length: 3316
Content-Type: text/html
cache-control: max-age = 259200
X-Varnish: 5 3
Age: 6
Via: 1.1 varnish-v4
Server: varnish
X-Powered-By: My Varnish
X-Cache: HIT
Accept-Ranges: bytes
Connection: keep-alive

if I benchmark the server I get the following.
First is derectly to Apache

root at testserver:~ # bombardier -c400 -n10000
http://www.testdomain.nl/info.html
Bombarding http://www.testdomain.nl/info.html with 10000 requests using
400 connections
 10000 / 10000
[=============================================================] 100.00% 0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec     12459.00     898.32      13301
  Latency       31.04ms    25.28ms   280.90ms
  HTTP codes:
    1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    42.16MB/s

This is via varnish. So that works as intended.
Varnish does its job and servers the page better.

root at testserver:~ # bombardier -c400 -n10000
http://www.testdomain.nl:82/info.html
Bombarding http://www.testdomain.nl:82/info.html with 10000 requests
using 400 connections
 10000 / 10000
[=============================================================] 100.00% 0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec     19549.00    7649.32      24313
  Latency       17.90ms    66.77ms   485.07ms
  HTTP codes:
    1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    71.58MB/s


The next one is against a info.php file, which runs phpinfo();

So first agains the server without varnish.

root at testserver:~ # bombardier -c400 -n10000
http://www.testdomain.nl/info.php
Bombarding http://www.testdomain.nl/info.php with 10000 requests using
400 connections
 10000 / 10000
[============================================================] 100.00% 11s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       828.00     127.66       1010
  Latency      472.10ms    59.10ms   740.43ms
  HTTP codes:
    1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    75.51MB/s

But then against the server with varnish.
So we make sure it is in cache

root at desk:~ # curl -I www.testdomain.nl:82/info.php
HTTP/1.1 200 OK
Date: Fri, 02 Jun 2017 09:36:16 GMT
Content-Type: text/html; charset=UTF-8
cache-control: max-age = 259200
X-Varnish: 7
Age: 0
Via: 1.1 varnish-v4
Server: varnish
X-Powered-By: My Varnish
X-Cache: MISS
Accept-Ranges: bytes
Connection: keep-alive

root at desk:~ # curl -I www.testdomain.nl:82/info.php
HTTP/1.1 200 OK
Date: Fri, 02 Jun 2017 09:36:16 GMT
Content-Type: text/html; charset=UTF-8
cache-control: max-age = 259200
X-Varnish: 10 8
Age: 2
Via: 1.1 varnish-v4
Server: varnish
X-Powered-By: My Varnish
X-Cache: HIT
Accept-Ranges: bytes
Connection: keep-alive

So it is in cache now.
root at testserver:~ # bombardier -c400 -n10000
http://www.testdomain.nl:82/info.php
Bombarding http://www.testdomain.nl:82/info.php with 10000 requests
using 400 connections
 10000 / 10000
[===========================================================================================================================================================================================================]
100.00% 8s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      1179.00     230.77       1981
  Latency      219.94ms   340.29ms      2.00s
  HTTP codes:
    1xx - 0, 2xx - 9938, 3xx - 0, 4xx - 0, 5xx - 0
    others - 62
  Errors:
    dialing to the given TCP address timed out - 62
  Throughput:    83.16MB/s

I expected this to be much more in favour of varnish, but it even
generated some errors! Time taken is lower but I expected it to be much
faster. Also the 62 errors is not good i guess.

I do see the following with varnish log
*   << Request  >> 11141123
-   Begin          req 1310723 rxreq
-   Timestamp      Start: 1496396250.098654 0.000000 0.000000
-   Timestamp      Req: 1496396250.098654 0.000000 0.000000
-   ReqStart       192.168.2.39 14818
-   ReqMethod      GET
-   ReqURL         /info.php
-   ReqProtocol    HTTP/1.1
-   ReqHeader      User-Agent: fasthttp
-   ReqHeader      Host: www.testdomain.nl:82
-   ReqHeader      X-Forwarded-For: 192.168.2.39
-   VCL_call       RECV
-   ReqUnset       X-Forwarded-For: 192.168.2.39
-   ReqHeader      X-Forwarded-For: 192.168.2.39, 192.168.2.39
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            8
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Date: Fri, 02 Jun 2017 09:36:16 GMT
-   RespHeader     Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
-   RespHeader     X-Powered-By: PHP/7.0.19
-   RespHeader     Content-Type: text/html; charset=UTF-8
-   RespHeader     cache-control: max-age = 259200
-   RespHeader     X-Varnish: 11141123 8
-   RespHeader     Age: 73
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespUnset      Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
-   RespHeader     Server: varnish
-   RespUnset      X-Powered-By: PHP/7.0.19
-   RespHeader     X-Powered-By: My Varnish
-   RespHeader     X-Cache: HIT
-   VCL_return     deliver
-   Timestamp      Process: 1496396250.098712 0.000058 0.000058
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Content-Length: 95200
-   Debug          "RES_MODE 2"
-   RespHeader     Connection: keep-alive
*-   Debug          "Hit idle send timeout, wrote = 89972/95508; retrying"**
**-   Debug          "Write error, retval = -1, len = 5536, errno =
Resource temporarily unavailable"*
-   Timestamp      Resp: 1496396371.131526 121.032872 121.032814
-   ReqAcct        82 0 82 308 95200 95508
-   End

Sometimes I see this Debug line also -   *Debug          "Write error,
retval = -1, len = 95563, errno = Broken pipe"*


I also installed varnish 5.1.2 but the results are the same.
Is there something I miss?

My vcl file is pretty basic.

https://pastebin.com/rbb42x7h

Thanks all for your time.

regards
Johan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170602/c68716a5/attachment.html>

From lagged at gmail.com  Fri Jun  2 11:18:45 2017
From: lagged at gmail.com (Andrei)
Date: Fri, 2 Jun 2017 06:18:45 -0500
Subject: Can't create more than 494 threads
In-Reply-To: <ee579334-3982-f5b7-bc9a-ade80cdb336b@dubistmeinheld.de>
References: <c679ad49-386e-220a-14c3-2812059b907f@dubistmeinheld.de>
 <CAChvjRAdr7kvmXWnEaa2H+UudqGawxnzc-jpqbqTcRDcw7Se+g@mail.gmail.com>
 <ee579334-3982-f5b7-bc9a-ade80cdb336b@dubistmeinheld.de>
Message-ID: <CAP+vvEsHOkwTQrTzbwAEPqWyEhCwOaSbt9G0TC_trdaPkiU-wA@mail.gmail.com>

Good catch. Thanks for the details!

On Fri, Jun 2, 2017 at 3:49 AM, info at dubistmeinheld.de <
info at dubistmeinheld.de> wrote:

> On 01.06.2017 18:52, Jason Price wrote:
> > dimesg might help.  log files should indicate if you're in an 'open
> > files limit' issue...
>
> You pointed me in the right direction regarding log files. Thanks!
>
> From syslog (only shown when starting varnish this error comes up, then
> it's omitted):
> kernel: [84513.627267] cgroup:
> fork rejected by pids controller in /system.slice/varnish.service
>
> Which led me to the solution to change systemd settings:
> https://www.novell.com/support/kb/doc.php?id=7018594
>
> This may also affect not only openSuSE distros in the future. Happy again!
>
> > On Thu, Jun 1, 2017 at 7:51 AM, info at dubistmeinheld.de
> > <mailto:info at dubistmeinheld.de> <info at dubistmeinheld.de
> > <mailto:info at dubistmeinheld.de>> wrote:
> >
> >     Hi,
> >
> >     I'm using Varnish since a couple of years and I'm very satisfied.
> Thank
> >     you for the work!
> >
> >     Recently I setup a new server with Varnish 5.0. For some reasons,
> >     threads_created is stuck at 494 and threads_failed increase every
> >     second, even with no load (-s malloc,2G -p thread_pools=2 -p
> >     thread_pool_min=250 -p thread_pool_max=2000 -p
> >     thread_pool_fail_delay=2).
> >
> >     MAIN.threads                    494          .   Total number of
> threads
> >     MAIN.threads_limited              0         0.00 Threads hit max
> >     MAIN.threads_created            494         0.01 Threads created
> >     MAIN.threads_destroyed            0         0.00 Threads destroyed
> >     MAIN.threads_failed             356         0.01 Thread creation
> failed
> >
> >     I tried now for a couple of days to fix this issue and I'm looking in
> >     the area of a kernel param. But I'm completely stuck on what's
> >     causing it.
> >
> >     I would appreciate any help and pointing me into the right direction.
> >
> >     Best regards,
> >     Jens
> >
> >     _______________________________________________
> >     varnish-misc mailing list
> >     varnish-misc at varnish-cache.org <mailto:varnish-misc at varnish-
> cache.org>
> >     https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
> >     <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
> >
> >
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170602/60e4036e/attachment.html>

From dridi at varni.sh  Fri Jun  2 23:08:17 2017
From: dridi at varni.sh (Dridi Boukelmoune)
Date: Sat, 3 Jun 2017 01:08:17 +0200
Subject: Unexplained Cache MISSes
In-Reply-To: <53cec1b0-0b57-7110-4b78-9ac280eaa782@sharphosting.uk>
References: <211c667a-ce70-6373-c840-4482c159e38c@sharphosting.uk>
 <b073048c-0f7e-7304-839c-c1ee37acf977@sharphosting.uk>
 <CAJ6ZYQzp-3vA1GzKWwt0mP94947xmUPYvsygGU32HezQ9a4n9w@mail.gmail.com>
 <db381417-a8c1-fa5c-d24d-67d668f720c3@sharphosting.uk>
 <CABoVN9BrynCxCtP8QAv2Jdxiq7Z+=BWRXQ5o0F1C015RrJi5+A@mail.gmail.com>
 <35dfe986-72dc-95f5-0319-9d0743aebbe4@sharphosting.uk>
 <CABoVN9Dkh3QHhA5J6EMOsT9CQmLgPz-4NWAO3_ied-oqok4iOg@mail.gmail.com>
 <3fdcafb5-4000-3d64-478b-fb60baa9a783@sharphosting.uk>
 <CABoVN9Cd4TRS_C_6cLaLyuwnyb5J_MEHN_qR83p0SnCx0Mak+w@mail.gmail.com>
 <53cec1b0-0b57-7110-4b78-9ac280eaa782@sharphosting.uk>
Message-ID: <CABoVN9D4mUoTidbc5ZfkP4RZaD-D2HsJKM3Y8DGASui_qg=SpQ@mail.gmail.com>

>> Amazingly enough I never looked at the logs of a purge, maybe ExpKill
>> could give us a VXID to then check against the hit. If only SomeoneElse(tm)
>> could spare me the time and look at it themselves and tell us (wink wink=).
>
>
> I'm very happy to help in any way I can. Please let me know anything I can
> do or information I can provide. I'm no C programmer (web developer/server
> admin), so can't help out with coding/patching/debugging[3], but anything
> else I can do, please let me know what you need.

Well, luckily I didn't write any C code to find out what purge logs
look like. I'm certainly not going to debug code I'm not familiar with ;)

I wrote a dummy test case instead:

    varnishtest "purge logs"

    server s1 {
        rxreq
        expect req.url == "/to-be-purged"
        txresp
    } -start

    varnish v1 -vcl+backend {
        sub vcl_recv {
            if (req.method == "PURGE") {
                return (purge);
            }
        }
    } -start

    client c1 {
        txreq -url "/to-be-purged"
        rxresp

        txreq -req PURGE -url "/to-be-purged"
        rxresp

        txreq -req PURGE -url "/unknown"
        rxresp
    } -run

And then looked at the logs manually:

    varnishtest test.vtc -v | grep vsl | less

Here's a sample:

    [...]
    **** v1    0.4 vsl|       1002 VCL_return      b deliver
    **** v1    0.4 vsl|       1002 Storage         b malloc s0
    [...]
    **** v1    0.4 vsl|          0 ExpKill         - EXP_When
p=0x7f420b027000 e=1496443420.703764200 f=0xe
    **** v1    0.4 vsl|          0 ExpKill         - EXP_expire
p=0x7f420b027000 e=-0.000092268 f=0x0
    **** v1    0.4 vsl|          0 ExpKill         - EXP_Expired x=1002 t=-0
    [...]
    **** v1    0.4 vsl|       1003 ReqMethod       c PURGE
    **** v1    0.4 vsl|       1003 ReqURL          c /to-be-purged
    [...]
    **** v1    0.4 vsl|       1003 VCL_return      c purge
    **** v1    0.4 vsl|       1003 VCL_call        c HASH
    **** v1    0.4 vsl|       1003 VCL_return      c lookup
    **** v1    0.4 vsl|       1003 VCL_call        c PURGE
    **** v1    0.4 vsl|       1003 VCL_return      c synth
    [...]
    **** v1    0.4 vsl|       1004 ReqMethod       c PURGE
    **** v1    0.4 vsl|       1004 ReqURL          c /unknown
    [...]
    **** v1    0.4 vsl|       1004 VCL_return      c purge
    **** v1    0.4 vsl|       1004 VCL_call        c HASH
    **** v1    0.4 vsl|       1004 VCL_return      c lookup
    **** v1    0.4 vsl|       1004 VCL_call        c PURGE
    **** v1    0.4 vsl|       1004 VCL_return      c synth
    [...]

The interesting transaction id (VXID) is 1002.

So 1) purge-related logs will only show up with raw grouping in
varnishlog (which I find unfortunate but I should have remembered the
expiry thread would have been involved) and 2) we don't see in a
transaction log how many objects were actually purged (moved to the
expiry inbox).

The ExpKill records appear before because transactions commit their
logs when they finish by default.

> Would a cleanly installed server and absolute minimum VCL to reproduce this
> be useful? You would be welcome to have access to that server, if useful,
> once I've got it set up and producing the same problem.

Not yet, at this point we know that we were looking at an incomplete
picture so what you need to do is capture raw logs and we will be able
to get both a VXID and a timestamp from the ExpKill records (although
the timestamp for EXP_expire puzzles me).

See man varnishlog to see how to write (-w) and then read (-r) logs
to/from a file. When you notice the alleged bug, note the transaction
id and write the current logs (with the -d option) so that you can
pick up all the interesting bits at rest (instead of doing it on live
traffic).

> I can say that in my case there is definitely no Age header coming from the
> back-end. Also as shown in the example I sent it is the 7th HIT on that
> object.

Yes, smells like a bug. But before capturing logs, make sure to remove
Hash records from the vsl_mask (man varnishd) so we can confirm what's
being purged too.

I have a theory, a long shot that will only prove how unfamiliar I am
with this part of Varnish. Since the purge moves the object to the
expiry inbox, it could be that under load the restart may happen
before the expiry thread marks it as expired, thus creating a race
with the next lookup.

Cheers,
Dridi


From info at dubistmeinheld.de  Thu Jun  8 13:32:17 2017
From: info at dubistmeinheld.de (info at dubistmeinheld.de)
Date: Thu, 8 Jun 2017 15:32:17 +0200
Subject: Stuck with sc_rx_timeout
Message-ID: <a4f06c39-bcbc-742e-119e-c08bff34ae25@dubistmeinheld.de>

Hi,

I'm closely monitoring varnish 5.0 stats. I could fix some issues, but I
am stuck on where sc_rx_timeout is coming from.

Looking at the documentation, it says "Number of session closes with
Error RX_TIMEOUT (Receive timeout)".

I do not fully understand this sentence.
- Is this a timeout expressing that varnish does not receive an answer
within a certain time from the backend?
- Do these timeouts happen from time to time and are ok, or is there an
issue on the server (code, params?).
- If parameters, which ones could be candidates to tune?
- Also I have issues with more parameters like sc_req_http10 (please
have a look below) and unsure if they are severe.

I would be happy if you could point me in some direction.

Cheers,
Jens

# /usr/sbin/varnishstat -1 | grep -i '\(err\|fail\|drop\)'
MAIN.sess_drop               0         0.00 Sessions dropped
MAIN.sess_fail               0         0.00 Session accept failures
MAIN.client_req_400            2         0.00 Client requests received,
subject to 400 errors
MAIN.client_req_417            0         0.00 Client requests received,
subject to 417 errors
MAIN.backend_fail                 2         0.00 Backend conn. failures
MAIN.fetch_failed                 0         0.00 Fetch failed (all causes)
MAIN.fetch_no_thread              0         0.00 Fetch failed (no thread)
MAIN.threads_failed               0         0.00 Thread creation failed
MAIN.sess_dropped                 0         0.00 Sessions dropped for thread
MAIN.sess_closed_err          21784         0.30 Session Closed with error
MAIN.sc_req_http10             1223         0.02 Session Err REQ_HTTP10
MAIN.sc_rx_bad                    0         0.00 Session Err RX_BAD
MAIN.sc_rx_body                  34         0.00 Session Err RX_BODY
MAIN.sc_rx_junk                   2         0.00 Session Err RX_JUNK
MAIN.sc_rx_overflow               0         0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout            20525         0.29 Session Err RX_TIMEOUT
MAIN.sc_tx_error                  0         0.00 Session Err TX_ERROR
MAIN.sc_overload                  0         0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow             0         0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short               0         0.00 Session Err RANGE_SHORT
MAIN.esi_errors                              0         0.00 ESI parse
errors (unlock)
SMA.s0.c_fail                                0         0.00 Allocator
failures
SMA.Transient.c_fail                         0         0.00 Allocator
failures


From remofurlanetto at gmail.com  Fri Jun  9 20:10:31 2017
From: remofurlanetto at gmail.com (Remo Furlanetto)
Date: Fri, 9 Jun 2017 22:10:31 +0200
Subject: varnish daemon as non root - vmods libraries directory
Message-ID: <CAOhxrfjGheEbn7RncV8TaonZ9qibJEP=L8NvrxbpSNsJOt0+7A@mail.gmail.com>

Hi,

Is there one way to configure varnish to read the vmods libraries in a
different directory?

I am asking this because of the permissions I have compiled and installed
the varnish in a different path using other user (non-root).

Everything works when I don't use any import in my VCL file.

but when I try to use for example  "import std;", I receive an error
because the process is trying to read a system directory.


[docker at localhost varnish]$ /home/docker/varnish/sbin/varnishd -a :29800 -f
/home/docker/varnish/etc/default.vcl -T 127.0.0.1:6082 -p
thread_pool_min=50 -p thread_pool_max=1000 -S /home/docker/varnish/etc/secret
-s malloc,256M -n /home/docker/varnish/tmp -P /home/docker/varnish/run/
varnish.pid
Error:
Message from VCC-compiler:
Could not load VMOD std
*        File name: libvmod_std.so*
*        dlerror: /usr/local/lib/varnish/vmods/libvmod_std.so: cannot open
shared object file: No such file or directory*
('/home/docker/varnish/etc/default.vcl' Line 3 Pos 8)
import std;
-------###-

Running VCC-compiler failed, exited with 2
VCL compilation failed


I am not sure, but I believe that could be possible to configure other
folder because when the installation has finished, I saw that the libraries
is under a folder "lib"


[docker at localhost varnish]$ ls -ltr /home/docker/varnish
total 40
drwxrwxr-x 3 docker docker 4096 Jun  9 11:24 include
drwxrwxr-x 2 docker docker 4096 Jun  9 11:24 bin
drwxr-xr-x 3 docker docker 4096 Jun  9 11:24 var
drwxrwxr-x 6 docker docker 4096 Jun  9 11:24 share
drwxrwxr-x 4 docker docker 4096 Jun  9 11:24 lib
drwxrwxr-x 2 docker docker 4096 Jun  9 11:56 sysconfig
drwxrwxr-x 2 docker docker 4096 Jun  9 12:29 etc
drwxrwxr-x 2 docker docker 4096 Jun  9 12:29 run
drwxrwxr-x 2 docker docker 4096 Jun  9 12:33 sbin
drwxrwxr-x 4 docker docker 4096 Jun  9 12:37 tmp


[docker at localhost lib]$ find /home/docker/varnish/lib/
/home/docker/varnish/lib/
/home/docker/varnish/lib/pkgconfig
/home/docker/varnish/lib/pkgconfig/varnishapi.pc
/home/docker/varnish/lib/libvarnishapi.so
/home/docker/varnish/lib/libvarnishapi.la
/home/docker/varnish/lib/libvarnishapi.so.1.0.6
/home/docker/varnish/lib/varnish
/home/docker/varnish/lib/varnish/vmods
/home/docker/varnish/lib/varnish/vmods/libvmod_directors.so
*/home/docker/varnish/lib/varnish/vmods/libvmod_std.so*
/home/docker/varnish/lib/varnish/vmods/libvmod_directors.la
/home/docker/varnish/lib/varnish/vmods/libvmod_std.la
/home/docker/varnish/lib/libvarnishapi.so.1


I appreciate if someone could help me.


Thank you very much.

-- 
*Remo M. Furlanetto*
*E-mail:* *remofurlanetto at gmail.com <remofurlanetto at gmail.com>*
*Telefone:* (11) 99910-0565
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170609/a6c33f1a/attachment.html>

From dridi at varni.sh  Fri Jun  9 21:37:54 2017
From: dridi at varni.sh (Dridi Boukelmoune)
Date: Fri, 9 Jun 2017 23:37:54 +0200
Subject: varnish daemon as non root - vmods libraries directory
In-Reply-To: <CAOhxrfjGheEbn7RncV8TaonZ9qibJEP=L8NvrxbpSNsJOt0+7A@mail.gmail.com>
References: <CAOhxrfjGheEbn7RncV8TaonZ9qibJEP=L8NvrxbpSNsJOt0+7A@mail.gmail.com>
Message-ID: <CABoVN9DDD7RoQoHvSPxefeYozw7cYx+_ackQ+UaBPDcDA4Gwvw@mail.gmail.com>

On Fri, Jun 9, 2017 at 10:10 PM, Remo Furlanetto
<remofurlanetto at gmail.com> wrote:
> Hi,
>
> Is there one way to configure varnish to read the vmods libraries in a
> different directory?

You can use the `from` keyword:

    import <name> from <path>;

There is also a vmod_dir or vmod_path parameter depending on your
version of Varnish, see man varnishd.

Dridi


From remofurlanetto at gmail.com  Fri Jun  9 22:16:26 2017
From: remofurlanetto at gmail.com (Remo Furlanetto)
Date: Sat, 10 Jun 2017 00:16:26 +0200
Subject: varnish daemon as non root - vmods libraries directory
In-Reply-To: <CABoVN9DDD7RoQoHvSPxefeYozw7cYx+_ackQ+UaBPDcDA4Gwvw@mail.gmail.com>
References: <CAOhxrfjGheEbn7RncV8TaonZ9qibJEP=L8NvrxbpSNsJOt0+7A@mail.gmail.com>
 <CABoVN9DDD7RoQoHvSPxefeYozw7cYx+_ackQ+UaBPDcDA4Gwvw@mail.gmail.com>
Message-ID: <CAOhxrfhUKza6S5TasGpTYsrPC2N8=89ubvX-9y65=kUpkbX=Zw@mail.gmail.com>

Hi Dridi,

Thank you for your answer.

I have found a solution. I had to execute the script "configure" with
--exec-prefix
wget https://repo.varnish-cache.org/source/varnish-5.1.2.tar.gz
tar -xvzf varnish-5.1.2.tar.gz
cd varnish-5.1.2
./autogen.sh
./configure --prefix=/home/docker/varnish --exec-prefix=/home/docker/varnish
make
make install


Thank you

Remo.

On Fri, Jun 9, 2017 at 11:37 PM, Dridi Boukelmoune <dridi at varni.sh> wrote:

> On Fri, Jun 9, 2017 at 10:10 PM, Remo Furlanetto
> <remofurlanetto at gmail.com> wrote:
> > Hi,
> >
> > Is there one way to configure varnish to read the vmods libraries in a
> > different directory?
>
> You can use the `from` keyword:
>
>     import <name> from <path>;
>
> There is also a vmod_dir or vmod_path parameter depending on your
> version of Varnish, see man varnishd.
>
> Dridi
>


-- 
*Remo M. Furlanetto*
*E-mail:* *remofurlanetto at gmail.com <remofurlanetto at gmail.com>*
*Telefone:* (11) 99910-0565
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170610/281ed613/attachment.html>

From dridi at varni.sh  Fri Jun  9 22:30:42 2017
From: dridi at varni.sh (Dridi Boukelmoune)
Date: Sat, 10 Jun 2017 00:30:42 +0200
Subject: varnish daemon as non root - vmods libraries directory
In-Reply-To: <CAOhxrfhUKza6S5TasGpTYsrPC2N8=89ubvX-9y65=kUpkbX=Zw@mail.gmail.com>
References: <CAOhxrfjGheEbn7RncV8TaonZ9qibJEP=L8NvrxbpSNsJOt0+7A@mail.gmail.com>
 <CABoVN9DDD7RoQoHvSPxefeYozw7cYx+_ackQ+UaBPDcDA4Gwvw@mail.gmail.com>
 <CAOhxrfhUKza6S5TasGpTYsrPC2N8=89ubvX-9y65=kUpkbX=Zw@mail.gmail.com>
Message-ID: <CABoVN9DGLSBsnoQgJoYz-ez6mkwm=0bMfPVtQDuHM+440qwZ9w@mail.gmail.com>

On Sat, Jun 10, 2017 at 12:16 AM, Remo Furlanetto
<remofurlanetto at gmail.com> wrote:
> Hi Dridi,
>
> Thank you for your answer.
>
> I have found a solution. I had to execute the script "configure" with
> --exec-prefix

Oh yes, that too. Since you mentioned installing to a different prefix
on purpose, I didnt think you needed help in this area. Whats weird is
that if you set --prefix but not --exec-prefix, the latter falls back
to the former.

Glad to see it worked out anyway.

Cheers


From np.lists at sharphosting.uk  Fri Jun 16 18:27:15 2017
From: np.lists at sharphosting.uk (Nigel Peck)
Date: Fri, 16 Jun 2017 13:27:15 -0500
Subject: Unexplained Cache MISSes
In-Reply-To: <CABoVN9D4mUoTidbc5ZfkP4RZaD-D2HsJKM3Y8DGASui_qg=SpQ@mail.gmail.com>
References: <211c667a-ce70-6373-c840-4482c159e38c@sharphosting.uk>
 <b073048c-0f7e-7304-839c-c1ee37acf977@sharphosting.uk>
 <CAJ6ZYQzp-3vA1GzKWwt0mP94947xmUPYvsygGU32HezQ9a4n9w@mail.gmail.com>
 <db381417-a8c1-fa5c-d24d-67d668f720c3@sharphosting.uk>
 <CABoVN9BrynCxCtP8QAv2Jdxiq7Z+=BWRXQ5o0F1C015RrJi5+A@mail.gmail.com>
 <35dfe986-72dc-95f5-0319-9d0743aebbe4@sharphosting.uk>
 <CABoVN9Dkh3QHhA5J6EMOsT9CQmLgPz-4NWAO3_ied-oqok4iOg@mail.gmail.com>
 <3fdcafb5-4000-3d64-478b-fb60baa9a783@sharphosting.uk>
 <CABoVN9Cd4TRS_C_6cLaLyuwnyb5J_MEHN_qR83p0SnCx0Mak+w@mail.gmail.com>
 <53cec1b0-0b57-7110-4b78-9ac280eaa782@sharphosting.uk>
 <CABoVN9D4mUoTidbc5ZfkP4RZaD-D2HsJKM3Y8DGASui_qg=SpQ@mail.gmail.com>
Message-ID: <1a0267d7-8cc4-4a9c-5f0a-9719db34321d@sharphosting.uk>


Sorry for the delay on working on this. I've read your email a few times 
now and am still confused! I need to read the man pages suggested but 
haven't got to it yet. Will let you know when I make some progress on it.

I'm fixing the issue in the interim here by issuing another GET request 
in my cache refresh scripts for any PURGE requests that come back with a 
HIT.

Nigel

On 02/06/2017 18:08, Dridi Boukelmoune wrote:
>>> Amazingly enough I never looked at the logs of a purge, maybe ExpKill
>>> could give us a VXID to then check against the hit. If only SomeoneElse(tm)
>>> could spare me the time and look at it themselves and tell us (wink wink=).
>>
>>
>> I'm very happy to help in any way I can. Please let me know anything I can
>> do or information I can provide. I'm no C programmer (web developer/server
>> admin), so can't help out with coding/patching/debugging[3], but anything
>> else I can do, please let me know what you need.
> 
> Well, luckily I didn't write any C code to find out what purge logs
> look like. I'm certainly not going to debug code I'm not familiar with ;)
> 
> I wrote a dummy test case instead:
> 
>      varnishtest "purge logs"
> 
>      server s1 {
>          rxreq
>          expect req.url == "/to-be-purged"
>          txresp
>      } -start
> 
>      varnish v1 -vcl+backend {
>          sub vcl_recv {
>              if (req.method == "PURGE") {
>                  return (purge);
>              }
>          }
>      } -start
> 
>      client c1 {
>          txreq -url "/to-be-purged"
>          rxresp
> 
>          txreq -req PURGE -url "/to-be-purged"
>          rxresp
> 
>          txreq -req PURGE -url "/unknown"
>          rxresp
>      } -run
> 
> And then looked at the logs manually:
> 
>      varnishtest test.vtc -v | grep vsl | less
> 
> Here's a sample:
> 
>      [...]
>      **** v1    0.4 vsl|       1002 VCL_return      b deliver
>      **** v1    0.4 vsl|       1002 Storage         b malloc s0
>      [...]
>      **** v1    0.4 vsl|          0 ExpKill         - EXP_When
> p=0x7f420b027000 e=1496443420.703764200 f=0xe
>      **** v1    0.4 vsl|          0 ExpKill         - EXP_expire
> p=0x7f420b027000 e=-0.000092268 f=0x0
>      **** v1    0.4 vsl|          0 ExpKill         - EXP_Expired x=1002 t=-0
>      [...]
>      **** v1    0.4 vsl|       1003 ReqMethod       c PURGE
>      **** v1    0.4 vsl|       1003 ReqURL          c /to-be-purged
>      [...]
>      **** v1    0.4 vsl|       1003 VCL_return      c purge
>      **** v1    0.4 vsl|       1003 VCL_call        c HASH
>      **** v1    0.4 vsl|       1003 VCL_return      c lookup
>      **** v1    0.4 vsl|       1003 VCL_call        c PURGE
>      **** v1    0.4 vsl|       1003 VCL_return      c synth
>      [...]
>      **** v1    0.4 vsl|       1004 ReqMethod       c PURGE
>      **** v1    0.4 vsl|       1004 ReqURL          c /unknown
>      [...]
>      **** v1    0.4 vsl|       1004 VCL_return      c purge
>      **** v1    0.4 vsl|       1004 VCL_call        c HASH
>      **** v1    0.4 vsl|       1004 VCL_return      c lookup
>      **** v1    0.4 vsl|       1004 VCL_call        c PURGE
>      **** v1    0.4 vsl|       1004 VCL_return      c synth
>      [...]
> 
> The interesting transaction id (VXID) is 1002.
> 
> So 1) purge-related logs will only show up with raw grouping in
> varnishlog (which I find unfortunate but I should have remembered the
> expiry thread would have been involved) and 2) we don't see in a
> transaction log how many objects were actually purged (moved to the
> expiry inbox).
> 
> The ExpKill records appear before because transactions commit their
> logs when they finish by default.
> 
>> Would a cleanly installed server and absolute minimum VCL to reproduce this
>> be useful? You would be welcome to have access to that server, if useful,
>> once I've got it set up and producing the same problem.
> 
> Not yet, at this point we know that we were looking at an incomplete
> picture so what you need to do is capture raw logs and we will be able
> to get both a VXID and a timestamp from the ExpKill records (although
> the timestamp for EXP_expire puzzles me).
> 
> See man varnishlog to see how to write (-w) and then read (-r) logs
> to/from a file. When you notice the alleged bug, note the transaction
> id and write the current logs (with the -d option) so that you can
> pick up all the interesting bits at rest (instead of doing it on live
> traffic).
> 
>> I can say that in my case there is definitely no Age header coming from the
>> back-end. Also as shown in the example I sent it is the 7th HIT on that
>> object.
> 
> Yes, smells like a bug. But before capturing logs, make sure to remove
> Hash records from the vsl_mask (man varnishd) so we can confirm what's
> being purged too.
> 
> I have a theory, a long shot that will only prove how unfamiliar I am
> with this part of Varnish. Since the purge moves the object to the
> expiry inbox, it could be that under load the restart may happen
> before the expiry thread marks it as expired, thus creating a race
> with the next lookup.
> 
> Cheers,
> Dridi
> 


From np.lists at sharphosting.uk  Fri Jun 16 19:09:40 2017
From: np.lists at sharphosting.uk (Nigel Peck)
Date: Fri, 16 Jun 2017 14:09:40 -0500
Subject: Unexplained Cache MISSes
In-Reply-To: <1a0267d7-8cc4-4a9c-5f0a-9719db34321d@sharphosting.uk>
References: <211c667a-ce70-6373-c840-4482c159e38c@sharphosting.uk>
 <b073048c-0f7e-7304-839c-c1ee37acf977@sharphosting.uk>
 <CAJ6ZYQzp-3vA1GzKWwt0mP94947xmUPYvsygGU32HezQ9a4n9w@mail.gmail.com>
 <db381417-a8c1-fa5c-d24d-67d668f720c3@sharphosting.uk>
 <CABoVN9BrynCxCtP8QAv2Jdxiq7Z+=BWRXQ5o0F1C015RrJi5+A@mail.gmail.com>
 <35dfe986-72dc-95f5-0319-9d0743aebbe4@sharphosting.uk>
 <CABoVN9Dkh3QHhA5J6EMOsT9CQmLgPz-4NWAO3_ied-oqok4iOg@mail.gmail.com>
 <3fdcafb5-4000-3d64-478b-fb60baa9a783@sharphosting.uk>
 <CABoVN9Cd4TRS_C_6cLaLyuwnyb5J_MEHN_qR83p0SnCx0Mak+w@mail.gmail.com>
 <53cec1b0-0b57-7110-4b78-9ac280eaa782@sharphosting.uk>
 <CABoVN9D4mUoTidbc5ZfkP4RZaD-D2HsJKM3Y8DGASui_qg=SpQ@mail.gmail.com>
 <1a0267d7-8cc4-4a9c-5f0a-9719db34321d@sharphosting.uk>
Message-ID: <499ebc8d-e952-571c-e378-0fe092c6c709@sharphosting.uk>


Here's an interesting thing about this. When I refreshed the cache just 
now (PURGE) for 204 URLs, 78 of them were a HIT instead of a MISS. All 
had been in the cache for 9 hours at least. (a re-issued GET request 
received a MISS for all 78)

When I immediately issued a PURGE again a few seconds later for all 204 
URLs, every one of them was a MISS and purged successfully. I did it 
again a few seconds after that, and again all good. Same again a few 
minutes after that. No HITs.

So this seems to be in some way related to how long the objects have 
been in the cache.

Nigel

On 16/06/2017 13:27, Nigel Peck wrote:
> 
> Sorry for the delay on working on this. I've read your email a few times 
> now and am still confused! I need to read the man pages suggested but 
> haven't got to it yet. Will let you know when I make some progress on it.
> 
> I'm fixing the issue in the interim here by issuing another GET request 
> in my cache refresh scripts for any PURGE requests that come back with a 
> HIT.
> 
> Nigel
> 
> On 02/06/2017 18:08, Dridi Boukelmoune wrote:
>>>> Amazingly enough I never looked at the logs of a purge, maybe ExpKill
>>>> could give us a VXID to then check against the hit. If only 
>>>> SomeoneElse(tm)
>>>> could spare me the time and look at it themselves and tell us (wink 
>>>> wink=).
>>>
>>>
>>> I'm very happy to help in any way I can. Please let me know anything 
>>> I can
>>> do or information I can provide. I'm no C programmer (web 
>>> developer/server
>>> admin), so can't help out with coding/patching/debugging[3], but 
>>> anything
>>> else I can do, please let me know what you need.
>>
>> Well, luckily I didn't write any C code to find out what purge logs
>> look like. I'm certainly not going to debug code I'm not familiar with ;)
>>
>> I wrote a dummy test case instead:
>>
>>      varnishtest "purge logs"
>>
>>      server s1 {
>>          rxreq
>>          expect req.url == "/to-be-purged"
>>          txresp
>>      } -start
>>
>>      varnish v1 -vcl+backend {
>>          sub vcl_recv {
>>              if (req.method == "PURGE") {
>>                  return (purge);
>>              }
>>          }
>>      } -start
>>
>>      client c1 {
>>          txreq -url "/to-be-purged"
>>          rxresp
>>
>>          txreq -req PURGE -url "/to-be-purged"
>>          rxresp
>>
>>          txreq -req PURGE -url "/unknown"
>>          rxresp
>>      } -run
>>
>> And then looked at the logs manually:
>>
>>      varnishtest test.vtc -v | grep vsl | less
>>
>> Here's a sample:
>>
>>      [...]
>>      **** v1    0.4 vsl|       1002 VCL_return      b deliver
>>      **** v1    0.4 vsl|       1002 Storage         b malloc s0
>>      [...]
>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_When
>> p=0x7f420b027000 e=1496443420.703764200 f=0xe
>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_expire
>> p=0x7f420b027000 e=-0.000092268 f=0x0
>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_Expired 
>> x=1002 t=-0
>>      [...]
>>      **** v1    0.4 vsl|       1003 ReqMethod       c PURGE
>>      **** v1    0.4 vsl|       1003 ReqURL          c /to-be-purged
>>      [...]
>>      **** v1    0.4 vsl|       1003 VCL_return      c purge
>>      **** v1    0.4 vsl|       1003 VCL_call        c HASH
>>      **** v1    0.4 vsl|       1003 VCL_return      c lookup
>>      **** v1    0.4 vsl|       1003 VCL_call        c PURGE
>>      **** v1    0.4 vsl|       1003 VCL_return      c synth
>>      [...]
>>      **** v1    0.4 vsl|       1004 ReqMethod       c PURGE
>>      **** v1    0.4 vsl|       1004 ReqURL          c /unknown
>>      [...]
>>      **** v1    0.4 vsl|       1004 VCL_return      c purge
>>      **** v1    0.4 vsl|       1004 VCL_call        c HASH
>>      **** v1    0.4 vsl|       1004 VCL_return      c lookup
>>      **** v1    0.4 vsl|       1004 VCL_call        c PURGE
>>      **** v1    0.4 vsl|       1004 VCL_return      c synth
>>      [...]
>>
>> The interesting transaction id (VXID) is 1002.
>>
>> So 1) purge-related logs will only show up with raw grouping in
>> varnishlog (which I find unfortunate but I should have remembered the
>> expiry thread would have been involved) and 2) we don't see in a
>> transaction log how many objects were actually purged (moved to the
>> expiry inbox).
>>
>> The ExpKill records appear before because transactions commit their
>> logs when they finish by default.
>>
>>> Would a cleanly installed server and absolute minimum VCL to 
>>> reproduce this
>>> be useful? You would be welcome to have access to that server, if 
>>> useful,
>>> once I've got it set up and producing the same problem.
>>
>> Not yet, at this point we know that we were looking at an incomplete
>> picture so what you need to do is capture raw logs and we will be able
>> to get both a VXID and a timestamp from the ExpKill records (although
>> the timestamp for EXP_expire puzzles me).
>>
>> See man varnishlog to see how to write (-w) and then read (-r) logs
>> to/from a file. When you notice the alleged bug, note the transaction
>> id and write the current logs (with the -d option) so that you can
>> pick up all the interesting bits at rest (instead of doing it on live
>> traffic).
>>
>>> I can say that in my case there is definitely no Age header coming 
>>> from the
>>> back-end. Also as shown in the example I sent it is the 7th HIT on that
>>> object.
>>
>> Yes, smells like a bug. But before capturing logs, make sure to remove
>> Hash records from the vsl_mask (man varnishd) so we can confirm what's
>> being purged too.
>>
>> I have a theory, a long shot that will only prove how unfamiliar I am
>> with this part of Varnish. Since the purge moves the object to the
>> expiry inbox, it could be that under load the restart may happen
>> before the expiry thread marks it as expired, thus creating a race
>> with the next lookup.
>>
>> Cheers,
>> Dridi
>>
> 
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


From r at roze.lv  Wed Jun 21 12:23:42 2017
From: r at roze.lv (Reinis Rozitis)
Date: Wed, 21 Jun 2017 15:23:42 +0300
Subject: Assert error in http1_minimal_response(),
 http1/cache_http1_fsm.c line 234
Message-ID: <509EFFB6DD2147BE8DCF2FC0A6BD4E3A@MasterPC>

Hello,
before making a new issue I wanted to clarify:


After upgrading varnish to 5.1.2 (from 3.x) sometimes it panics with:

Assert error in http1_minimal_response(), http1/cache_http1_fsm.c line 234:
  Condition(VTCP_Check(1)) not true.
version = varnish-5.1.2 revision 6ece695, vrt api = 6.0
ident = 
Linux,4.11.4-1.gcba98ee-default,x86_64,-jnone,-sfile,-sfile,-sfile,-sfile,-smalloc,-hcritbit,epoll
now = 681947.494149 (mono), 1498046899.128715 (real)
Backtrace:
  0x43af07: /data/varnish5/sbin/varnishd() [0x43af07]
  0x45bfe5: /data/varnish5/sbin/varnishd() [0x45bfe5]
  0x45cf72: /data/varnish5/sbin/varnishd() [0x45cf72]
  0x454049: /data/varnish5/sbin/varnishd() [0x454049]
  0x4544ab: /data/varnish5/sbin/varnishd() [0x4544ab]
  0x7fee77d44744: /lib64/libpthread.so.0(+0x8744) [0x7fee77d44744]
  0x7fee77a82d3d: /lib64/libc.so.6(clone+0x6d) [0x7fee77a82d3d]
errno = 32 (Broken pipe)

[also full backtrace]


I've checked past issues and there was exactly one matches: 
https://github.com/varnishcache/varnish-cache/issues/2267 but it is kind of 
closed closed with 
https://github.com/varnishcache/varnish-cache/commit/a8b453cb432e9717e1a8afab91433aa4294ba27e


Should I add to the existing bug report or create a new one?


rr 


From r at roze.lv  Wed Jun 21 12:34:13 2017
From: r at roze.lv (Reinis Rozitis)
Date: Wed, 21 Jun 2017 15:34:13 +0300
Subject: Assert error in http1_minimal_response(),
 http1/cache_http1_fsm.c line 234
In-Reply-To: <509EFFB6DD2147BE8DCF2FC0A6BD4E3A@MasterPC>
References: <509EFFB6DD2147BE8DCF2FC0A6BD4E3A@MasterPC>
Message-ID: <77172C33544E4F7CBC9B79D585638E83@MasterPC>

Also the fix ( 
https://github.com/varnishcache/varnish-cache/commit/a8b453cb432e9717e1a8afab91433aa4294ba27e 
 )  itself is a bit odd, since it's within:

#if (defined (__SVR4) && defined (__sun)) || defined (__NetBSD__)


.. but both the original authors and mine environment is Linux (eg I can't 
see how it actually changes something regarding the issue).

rr 


From guillaume at varnish-software.com  Fri Jun 23 08:58:21 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Fri, 23 Jun 2017 10:58:21 +0200
Subject: Varnish performance with phpinfo
In-Reply-To: <6fa4576d-d25e-b770-44da-98877379a815@gmail.com>
References: <6fa4576d-d25e-b770-44da-98877379a815@gmail.com>
Message-ID: <CAJ6ZYQxy8Skagu1jqP0tggyhqA-599urogy6JLxXUOgZyEYBXA@mail.gmail.com>

Stupid question but, aren't you being limited by your client, or a
firewall, maybe?

-- 
Guillaume Quintard

On Fri, Jun 2, 2017 at 12:06 PM, Johan Hendriks <joh.hendriks at gmail.com>
wrote:

> Hello all, First sorry for the long email.
> I have a strange issue with varnish. At least I think it is strange.
>
> We start some tests with varnish, but we have an issue.
>
> I am running varnish 4.1.6 on FreeBSD 11.1-prerelease. Where varnish
> listen on port 82 and apache on 80, This is just for the tests.
> We use the following start options.
>
> # Varnish
> varnishd_enable="YES"
> varnishd_listen="192.168.2.247:82"
> varnishd_pidfile="/var/run/varnishd.pid"
> varnishd_storage="default=malloc,2024M"
> varnishd_config="/usr/local/etc/varnish/default.vcl"
> varnishd_hash="critbit"
> varnishd_admin=":6082"
> varnishncsa_enable="YES"
>
> We did a test with a static page and that went fine. First we see it is
> not cached, second attempt is cached.
>
> root at desk:~ # curl -I www.testdomain.nl:82/info.html
> HTTP/1.1 200 OK
> Date: Fri, 02 Jun 2017 09:19:52 GMT
> Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
> ETag: "cf4-550e57bc1f812"
> Content-Length: 3316
> Content-Type: text/html
> cache-control: max-age = 259200
> X-Varnish: 2
> Age: 0
> Via: 1.1 varnish-v4
> Server: varnish
> X-Powered-By: My Varnish
> X-Cache: MISS
> Accept-Ranges: bytes
> Connection: keep-alive
>
> root at desk:~ # curl -I www.testdomain.nl:82/info.html
> HTTP/1.1 200 OK
> Date: Fri, 02 Jun 2017 09:19:52 GMT
> Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
> ETag: "cf4-550e57bc1f812"
> Content-Length: 3316
> Content-Type: text/html
> cache-control: max-age = 259200
> X-Varnish: 5 3
> Age: 6
> Via: 1.1 varnish-v4
> Server: varnish
> X-Powered-By: My Varnish
> X-Cache: HIT
> Accept-Ranges: bytes
> Connection: keep-alive
>
> if I benchmark the server I get the following.
> First is derectly to Apache
>
> root at testserver:~ # bombardier -c400 -n10000
> http://www.testdomain.nl/info.html
> Bombarding http://www.testdomain.nl/info.html with 10000 requests using
> 400 connections
>  10000 / 10000 [=============================
> ================================] 100.00% 0s
> Done!
> Statistics        Avg      Stdev        Max
>   Reqs/sec     12459.00     898.32      13301
>   Latency       31.04ms    25.28ms   280.90ms
>   HTTP codes:
>     1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>     others - 0
>   Throughput:    42.16MB/s
>
> This is via varnish. So that works as intended.
> Varnish does its job and servers the page better.
>
> root at testserver:~ # bombardier -c400 -n10000 http://www.testdomain.nl:82/
> info.html
> Bombarding http://www.testdomain.nl:82/info.html with 10000 requests
> using 400 connections
>  10000 / 10000 [=============================
> ================================] 100.00% 0s
> Done!
> Statistics        Avg      Stdev        Max
>   Reqs/sec     19549.00    7649.32      24313
>   Latency       17.90ms    66.77ms   485.07ms
>   HTTP codes:
>     1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>     others - 0
>   Throughput:    71.58MB/s
>
>
> The next one is against a info.php file, which runs phpinfo();
>
> So first agains the server without varnish.
>
> root at testserver:~ # bombardier -c400 -n10000
> http://www.testdomain.nl/info.php
> Bombarding http://www.testdomain.nl/info.php with 10000 requests using
> 400 connections
>  10000 / 10000 [=============================
> ===============================] 100.00% 11s
> Done!
> Statistics        Avg      Stdev        Max
>   Reqs/sec       828.00     127.66       1010
>   Latency      472.10ms    59.10ms   740.43ms
>   HTTP codes:
>     1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>     others - 0
>   Throughput:    75.51MB/s
>
> But then against the server with varnish.
> So we make sure it is in cache
>
> root at desk:~ # curl -I www.testdomain.nl:82/info.php
> HTTP/1.1 200 OK
> Date: Fri, 02 Jun 2017 09:36:16 GMT
> Content-Type: text/html; charset=UTF-8
> cache-control: max-age = 259200
> X-Varnish: 7
> Age: 0
> Via: 1.1 varnish-v4
> Server: varnish
> X-Powered-By: My Varnish
> X-Cache: MISS
> Accept-Ranges: bytes
> Connection: keep-alive
>
> root at desk:~ # curl -I www.testdomain.nl:82/info.php
> HTTP/1.1 200 OK
> Date: Fri, 02 Jun 2017 09:36:16 GMT
> Content-Type: text/html; charset=UTF-8
> cache-control: max-age = 259200
> X-Varnish: 10 8
> Age: 2
> Via: 1.1 varnish-v4
> Server: varnish
> X-Powered-By: My Varnish
> X-Cache: HIT
> Accept-Ranges: bytes
> Connection: keep-alive
>
> So it is in cache now.
> root at testserver:~ # bombardier -c400 -n10000 http://www.testdomain.nl:82/
> info.php
> Bombarding http://www.testdomain.nl:82/info.php with 10000 requests using
> 400 connections
>  10000 / 10000 [=============================
> ============================================================
> ============================================================
> ======================================================] 100.00% 8s
> Done!
> Statistics        Avg      Stdev        Max
>   Reqs/sec      1179.00     230.77       1981
>   Latency      219.94ms   340.29ms      2.00s
>   HTTP codes:
>     1xx - 0, 2xx - 9938, 3xx - 0, 4xx - 0, 5xx - 0
>     others - 62
>   Errors:
>     dialing to the given TCP address timed out - 62
>   Throughput:    83.16MB/s
>
> I expected this to be much more in favour of varnish, but it even
> generated some errors! Time taken is lower but I expected it to be much
> faster. Also the 62 errors is not good i guess.
>
> I do see the following with varnish log
> *   << Request  >> 11141123
> -   Begin          req 1310723 rxreq
> -   Timestamp      Start: 1496396250.098654 0.000000 0.000000
> -   Timestamp      Req: 1496396250.098654 0.000000 0.000000
> -   ReqStart       192.168.2.39 14818
> -   ReqMethod      GET
> -   ReqURL         /info.php
> -   ReqProtocol    HTTP/1.1
> -   ReqHeader      User-Agent: fasthttp
> -   ReqHeader      Host: www.testdomain.nl:82
> -   ReqHeader      X-Forwarded-For: 192.168.2.39
> -   VCL_call       RECV
> -   ReqUnset       X-Forwarded-For: 192.168.2.39
> -   ReqHeader      X-Forwarded-For: 192.168.2.39, 192.168.2.39
> -   VCL_return     hash
> -   VCL_call       HASH
> -   VCL_return     lookup
> -   Hit            8
> -   VCL_call       HIT
> -   VCL_return     deliver
> -   RespProtocol   HTTP/1.1
> -   RespStatus     200
> -   RespReason     OK
> -   RespHeader     Date: Fri, 02 Jun 2017 09:36:16 GMT
> -   RespHeader     Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
> -   RespHeader     X-Powered-By: PHP/7.0.19
> -   RespHeader     Content-Type: text/html; charset=UTF-8
> -   RespHeader     cache-control: max-age = 259200
> -   RespHeader     X-Varnish: 11141123 8
> -   RespHeader     Age: 73
> -   RespHeader     Via: 1.1 varnish-v4
> -   VCL_call       DELIVER
> -   RespUnset      Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
> -   RespHeader     Server: varnish
> -   RespUnset      X-Powered-By: PHP/7.0.19
> -   RespHeader     X-Powered-By: My Varnish
> -   RespHeader     X-Cache: HIT
> -   VCL_return     deliver
> -   Timestamp      Process: 1496396250.098712 0.000058 0.000058
> -   RespHeader     Accept-Ranges: bytes
> -   RespHeader     Content-Length: 95200
> -   Debug          "RES_MODE 2"
> -   RespHeader     Connection: keep-alive
> *-   Debug          "Hit idle send timeout, wrote = 89972/95508; retrying"*
> *-   Debug          "Write error, retval = -1, len = 5536, errno =
> Resource temporarily unavailable"*
> -   Timestamp      Resp: 1496396371.131526 121.032872 121.032814
> -   ReqAcct        82 0 82 308 95200 95508
> -   End
>
> Sometimes I see this Debug line also -   *Debug          "Write error,
> retval = -1, len = 95563, errno = Broken pipe"*
>
>
> I also installed varnish 5.1.2 but the results are the same.
> Is there something I miss?
>
> My vcl file is pretty basic.
>
> https://pastebin.com/rbb42x7h
>
> Thanks all for your time.
>
> regards
> Johan
>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170623/dd0d552c/attachment.html>

From guillaume at varnish-software.com  Fri Jun 23 09:05:33 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Fri, 23 Jun 2017 11:05:33 +0200
Subject: Stuck with sc_rx_timeout
In-Reply-To: <a4f06c39-bcbc-742e-119e-c08bff34ae25@dubistmeinheld.de>
References: <a4f06c39-bcbc-742e-119e-c08bff34ae25@dubistmeinheld.de>
Message-ID: <CAJ6ZYQx3n+0_vomMm7ahEMFVsd-3pjicFvnqk3No0+NCedqFJQ@mail.gmail.com>

Don't worry about it, your client just left while you where expecting data
from it.

-- 
Guillaume Quintard

On Thu, Jun 8, 2017 at 3:32 PM, info at dubistmeinheld.de <
info at dubistmeinheld.de> wrote:

> Hi,
>
> I'm closely monitoring varnish 5.0 stats. I could fix some issues, but I
> am stuck on where sc_rx_timeout is coming from.
>
> Looking at the documentation, it says "Number of session closes with
> Error RX_TIMEOUT (Receive timeout)".
>
> I do not fully understand this sentence.
> - Is this a timeout expressing that varnish does not receive an answer
> within a certain time from the backend?
> - Do these timeouts happen from time to time and are ok, or is there an
> issue on the server (code, params?).
> - If parameters, which ones could be candidates to tune?
> - Also I have issues with more parameters like sc_req_http10 (please
> have a look below) and unsure if they are severe.
>
> I would be happy if you could point me in some direction.
>
> Cheers,
> Jens
>
> # /usr/sbin/varnishstat -1 | grep -i '\(err\|fail\|drop\)'
> MAIN.sess_drop               0         0.00 Sessions dropped
> MAIN.sess_fail               0         0.00 Session accept failures
> MAIN.client_req_400            2         0.00 Client requests received,
> subject to 400 errors
> MAIN.client_req_417            0         0.00 Client requests received,
> subject to 417 errors
> MAIN.backend_fail                 2         0.00 Backend conn. failures
> MAIN.fetch_failed                 0         0.00 Fetch failed (all causes)
> MAIN.fetch_no_thread              0         0.00 Fetch failed (no thread)
> MAIN.threads_failed               0         0.00 Thread creation failed
> MAIN.sess_dropped                 0         0.00 Sessions dropped for
> thread
> MAIN.sess_closed_err          21784         0.30 Session Closed with error
> MAIN.sc_req_http10             1223         0.02 Session Err REQ_HTTP10
> MAIN.sc_rx_bad                    0         0.00 Session Err RX_BAD
> MAIN.sc_rx_body                  34         0.00 Session Err RX_BODY
> MAIN.sc_rx_junk                   2         0.00 Session Err RX_JUNK
> MAIN.sc_rx_overflow               0         0.00 Session Err RX_OVERFLOW
> MAIN.sc_rx_timeout            20525         0.29 Session Err RX_TIMEOUT
> MAIN.sc_tx_error                  0         0.00 Session Err TX_ERROR
> MAIN.sc_overload                  0         0.00 Session Err OVERLOAD
> MAIN.sc_pipe_overflow             0         0.00 Session Err PIPE_OVERFLOW
> MAIN.sc_range_short               0         0.00 Session Err RANGE_SHORT
> MAIN.esi_errors                              0         0.00 ESI parse
> errors (unlock)
> SMA.s0.c_fail                                0         0.00 Allocator
> failures
> SMA.Transient.c_fail                         0         0.00 Allocator
> failures
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170623/013f96fe/attachment.html>

From guillaume at varnish-software.com  Fri Jun 23 09:09:50 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Fri, 23 Jun 2017 11:09:50 +0200
Subject: Unexplained Cache MISSes
In-Reply-To: <499ebc8d-e952-571c-e378-0fe092c6c709@sharphosting.uk>
References: <211c667a-ce70-6373-c840-4482c159e38c@sharphosting.uk>
 <b073048c-0f7e-7304-839c-c1ee37acf977@sharphosting.uk>
 <CAJ6ZYQzp-3vA1GzKWwt0mP94947xmUPYvsygGU32HezQ9a4n9w@mail.gmail.com>
 <db381417-a8c1-fa5c-d24d-67d668f720c3@sharphosting.uk>
 <CABoVN9BrynCxCtP8QAv2Jdxiq7Z+=BWRXQ5o0F1C015RrJi5+A@mail.gmail.com>
 <35dfe986-72dc-95f5-0319-9d0743aebbe4@sharphosting.uk>
 <CABoVN9Dkh3QHhA5J6EMOsT9CQmLgPz-4NWAO3_ied-oqok4iOg@mail.gmail.com>
 <3fdcafb5-4000-3d64-478b-fb60baa9a783@sharphosting.uk>
 <CABoVN9Cd4TRS_C_6cLaLyuwnyb5J_MEHN_qR83p0SnCx0Mak+w@mail.gmail.com>
 <53cec1b0-0b57-7110-4b78-9ac280eaa782@sharphosting.uk>
 <CABoVN9D4mUoTidbc5ZfkP4RZaD-D2HsJKM3Y8DGASui_qg=SpQ@mail.gmail.com>
 <1a0267d7-8cc4-4a9c-5f0a-9719db34321d@sharphosting.uk>
 <499ebc8d-e952-571c-e378-0fe092c6c709@sharphosting.uk>
Message-ID: <CAJ6ZYQw202eRngV1yr=CzgW0O86=Xh5KEazV_1bwP-=jcCQ3Ew@mail.gmail.com>

Hum, could you toy with ttl/grace/keep periods? Like having only a one week
TTL but no grace/keep, then a one week grace but no TTL/keep?
The period when the purge occurs may be important...

-- 
Guillaume Quintard

On Fri, Jun 16, 2017 at 9:09 PM, Nigel Peck <np.lists at sharphosting.uk>
wrote:

>
> Here's an interesting thing about this. When I refreshed the cache just
> now (PURGE) for 204 URLs, 78 of them were a HIT instead of a MISS. All had
> been in the cache for 9 hours at least. (a re-issued GET request received a
> MISS for all 78)
>
> When I immediately issued a PURGE again a few seconds later for all 204
> URLs, every one of them was a MISS and purged successfully. I did it again
> a few seconds after that, and again all good. Same again a few minutes
> after that. No HITs.
>
> So this seems to be in some way related to how long the objects have been
> in the cache.
>
> Nigel
>
>
> On 16/06/2017 13:27, Nigel Peck wrote:
>
>>
>> Sorry for the delay on working on this. I've read your email a few times
>> now and am still confused! I need to read the man pages suggested but
>> haven't got to it yet. Will let you know when I make some progress on it.
>>
>> I'm fixing the issue in the interim here by issuing another GET request
>> in my cache refresh scripts for any PURGE requests that come back with a
>> HIT.
>>
>> Nigel
>>
>> On 02/06/2017 18:08, Dridi Boukelmoune wrote:
>>
>>> Amazingly enough I never looked at the logs of a purge, maybe ExpKill
>>>>> could give us a VXID to then check against the hit. If only
>>>>> SomeoneElse(tm)
>>>>> could spare me the time and look at it themselves and tell us (wink
>>>>> wink=).
>>>>>
>>>>
>>>>
>>>> I'm very happy to help in any way I can. Please let me know anything I
>>>> can
>>>> do or information I can provide. I'm no C programmer (web
>>>> developer/server
>>>> admin), so can't help out with coding/patching/debugging[3], but
>>>> anything
>>>> else I can do, please let me know what you need.
>>>>
>>>
>>> Well, luckily I didn't write any C code to find out what purge logs
>>> look like. I'm certainly not going to debug code I'm not familiar with ;)
>>>
>>> I wrote a dummy test case instead:
>>>
>>>      varnishtest "purge logs"
>>>
>>>      server s1 {
>>>          rxreq
>>>          expect req.url == "/to-be-purged"
>>>          txresp
>>>      } -start
>>>
>>>      varnish v1 -vcl+backend {
>>>          sub vcl_recv {
>>>              if (req.method == "PURGE") {
>>>                  return (purge);
>>>              }
>>>          }
>>>      } -start
>>>
>>>      client c1 {
>>>          txreq -url "/to-be-purged"
>>>          rxresp
>>>
>>>          txreq -req PURGE -url "/to-be-purged"
>>>          rxresp
>>>
>>>          txreq -req PURGE -url "/unknown"
>>>          rxresp
>>>      } -run
>>>
>>> And then looked at the logs manually:
>>>
>>>      varnishtest test.vtc -v | grep vsl | less
>>>
>>> Here's a sample:
>>>
>>>      [...]
>>>      **** v1    0.4 vsl|       1002 VCL_return      b deliver
>>>      **** v1    0.4 vsl|       1002 Storage         b malloc s0
>>>      [...]
>>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_When
>>> p=0x7f420b027000 e=1496443420.703764200 f=0xe
>>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_expire
>>> p=0x7f420b027000 e=-0.000092268 f=0x0
>>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_Expired x=1002
>>> t=-0
>>>      [...]
>>>      **** v1    0.4 vsl|       1003 ReqMethod       c PURGE
>>>      **** v1    0.4 vsl|       1003 ReqURL          c /to-be-purged
>>>      [...]
>>>      **** v1    0.4 vsl|       1003 VCL_return      c purge
>>>      **** v1    0.4 vsl|       1003 VCL_call        c HASH
>>>      **** v1    0.4 vsl|       1003 VCL_return      c lookup
>>>      **** v1    0.4 vsl|       1003 VCL_call        c PURGE
>>>      **** v1    0.4 vsl|       1003 VCL_return      c synth
>>>      [...]
>>>      **** v1    0.4 vsl|       1004 ReqMethod       c PURGE
>>>      **** v1    0.4 vsl|       1004 ReqURL          c /unknown
>>>      [...]
>>>      **** v1    0.4 vsl|       1004 VCL_return      c purge
>>>      **** v1    0.4 vsl|       1004 VCL_call        c HASH
>>>      **** v1    0.4 vsl|       1004 VCL_return      c lookup
>>>      **** v1    0.4 vsl|       1004 VCL_call        c PURGE
>>>      **** v1    0.4 vsl|       1004 VCL_return      c synth
>>>      [...]
>>>
>>> The interesting transaction id (VXID) is 1002.
>>>
>>> So 1) purge-related logs will only show up with raw grouping in
>>> varnishlog (which I find unfortunate but I should have remembered the
>>> expiry thread would have been involved) and 2) we don't see in a
>>> transaction log how many objects were actually purged (moved to the
>>> expiry inbox).
>>>
>>> The ExpKill records appear before because transactions commit their
>>> logs when they finish by default.
>>>
>>> Would a cleanly installed server and absolute minimum VCL to reproduce
>>>> this
>>>> be useful? You would be welcome to have access to that server, if
>>>> useful,
>>>> once I've got it set up and producing the same problem.
>>>>
>>>
>>> Not yet, at this point we know that we were looking at an incomplete
>>> picture so what you need to do is capture raw logs and we will be able
>>> to get both a VXID and a timestamp from the ExpKill records (although
>>> the timestamp for EXP_expire puzzles me).
>>>
>>> See man varnishlog to see how to write (-w) and then read (-r) logs
>>> to/from a file. When you notice the alleged bug, note the transaction
>>> id and write the current logs (with the -d option) so that you can
>>> pick up all the interesting bits at rest (instead of doing it on live
>>> traffic).
>>>
>>> I can say that in my case there is definitely no Age header coming from
>>>> the
>>>> back-end. Also as shown in the example I sent it is the 7th HIT on that
>>>> object.
>>>>
>>>
>>> Yes, smells like a bug. But before capturing logs, make sure to remove
>>> Hash records from the vsl_mask (man varnishd) so we can confirm what's
>>> being purged too.
>>>
>>> I have a theory, a long shot that will only prove how unfamiliar I am
>>> with this part of Varnish. Since the purge moves the object to the
>>> expiry inbox, it could be that under load the restart may happen
>>> before the expiry thread marks it as expired, thus creating a race
>>> with the next lookup.
>>>
>>> Cheers,
>>> Dridi
>>>
>>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170623/10f17d80/attachment.html>

From stefanobaldo at gmail.com  Fri Jun 23 14:01:26 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Fri, 23 Jun 2017 11:01:26 -0300
Subject: Child process recurrently being restarted
Message-ID: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>

Hello.

I am having a critical problem with Varnish Cache in production for over a
month and any help will be appreciated.
The problem is that Varnish child process is recurrently being restarted
after 10~20h of use, with the following message:

Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not responding
to CLI, killed it.
Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from ping:
400 CLI communication error
Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died signal=9
Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
starts
Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
mmap'ed 483183820800 bytes of 483183820800

The following link is the varnishstat output just 1 minute before a restart:

https://pastebin.com/g0g5RVTs

Environment:

varnish-5.1.2 revision 6ece695
Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
Installed using pre-built package from official repo at packagecloud.io
CPU 2x2.9 GHz
Mem 3.69 GiB
Running inside a Docker container
NFILES=131072
MEMLOCK=82000

Additional info:

- I need to cache a large number of objets and the cache should last for
almost a week, so I have set up a 450G storage space, I don't know if this
is a problem;
- I use ban a lot. There was about 40k bans in the system just before the
last crash. I really don't know if this is too much or may have anything to
do with it;
- No registered CPU spikes (almost always by 30%);
- No panic is reported, the only info I can retrieve is from syslog;
- During all the time, event moments before the crashes, everything is okay
and requests are being responded very fast.

Best,
Stefano Baldo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170623/cefe7ad3/attachment.html>

From guillaume at varnish-software.com  Fri Jun 23 14:30:18 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Fri, 23 Jun 2017 16:30:18 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
Message-ID: <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>

Hi Stefano,

Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
trying to push/pull data and can't make time to reply to the CLI. I'd
recommend monitoring the disk activity (bandwidth and iops) to confirm.

After some time, the file storage is terrible on a hard drive (SSDs take a
bit more time to degrade) because of fragmentation. One solution to help
the disks cope is to overprovision themif they're SSDs, and you can try
different advices in the file storage definition in the command line (last
parameter, after granularity).

Is your /var/lib/varnish mount on tmpfs? That could help too.

40K bans is a lot, are they ban-lurker friendly?

-- 
Guillaume Quintard

On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Hello.
>
> I am having a critical problem with Varnish Cache in production for over a
> month and any help will be appreciated.
> The problem is that Varnish child process is recurrently being restarted
> after 10~20h of use, with the following message:
>
> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not responding
> to CLI, killed it.
> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from ping:
> 400 CLI communication error
> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died signal=9
> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
> starts
> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
> mmap'ed 483183820800 bytes of 483183820800
>
> The following link is the varnishstat output just 1 minute before a
> restart:
>
> https://pastebin.com/g0g5RVTs
>
> Environment:
>
> varnish-5.1.2 revision 6ece695
> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
> Installed using pre-built package from official repo at packagecloud.io
> CPU 2x2.9 GHz
> Mem 3.69 GiB
> Running inside a Docker container
> NFILES=131072
> MEMLOCK=82000
>
> Additional info:
>
> - I need to cache a large number of objets and the cache should last for
> almost a week, so I have set up a 450G storage space, I don't know if this
> is a problem;
> - I use ban a lot. There was about 40k bans in the system just before the
> last crash. I really don't know if this is too much or may have anything to
> do with it;
> - No registered CPU spikes (almost always by 30%);
> - No panic is reported, the only info I can retrieve is from syslog;
> - During all the time, event moments before the crashes, everything is
> okay and requests are being responded very fast.
>
> Best,
> Stefano Baldo
>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170623/68dacb08/attachment.html>

From joh.hendriks at gmail.com  Fri Jun 23 14:52:53 2017
From: joh.hendriks at gmail.com (Johan Hendriks)
Date: Fri, 23 Jun 2017 16:52:53 +0200
Subject: Varnish performance with phpinfo
In-Reply-To: <CAJ6ZYQxy8Skagu1jqP0tggyhqA-599urogy6JLxXUOgZyEYBXA@mail.gmail.com>
References: <6fa4576d-d25e-b770-44da-98877379a815@gmail.com>
 <CAJ6ZYQxy8Skagu1jqP0tggyhqA-599urogy6JLxXUOgZyEYBXA@mail.gmail.com>
Message-ID: <83029bff-6f19-5d12-0514-fa6441ecbd6a@gmail.com>

Thanks for you answer.
I was thinking about that also, but I could not find anything that
pointed in that direction.
But should I hit that limit also with the info.html file then or could
it be the size of the page.
The info.html is off cource way smaller than the whole php.info page.

regards
Johan


Op 23/06/2017 om 10:58 schreef Guillaume Quintard:
> Stupid question but, aren't you being limited by your client, or a
> firewall, maybe?
>
> -- 
> Guillaume Quintard
>
> On Fri, Jun 2, 2017 at 12:06 PM, Johan Hendriks
> <joh.hendriks at gmail.com <mailto:joh.hendriks at gmail.com>> wrote:
>
>     Hello all, First sorry for the long email.
>     I have a strange issue with varnish. At least I think it is strange.
>
>     We start some tests with varnish, but we have an issue.
>
>     I am running varnish 4.1.6 on FreeBSD 11.1-prerelease. Where
>     varnish listen on port 82 and apache on 80, This is just for the
>     tests.
>     We use the following start options.
>
>     # Varnish
>     varnishd_enable="YES"
>     varnishd_listen="192.168.2.247:82 <http://192.168.2.247:82>"
>     varnishd_pidfile="/var/run/varnishd.pid"
>     varnishd_storage="default=malloc,2024M"
>     varnishd_config="/usr/local/etc/varnish/default.vcl"
>     varnishd_hash="critbit"
>     varnishd_admin=":6082"
>     varnishncsa_enable="YES"
>
>     We did a test with a static page and that went fine. First we see
>     it is not cached, second attempt is cached.
>
>     root at desk:~ # curl -I www.testdomain.nl:82/info.html
>     <http://www.testdomain.nl:82/info.html>
>     HTTP/1.1 200 OK
>     Date: Fri, 02 Jun 2017 09:19:52 GMT
>     Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
>     ETag: "cf4-550e57bc1f812"
>     Content-Length: 3316
>     Content-Type: text/html
>     cache-control: max-age = 259200
>     X-Varnish: 2
>     Age: 0
>     Via: 1.1 varnish-v4
>     Server: varnish
>     X-Powered-By: My Varnish
>     X-Cache: MISS
>     Accept-Ranges: bytes
>     Connection: keep-alive
>
>     root at desk:~ # curl -I www.testdomain.nl:82/info.html
>     <http://www.testdomain.nl:82/info.html>
>     HTTP/1.1 200 OK
>     Date: Fri, 02 Jun 2017 09:19:52 GMT
>     Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
>     ETag: "cf4-550e57bc1f812"
>     Content-Length: 3316
>     Content-Type: text/html
>     cache-control: max-age = 259200
>     X-Varnish: 5 3
>     Age: 6
>     Via: 1.1 varnish-v4
>     Server: varnish
>     X-Powered-By: My Varnish
>     X-Cache: HIT
>     Accept-Ranges: bytes
>     Connection: keep-alive
>
>     if I benchmark the server I get the following.
>     First is derectly to Apache
>
>     root at testserver:~ # bombardier -c400 -n10000
>     http://www.testdomain.nl/info.html
>     <http://www.testdomain.nl/info.html>
>     Bombarding http://www.testdomain.nl/info.html
>     <http://www.testdomain.nl/info.html> with 10000 requests using 400
>     connections
>      10000 / 10000
>     [=============================================================]
>     100.00% 0s
>     Done!
>     Statistics        Avg      Stdev        Max
>       Reqs/sec     12459.00     898.32      13301
>       Latency       31.04ms    25.28ms   280.90ms
>       HTTP codes:
>         1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>         others - 0
>       Throughput:    42.16MB/s
>
>     This is via varnish. So that works as intended.
>     Varnish does its job and servers the page better.
>
>     root at testserver:~ # bombardier -c400 -n10000
>     http://www.testdomain.nl:82/info.html
>     <http://www.testdomain.nl:82/info.html>
>     Bombarding http://www.testdomain.nl:82/info.html
>     <http://www.testdomain.nl:82/info.html> with 10000 requests using
>     400 connections
>      10000 / 10000
>     [=============================================================]
>     100.00% 0s
>     Done!
>     Statistics        Avg      Stdev        Max
>       Reqs/sec     19549.00    7649.32      24313
>       Latency       17.90ms    66.77ms   485.07ms
>       HTTP codes:
>         1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>         others - 0
>       Throughput:    71.58MB/s
>
>
>     The next one is against a info.php file, which runs phpinfo();
>
>     So first agains the server without varnish.
>
>     root at testserver:~ # bombardier -c400 -n10000
>     http://www.testdomain.nl/info.php <http://www.testdomain.nl/info.php>
>     Bombarding http://www.testdomain.nl/info.php
>     <http://www.testdomain.nl/info.php> with 10000 requests using 400
>     connections
>      10000 / 10000
>     [============================================================]
>     100.00% 11s
>     Done!
>     Statistics        Avg      Stdev        Max
>       Reqs/sec       828.00     127.66       1010
>       Latency      472.10ms    59.10ms   740.43ms
>       HTTP codes:
>         1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>         others - 0
>       Throughput:    75.51MB/s
>
>     But then against the server with varnish.
>     So we make sure it is in cache
>
>     root at desk:~ # curl -I www.testdomain.nl:82/info.php
>     <http://www.testdomain.nl:82/info.php>
>     HTTP/1.1 200 OK
>     Date: Fri, 02 Jun 2017 09:36:16 GMT
>     Content-Type: text/html; charset=UTF-8
>     cache-control: max-age = 259200
>     X-Varnish: 7
>     Age: 0
>     Via: 1.1 varnish-v4
>     Server: varnish
>     X-Powered-By: My Varnish
>     X-Cache: MISS
>     Accept-Ranges: bytes
>     Connection: keep-alive
>
>     root at desk:~ # curl -I www.testdomain.nl:82/info.php
>     <http://www.testdomain.nl:82/info.php>
>     HTTP/1.1 200 OK
>     Date: Fri, 02 Jun 2017 09:36:16 GMT
>     Content-Type: text/html; charset=UTF-8
>     cache-control: max-age = 259200
>     X-Varnish: 10 8
>     Age: 2
>     Via: 1.1 varnish-v4
>     Server: varnish
>     X-Powered-By: My Varnish
>     X-Cache: HIT
>     Accept-Ranges: bytes
>     Connection: keep-alive
>
>     So it is in cache now.
>     root at testserver:~ # bombardier -c400 -n10000
>     http://www.testdomain.nl:82/info.php
>     <http://www.testdomain.nl:82/info.php>
>     Bombarding http://www.testdomain.nl:82/info.php
>     <http://www.testdomain.nl:82/info.php> with 10000 requests using
>     400 connections
>      10000 / 10000
>     [===========================================================================================================================================================================================================]
>     100.00% 8s
>     Done!
>     Statistics        Avg      Stdev        Max
>       Reqs/sec      1179.00     230.77       1981
>       Latency      219.94ms   340.29ms      2.00s
>       HTTP codes:
>         1xx - 0, 2xx - 9938, 3xx - 0, 4xx - 0, 5xx - 0
>         others - 62
>       Errors:
>         dialing to the given TCP address timed out - 62
>       Throughput:    83.16MB/s
>
>     I expected this to be much more in favour of varnish, but it even
>     generated some errors! Time taken is lower but I expected it to be
>     much faster. Also the 62 errors is not good i guess.
>
>     I do see the following with varnish log
>     *   << Request  >> 11141123
>     -   Begin          req 1310723 rxreq
>     -   Timestamp      Start: 1496396250.098654 0.000000 0.000000
>     -   Timestamp      Req: 1496396250.098654 0.000000 0.000000
>     -   ReqStart       192.168.2.39 14818
>     -   ReqMethod      GET
>     -   ReqURL         /info.php
>     -   ReqProtocol    HTTP/1.1
>     -   ReqHeader      User-Agent: fasthttp
>     -   ReqHeader      Host: www.testdomain.nl:82
>     <http://www.testdomain.nl:82>
>     -   ReqHeader      X-Forwarded-For: 192.168.2.39
>     -   VCL_call       RECV
>     -   ReqUnset       X-Forwarded-For: 192.168.2.39
>     -   ReqHeader      X-Forwarded-For: 192.168.2.39, 192.168.2.39
>     -   VCL_return     hash
>     -   VCL_call       HASH
>     -   VCL_return     lookup
>     -   Hit            8
>     -   VCL_call       HIT
>     -   VCL_return     deliver
>     -   RespProtocol   HTTP/1.1
>     -   RespStatus     200
>     -   RespReason     OK
>     -   RespHeader     Date: Fri, 02 Jun 2017 09:36:16 GMT
>     -   RespHeader     Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
>     -   RespHeader     X-Powered-By: PHP/7.0.19
>     -   RespHeader     Content-Type: text/html; charset=UTF-8
>     -   RespHeader     cache-control: max-age = 259200
>     -   RespHeader     X-Varnish: 11141123 8
>     -   RespHeader     Age: 73
>     -   RespHeader     Via: 1.1 varnish-v4
>     -   VCL_call       DELIVER
>     -   RespUnset      Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
>     -   RespHeader     Server: varnish
>     -   RespUnset      X-Powered-By: PHP/7.0.19
>     -   RespHeader     X-Powered-By: My Varnish
>     -   RespHeader     X-Cache: HIT
>     -   VCL_return     deliver
>     -   Timestamp      Process: 1496396250.098712 0.000058 0.000058
>     -   RespHeader     Accept-Ranges: bytes
>     -   RespHeader     Content-Length: 95200
>     -   Debug          "RES_MODE 2"
>     -   RespHeader     Connection: keep-alive
>     *-   Debug          "Hit idle send timeout, wrote = 89972/95508;
>     retrying"**
>     **-   Debug          "Write error, retval = -1, len = 5536, errno
>     = Resource temporarily unavailable"*
>     -   Timestamp      Resp: 1496396371.131526 121.032872 121.032814
>     -   ReqAcct        82 0 82 308 95200 95508
>     -   End
>
>     Sometimes I see this Debug line also -   *Debug          "Write
>     error, retval = -1, len = 95563, errno = Broken pipe"*
>
>
>     I also installed varnish 5.1.2 but the results are the same.
>     Is there something I miss?
>
>     My vcl file is pretty basic.
>
>     https://pastebin.com/rbb42x7h
>
>     Thanks all for your time.
>
>     regards
>     Johan
>
>
>     _______________________________________________
>     varnish-misc mailing list
>     varnish-misc at varnish-cache.org <mailto:varnish-misc at varnish-cache.org>
>     https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>     <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170623/bba4e418/attachment.html>

From guillaume at varnish-software.com  Fri Jun 23 15:36:49 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Fri, 23 Jun 2017 17:36:49 +0200
Subject: Varnish performance with phpinfo
In-Reply-To: <83029bff-6f19-5d12-0514-fa6441ecbd6a@gmail.com>
References: <6fa4576d-d25e-b770-44da-98877379a815@gmail.com>
 <CAJ6ZYQxy8Skagu1jqP0tggyhqA-599urogy6JLxXUOgZyEYBXA@mail.gmail.com>
 <83029bff-6f19-5d12-0514-fa6441ecbd6a@gmail.com>
Message-ID: <CAJ6ZYQwm7Gc+sN4n4eBFsyAvt55c796LWDzwgMTS4p3V3rFg7A@mail.gmail.com>

Simple way to test: grow the info.html size :-)

-- 
Guillaume Quintard

On Fri, Jun 23, 2017 at 4:52 PM, Johan Hendriks <joh.hendriks at gmail.com>
wrote:

> Thanks for you answer.
> I was thinking about that also, but I could not find anything that pointed
> in that direction.
> But should I hit that limit also with the info.html file then or could it
> be the size of the page.
> The info.html is off cource way smaller than the whole php.info page.
>
> regards
> Johan
>
> Op 23/06/2017 om 10:58 schreef Guillaume Quintard:
>
> Stupid question but, aren't you being limited by your client, or a
> firewall, maybe?
>
> --
> Guillaume Quintard
>
> On Fri, Jun 2, 2017 at 12:06 PM, Johan Hendriks <joh.hendriks at gmail.com>
> wrote:
>
>> Hello all, First sorry for the long email.
>> I have a strange issue with varnish. At least I think it is strange.
>>
>> We start some tests with varnish, but we have an issue.
>>
>> I am running varnish 4.1.6 on FreeBSD 11.1-prerelease. Where varnish
>> listen on port 82 and apache on 80, This is just for the tests.
>> We use the following start options.
>>
>> # Varnish
>> varnishd_enable="YES"
>> varnishd_listen="192.168.2.247:82"
>> varnishd_pidfile="/var/run/varnishd.pid"
>> varnishd_storage="default=malloc,2024M"
>> varnishd_config="/usr/local/etc/varnish/default.vcl"
>> varnishd_hash="critbit"
>> varnishd_admin=":6082"
>> varnishncsa_enable="YES"
>>
>> We did a test with a static page and that went fine. First we see it is
>> not cached, second attempt is cached.
>>
>> root at desk:~ # curl -I www.testdomain.nl:82/info.html
>> HTTP/1.1 200 OK
>> Date: Fri, 02 Jun 2017 09:19:52 GMT
>> Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
>> ETag: "cf4-550e57bc1f812"
>> Content-Length: 3316
>> Content-Type: text/html
>> cache-control: max-age = 259200
>> X-Varnish: 2
>> Age: 0
>> Via: 1.1 varnish-v4
>> Server: varnish
>> X-Powered-By: My Varnish
>> X-Cache: MISS
>> Accept-Ranges: bytes
>> Connection: keep-alive
>>
>> root at desk:~ # curl -I www.testdomain.nl:82/info.html
>> HTTP/1.1 200 OK
>> Date: Fri, 02 Jun 2017 09:19:52 GMT
>> Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
>> ETag: "cf4-550e57bc1f812"
>> Content-Length: 3316
>> Content-Type: text/html
>> cache-control: max-age = 259200
>> X-Varnish: 5 3
>> Age: 6
>> Via: 1.1 varnish-v4
>> Server: varnish
>> X-Powered-By: My Varnish
>> X-Cache: HIT
>> Accept-Ranges: bytes
>> Connection: keep-alive
>>
>> if I benchmark the server I get the following.
>> First is derectly to Apache
>>
>> root at testserver:~ # bombardier -c400 -n10000
>> http://www.testdomain.nl/info.html
>> Bombarding http://www.testdomain.nl/info.html with 10000 requests using
>> 400 connections
>>  10000 / 10000 [=============================
>> ================================] 100.00% 0s
>> Done!
>> Statistics        Avg      Stdev        Max
>>   Reqs/sec     12459.00     898.32      13301
>>   Latency       31.04ms    25.28ms   280.90ms
>>   HTTP codes:
>>     1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>>     others - 0
>>   Throughput:    42.16MB/s
>>
>> This is via varnish. So that works as intended.
>> Varnish does its job and servers the page better.
>>
>> root at testserver:~ # bombardier -c400 -n10000
>> http://www.testdomain.nl:82/info.html
>> Bombarding http://www.testdomain.nl:82/info.html with 10000 requests
>> using 400 connections
>>  10000 / 10000 [=============================
>> ================================] 100.00% 0s
>> Done!
>> Statistics        Avg      Stdev        Max
>>   Reqs/sec     19549.00    7649.32      24313
>>   Latency       17.90ms    66.77ms   485.07ms
>>   HTTP codes:
>>     1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>>     others - 0
>>   Throughput:    71.58MB/s
>>
>>
>> The next one is against a info.php file, which runs phpinfo();
>>
>> So first agains the server without varnish.
>>
>> root at testserver:~ # bombardier -c400 -n10000
>> http://www.testdomain.nl/info.php
>> Bombarding http://www.testdomain.nl/info.php with 10000 requests using
>> 400 connections
>>  10000 / 10000 [=============================
>> ===============================] 100.00% 11s
>> Done!
>> Statistics        Avg      Stdev        Max
>>   Reqs/sec       828.00     127.66       1010
>>   Latency      472.10ms    59.10ms   740.43ms
>>   HTTP codes:
>>     1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>>     others - 0
>>   Throughput:    75.51MB/s
>>
>> But then against the server with varnish.
>> So we make sure it is in cache
>>
>> root at desk:~ # curl -I www.testdomain.nl:82/info.php
>> HTTP/1.1 200 OK
>> Date: Fri, 02 Jun 2017 09:36:16 GMT
>> Content-Type: text/html; charset=UTF-8
>> cache-control: max-age = 259200
>> X-Varnish: 7
>> Age: 0
>> Via: 1.1 varnish-v4
>> Server: varnish
>> X-Powered-By: My Varnish
>> X-Cache: MISS
>> Accept-Ranges: bytes
>> Connection: keep-alive
>>
>> root at desk:~ # curl -I www.testdomain.nl:82/info.php
>> HTTP/1.1 200 OK
>> Date: Fri, 02 Jun 2017 09:36:16 GMT
>> Content-Type: text/html; charset=UTF-8
>> cache-control: max-age = 259200
>> X-Varnish: 10 8
>> Age: 2
>> Via: 1.1 varnish-v4
>> Server: varnish
>> X-Powered-By: My Varnish
>> X-Cache: HIT
>> Accept-Ranges: bytes
>> Connection: keep-alive
>>
>> So it is in cache now.
>> root at testserver:~ # bombardier -c400 -n10000
>> http://www.testdomain.nl:82/info.php
>> Bombarding http://www.testdomain.nl:82/info.php with 10000 requests
>> using 400 connections
>>  10000 / 10000 [=============================
>> ============================================================
>> ============================================================
>> ======================================================] 100.00% 8s
>> Done!
>> Statistics        Avg      Stdev        Max
>>   Reqs/sec      1179.00     230.77       1981
>>   Latency      219.94ms   340.29ms      2.00s
>>   HTTP codes:
>>     1xx - 0, 2xx - 9938, 3xx - 0, 4xx - 0, 5xx - 0
>>     others - 62
>>   Errors:
>>     dialing to the given TCP address timed out - 62
>>   Throughput:    83.16MB/s
>>
>> I expected this to be much more in favour of varnish, but it even
>> generated some errors! Time taken is lower but I expected it to be much
>> faster. Also the 62 errors is not good i guess.
>>
>> I do see the following with varnish log
>> *   << Request  >> 11141123
>> -   Begin          req 1310723 rxreq
>> -   Timestamp      Start: 1496396250.098654 0.000000 0.000000
>> -   Timestamp      Req: 1496396250.098654 0.000000 0.000000
>> -   ReqStart       192.168.2.39 14818
>> -   ReqMethod      GET
>> -   ReqURL         /info.php
>> -   ReqProtocol    HTTP/1.1
>> -   ReqHeader      User-Agent: fasthttp
>> -   ReqHeader      Host: www.testdomain.nl:82
>> -   ReqHeader      X-Forwarded-For: 192.168.2.39
>> -   VCL_call       RECV
>> -   ReqUnset       X-Forwarded-For: 192.168.2.39
>> -   ReqHeader      X-Forwarded-For: 192.168.2.39, 192.168.2.39
>> -   VCL_return     hash
>> -   VCL_call       HASH
>> -   VCL_return     lookup
>> -   Hit            8
>> -   VCL_call       HIT
>> -   VCL_return     deliver
>> -   RespProtocol   HTTP/1.1
>> -   RespStatus     200
>> -   RespReason     OK
>> -   RespHeader     Date: Fri, 02 Jun 2017 09:36:16 GMT
>> -   RespHeader     Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
>> -   RespHeader     X-Powered-By: PHP/7.0.19
>> -   RespHeader     Content-Type: text/html; charset=UTF-8
>> -   RespHeader     cache-control: max-age = 259200
>> -   RespHeader     X-Varnish: 11141123 8
>> -   RespHeader     Age: 73
>> -   RespHeader     Via: 1.1 varnish-v4
>> -   VCL_call       DELIVER
>> -   RespUnset      Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
>> -   RespHeader     Server: varnish
>> -   RespUnset      X-Powered-By: PHP/7.0.19
>> -   RespHeader     X-Powered-By: My Varnish
>> -   RespHeader     X-Cache: HIT
>> -   VCL_return     deliver
>> -   Timestamp      Process: 1496396250.098712 0.000058 0.000058
>> -   RespHeader     Accept-Ranges: bytes
>> -   RespHeader     Content-Length: 95200
>> -   Debug          "RES_MODE 2"
>> -   RespHeader     Connection: keep-alive
>> *-   Debug          "Hit idle send timeout, wrote = 89972/95508;
>> retrying"*
>> *-   Debug          "Write error, retval = -1, len = 5536, errno =
>> Resource temporarily unavailable"*
>> -   Timestamp      Resp: 1496396371.131526 121.032872 121.032814
>> -   ReqAcct        82 0 82 308 95200 95508
>> -   End
>>
>> Sometimes I see this Debug line also -   *Debug          "Write error,
>> retval = -1, len = 95563, errno = Broken pipe"*
>>
>>
>> I also installed varnish 5.1.2 but the results are the same.
>> Is there something I miss?
>>
>> My vcl file is pretty basic.
>>
>> https://pastebin.com/rbb42x7h
>>
>> Thanks all for your time.
>>
>> regards
>> Johan
>>
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170623/8667b48f/attachment.html>

From np.lists at sharphosting.uk  Fri Jun 23 19:02:50 2017
From: np.lists at sharphosting.uk (Nigel Peck)
Date: Fri, 23 Jun 2017 14:02:50 -0500
Subject: Unexplained Cache MISSes
In-Reply-To: <CAJ6ZYQw202eRngV1yr=CzgW0O86=Xh5KEazV_1bwP-=jcCQ3Ew@mail.gmail.com>
References: <211c667a-ce70-6373-c840-4482c159e38c@sharphosting.uk>
 <b073048c-0f7e-7304-839c-c1ee37acf977@sharphosting.uk>
 <CAJ6ZYQzp-3vA1GzKWwt0mP94947xmUPYvsygGU32HezQ9a4n9w@mail.gmail.com>
 <db381417-a8c1-fa5c-d24d-67d668f720c3@sharphosting.uk>
 <CABoVN9BrynCxCtP8QAv2Jdxiq7Z+=BWRXQ5o0F1C015RrJi5+A@mail.gmail.com>
 <35dfe986-72dc-95f5-0319-9d0743aebbe4@sharphosting.uk>
 <CABoVN9Dkh3QHhA5J6EMOsT9CQmLgPz-4NWAO3_ied-oqok4iOg@mail.gmail.com>
 <3fdcafb5-4000-3d64-478b-fb60baa9a783@sharphosting.uk>
 <CABoVN9Cd4TRS_C_6cLaLyuwnyb5J_MEHN_qR83p0SnCx0Mak+w@mail.gmail.com>
 <53cec1b0-0b57-7110-4b78-9ac280eaa782@sharphosting.uk>
 <CABoVN9D4mUoTidbc5ZfkP4RZaD-D2HsJKM3Y8DGASui_qg=SpQ@mail.gmail.com>
 <1a0267d7-8cc4-4a9c-5f0a-9719db34321d@sharphosting.uk>
 <499ebc8d-e952-571c-e378-0fe092c6c709@sharphosting.uk>
 <CAJ6ZYQw202eRngV1yr=CzgW0O86=Xh5KEazV_1bwP-=jcCQ3Ew@mail.gmail.com>
Message-ID: <a3f12e90-6d1d-58e3-793b-dc12e815743e@sharphosting.uk>


Sure, that's something I can understand! Will gather some data over the 
next couple of days for different time periods and configurations.

On 23/06/2017 04:09, Guillaume Quintard wrote:
> Hum, could you toy with ttl/grace/keep periods? Like having only a one 
> week TTL but no grace/keep, then a one week grace but no TTL/keep?
> The period when the purge occurs may be important...
> 
> -- 
> Guillaume Quintard
> 
> On Fri, Jun 16, 2017 at 9:09 PM, Nigel Peck <np.lists at sharphosting.uk 
> <mailto:np.lists at sharphosting.uk>> wrote:
> 
> 
>     Here's an interesting thing about this. When I refreshed the cache
>     just now (PURGE) for 204 URLs, 78 of them were a HIT instead of a
>     MISS. All had been in the cache for 9 hours at least. (a re-issued
>     GET request received a MISS for all 78)
> 
>     When I immediately issued a PURGE again a few seconds later for all
>     204 URLs, every one of them was a MISS and purged successfully. I
>     did it again a few seconds after that, and again all good. Same
>     again a few minutes after that. No HITs.
> 
>     So this seems to be in some way related to how long the objects have
>     been in the cache.
> 
>     Nigel
> 
> 
>     On 16/06/2017 13:27, Nigel Peck wrote:
> 
> 
>         Sorry for the delay on working on this. I've read your email a
>         few times now and am still confused! I need to read the man
>         pages suggested but haven't got to it yet. Will let you know
>         when I make some progress on it.
> 
>         I'm fixing the issue in the interim here by issuing another GET
>         request in my cache refresh scripts for any PURGE requests that
>         come back with a HIT.
> 
>         Nigel
> 
>         On 02/06/2017 18:08, Dridi Boukelmoune wrote:
> 
>                     Amazingly enough I never looked at the logs of a
>                     purge, maybe ExpKill
>                     could give us a VXID to then check against the hit.
>                     If only SomeoneElse(tm)
>                     could spare me the time and look at it themselves
>                     and tell us (wink wink=).
> 
> 
> 
>                 I'm very happy to help in any way I can. Please let me
>                 know anything I can
>                 do or information I can provide. I'm no C programmer
>                 (web developer/server
>                 admin), so can't help out with
>                 coding/patching/debugging[3], but anything
>                 else I can do, please let me know what you need.
> 
> 
>             Well, luckily I didn't write any C code to find out what
>             purge logs
>             look like. I'm certainly not going to debug code I'm not
>             familiar with ;)
> 
>             I wrote a dummy test case instead:
> 
>                   varnishtest "purge logs"
> 
>                   server s1 {
>                       rxreq
>                       expect req.url == "/to-be-purged"
>                       txresp
>                   } -start
> 
>                   varnish v1 -vcl+backend {
>                       sub vcl_recv {
>                           if (req.method == "PURGE") {
>                               return (purge);
>                           }
>                       }
>                   } -start
> 
>                   client c1 {
>                       txreq -url "/to-be-purged"
>                       rxresp
> 
>                       txreq -req PURGE -url "/to-be-purged"
>                       rxresp
> 
>                       txreq -req PURGE -url "/unknown"
>                       rxresp
>                   } -run
> 
>             And then looked at the logs manually:
> 
>                   varnishtest test.vtc -v | grep vsl | less
> 
>             Here's a sample:
> 
>                   [...]
>                   **** v1    0.4 vsl|       1002 VCL_return      b deliver
>                   **** v1    0.4 vsl|       1002 Storage         b malloc s0
>                   [...]
>                   **** v1    0.4 vsl|          0 ExpKill         - EXP_When
>             p=0x7f420b027000 e=1496443420.703764200 f=0xe
>                   **** v1    0.4 vsl|          0 ExpKill         -
>             EXP_expire
>             p=0x7f420b027000 e=-0.000092268 f=0x0
>                   **** v1    0.4 vsl|          0 ExpKill         -
>             EXP_Expired x=1002 t=-0
>                   [...]
>                   **** v1    0.4 vsl|       1003 ReqMethod       c PURGE
>                   **** v1    0.4 vsl|       1003 ReqURL          c
>             /to-be-purged
>                   [...]
>                   **** v1    0.4 vsl|       1003 VCL_return      c purge
>                   **** v1    0.4 vsl|       1003 VCL_call        c HASH
>                   **** v1    0.4 vsl|       1003 VCL_return      c lookup
>                   **** v1    0.4 vsl|       1003 VCL_call        c PURGE
>                   **** v1    0.4 vsl|       1003 VCL_return      c synth
>                   [...]
>                   **** v1    0.4 vsl|       1004 ReqMethod       c PURGE
>                   **** v1    0.4 vsl|       1004 ReqURL          c /unknown
>                   [...]
>                   **** v1    0.4 vsl|       1004 VCL_return      c purge
>                   **** v1    0.4 vsl|       1004 VCL_call        c HASH
>                   **** v1    0.4 vsl|       1004 VCL_return      c lookup
>                   **** v1    0.4 vsl|       1004 VCL_call        c PURGE
>                   **** v1    0.4 vsl|       1004 VCL_return      c synth
>                   [...]
> 
>             The interesting transaction id (VXID) is 1002.
> 
>             So 1) purge-related logs will only show up with raw grouping in
>             varnishlog (which I find unfortunate but I should have
>             remembered the
>             expiry thread would have been involved) and 2) we don't see in a
>             transaction log how many objects were actually purged (moved
>             to the
>             expiry inbox).
> 
>             The ExpKill records appear before because transactions
>             commit their
>             logs when they finish by default.
> 
>                 Would a cleanly installed server and absolute minimum
>                 VCL to reproduce this
>                 be useful? You would be welcome to have access to that
>                 server, if useful,
>                 once I've got it set up and producing the same problem.
> 
> 
>             Not yet, at this point we know that we were looking at an
>             incomplete
>             picture so what you need to do is capture raw logs and we
>             will be able
>             to get both a VXID and a timestamp from the ExpKill records
>             (although
>             the timestamp for EXP_expire puzzles me).
> 
>             See man varnishlog to see how to write (-w) and then read
>             (-r) logs
>             to/from a file. When you notice the alleged bug, note the
>             transaction
>             id and write the current logs (with the -d option) so that
>             you can
>             pick up all the interesting bits at rest (instead of doing
>             it on live
>             traffic).
> 
>                 I can say that in my case there is definitely no Age
>                 header coming from the
>                 back-end. Also as shown in the example I sent it is the
>                 7th HIT on that
>                 object.
> 
> 
>             Yes, smells like a bug. But before capturing logs, make sure
>             to remove
>             Hash records from the vsl_mask (man varnishd) so we can
>             confirm what's
>             being purged too.
> 
>             I have a theory, a long shot that will only prove how
>             unfamiliar I am
>             with this part of Varnish. Since the purge moves the object
>             to the
>             expiry inbox, it could be that under load the restart may happen
>             before the expiry thread marks it as expired, thus creating
>             a race
>             with the next lookup.
> 
>             Cheers,
>             Dridi
> 
> 
>         _______________________________________________
>         varnish-misc mailing list
>         varnish-misc at varnish-cache.org
>         <mailto:varnish-misc at varnish-cache.org>
>         https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
> 
> 
>     _______________________________________________
>     varnish-misc mailing list
>     varnish-misc at varnish-cache.org <mailto:varnish-misc at varnish-cache.org>
>     https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>     <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
> 
> 


From stefanobaldo at gmail.com  Mon Jun 26 14:51:40 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Mon, 26 Jun 2017 11:51:40 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
Message-ID: <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>

Hi Guillaume.

Thanks for answering.

I'm using a SSD disk. I've changed from ext4 to ext2 to increase
performance but it stills restarting.
Also, I checked the I/O performance for the disk and there is no signal of
overhead.

I've changed the /var/lib/varnish to a tmpfs and increased its 80m default
size passing "-l 200m,20m" to varnishd and using
"nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
problem here. After a couple of hours varnish died and I received a "no
space left on device" message - deleting the /var/lib/varnish solved the
problem and varnish was up again, but it's weird because there was free
memory on the host to be used with the tmpfs directory, so I don't know
what could have happened. I will try to stop increasing the
/var/lib/varnish size.

Anyway, I am worried about the bans. You asked me if the bans are lurker
friedly. Well, I don't think so. My bans are created this way:

ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + " &&
req.http.User-Agent !~ Googlebot");

Are they lurker friendly? I was taking a quick look and the documentation
and it looks like they're not.

Best,
Stefano


On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Hi Stefano,
>
> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
> trying to push/pull data and can't make time to reply to the CLI. I'd
> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>
> After some time, the file storage is terrible on a hard drive (SSDs take a
> bit more time to degrade) because of fragmentation. One solution to help
> the disks cope is to overprovision themif they're SSDs, and you can try
> different advices in the file storage definition in the command line (last
> parameter, after granularity).
>
> Is your /var/lib/varnish mount on tmpfs? That could help too.
>
> 40K bans is a lot, are they ban-lurker friendly?
>
> --
> Guillaume Quintard
>
> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hello.
>>
>> I am having a critical problem with Varnish Cache in production for over
>> a month and any help will be appreciated.
>> The problem is that Varnish child process is recurrently being restarted
>> after 10~20h of use, with the following message:
>>
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>> responding to CLI, killed it.
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from ping:
>> 400 CLI communication error
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died signal=9
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>> starts
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
>> mmap'ed 483183820800 bytes of 483183820800
>>
>> The following link is the varnishstat output just 1 minute before a
>> restart:
>>
>> https://pastebin.com/g0g5RVTs
>>
>> Environment:
>>
>> varnish-5.1.2 revision 6ece695
>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>> Installed using pre-built package from official repo at packagecloud.io
>> CPU 2x2.9 GHz
>> Mem 3.69 GiB
>> Running inside a Docker container
>> NFILES=131072
>> MEMLOCK=82000
>>
>> Additional info:
>>
>> - I need to cache a large number of objets and the cache should last for
>> almost a week, so I have set up a 450G storage space, I don't know if this
>> is a problem;
>> - I use ban a lot. There was about 40k bans in the system just before the
>> last crash. I really don't know if this is too much or may have anything to
>> do with it;
>> - No registered CPU spikes (almost always by 30%);
>> - No panic is reported, the only info I can retrieve is from syslog;
>> - During all the time, event moments before the crashes, everything is
>> okay and requests are being responded very fast.
>>
>> Best,
>> Stefano Baldo
>>
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170626/f5a323d7/attachment.html>

From guillaume at varnish-software.com  Mon Jun 26 15:43:54 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Mon, 26 Jun 2017 17:43:54 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
Message-ID: <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>

Not lurker friendly at all indeed. You'll need to avoid req.* expression.
Easiest way is to stash the host, user-agent and url in beresp.http.* and
ban against those (unset them in vcl_deliver).

I don't think you need to expand the VSL at all.

-- 
Guillaume Quintard

On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:

Hi Guillaume.

Thanks for answering.

I'm using a SSD disk. I've changed from ext4 to ext2 to increase
performance but it stills restarting.
Also, I checked the I/O performance for the disk and there is no signal of
overhead.

I've changed the /var/lib/varnish to a tmpfs and increased its 80m default
size passing "-l 200m,20m" to varnishd and using
"nodev,nosuid,noatime,size=256M
0 0" for the tmpfs mount. There was a problem here. After a couple of hours
varnish died and I received a "no space left on device" message - deleting
the /var/lib/varnish solved the problem and varnish was up again, but it's
weird because there was free memory on the host to be used with the tmpfs
directory, so I don't know what could have happened. I will try to stop
increasing the /var/lib/varnish size.

Anyway, I am worried about the bans. You asked me if the bans are lurker
friedly. Well, I don't think so. My bans are created this way:

ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + " &&
req.http.User-Agent !~ Googlebot");

Are they lurker friendly? I was taking a quick look and the documentation
and it looks like they're not.

Best,
Stefano


On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Hi Stefano,
>
> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
> trying to push/pull data and can't make time to reply to the CLI. I'd
> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>
> After some time, the file storage is terrible on a hard drive (SSDs take a
> bit more time to degrade) because of fragmentation. One solution to help
> the disks cope is to overprovision themif they're SSDs, and you can try
> different advices in the file storage definition in the command line (last
> parameter, after granularity).
>
> Is your /var/lib/varnish mount on tmpfs? That could help too.
>
> 40K bans is a lot, are they ban-lurker friendly?
>
> --
> Guillaume Quintard
>
> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hello.
>>
>> I am having a critical problem with Varnish Cache in production for over
>> a month and any help will be appreciated.
>> The problem is that Varnish child process is recurrently being restarted
>> after 10~20h of use, with the following message:
>>
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>> responding to CLI, killed it.
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from ping:
>> 400 CLI communication error
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died signal=9
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>> starts
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
>> mmap'ed 483183820800 bytes of 483183820800
>>
>> The following link is the varnishstat output just 1 minute before a
>> restart:
>>
>> https://pastebin.com/g0g5RVTs
>>
>> Environment:
>>
>> varnish-5.1.2 revision 6ece695
>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>> Installed using pre-built package from official repo at packagecloud.io
>> CPU 2x2.9 GHz
>> Mem 3.69 GiB
>> Running inside a Docker container
>> NFILES=131072
>> MEMLOCK=82000
>>
>> Additional info:
>>
>> - I need to cache a large number of objets and the cache should last for
>> almost a week, so I have set up a 450G storage space, I don't know if this
>> is a problem;
>> - I use ban a lot. There was about 40k bans in the system just before the
>> last crash. I really don't know if this is too much or may have anything to
>> do with it;
>> - No registered CPU spikes (almost always by 30%);
>> - No panic is reported, the only info I can retrieve is from syslog;
>> - During all the time, event moments before the crashes, everything is
>> okay and requests are being responded very fast.
>>
>> Best,
>> Stefano Baldo
>>
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170626/e302cda4/attachment.html>

From stefanobaldo at gmail.com  Mon Jun 26 17:06:05 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Mon, 26 Jun 2017 14:06:05 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
Message-ID: <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>

Hi Guillaume,

Can the following be considered "ban lurker friendly"?

sub vcl_backend_response {
  set beresp.http.x-url = bereq.http.host + bereq.url;
  set beresp.http.x-user-agent = bereq.http.user-agent;
}

sub vcl_recv {
  if (req.method == "PURGE") {
    ban("obj.http.x-url == " + req.http.host + req.url + " &&
obj.http.x-user-agent !~ Googlebot");
    return(synth(750));
  }
}

sub vcl_deliver {
  unset resp.http.x-url;
  unset resp.http.x-user-agent;
}

Best,
Stefano


On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Not lurker friendly at all indeed. You'll need to avoid req.* expression.
> Easiest way is to stash the host, user-agent and url in beresp.http.* and
> ban against those (unset them in vcl_deliver).
>
> I don't think you need to expand the VSL at all.
>
> --
> Guillaume Quintard
>
> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>
> Hi Guillaume.
>
> Thanks for answering.
>
> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
> performance but it stills restarting.
> Also, I checked the I/O performance for the disk and there is no signal of
> overhead.
>
> I've changed the /var/lib/varnish to a tmpfs and increased its 80m default
> size passing "-l 200m,20m" to varnishd and using
> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
> problem here. After a couple of hours varnish died and I received a "no
> space left on device" message - deleting the /var/lib/varnish solved the
> problem and varnish was up again, but it's weird because there was free
> memory on the host to be used with the tmpfs directory, so I don't know
> what could have happened. I will try to stop increasing the
> /var/lib/varnish size.
>
> Anyway, I am worried about the bans. You asked me if the bans are lurker
> friedly. Well, I don't think so. My bans are created this way:
>
> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + "
> && req.http.User-Agent !~ Googlebot");
>
> Are they lurker friendly? I was taking a quick look and the documentation
> and it looks like they're not.
>
> Best,
> Stefano
>
>
> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Hi Stefano,
>>
>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>> trying to push/pull data and can't make time to reply to the CLI. I'd
>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>
>> After some time, the file storage is terrible on a hard drive (SSDs take
>> a bit more time to degrade) because of fragmentation. One solution to help
>> the disks cope is to overprovision themif they're SSDs, and you can try
>> different advices in the file storage definition in the command line (last
>> parameter, after granularity).
>>
>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>
>> 40K bans is a lot, are they ban-lurker friendly?
>>
>> --
>> Guillaume Quintard
>>
>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I am having a critical problem with Varnish Cache in production for over
>>> a month and any help will be appreciated.
>>> The problem is that Varnish child process is recurrently being restarted
>>> after 10~20h of use, with the following message:
>>>
>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>> responding to CLI, killed it.
>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>> ping: 400 CLI communication error
>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died signal=9
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>>> starts
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
>>> mmap'ed 483183820800 bytes of 483183820800
>>>
>>> The following link is the varnishstat output just 1 minute before a
>>> restart:
>>>
>>> https://pastebin.com/g0g5RVTs
>>>
>>> Environment:
>>>
>>> varnish-5.1.2 revision 6ece695
>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>> Installed using pre-built package from official repo at packagecloud.io
>>> CPU 2x2.9 GHz
>>> Mem 3.69 GiB
>>> Running inside a Docker container
>>> NFILES=131072
>>> MEMLOCK=82000
>>>
>>> Additional info:
>>>
>>> - I need to cache a large number of objets and the cache should last for
>>> almost a week, so I have set up a 450G storage space, I don't know if this
>>> is a problem;
>>> - I use ban a lot. There was about 40k bans in the system just before
>>> the last crash. I really don't know if this is too much or may have
>>> anything to do with it;
>>> - No registered CPU spikes (almost always by 30%);
>>> - No panic is reported, the only info I can retrieve is from syslog;
>>> - During all the time, event moments before the crashes, everything is
>>> okay and requests are being responded very fast.
>>>
>>> Best,
>>> Stefano Baldo
>>>
>>>
>>> _______________________________________________
>>> varnish-misc mailing list
>>> varnish-misc at varnish-cache.org
>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170626/b4f7ba62/attachment.html>

From guillaume at varnish-software.com  Mon Jun 26 18:10:37 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Mon, 26 Jun 2017 20:10:37 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
Message-ID: <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>

Looking good!

-- 
Guillaume Quintard

On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Hi Guillaume,
>
> Can the following be considered "ban lurker friendly"?
>
> sub vcl_backend_response {
>   set beresp.http.x-url = bereq.http.host + bereq.url;
>   set beresp.http.x-user-agent = bereq.http.user-agent;
> }
>
> sub vcl_recv {
>   if (req.method == "PURGE") {
>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
> obj.http.x-user-agent !~ Googlebot");
>     return(synth(750));
>   }
> }
>
> sub vcl_deliver {
>   unset resp.http.x-url;
>   unset resp.http.x-user-agent;
> }
>
> Best,
> Stefano
>
>
> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Not lurker friendly at all indeed. You'll need to avoid req.* expression.
>> Easiest way is to stash the host, user-agent and url in beresp.http.* and
>> ban against those (unset them in vcl_deliver).
>>
>> I don't think you need to expand the VSL at all.
>>
>> --
>> Guillaume Quintard
>>
>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>
>> Hi Guillaume.
>>
>> Thanks for answering.
>>
>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>> performance but it stills restarting.
>> Also, I checked the I/O performance for the disk and there is no signal
>> of overhead.
>>
>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>> default size passing "-l 200m,20m" to varnishd and using
>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
>> problem here. After a couple of hours varnish died and I received a "no
>> space left on device" message - deleting the /var/lib/varnish solved the
>> problem and varnish was up again, but it's weird because there was free
>> memory on the host to be used with the tmpfs directory, so I don't know
>> what could have happened. I will try to stop increasing the
>> /var/lib/varnish size.
>>
>> Anyway, I am worried about the bans. You asked me if the bans are lurker
>> friedly. Well, I don't think so. My bans are created this way:
>>
>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + "
>> && req.http.User-Agent !~ Googlebot");
>>
>> Are they lurker friendly? I was taking a quick look and the documentation
>> and it looks like they're not.
>>
>> Best,
>> Stefano
>>
>>
>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Hi Stefano,
>>>
>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>>> trying to push/pull data and can't make time to reply to the CLI. I'd
>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>
>>> After some time, the file storage is terrible on a hard drive (SSDs take
>>> a bit more time to degrade) because of fragmentation. One solution to help
>>> the disks cope is to overprovision themif they're SSDs, and you can try
>>> different advices in the file storage definition in the command line (last
>>> parameter, after granularity).
>>>
>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>
>>> 40K bans is a lot, are they ban-lurker friendly?
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hello.
>>>>
>>>> I am having a critical problem with Varnish Cache in production for
>>>> over a month and any help will be appreciated.
>>>> The problem is that Varnish child process is recurrently being
>>>> restarted after 10~20h of use, with the following message:
>>>>
>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>> responding to CLI, killed it.
>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>> ping: 400 CLI communication error
>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>> signal=9
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>>>> starts
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
>>>> mmap'ed 483183820800 bytes of 483183820800
>>>>
>>>> The following link is the varnishstat output just 1 minute before a
>>>> restart:
>>>>
>>>> https://pastebin.com/g0g5RVTs
>>>>
>>>> Environment:
>>>>
>>>> varnish-5.1.2 revision 6ece695
>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>> Installed using pre-built package from official repo at packagecloud.io
>>>> CPU 2x2.9 GHz
>>>> Mem 3.69 GiB
>>>> Running inside a Docker container
>>>> NFILES=131072
>>>> MEMLOCK=82000
>>>>
>>>> Additional info:
>>>>
>>>> - I need to cache a large number of objets and the cache should last
>>>> for almost a week, so I have set up a 450G storage space, I don't know if
>>>> this is a problem;
>>>> - I use ban a lot. There was about 40k bans in the system just before
>>>> the last crash. I really don't know if this is too much or may have
>>>> anything to do with it;
>>>> - No registered CPU spikes (almost always by 30%);
>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>> - During all the time, event moments before the crashes, everything is
>>>> okay and requests are being responded very fast.
>>>>
>>>> Best,
>>>> Stefano Baldo
>>>>
>>>>
>>>> _______________________________________________
>>>> varnish-misc mailing list
>>>> varnish-misc at varnish-cache.org
>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170626/13dd8350/attachment.html>

From stefanobaldo at gmail.com  Mon Jun 26 18:21:43 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Mon, 26 Jun 2017 15:21:43 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
Message-ID: <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>

Hi Guillaume.

I think things will start to going better now after changing the bans.
This is how my last varnishstat looked like moments before a crash
regarding the bans:

MAIN.bans                     41336          .   Count of bans
MAIN.bans_completed           37967          .   Number of bans marked
'completed'
MAIN.bans_obj                     0          .   Number of bans using obj.*
MAIN.bans_req                 41335          .   Number of bans using req.*
MAIN.bans_added               41336         0.68 Bans added
MAIN.bans_deleted                 0         0.00 Bans deleted

And this is how it looks like now:

MAIN.bans                         2          .   Count of bans
MAIN.bans_completed               1          .   Number of bans marked
'completed'
MAIN.bans_obj                     2          .   Number of bans using obj.*
MAIN.bans_req                     0          .   Number of bans using req.*
MAIN.bans_added                2016         0.69 Bans added
MAIN.bans_deleted              2014         0.69 Bans deleted

Before the changes, bans were never deleted!
Now the bans are added and quickly deleted after a minute or even a couple
of seconds.

May this was the cause of the problem? It seems like varnish was having a
large number of bans to manage and test against.
I will let it ride now. Let's see if the problem persists or it's gone! :-)

Best,
Stefano


On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Looking good!
>
> --
> Guillaume Quintard
>
> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hi Guillaume,
>>
>> Can the following be considered "ban lurker friendly"?
>>
>> sub vcl_backend_response {
>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>> }
>>
>> sub vcl_recv {
>>   if (req.method == "PURGE") {
>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>> obj.http.x-user-agent !~ Googlebot");
>>     return(synth(750));
>>   }
>> }
>>
>> sub vcl_deliver {
>>   unset resp.http.x-url;
>>   unset resp.http.x-user-agent;
>> }
>>
>> Best,
>> Stefano
>>
>>
>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>> expression. Easiest way is to stash the host, user-agent and url in
>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>
>>> I don't think you need to expand the VSL at all.
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>
>>> Hi Guillaume.
>>>
>>> Thanks for answering.
>>>
>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>> performance but it stills restarting.
>>> Also, I checked the I/O performance for the disk and there is no signal
>>> of overhead.
>>>
>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>> default size passing "-l 200m,20m" to varnishd and using
>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
>>> problem here. After a couple of hours varnish died and I received a "no
>>> space left on device" message - deleting the /var/lib/varnish solved the
>>> problem and varnish was up again, but it's weird because there was free
>>> memory on the host to be used with the tmpfs directory, so I don't know
>>> what could have happened. I will try to stop increasing the
>>> /var/lib/varnish size.
>>>
>>> Anyway, I am worried about the bans. You asked me if the bans are lurker
>>> friedly. Well, I don't think so. My bans are created this way:
>>>
>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + "
>>> && req.http.User-Agent !~ Googlebot");
>>>
>>> Are they lurker friendly? I was taking a quick look and the
>>> documentation and it looks like they're not.
>>>
>>> Best,
>>> Stefano
>>>
>>>
>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Hi Stefano,
>>>>
>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>>>> trying to push/pull data and can't make time to reply to the CLI. I'd
>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>
>>>> After some time, the file storage is terrible on a hard drive (SSDs
>>>> take a bit more time to degrade) because of fragmentation. One solution to
>>>> help the disks cope is to overprovision themif they're SSDs, and you can
>>>> try different advices in the file storage definition in the command line
>>>> (last parameter, after granularity).
>>>>
>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>
>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello.
>>>>>
>>>>> I am having a critical problem with Varnish Cache in production for
>>>>> over a month and any help will be appreciated.
>>>>> The problem is that Varnish child process is recurrently being
>>>>> restarted after 10~20h of use, with the following message:
>>>>>
>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>> responding to CLI, killed it.
>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>>> ping: 400 CLI communication error
>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>> signal=9
>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>>>>> starts
>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>
>>>>> The following link is the varnishstat output just 1 minute before a
>>>>> restart:
>>>>>
>>>>> https://pastebin.com/g0g5RVTs
>>>>>
>>>>> Environment:
>>>>>
>>>>> varnish-5.1.2 revision 6ece695
>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>> Installed using pre-built package from official repo at
>>>>> packagecloud.io
>>>>> CPU 2x2.9 GHz
>>>>> Mem 3.69 GiB
>>>>> Running inside a Docker container
>>>>> NFILES=131072
>>>>> MEMLOCK=82000
>>>>>
>>>>> Additional info:
>>>>>
>>>>> - I need to cache a large number of objets and the cache should last
>>>>> for almost a week, so I have set up a 450G storage space, I don't know if
>>>>> this is a problem;
>>>>> - I use ban a lot. There was about 40k bans in the system just before
>>>>> the last crash. I really don't know if this is too much or may have
>>>>> anything to do with it;
>>>>> - No registered CPU spikes (almost always by 30%);
>>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>>> - During all the time, event moments before the crashes, everything is
>>>>> okay and requests are being responded very fast.
>>>>>
>>>>> Best,
>>>>> Stefano Baldo
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> varnish-misc mailing list
>>>>> varnish-misc at varnish-cache.org
>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170626/c05380e2/attachment.html>

From guillaume at varnish-software.com  Mon Jun 26 18:47:33 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Mon, 26 Jun 2017 20:47:33 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
Message-ID: <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>

Nice! It may have been the cause, time will tell.can you report back in a
few days to let us know?
-- 
Guillaume Quintard

On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:

> Hi Guillaume.
>
> I think things will start to going better now after changing the bans.
> This is how my last varnishstat looked like moments before a crash
> regarding the bans:
>
> MAIN.bans                     41336          .   Count of bans
> MAIN.bans_completed           37967          .   Number of bans marked
> 'completed'
> MAIN.bans_obj                     0          .   Number of bans using obj.*
> MAIN.bans_req                 41335          .   Number of bans using req.*
> MAIN.bans_added               41336         0.68 Bans added
> MAIN.bans_deleted                 0         0.00 Bans deleted
>
> And this is how it looks like now:
>
> MAIN.bans                         2          .   Count of bans
> MAIN.bans_completed               1          .   Number of bans marked
> 'completed'
> MAIN.bans_obj                     2          .   Number of bans using obj.*
> MAIN.bans_req                     0          .   Number of bans using req.*
> MAIN.bans_added                2016         0.69 Bans added
> MAIN.bans_deleted              2014         0.69 Bans deleted
>
> Before the changes, bans were never deleted!
> Now the bans are added and quickly deleted after a minute or even a couple
> of seconds.
>
> May this was the cause of the problem? It seems like varnish was having a
> large number of bans to manage and test against.
> I will let it ride now. Let's see if the problem persists or it's gone! :-)
>
> Best,
> Stefano
>
>
> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Looking good!
>>
>> --
>> Guillaume Quintard
>>
>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hi Guillaume,
>>>
>>> Can the following be considered "ban lurker friendly"?
>>>
>>> sub vcl_backend_response {
>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>> }
>>>
>>> sub vcl_recv {
>>>   if (req.method == "PURGE") {
>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>> obj.http.x-user-agent !~ Googlebot");
>>>     return(synth(750));
>>>   }
>>> }
>>>
>>> sub vcl_deliver {
>>>   unset resp.http.x-url;
>>>   unset resp.http.x-user-agent;
>>> }
>>>
>>> Best,
>>> Stefano
>>>
>>>
>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>
>>>> I don't think you need to expand the VSL at all.
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>>
>>>> Hi Guillaume.
>>>>
>>>> Thanks for answering.
>>>>
>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>> performance but it stills restarting.
>>>> Also, I checked the I/O performance for the disk and there is no signal
>>>> of overhead.
>>>>
>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>> default size passing "-l 200m,20m" to varnishd and using
>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
>>>> problem here. After a couple of hours varnish died and I received a "no
>>>> space left on device" message - deleting the /var/lib/varnish solved the
>>>> problem and varnish was up again, but it's weird because there was free
>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>> what could have happened. I will try to stop increasing the
>>>> /var/lib/varnish size.
>>>>
>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>
>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url +
>>>> " && req.http.User-Agent !~ Googlebot");
>>>>
>>>> Are they lurker friendly? I was taking a quick look and the
>>>> documentation and it looks like they're not.
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Hi Stefano,
>>>>>
>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>>>>> trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>
>>>>> After some time, the file storage is terrible on a hard drive (SSDs
>>>>> take a bit more time to degrade) because of fragmentation. One solution to
>>>>> help the disks cope is to overprovision themif they're SSDs, and you can
>>>>> try different advices in the file storage definition in the command line
>>>>> (last parameter, after granularity).
>>>>>
>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>
>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hello.
>>>>>>
>>>>>> I am having a critical problem with Varnish Cache in production for
>>>>>> over a month and any help will be appreciated.
>>>>>> The problem is that Varnish child process is recurrently being
>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>
>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>> responding to CLI, killed it.
>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>>>> ping: 400 CLI communication error
>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>> signal=9
>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>> Child starts
>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>
>>>>>> The following link is the varnishstat output just 1 minute before a
>>>>>> restart:
>>>>>>
>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>
>>>>>> Environment:
>>>>>>
>>>>>> varnish-5.1.2 revision 6ece695
>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>> Installed using pre-built package from official repo at
>>>>>> packagecloud.io
>>>>>> CPU 2x2.9 GHz
>>>>>> Mem 3.69 GiB
>>>>>> Running inside a Docker container
>>>>>> NFILES=131072
>>>>>> MEMLOCK=82000
>>>>>>
>>>>>> Additional info:
>>>>>>
>>>>>> - I need to cache a large number of objets and the cache should last
>>>>>> for almost a week, so I have set up a 450G storage space, I don't know if
>>>>>> this is a problem;
>>>>>> - I use ban a lot. There was about 40k bans in the system just before
>>>>>> the last crash. I really don't know if this is too much or may have
>>>>>> anything to do with it;
>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>>>> - During all the time, event moments before the crashes, everything
>>>>>> is okay and requests are being responded very fast.
>>>>>>
>>>>>> Best,
>>>>>> Stefano Baldo
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> varnish-misc mailing list
>>>>>> varnish-misc at varnish-cache.org
>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170626/8b0c6799/attachment.html>

From stefanobaldo at gmail.com  Mon Jun 26 19:08:54 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Mon, 26 Jun 2017 16:08:54 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
Message-ID: <CA+i_oAcxrE2U_W9dg5G0UE2gwno8uxw+DgfPe55z3wvp7VND3w@mail.gmail.com>

Sure, will do!

On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Nice! It may have been the cause, time will tell.can you report back in a
> few days to let us know?
> --
> Guillaume Quintard
>
> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>
>> Hi Guillaume.
>>
>> I think things will start to going better now after changing the bans.
>> This is how my last varnishstat looked like moments before a crash
>> regarding the bans:
>>
>> MAIN.bans                     41336          .   Count of bans
>> MAIN.bans_completed           37967          .   Number of bans marked
>> 'completed'
>> MAIN.bans_obj                     0          .   Number of bans using
>> obj.*
>> MAIN.bans_req                 41335          .   Number of bans using
>> req.*
>> MAIN.bans_added               41336         0.68 Bans added
>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>
>> And this is how it looks like now:
>>
>> MAIN.bans                         2          .   Count of bans
>> MAIN.bans_completed               1          .   Number of bans marked
>> 'completed'
>> MAIN.bans_obj                     2          .   Number of bans using
>> obj.*
>> MAIN.bans_req                     0          .   Number of bans using
>> req.*
>> MAIN.bans_added                2016         0.69 Bans added
>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>
>> Before the changes, bans were never deleted!
>> Now the bans are added and quickly deleted after a minute or even a
>> couple of seconds.
>>
>> May this was the cause of the problem? It seems like varnish was having a
>> large number of bans to manage and test against.
>> I will let it ride now. Let's see if the problem persists or it's gone!
>> :-)
>>
>> Best,
>> Stefano
>>
>>
>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Looking good!
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume,
>>>>
>>>> Can the following be considered "ban lurker friendly"?
>>>>
>>>> sub vcl_backend_response {
>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>> }
>>>>
>>>> sub vcl_recv {
>>>>   if (req.method == "PURGE") {
>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>> obj.http.x-user-agent !~ Googlebot");
>>>>     return(synth(750));
>>>>   }
>>>> }
>>>>
>>>> sub vcl_deliver {
>>>>   unset resp.http.x-url;
>>>>   unset resp.http.x-user-agent;
>>>> }
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>
>>>>> I don't think you need to expand the VSL at all.
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> Thanks for answering.
>>>>>
>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>> performance but it stills restarting.
>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>> signal of overhead.
>>>>>
>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
>>>>> problem here. After a couple of hours varnish died and I received a "no
>>>>> space left on device" message - deleting the /var/lib/varnish solved the
>>>>> problem and varnish was up again, but it's weird because there was free
>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>> what could have happened. I will try to stop increasing the
>>>>> /var/lib/varnish size.
>>>>>
>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>
>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url +
>>>>> " && req.http.User-Agent !~ Googlebot");
>>>>>
>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>> documentation and it looks like they're not.
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Hi Stefano,
>>>>>>
>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>>>>>> trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>
>>>>>> After some time, the file storage is terrible on a hard drive (SSDs
>>>>>> take a bit more time to degrade) because of fragmentation. One solution to
>>>>>> help the disks cope is to overprovision themif they're SSDs, and you can
>>>>>> try different advices in the file storage definition in the command line
>>>>>> (last parameter, after granularity).
>>>>>>
>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>
>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>
>>>>>>> Hello.
>>>>>>>
>>>>>>> I am having a critical problem with Varnish Cache in production for
>>>>>>> over a month and any help will be appreciated.
>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>> responding to CLI, killed it.
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>>>>> ping: 400 CLI communication error
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>> signal=9
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>> Child starts
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>
>>>>>>> The following link is the varnishstat output just 1 minute before a
>>>>>>> restart:
>>>>>>>
>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>
>>>>>>> Environment:
>>>>>>>
>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>> Installed using pre-built package from official repo at
>>>>>>> packagecloud.io
>>>>>>> CPU 2x2.9 GHz
>>>>>>> Mem 3.69 GiB
>>>>>>> Running inside a Docker container
>>>>>>> NFILES=131072
>>>>>>> MEMLOCK=82000
>>>>>>>
>>>>>>> Additional info:
>>>>>>>
>>>>>>> - I need to cache a large number of objets and the cache should last
>>>>>>> for almost a week, so I have set up a 450G storage space, I don't know if
>>>>>>> this is a problem;
>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>> anything to do with it;
>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>>>>> - During all the time, event moments before the crashes, everything
>>>>>>> is okay and requests are being responded very fast.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano Baldo
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> varnish-misc mailing list
>>>>>>> varnish-misc at varnish-cache.org
>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170626/a72aa158/attachment.html>

From stefanobaldo at gmail.com  Tue Jun 27 12:37:54 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Tue, 27 Jun 2017 09:37:54 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
Message-ID: <CA+i_oAdMf4HRHrr6qD9FC75vdfKGEOC2310V5uAcn7wbwNPs_Q@mail.gmail.com>

Hi Guillaume,

FYI, restarted again after ~16h :-(

Uptime mgt:     0+17:48:50
Uptime child:   0+02:17:14

Best,
Stefano


On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Nice! It may have been the cause, time will tell.can you report back in a
> few days to let us know?
> --
> Guillaume Quintard
>
> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>
>> Hi Guillaume.
>>
>> I think things will start to going better now after changing the bans.
>> This is how my last varnishstat looked like moments before a crash
>> regarding the bans:
>>
>> MAIN.bans                     41336          .   Count of bans
>> MAIN.bans_completed           37967          .   Number of bans marked
>> 'completed'
>> MAIN.bans_obj                     0          .   Number of bans using
>> obj.*
>> MAIN.bans_req                 41335          .   Number of bans using
>> req.*
>> MAIN.bans_added               41336         0.68 Bans added
>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>
>> And this is how it looks like now:
>>
>> MAIN.bans                         2          .   Count of bans
>> MAIN.bans_completed               1          .   Number of bans marked
>> 'completed'
>> MAIN.bans_obj                     2          .   Number of bans using
>> obj.*
>> MAIN.bans_req                     0          .   Number of bans using
>> req.*
>> MAIN.bans_added                2016         0.69 Bans added
>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>
>> Before the changes, bans were never deleted!
>> Now the bans are added and quickly deleted after a minute or even a
>> couple of seconds.
>>
>> May this was the cause of the problem? It seems like varnish was having a
>> large number of bans to manage and test against.
>> I will let it ride now. Let's see if the problem persists or it's gone!
>> :-)
>>
>> Best,
>> Stefano
>>
>>
>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Looking good!
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume,
>>>>
>>>> Can the following be considered "ban lurker friendly"?
>>>>
>>>> sub vcl_backend_response {
>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>> }
>>>>
>>>> sub vcl_recv {
>>>>   if (req.method == "PURGE") {
>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>> obj.http.x-user-agent !~ Googlebot");
>>>>     return(synth(750));
>>>>   }
>>>> }
>>>>
>>>> sub vcl_deliver {
>>>>   unset resp.http.x-url;
>>>>   unset resp.http.x-user-agent;
>>>> }
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>
>>>>> I don't think you need to expand the VSL at all.
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> Thanks for answering.
>>>>>
>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>> performance but it stills restarting.
>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>> signal of overhead.
>>>>>
>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
>>>>> problem here. After a couple of hours varnish died and I received a "no
>>>>> space left on device" message - deleting the /var/lib/varnish solved the
>>>>> problem and varnish was up again, but it's weird because there was free
>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>> what could have happened. I will try to stop increasing the
>>>>> /var/lib/varnish size.
>>>>>
>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>
>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url +
>>>>> " && req.http.User-Agent !~ Googlebot");
>>>>>
>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>> documentation and it looks like they're not.
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Hi Stefano,
>>>>>>
>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>>>>>> trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>
>>>>>> After some time, the file storage is terrible on a hard drive (SSDs
>>>>>> take a bit more time to degrade) because of fragmentation. One solution to
>>>>>> help the disks cope is to overprovision themif they're SSDs, and you can
>>>>>> try different advices in the file storage definition in the command line
>>>>>> (last parameter, after granularity).
>>>>>>
>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>
>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>
>>>>>>> Hello.
>>>>>>>
>>>>>>> I am having a critical problem with Varnish Cache in production for
>>>>>>> over a month and any help will be appreciated.
>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>> responding to CLI, killed it.
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>>>>> ping: 400 CLI communication error
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>> signal=9
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>> Child starts
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>
>>>>>>> The following link is the varnishstat output just 1 minute before a
>>>>>>> restart:
>>>>>>>
>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>
>>>>>>> Environment:
>>>>>>>
>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>> Installed using pre-built package from official repo at
>>>>>>> packagecloud.io
>>>>>>> CPU 2x2.9 GHz
>>>>>>> Mem 3.69 GiB
>>>>>>> Running inside a Docker container
>>>>>>> NFILES=131072
>>>>>>> MEMLOCK=82000
>>>>>>>
>>>>>>> Additional info:
>>>>>>>
>>>>>>> - I need to cache a large number of objets and the cache should last
>>>>>>> for almost a week, so I have set up a 450G storage space, I don't know if
>>>>>>> this is a problem;
>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>> anything to do with it;
>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>>>>> - During all the time, event moments before the crashes, everything
>>>>>>> is okay and requests are being responded very fast.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano Baldo
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> varnish-misc mailing list
>>>>>>> varnish-misc at varnish-cache.org
>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170627/31df9752/attachment.html>

From stefanobaldo at gmail.com  Tue Jun 27 21:07:31 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Tue, 27 Jun 2017 18:07:31 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
Message-ID: <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>

Hi Guillaume.

It keeps restarting.
Would you mind taking a quick look in the following VCL file to check if
you find anything suspicious?

Thank you very much.

Best,
Stefano

vcl 4.0;

import std;

backend default {
  .host = "sites-web-server-lb";
  .port = "80";
}

include "/etc/varnish/bad_bot_detection.vcl";

sub vcl_recv {
  call bad_bot_detection;

  if (req.url == "/nocache" || req.url == "/version") {
    return(pass);
  }

  unset req.http.Cookie;
  if (req.method == "PURGE") {
    ban("obj.http.x-host == " + req.http.host + " && obj.http.x-user-agent
!~ Googlebot");
    return(synth(750));
  }

  set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
}

sub vcl_synth {
  if (resp.status == 750) {
    set resp.status = 200;
    synthetic("PURGED => " + req.url);
    return(deliver);
  } elsif (resp.status == 501) {
    set resp.status = 200;
    set resp.http.Content-Type = "text/html; charset=utf-8";
    synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
    return(deliver);
  }
}

sub vcl_backend_response {
  unset beresp.http.Set-Cookie;
  set beresp.http.x-host = bereq.http.host;
  set beresp.http.x-user-agent = bereq.http.user-agent;

  if (bereq.url == "/themes/basic/assets/theme.min.css"
    || bereq.url == "/api/events/PAGEVIEW"
    || bereq.url ~ "^\/assets\/img\/") {
    set beresp.http.Cache-Control = "max-age=0";
  } else {
    unset beresp.http.Cache-Control;
  }

  if (beresp.status == 200 ||
    beresp.status == 301 ||
    beresp.status == 302 ||
    beresp.status == 404) {
      if (bereq.url ~ "\&ordenar=aleatorio$") {
        set beresp.http.X-TTL = "1d";
        set beresp.ttl = 1d;
      } else {
        set beresp.http.X-TTL = "1w";
        set beresp.ttl = 1w;
      }
  }

  if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$") {
    set beresp.do_gzip = true;
  }
}

sub vcl_pipe {
  set bereq.http.connection = "close";
  return (pipe);
}

sub vcl_deliver {
  unset resp.http.x-host;
  unset resp.http.x-user-agent;
}

sub vcl_backend_error {
  if (beresp.status == 502 || beresp.status == 503 || beresp.status == 504)
{
    set beresp.status = 200;
    set beresp.http.Content-Type = "text/html; charset=utf-8";
    synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
    return (deliver);
  }
}

sub vcl_hash {
  if (req.http.User-Agent ~ "Google Page Speed") {
    hash_data("Google Page Speed");
  } elsif (req.http.User-Agent ~ "Googlebot") {
    hash_data("Googlebot");
  }
}

sub vcl_deliver {
  if (resp.status == 501) {
    return (synth(resp.status));
  }
  if (obj.hits > 0) {
    set resp.http.X-Cache = "hit";
  } else {
    set resp.http.X-Cache = "miss";
  }
}


On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Nice! It may have been the cause, time will tell.can you report back in a
> few days to let us know?
> --
> Guillaume Quintard
>
> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>
>> Hi Guillaume.
>>
>> I think things will start to going better now after changing the bans.
>> This is how my last varnishstat looked like moments before a crash
>> regarding the bans:
>>
>> MAIN.bans                     41336          .   Count of bans
>> MAIN.bans_completed           37967          .   Number of bans marked
>> 'completed'
>> MAIN.bans_obj                     0          .   Number of bans using
>> obj.*
>> MAIN.bans_req                 41335          .   Number of bans using
>> req.*
>> MAIN.bans_added               41336         0.68 Bans added
>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>
>> And this is how it looks like now:
>>
>> MAIN.bans                         2          .   Count of bans
>> MAIN.bans_completed               1          .   Number of bans marked
>> 'completed'
>> MAIN.bans_obj                     2          .   Number of bans using
>> obj.*
>> MAIN.bans_req                     0          .   Number of bans using
>> req.*
>> MAIN.bans_added                2016         0.69 Bans added
>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>
>> Before the changes, bans were never deleted!
>> Now the bans are added and quickly deleted after a minute or even a
>> couple of seconds.
>>
>> May this was the cause of the problem? It seems like varnish was having a
>> large number of bans to manage and test against.
>> I will let it ride now. Let's see if the problem persists or it's gone!
>> :-)
>>
>> Best,
>> Stefano
>>
>>
>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Looking good!
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume,
>>>>
>>>> Can the following be considered "ban lurker friendly"?
>>>>
>>>> sub vcl_backend_response {
>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>> }
>>>>
>>>> sub vcl_recv {
>>>>   if (req.method == "PURGE") {
>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>> obj.http.x-user-agent !~ Googlebot");
>>>>     return(synth(750));
>>>>   }
>>>> }
>>>>
>>>> sub vcl_deliver {
>>>>   unset resp.http.x-url;
>>>>   unset resp.http.x-user-agent;
>>>> }
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>
>>>>> I don't think you need to expand the VSL at all.
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> Thanks for answering.
>>>>>
>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>> performance but it stills restarting.
>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>> signal of overhead.
>>>>>
>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
>>>>> problem here. After a couple of hours varnish died and I received a "no
>>>>> space left on device" message - deleting the /var/lib/varnish solved the
>>>>> problem and varnish was up again, but it's weird because there was free
>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>> what could have happened. I will try to stop increasing the
>>>>> /var/lib/varnish size.
>>>>>
>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>
>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url +
>>>>> " && req.http.User-Agent !~ Googlebot");
>>>>>
>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>> documentation and it looks like they're not.
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Hi Stefano,
>>>>>>
>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>>>>>> trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>
>>>>>> After some time, the file storage is terrible on a hard drive (SSDs
>>>>>> take a bit more time to degrade) because of fragmentation. One solution to
>>>>>> help the disks cope is to overprovision themif they're SSDs, and you can
>>>>>> try different advices in the file storage definition in the command line
>>>>>> (last parameter, after granularity).
>>>>>>
>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>
>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>
>>>>>>> Hello.
>>>>>>>
>>>>>>> I am having a critical problem with Varnish Cache in production for
>>>>>>> over a month and any help will be appreciated.
>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>> responding to CLI, killed it.
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>>>>> ping: 400 CLI communication error
>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>> signal=9
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>> Child starts
>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>
>>>>>>> The following link is the varnishstat output just 1 minute before a
>>>>>>> restart:
>>>>>>>
>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>
>>>>>>> Environment:
>>>>>>>
>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>> Installed using pre-built package from official repo at
>>>>>>> packagecloud.io
>>>>>>> CPU 2x2.9 GHz
>>>>>>> Mem 3.69 GiB
>>>>>>> Running inside a Docker container
>>>>>>> NFILES=131072
>>>>>>> MEMLOCK=82000
>>>>>>>
>>>>>>> Additional info:
>>>>>>>
>>>>>>> - I need to cache a large number of objets and the cache should last
>>>>>>> for almost a week, so I have set up a 450G storage space, I don't know if
>>>>>>> this is a problem;
>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>> anything to do with it;
>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>>>>> - During all the time, event moments before the crashes, everything
>>>>>>> is okay and requests are being responded very fast.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano Baldo
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> varnish-misc mailing list
>>>>>>> varnish-misc at varnish-cache.org
>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170627/0454f260/attachment.html>

From guillaume at varnish-software.com  Wed Jun 28 07:12:46 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Wed, 28 Jun 2017 09:12:46 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
Message-ID: <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>

Sadly, nothing suspicious here, you can still try:
- bumping the cli_timeout
- changing your disk scheduler
- changing the advice option of the file storage

I'm still convinced this is due to Varnish getting stuck waiting for the
disk because of the file storage fragmentation.

Maybe you could look at SMF.*.g_alloc and compare it to the number of
objects. Ideally, we would have a 1:1 relation between objects and
allocations. If that number drops prior to a restart, that would be a good
clue.


-- 
Guillaume Quintard

On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Hi Guillaume.
>
> It keeps restarting.
> Would you mind taking a quick look in the following VCL file to check if
> you find anything suspicious?
>
> Thank you very much.
>
> Best,
> Stefano
>
> vcl 4.0;
>
> import std;
>
> backend default {
>   .host = "sites-web-server-lb";
>   .port = "80";
> }
>
> include "/etc/varnish/bad_bot_detection.vcl";
>
> sub vcl_recv {
>   call bad_bot_detection;
>
>   if (req.url == "/nocache" || req.url == "/version") {
>     return(pass);
>   }
>
>   unset req.http.Cookie;
>   if (req.method == "PURGE") {
>     ban("obj.http.x-host == " + req.http.host + " && obj.http.x-user-agent
> !~ Googlebot");
>     return(synth(750));
>   }
>
>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
> }
>
> sub vcl_synth {
>   if (resp.status == 750) {
>     set resp.status = 200;
>     synthetic("PURGED => " + req.url);
>     return(deliver);
>   } elsif (resp.status == 501) {
>     set resp.status = 200;
>     set resp.http.Content-Type = "text/html; charset=utf-8";
>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>     return(deliver);
>   }
> }
>
> sub vcl_backend_response {
>   unset beresp.http.Set-Cookie;
>   set beresp.http.x-host = bereq.http.host;
>   set beresp.http.x-user-agent = bereq.http.user-agent;
>
>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>     || bereq.url == "/api/events/PAGEVIEW"
>     || bereq.url ~ "^\/assets\/img\/") {
>     set beresp.http.Cache-Control = "max-age=0";
>   } else {
>     unset beresp.http.Cache-Control;
>   }
>
>   if (beresp.status == 200 ||
>     beresp.status == 301 ||
>     beresp.status == 302 ||
>     beresp.status == 404) {
>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>         set beresp.http.X-TTL = "1d";
>         set beresp.ttl = 1d;
>       } else {
>         set beresp.http.X-TTL = "1w";
>         set beresp.ttl = 1w;
>       }
>   }
>
>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
> {
>     set beresp.do_gzip = true;
>   }
> }
>
> sub vcl_pipe {
>   set bereq.http.connection = "close";
>   return (pipe);
> }
>
> sub vcl_deliver {
>   unset resp.http.x-host;
>   unset resp.http.x-user-agent;
> }
>
> sub vcl_backend_error {
>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
> 504) {
>     set beresp.status = 200;
>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>     return (deliver);
>   }
> }
>
> sub vcl_hash {
>   if (req.http.User-Agent ~ "Google Page Speed") {
>     hash_data("Google Page Speed");
>   } elsif (req.http.User-Agent ~ "Googlebot") {
>     hash_data("Googlebot");
>   }
> }
>
> sub vcl_deliver {
>   if (resp.status == 501) {
>     return (synth(resp.status));
>   }
>   if (obj.hits > 0) {
>     set resp.http.X-Cache = "hit";
>   } else {
>     set resp.http.X-Cache = "miss";
>   }
> }
>
>
> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Nice! It may have been the cause, time will tell.can you report back in a
>> few days to let us know?
>> --
>> Guillaume Quintard
>>
>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>
>>> Hi Guillaume.
>>>
>>> I think things will start to going better now after changing the bans.
>>> This is how my last varnishstat looked like moments before a crash
>>> regarding the bans:
>>>
>>> MAIN.bans                     41336          .   Count of bans
>>> MAIN.bans_completed           37967          .   Number of bans marked
>>> 'completed'
>>> MAIN.bans_obj                     0          .   Number of bans using
>>> obj.*
>>> MAIN.bans_req                 41335          .   Number of bans using
>>> req.*
>>> MAIN.bans_added               41336         0.68 Bans added
>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>
>>> And this is how it looks like now:
>>>
>>> MAIN.bans                         2          .   Count of bans
>>> MAIN.bans_completed               1          .   Number of bans marked
>>> 'completed'
>>> MAIN.bans_obj                     2          .   Number of bans using
>>> obj.*
>>> MAIN.bans_req                     0          .   Number of bans using
>>> req.*
>>> MAIN.bans_added                2016         0.69 Bans added
>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>
>>> Before the changes, bans were never deleted!
>>> Now the bans are added and quickly deleted after a minute or even a
>>> couple of seconds.
>>>
>>> May this was the cause of the problem? It seems like varnish was having
>>> a large number of bans to manage and test against.
>>> I will let it ride now. Let's see if the problem persists or it's gone!
>>> :-)
>>>
>>> Best,
>>> Stefano
>>>
>>>
>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Looking good!
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Guillaume,
>>>>>
>>>>> Can the following be considered "ban lurker friendly"?
>>>>>
>>>>> sub vcl_backend_response {
>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>> }
>>>>>
>>>>> sub vcl_recv {
>>>>>   if (req.method == "PURGE") {
>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>     return(synth(750));
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_deliver {
>>>>>   unset resp.http.x-url;
>>>>>   unset resp.http.x-user-agent;
>>>>> }
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>
>>>>>> I don't think you need to expand the VSL at all.
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Guillaume.
>>>>>>
>>>>>> Thanks for answering.
>>>>>>
>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>> performance but it stills restarting.
>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>> signal of overhead.
>>>>>>
>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was
>>>>>> a problem here. After a couple of hours varnish died and I received a "no
>>>>>> space left on device" message - deleting the /var/lib/varnish solved the
>>>>>> problem and varnish was up again, but it's weird because there was free
>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>> what could have happened. I will try to stop increasing the
>>>>>> /var/lib/varnish size.
>>>>>>
>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>
>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url
>>>>>> + " && req.http.User-Agent !~ Googlebot");
>>>>>>
>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>> documentation and it looks like they're not.
>>>>>>
>>>>>> Best,
>>>>>> Stefano
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>
>>>>>>> Hi Stefano,
>>>>>>>
>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>
>>>>>>> After some time, the file storage is terrible on a hard drive (SSDs
>>>>>>> take a bit more time to degrade) because of fragmentation. One solution to
>>>>>>> help the disks cope is to overprovision themif they're SSDs, and you can
>>>>>>> try different advices in the file storage definition in the command line
>>>>>>> (last parameter, after granularity).
>>>>>>>
>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>
>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Quintard
>>>>>>>
>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello.
>>>>>>>>
>>>>>>>> I am having a critical problem with Varnish Cache in production for
>>>>>>>> over a month and any help will be appreciated.
>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>
>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>> responding to CLI, killed it.
>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>>>>>> ping: 400 CLI communication error
>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>>> signal=9
>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>> Child starts
>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>
>>>>>>>> The following link is the varnishstat output just 1 minute before a
>>>>>>>> restart:
>>>>>>>>
>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>
>>>>>>>> Environment:
>>>>>>>>
>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>> packagecloud.io
>>>>>>>> CPU 2x2.9 GHz
>>>>>>>> Mem 3.69 GiB
>>>>>>>> Running inside a Docker container
>>>>>>>> NFILES=131072
>>>>>>>> MEMLOCK=82000
>>>>>>>>
>>>>>>>> Additional info:
>>>>>>>>
>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>> if this is a problem;
>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>> anything to do with it;
>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>>>>>> - During all the time, event moments before the crashes, everything
>>>>>>>> is okay and requests are being responded very fast.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stefano Baldo
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> varnish-misc mailing list
>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/610a5a7a/attachment.html>

From joh.hendriks at gmail.com  Wed Jun 28 08:58:44 2017
From: joh.hendriks at gmail.com (Johan Hendriks)
Date: Wed, 28 Jun 2017 10:58:44 +0200
Subject: Varnish performance with phpinfo
In-Reply-To: <CAJ6ZYQwm7Gc+sN4n4eBFsyAvt55c796LWDzwgMTS4p3V3rFg7A@mail.gmail.com>
References: <6fa4576d-d25e-b770-44da-98877379a815@gmail.com>
 <CAJ6ZYQxy8Skagu1jqP0tggyhqA-599urogy6JLxXUOgZyEYBXA@mail.gmail.com>
 <83029bff-6f19-5d12-0514-fa6441ecbd6a@gmail.com>
 <CAJ6ZYQwm7Gc+sN4n4eBFsyAvt55c796LWDzwgMTS4p3V3rFg7A@mail.gmail.com>
Message-ID: <d242bf46-0827-0d39-6ac8-ddc5ea8e62d4@gmail.com>

I created a .html file from the php.info page and the results are the same.
So i think it is a local problem on the client.
Thank you for your time an sorry for the noice.

Regards,
Johan Hendriks


Op 23/06/2017 om 17:36 schreef Guillaume Quintard:
> Simple way to test: grow the info.html size :-)
>
> -- 
> Guillaume Quintard
>
> On Fri, Jun 23, 2017 at 4:52 PM, Johan Hendriks
> <joh.hendriks at gmail.com <mailto:joh.hendriks at gmail.com>> wrote:
>
>     Thanks for you answer.
>     I was thinking about that also, but I could not find anything that
>     pointed in that direction.
>     But should I hit that limit also with the info.html file then or
>     could it be the size of the page.
>     The info.html is off cource way smaller than the whole php.info
>     <http://php.info> page.
>
>     regards
>     Johan
>
>
>     Op 23/06/2017 om 10:58 schreef Guillaume Quintard:
>>     Stupid question but, aren't you being limited by your client, or
>>     a firewall, maybe?
>>
>>     -- 
>>     Guillaume Quintard
>>
>>     On Fri, Jun 2, 2017 at 12:06 PM, Johan Hendriks
>>     <joh.hendriks at gmail.com <mailto:joh.hendriks at gmail.com>> wrote:
>>
>>         Hello all, First sorry for the long email.
>>         I have a strange issue with varnish. At least I think it is
>>         strange.
>>
>>         We start some tests with varnish, but we have an issue.
>>
>>         I am running varnish 4.1.6 on FreeBSD 11.1-prerelease. Where
>>         varnish listen on port 82 and apache on 80, This is just for
>>         the tests.
>>         We use the following start options.
>>
>>         # Varnish
>>         varnishd_enable="YES"
>>         varnishd_listen="192.168.2.247:82 <http://192.168.2.247:82>"
>>         varnishd_pidfile="/var/run/varnishd.pid"
>>         varnishd_storage="default=malloc,2024M"
>>         varnishd_config="/usr/local/etc/varnish/default.vcl"
>>         varnishd_hash="critbit"
>>         varnishd_admin=":6082"
>>         varnishncsa_enable="YES"
>>
>>         We did a test with a static page and that went fine. First we
>>         see it is not cached, second attempt is cached.
>>
>>         root at desk:~ # curl -I www.testdomain.nl:82/info.html
>>         <http://www.testdomain.nl:82/info.html>
>>         HTTP/1.1 200 OK
>>         Date: Fri, 02 Jun 2017 09:19:52 GMT
>>         Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
>>         ETag: "cf4-550e57bc1f812"
>>         Content-Length: 3316
>>         Content-Type: text/html
>>         cache-control: max-age = 259200
>>         X-Varnish: 2
>>         Age: 0
>>         Via: 1.1 varnish-v4
>>         Server: varnish
>>         X-Powered-By: My Varnish
>>         X-Cache: MISS
>>         Accept-Ranges: bytes
>>         Connection: keep-alive
>>
>>         root at desk:~ # curl -I www.testdomain.nl:82/info.html
>>         <http://www.testdomain.nl:82/info.html>
>>         HTTP/1.1 200 OK
>>         Date: Fri, 02 Jun 2017 09:19:52 GMT
>>         Last-Modified: Thu, 01 Jun 2017 12:50:37 GMT
>>         ETag: "cf4-550e57bc1f812"
>>         Content-Length: 3316
>>         Content-Type: text/html
>>         cache-control: max-age = 259200
>>         X-Varnish: 5 3
>>         Age: 6
>>         Via: 1.1 varnish-v4
>>         Server: varnish
>>         X-Powered-By: My Varnish
>>         X-Cache: HIT
>>         Accept-Ranges: bytes
>>         Connection: keep-alive
>>
>>         if I benchmark the server I get the following.
>>         First is derectly to Apache
>>
>>         root at testserver:~ # bombardier -c400 -n10000
>>         http://www.testdomain.nl/info.html
>>         <http://www.testdomain.nl/info.html>
>>         Bombarding http://www.testdomain.nl/info.html
>>         <http://www.testdomain.nl/info.html> with 10000 requests
>>         using 400 connections
>>          10000 / 10000
>>         [=============================================================]
>>         100.00% 0s
>>         Done!
>>         Statistics        Avg      Stdev        Max
>>           Reqs/sec     12459.00     898.32      13301
>>           Latency       31.04ms    25.28ms   280.90ms
>>           HTTP codes:
>>             1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>>             others - 0
>>           Throughput:    42.16MB/s
>>
>>         This is via varnish. So that works as intended.
>>         Varnish does its job and servers the page better.
>>
>>         root at testserver:~ # bombardier -c400 -n10000
>>         http://www.testdomain.nl:82/info.html
>>         <http://www.testdomain.nl:82/info.html>
>>         Bombarding http://www.testdomain.nl:82/info.html
>>         <http://www.testdomain.nl:82/info.html> with 10000 requests
>>         using 400 connections
>>          10000 / 10000
>>         [=============================================================]
>>         100.00% 0s
>>         Done!
>>         Statistics        Avg      Stdev        Max
>>           Reqs/sec     19549.00    7649.32      24313
>>           Latency       17.90ms    66.77ms   485.07ms
>>           HTTP codes:
>>             1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>>             others - 0
>>           Throughput:    71.58MB/s
>>
>>
>>         The next one is against a info.php file, which runs phpinfo();
>>
>>         So first agains the server without varnish.
>>
>>         root at testserver:~ # bombardier -c400 -n10000
>>         http://www.testdomain.nl/info.php
>>         <http://www.testdomain.nl/info.php>
>>         Bombarding http://www.testdomain.nl/info.php
>>         <http://www.testdomain.nl/info.php> with 10000 requests using
>>         400 connections
>>          10000 / 10000
>>         [============================================================]
>>         100.00% 11s
>>         Done!
>>         Statistics        Avg      Stdev        Max
>>           Reqs/sec       828.00     127.66       1010
>>           Latency      472.10ms    59.10ms   740.43ms
>>           HTTP codes:
>>             1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
>>             others - 0
>>           Throughput:    75.51MB/s
>>
>>         But then against the server with varnish.
>>         So we make sure it is in cache
>>
>>         root at desk:~ # curl -I www.testdomain.nl:82/info.php
>>         <http://www.testdomain.nl:82/info.php>
>>         HTTP/1.1 200 OK
>>         Date: Fri, 02 Jun 2017 09:36:16 GMT
>>         Content-Type: text/html; charset=UTF-8
>>         cache-control: max-age = 259200
>>         X-Varnish: 7
>>         Age: 0
>>         Via: 1.1 varnish-v4
>>         Server: varnish
>>         X-Powered-By: My Varnish
>>         X-Cache: MISS
>>         Accept-Ranges: bytes
>>         Connection: keep-alive
>>
>>         root at desk:~ # curl -I www.testdomain.nl:82/info.php
>>         <http://www.testdomain.nl:82/info.php>
>>         HTTP/1.1 200 OK
>>         Date: Fri, 02 Jun 2017 09:36:16 GMT
>>         Content-Type: text/html; charset=UTF-8
>>         cache-control: max-age = 259200
>>         X-Varnish: 10 8
>>         Age: 2
>>         Via: 1.1 varnish-v4
>>         Server: varnish
>>         X-Powered-By: My Varnish
>>         X-Cache: HIT
>>         Accept-Ranges: bytes
>>         Connection: keep-alive
>>
>>         So it is in cache now.
>>         root at testserver:~ # bombardier -c400 -n10000
>>         http://www.testdomain.nl:82/info.php
>>         <http://www.testdomain.nl:82/info.php>
>>         Bombarding http://www.testdomain.nl:82/info.php
>>         <http://www.testdomain.nl:82/info.php> with 10000 requests
>>         using 400 connections
>>          10000 / 10000
>>         [===========================================================================================================================================================================================================]
>>         100.00% 8s
>>         Done!
>>         Statistics        Avg      Stdev        Max
>>           Reqs/sec      1179.00     230.77       1981
>>           Latency      219.94ms   340.29ms      2.00s
>>           HTTP codes:
>>             1xx - 0, 2xx - 9938, 3xx - 0, 4xx - 0, 5xx - 0
>>             others - 62
>>           Errors:
>>             dialing to the given TCP address timed out - 62
>>           Throughput:    83.16MB/s
>>
>>         I expected this to be much more in favour of varnish, but it
>>         even generated some errors! Time taken is lower but I
>>         expected it to be much faster. Also the 62 errors is not good
>>         i guess.
>>
>>         I do see the following with varnish log
>>         *   << Request  >> 11141123
>>         -   Begin          req 1310723 rxreq
>>         -   Timestamp      Start: 1496396250.098654 0.000000 0.000000
>>         -   Timestamp      Req: 1496396250.098654 0.000000 0.000000
>>         -   ReqStart       192.168.2.39 14818
>>         -   ReqMethod      GET
>>         -   ReqURL         /info.php
>>         -   ReqProtocol    HTTP/1.1
>>         -   ReqHeader      User-Agent: fasthttp
>>         -   ReqHeader      Host: www.testdomain.nl:82
>>         <http://www.testdomain.nl:82>
>>         -   ReqHeader      X-Forwarded-For: 192.168.2.39
>>         -   VCL_call       RECV
>>         -   ReqUnset       X-Forwarded-For: 192.168.2.39
>>         -   ReqHeader      X-Forwarded-For: 192.168.2.39, 192.168.2.39
>>         -   VCL_return     hash
>>         -   VCL_call       HASH
>>         -   VCL_return     lookup
>>         -   Hit            8
>>         -   VCL_call       HIT
>>         -   VCL_return     deliver
>>         -   RespProtocol   HTTP/1.1
>>         -   RespStatus     200
>>         -   RespReason     OK
>>         -   RespHeader     Date: Fri, 02 Jun 2017 09:36:16 GMT
>>         -   RespHeader     Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
>>         -   RespHeader     X-Powered-By: PHP/7.0.19
>>         -   RespHeader     Content-Type: text/html; charset=UTF-8
>>         -   RespHeader     cache-control: max-age = 259200
>>         -   RespHeader     X-Varnish: 11141123 8
>>         -   RespHeader     Age: 73
>>         -   RespHeader     Via: 1.1 varnish-v4
>>         -   VCL_call       DELIVER
>>         -   RespUnset      Server: Apache/2.4.25 (FreeBSD) OpenSSL/1.0.2l
>>         -   RespHeader     Server: varnish
>>         -   RespUnset      X-Powered-By: PHP/7.0.19
>>         -   RespHeader     X-Powered-By: My Varnish
>>         -   RespHeader     X-Cache: HIT
>>         -   VCL_return     deliver
>>         -   Timestamp      Process: 1496396250.098712 0.000058 0.000058
>>         -   RespHeader     Accept-Ranges: bytes
>>         -   RespHeader     Content-Length: 95200
>>         -   Debug          "RES_MODE 2"
>>         -   RespHeader     Connection: keep-alive
>>         *-   Debug          "Hit idle send timeout, wrote =
>>         89972/95508; retrying"**
>>         **-   Debug          "Write error, retval = -1, len = 5536,
>>         errno = Resource temporarily unavailable"*
>>         -   Timestamp      Resp: 1496396371.131526 121.032872 121.032814
>>         -   ReqAcct        82 0 82 308 95200 95508
>>         -   End
>>
>>         Sometimes I see this Debug line also -   *Debug         
>>         "Write error, retval = -1, len = 95563, errno = Broken pipe"*
>>
>>
>>         I also installed varnish 5.1.2 but the results are the same.
>>         Is there something I miss?
>>
>>         My vcl file is pretty basic.
>>
>>         https://pastebin.com/rbb42x7h
>>
>>         Thanks all for your time.
>>
>>         regards
>>         Johan
>>
>>
>>         _______________________________________________
>>         varnish-misc mailing list
>>         varnish-misc at varnish-cache.org
>>         <mailto:varnish-misc at varnish-cache.org>
>>         https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>         <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
>>
>>
>
>
>     _______________________________________________
>     varnish-misc mailing list
>     varnish-misc at varnish-cache.org <mailto:varnish-misc at varnish-cache.org>
>     https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>     <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/2e2ce01e/attachment.html>

From stefanobaldo at gmail.com  Wed Jun 28 13:20:16 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Wed, 28 Jun 2017 10:20:16 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
Message-ID: <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>

Hi Guillaume.

I increased the cli_timeout yesterday to 900sec (15min) and it restarted
anyway, which seems to indicate that the thread is really stalled.

This was 1 minute after the last restart:

MAIN.n_object               3908216          .   object structs made
SMF.s0.g_alloc              7794510          .   Allocations outstanding

I've just changed the I/O Scheduler to noop to see what happens.

One interest thing I've found is about the memory usage.

In the 1st minute of use:
MemTotal:        3865572 kB
MemFree:          120768 kB
MemAvailable:    2300268 kB

1 minute before a restart:
MemTotal:        3865572 kB
MemFree:           82480 kB
MemAvailable:      68316 kB

It seems like the system is possibly running out of memory.

When calling varnishd, I'm specifying only "-s file,..." as storage. I see
in some examples that is common to use "-s file" AND "-s malloc" together.
Should I be passing "-s malloc" as well to somehow try to limit the memory
usage by varnishd?

Best,
Stefano


On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Sadly, nothing suspicious here, you can still try:
> - bumping the cli_timeout
> - changing your disk scheduler
> - changing the advice option of the file storage
>
> I'm still convinced this is due to Varnish getting stuck waiting for the
> disk because of the file storage fragmentation.
>
> Maybe you could look at SMF.*.g_alloc and compare it to the number of
> objects. Ideally, we would have a 1:1 relation between objects and
> allocations. If that number drops prior to a restart, that would be a good
> clue.
>
>
> --
> Guillaume Quintard
>
> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hi Guillaume.
>>
>> It keeps restarting.
>> Would you mind taking a quick look in the following VCL file to check if
>> you find anything suspicious?
>>
>> Thank you very much.
>>
>> Best,
>> Stefano
>>
>> vcl 4.0;
>>
>> import std;
>>
>> backend default {
>>   .host = "sites-web-server-lb";
>>   .port = "80";
>> }
>>
>> include "/etc/varnish/bad_bot_detection.vcl";
>>
>> sub vcl_recv {
>>   call bad_bot_detection;
>>
>>   if (req.url == "/nocache" || req.url == "/version") {
>>     return(pass);
>>   }
>>
>>   unset req.http.Cookie;
>>   if (req.method == "PURGE") {
>>     ban("obj.http.x-host == " + req.http.host + " &&
>> obj.http.x-user-agent !~ Googlebot");
>>     return(synth(750));
>>   }
>>
>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>> }
>>
>> sub vcl_synth {
>>   if (resp.status == 750) {
>>     set resp.status = 200;
>>     synthetic("PURGED => " + req.url);
>>     return(deliver);
>>   } elsif (resp.status == 501) {
>>     set resp.status = 200;
>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>>     return(deliver);
>>   }
>> }
>>
>> sub vcl_backend_response {
>>   unset beresp.http.Set-Cookie;
>>   set beresp.http.x-host = bereq.http.host;
>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>
>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>     || bereq.url == "/api/events/PAGEVIEW"
>>     || bereq.url ~ "^\/assets\/img\/") {
>>     set beresp.http.Cache-Control = "max-age=0";
>>   } else {
>>     unset beresp.http.Cache-Control;
>>   }
>>
>>   if (beresp.status == 200 ||
>>     beresp.status == 301 ||
>>     beresp.status == 302 ||
>>     beresp.status == 404) {
>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>         set beresp.http.X-TTL = "1d";
>>         set beresp.ttl = 1d;
>>       } else {
>>         set beresp.http.X-TTL = "1w";
>>         set beresp.ttl = 1w;
>>       }
>>   }
>>
>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>> {
>>     set beresp.do_gzip = true;
>>   }
>> }
>>
>> sub vcl_pipe {
>>   set bereq.http.connection = "close";
>>   return (pipe);
>> }
>>
>> sub vcl_deliver {
>>   unset resp.http.x-host;
>>   unset resp.http.x-user-agent;
>> }
>>
>> sub vcl_backend_error {
>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
>> 504) {
>>     set beresp.status = 200;
>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>     return (deliver);
>>   }
>> }
>>
>> sub vcl_hash {
>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>     hash_data("Google Page Speed");
>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>     hash_data("Googlebot");
>>   }
>> }
>>
>> sub vcl_deliver {
>>   if (resp.status == 501) {
>>     return (synth(resp.status));
>>   }
>>   if (obj.hits > 0) {
>>     set resp.http.X-Cache = "hit";
>>   } else {
>>     set resp.http.X-Cache = "miss";
>>   }
>> }
>>
>>
>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Nice! It may have been the cause, time will tell.can you report back in
>>> a few days to let us know?
>>> --
>>> Guillaume Quintard
>>>
>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>
>>>> Hi Guillaume.
>>>>
>>>> I think things will start to going better now after changing the bans.
>>>> This is how my last varnishstat looked like moments before a crash
>>>> regarding the bans:
>>>>
>>>> MAIN.bans                     41336          .   Count of bans
>>>> MAIN.bans_completed           37967          .   Number of bans marked
>>>> 'completed'
>>>> MAIN.bans_obj                     0          .   Number of bans using
>>>> obj.*
>>>> MAIN.bans_req                 41335          .   Number of bans using
>>>> req.*
>>>> MAIN.bans_added               41336         0.68 Bans added
>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>
>>>> And this is how it looks like now:
>>>>
>>>> MAIN.bans                         2          .   Count of bans
>>>> MAIN.bans_completed               1          .   Number of bans marked
>>>> 'completed'
>>>> MAIN.bans_obj                     2          .   Number of bans using
>>>> obj.*
>>>> MAIN.bans_req                     0          .   Number of bans using
>>>> req.*
>>>> MAIN.bans_added                2016         0.69 Bans added
>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>
>>>> Before the changes, bans were never deleted!
>>>> Now the bans are added and quickly deleted after a minute or even a
>>>> couple of seconds.
>>>>
>>>> May this was the cause of the problem? It seems like varnish was having
>>>> a large number of bans to manage and test against.
>>>> I will let it ride now. Let's see if the problem persists or it's gone!
>>>> :-)
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Looking good!
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi Guillaume,
>>>>>>
>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>
>>>>>> sub vcl_backend_response {
>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>> }
>>>>>>
>>>>>> sub vcl_recv {
>>>>>>   if (req.method == "PURGE") {
>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>     return(synth(750));
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_deliver {
>>>>>>   unset resp.http.x-url;
>>>>>>   unset resp.http.x-user-agent;
>>>>>> }
>>>>>>
>>>>>> Best,
>>>>>> Stefano
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>
>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>
>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Quintard
>>>>>>>
>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Guillaume.
>>>>>>>
>>>>>>> Thanks for answering.
>>>>>>>
>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>> performance but it stills restarting.
>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>> signal of overhead.
>>>>>>>
>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was
>>>>>>> a problem here. After a couple of hours varnish died and I received a "no
>>>>>>> space left on device" message - deleting the /var/lib/varnish solved the
>>>>>>> problem and varnish was up again, but it's weird because there was free
>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>> /var/lib/varnish size.
>>>>>>>
>>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>
>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url
>>>>>>> + " && req.http.User-Agent !~ Googlebot");
>>>>>>>
>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>> documentation and it looks like they're not.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>
>>>>>>>> Hi Stefano,
>>>>>>>>
>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>
>>>>>>>> After some time, the file storage is terrible on a hard drive (SSDs
>>>>>>>> take a bit more time to degrade) because of fragmentation. One solution to
>>>>>>>> help the disks cope is to overprovision themif they're SSDs, and you can
>>>>>>>> try different advices in the file storage definition in the command line
>>>>>>>> (last parameter, after granularity).
>>>>>>>>
>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>
>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Quintard
>>>>>>>>
>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello.
>>>>>>>>>
>>>>>>>>> I am having a critical problem with Varnish Cache in production
>>>>>>>>> for over a month and any help will be appreciated.
>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>
>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>>> responding to CLI, killed it.
>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>>>> signal=9
>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>> complete
>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>> Child starts
>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>
>>>>>>>>> The following link is the varnishstat output just 1 minute before
>>>>>>>>> a restart:
>>>>>>>>>
>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>
>>>>>>>>> Environment:
>>>>>>>>>
>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>> packagecloud.io
>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>> Mem 3.69 GiB
>>>>>>>>> Running inside a Docker container
>>>>>>>>> NFILES=131072
>>>>>>>>> MEMLOCK=82000
>>>>>>>>>
>>>>>>>>> Additional info:
>>>>>>>>>
>>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>>> if this is a problem;
>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>> anything to do with it;
>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>> syslog;
>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano Baldo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> varnish-misc mailing list
>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/4cf733de/attachment.html>

From guillaume at varnish-software.com  Wed Jun 28 13:26:20 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Wed, 28 Jun 2017 15:26:20 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
Message-ID: <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>

Hi,

can you look that "varnishstat -1 | grep g_bytes" and see if if matches the
memory you are seeing?

-- 
Guillaume Quintard

On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Hi Guillaume.
>
> I increased the cli_timeout yesterday to 900sec (15min) and it restarted
> anyway, which seems to indicate that the thread is really stalled.
>
> This was 1 minute after the last restart:
>
> MAIN.n_object               3908216          .   object structs made
> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>
> I've just changed the I/O Scheduler to noop to see what happens.
>
> One interest thing I've found is about the memory usage.
>
> In the 1st minute of use:
> MemTotal:        3865572 kB
> MemFree:          120768 kB
> MemAvailable:    2300268 kB
>
> 1 minute before a restart:
> MemTotal:        3865572 kB
> MemFree:           82480 kB
> MemAvailable:      68316 kB
>
> It seems like the system is possibly running out of memory.
>
> When calling varnishd, I'm specifying only "-s file,..." as storage. I see
> in some examples that is common to use "-s file" AND "-s malloc" together.
> Should I be passing "-s malloc" as well to somehow try to limit the memory
> usage by varnishd?
>
> Best,
> Stefano
>
>
> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Sadly, nothing suspicious here, you can still try:
>> - bumping the cli_timeout
>> - changing your disk scheduler
>> - changing the advice option of the file storage
>>
>> I'm still convinced this is due to Varnish getting stuck waiting for the
>> disk because of the file storage fragmentation.
>>
>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>> objects. Ideally, we would have a 1:1 relation between objects and
>> allocations. If that number drops prior to a restart, that would be a good
>> clue.
>>
>>
>> --
>> Guillaume Quintard
>>
>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hi Guillaume.
>>>
>>> It keeps restarting.
>>> Would you mind taking a quick look in the following VCL file to check if
>>> you find anything suspicious?
>>>
>>> Thank you very much.
>>>
>>> Best,
>>> Stefano
>>>
>>> vcl 4.0;
>>>
>>> import std;
>>>
>>> backend default {
>>>   .host = "sites-web-server-lb";
>>>   .port = "80";
>>> }
>>>
>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>
>>> sub vcl_recv {
>>>   call bad_bot_detection;
>>>
>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>     return(pass);
>>>   }
>>>
>>>   unset req.http.Cookie;
>>>   if (req.method == "PURGE") {
>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>> obj.http.x-user-agent !~ Googlebot");
>>>     return(synth(750));
>>>   }
>>>
>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>> }
>>>
>>> sub vcl_synth {
>>>   if (resp.status == 750) {
>>>     set resp.status = 200;
>>>     synthetic("PURGED => " + req.url);
>>>     return(deliver);
>>>   } elsif (resp.status == 501) {
>>>     set resp.status = 200;
>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>>>     return(deliver);
>>>   }
>>> }
>>>
>>> sub vcl_backend_response {
>>>   unset beresp.http.Set-Cookie;
>>>   set beresp.http.x-host = bereq.http.host;
>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>
>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>     set beresp.http.Cache-Control = "max-age=0";
>>>   } else {
>>>     unset beresp.http.Cache-Control;
>>>   }
>>>
>>>   if (beresp.status == 200 ||
>>>     beresp.status == 301 ||
>>>     beresp.status == 302 ||
>>>     beresp.status == 404) {
>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>         set beresp.http.X-TTL = "1d";
>>>         set beresp.ttl = 1d;
>>>       } else {
>>>         set beresp.http.X-TTL = "1w";
>>>         set beresp.ttl = 1w;
>>>       }
>>>   }
>>>
>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>> {
>>>     set beresp.do_gzip = true;
>>>   }
>>> }
>>>
>>> sub vcl_pipe {
>>>   set bereq.http.connection = "close";
>>>   return (pipe);
>>> }
>>>
>>> sub vcl_deliver {
>>>   unset resp.http.x-host;
>>>   unset resp.http.x-user-agent;
>>> }
>>>
>>> sub vcl_backend_error {
>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
>>> 504) {
>>>     set beresp.status = 200;
>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>     return (deliver);
>>>   }
>>> }
>>>
>>> sub vcl_hash {
>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>     hash_data("Google Page Speed");
>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>     hash_data("Googlebot");
>>>   }
>>> }
>>>
>>> sub vcl_deliver {
>>>   if (resp.status == 501) {
>>>     return (synth(resp.status));
>>>   }
>>>   if (obj.hits > 0) {
>>>     set resp.http.X-Cache = "hit";
>>>   } else {
>>>     set resp.http.X-Cache = "miss";
>>>   }
>>> }
>>>
>>>
>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Nice! It may have been the cause, time will tell.can you report back in
>>>> a few days to let us know?
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> I think things will start to going better now after changing the bans.
>>>>> This is how my last varnishstat looked like moments before a crash
>>>>> regarding the bans:
>>>>>
>>>>> MAIN.bans                     41336          .   Count of bans
>>>>> MAIN.bans_completed           37967          .   Number of bans marked
>>>>> 'completed'
>>>>> MAIN.bans_obj                     0          .   Number of bans using
>>>>> obj.*
>>>>> MAIN.bans_req                 41335          .   Number of bans using
>>>>> req.*
>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>
>>>>> And this is how it looks like now:
>>>>>
>>>>> MAIN.bans                         2          .   Count of bans
>>>>> MAIN.bans_completed               1          .   Number of bans marked
>>>>> 'completed'
>>>>> MAIN.bans_obj                     2          .   Number of bans using
>>>>> obj.*
>>>>> MAIN.bans_req                     0          .   Number of bans using
>>>>> req.*
>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>
>>>>> Before the changes, bans were never deleted!
>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>> couple of seconds.
>>>>>
>>>>> May this was the cause of the problem? It seems like varnish was
>>>>> having a large number of bans to manage and test against.
>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>> gone! :-)
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Looking good!
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Guillaume,
>>>>>>>
>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>
>>>>>>> sub vcl_backend_response {
>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_recv {
>>>>>>>   if (req.method == "PURGE") {
>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>     return(synth(750));
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_deliver {
>>>>>>>   unset resp.http.x-url;
>>>>>>>   unset resp.http.x-user-agent;
>>>>>>> }
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>
>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>
>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Quintard
>>>>>>>>
>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Guillaume.
>>>>>>>>
>>>>>>>> Thanks for answering.
>>>>>>>>
>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>> performance but it stills restarting.
>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>> signal of overhead.
>>>>>>>>
>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>> /var/lib/varnish size.
>>>>>>>>
>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>
>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>
>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>> documentation and it looks like they're not.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stefano
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Stefano,
>>>>>>>>>
>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>
>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>
>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>
>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Quintard
>>>>>>>>>
>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello.
>>>>>>>>>>
>>>>>>>>>> I am having a critical problem with Varnish Cache in production
>>>>>>>>>> for over a month and any help will be appreciated.
>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>
>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>>>> responding to CLI, killed it.
>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>>>>> signal=9
>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>> complete
>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>> Started
>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>>> Child starts
>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>
>>>>>>>>>> The following link is the varnishstat output just 1 minute before
>>>>>>>>>> a restart:
>>>>>>>>>>
>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>
>>>>>>>>>> Environment:
>>>>>>>>>>
>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>> packagecloud.io
>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>> Running inside a Docker container
>>>>>>>>>> NFILES=131072
>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>
>>>>>>>>>> Additional info:
>>>>>>>>>>
>>>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>>>> if this is a problem;
>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>> anything to do with it;
>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>> syslog;
>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stefano Baldo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/caab97d2/attachment.html>

From stefanobaldo at gmail.com  Wed Jun 28 13:39:52 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Wed, 28 Jun 2017 10:39:52 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
Message-ID: <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>

Hi.

root at 2c6c325b279f:/# varnishstat -1 | grep g_bytes
SMA.Transient.g_bytes                    519022          .   Bytes
outstanding
SMF.s0.g_bytes                      23662845952          .   Bytes
outstanding

You mean g_bytes from SMA.Transient? I have set no malloc storage.


On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Hi,
>
> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
> the memory you are seeing?
>
> --
> Guillaume Quintard
>
> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hi Guillaume.
>>
>> I increased the cli_timeout yesterday to 900sec (15min) and it restarted
>> anyway, which seems to indicate that the thread is really stalled.
>>
>> This was 1 minute after the last restart:
>>
>> MAIN.n_object               3908216          .   object structs made
>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>
>> I've just changed the I/O Scheduler to noop to see what happens.
>>
>> One interest thing I've found is about the memory usage.
>>
>> In the 1st minute of use:
>> MemTotal:        3865572 kB
>> MemFree:          120768 kB
>> MemAvailable:    2300268 kB
>>
>> 1 minute before a restart:
>> MemTotal:        3865572 kB
>> MemFree:           82480 kB
>> MemAvailable:      68316 kB
>>
>> It seems like the system is possibly running out of memory.
>>
>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>> see in some examples that is common to use "-s file" AND "-s malloc"
>> together. Should I be passing "-s malloc" as well to somehow try to limit
>> the memory usage by varnishd?
>>
>> Best,
>> Stefano
>>
>>
>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Sadly, nothing suspicious here, you can still try:
>>> - bumping the cli_timeout
>>> - changing your disk scheduler
>>> - changing the advice option of the file storage
>>>
>>> I'm still convinced this is due to Varnish getting stuck waiting for the
>>> disk because of the file storage fragmentation.
>>>
>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>> objects. Ideally, we would have a 1:1 relation between objects and
>>> allocations. If that number drops prior to a restart, that would be a good
>>> clue.
>>>
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume.
>>>>
>>>> It keeps restarting.
>>>> Would you mind taking a quick look in the following VCL file to check
>>>> if you find anything suspicious?
>>>>
>>>> Thank you very much.
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>> vcl 4.0;
>>>>
>>>> import std;
>>>>
>>>> backend default {
>>>>   .host = "sites-web-server-lb";
>>>>   .port = "80";
>>>> }
>>>>
>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>
>>>> sub vcl_recv {
>>>>   call bad_bot_detection;
>>>>
>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>     return(pass);
>>>>   }
>>>>
>>>>   unset req.http.Cookie;
>>>>   if (req.method == "PURGE") {
>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>> obj.http.x-user-agent !~ Googlebot");
>>>>     return(synth(750));
>>>>   }
>>>>
>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>> }
>>>>
>>>> sub vcl_synth {
>>>>   if (resp.status == 750) {
>>>>     set resp.status = 200;
>>>>     synthetic("PURGED => " + req.url);
>>>>     return(deliver);
>>>>   } elsif (resp.status == 501) {
>>>>     set resp.status = 200;
>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>>>>     return(deliver);
>>>>   }
>>>> }
>>>>
>>>> sub vcl_backend_response {
>>>>   unset beresp.http.Set-Cookie;
>>>>   set beresp.http.x-host = bereq.http.host;
>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>
>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>   } else {
>>>>     unset beresp.http.Cache-Control;
>>>>   }
>>>>
>>>>   if (beresp.status == 200 ||
>>>>     beresp.status == 301 ||
>>>>     beresp.status == 302 ||
>>>>     beresp.status == 404) {
>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>         set beresp.http.X-TTL = "1d";
>>>>         set beresp.ttl = 1d;
>>>>       } else {
>>>>         set beresp.http.X-TTL = "1w";
>>>>         set beresp.ttl = 1w;
>>>>       }
>>>>   }
>>>>
>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>> {
>>>>     set beresp.do_gzip = true;
>>>>   }
>>>> }
>>>>
>>>> sub vcl_pipe {
>>>>   set bereq.http.connection = "close";
>>>>   return (pipe);
>>>> }
>>>>
>>>> sub vcl_deliver {
>>>>   unset resp.http.x-host;
>>>>   unset resp.http.x-user-agent;
>>>> }
>>>>
>>>> sub vcl_backend_error {
>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
>>>> 504) {
>>>>     set beresp.status = 200;
>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>     return (deliver);
>>>>   }
>>>> }
>>>>
>>>> sub vcl_hash {
>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>     hash_data("Google Page Speed");
>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>     hash_data("Googlebot");
>>>>   }
>>>> }
>>>>
>>>> sub vcl_deliver {
>>>>   if (resp.status == 501) {
>>>>     return (synth(resp.status));
>>>>   }
>>>>   if (obj.hits > 0) {
>>>>     set resp.http.X-Cache = "hit";
>>>>   } else {
>>>>     set resp.http.X-Cache = "miss";
>>>>   }
>>>> }
>>>>
>>>>
>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>> in a few days to let us know?
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>>>
>>>>>> Hi Guillaume.
>>>>>>
>>>>>> I think things will start to going better now after changing the bans.
>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>> regarding the bans:
>>>>>>
>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>> marked 'completed'
>>>>>> MAIN.bans_obj                     0          .   Number of bans using
>>>>>> obj.*
>>>>>> MAIN.bans_req                 41335          .   Number of bans using
>>>>>> req.*
>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>
>>>>>> And this is how it looks like now:
>>>>>>
>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>> marked 'completed'
>>>>>> MAIN.bans_obj                     2          .   Number of bans using
>>>>>> obj.*
>>>>>> MAIN.bans_req                     0          .   Number of bans using
>>>>>> req.*
>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>
>>>>>> Before the changes, bans were never deleted!
>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>> couple of seconds.
>>>>>>
>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>> having a large number of bans to manage and test against.
>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>> gone! :-)
>>>>>>
>>>>>> Best,
>>>>>> Stefano
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>
>>>>>>> Looking good!
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Quintard
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Guillaume,
>>>>>>>>
>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>
>>>>>>>> sub vcl_backend_response {
>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>> }
>>>>>>>>
>>>>>>>> sub vcl_recv {
>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>     return(synth(750));
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>> sub vcl_deliver {
>>>>>>>>   unset resp.http.x-url;
>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stefano
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>
>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>
>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Quintard
>>>>>>>>>
>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Guillaume.
>>>>>>>>>
>>>>>>>>> Thanks for answering.
>>>>>>>>>
>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>> performance but it stills restarting.
>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>> signal of overhead.
>>>>>>>>>
>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>
>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>
>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>
>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Stefano,
>>>>>>>>>>
>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>
>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>
>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>
>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello.
>>>>>>>>>>>
>>>>>>>>>>> I am having a critical problem with Varnish Cache in production
>>>>>>>>>>> for over a month and any help will be appreciated.
>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>
>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>>>>> responding to CLI, killed it.
>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>>>>>> signal=9
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>> complete
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>> Started
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>>>> Child starts
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>
>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>> before a restart:
>>>>>>>>>>>
>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>
>>>>>>>>>>> Environment:
>>>>>>>>>>>
>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>> packagecloud.io
>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>> NFILES=131072
>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>
>>>>>>>>>>> Additional info:
>>>>>>>>>>>
>>>>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>>>>> if this is a problem;
>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>> anything to do with it;
>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>> syslog;
>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>> -misc
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/6adf63ba/attachment.html>

From guillaume at varnish-software.com  Wed Jun 28 13:43:55 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Wed, 28 Jun 2017 15:43:55 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
 <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
Message-ID: <CAJ6ZYQzd9gr6j0QQkUn_=LQcUu4B2qa15X1_Wdgg7AbnixqiBA@mail.gmail.com>

Yeah, I was wondering about Transient, but it seems under control.

Apart from moving away from file storage, I have nothing at the moment :-/

-- 
Guillaume Quintard

On Wed, Jun 28, 2017 at 3:39 PM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Hi.
>
> root at 2c6c325b279f:/# varnishstat -1 | grep g_bytes
> SMA.Transient.g_bytes                    519022          .   Bytes
> outstanding
> SMF.s0.g_bytes                      23662845952          .   Bytes
> outstanding
>
> You mean g_bytes from SMA.Transient? I have set no malloc storage.
>
>
> On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Hi,
>>
>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>> the memory you are seeing?
>>
>> --
>> Guillaume Quintard
>>
>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hi Guillaume.
>>>
>>> I increased the cli_timeout yesterday to 900sec (15min) and it restarted
>>> anyway, which seems to indicate that the thread is really stalled.
>>>
>>> This was 1 minute after the last restart:
>>>
>>> MAIN.n_object               3908216          .   object structs made
>>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>>
>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>
>>> One interest thing I've found is about the memory usage.
>>>
>>> In the 1st minute of use:
>>> MemTotal:        3865572 kB
>>> MemFree:          120768 kB
>>> MemAvailable:    2300268 kB
>>>
>>> 1 minute before a restart:
>>> MemTotal:        3865572 kB
>>> MemFree:           82480 kB
>>> MemAvailable:      68316 kB
>>>
>>> It seems like the system is possibly running out of memory.
>>>
>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>> the memory usage by varnishd?
>>>
>>> Best,
>>> Stefano
>>>
>>>
>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Sadly, nothing suspicious here, you can still try:
>>>> - bumping the cli_timeout
>>>> - changing your disk scheduler
>>>> - changing the advice option of the file storage
>>>>
>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>> the disk because of the file storage fragmentation.
>>>>
>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>> allocations. If that number drops prior to a restart, that would be a good
>>>> clue.
>>>>
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> It keeps restarting.
>>>>> Would you mind taking a quick look in the following VCL file to check
>>>>> if you find anything suspicious?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>> vcl 4.0;
>>>>>
>>>>> import std;
>>>>>
>>>>> backend default {
>>>>>   .host = "sites-web-server-lb";
>>>>>   .port = "80";
>>>>> }
>>>>>
>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>
>>>>> sub vcl_recv {
>>>>>   call bad_bot_detection;
>>>>>
>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>     return(pass);
>>>>>   }
>>>>>
>>>>>   unset req.http.Cookie;
>>>>>   if (req.method == "PURGE") {
>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>     return(synth(750));
>>>>>   }
>>>>>
>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>> }
>>>>>
>>>>> sub vcl_synth {
>>>>>   if (resp.status == 750) {
>>>>>     set resp.status = 200;
>>>>>     synthetic("PURGED => " + req.url);
>>>>>     return(deliver);
>>>>>   } elsif (resp.status == 501) {
>>>>>     set resp.status = 200;
>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>>>>>     return(deliver);
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_backend_response {
>>>>>   unset beresp.http.Set-Cookie;
>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>
>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>   } else {
>>>>>     unset beresp.http.Cache-Control;
>>>>>   }
>>>>>
>>>>>   if (beresp.status == 200 ||
>>>>>     beresp.status == 301 ||
>>>>>     beresp.status == 302 ||
>>>>>     beresp.status == 404) {
>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>         set beresp.http.X-TTL = "1d";
>>>>>         set beresp.ttl = 1d;
>>>>>       } else {
>>>>>         set beresp.http.X-TTL = "1w";
>>>>>         set beresp.ttl = 1w;
>>>>>       }
>>>>>   }
>>>>>
>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>> {
>>>>>     set beresp.do_gzip = true;
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_pipe {
>>>>>   set bereq.http.connection = "close";
>>>>>   return (pipe);
>>>>> }
>>>>>
>>>>> sub vcl_deliver {
>>>>>   unset resp.http.x-host;
>>>>>   unset resp.http.x-user-agent;
>>>>> }
>>>>>
>>>>> sub vcl_backend_error {
>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
>>>>> 504) {
>>>>>     set beresp.status = 200;
>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>     return (deliver);
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_hash {
>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>     hash_data("Google Page Speed");
>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>     hash_data("Googlebot");
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_deliver {
>>>>>   if (resp.status == 501) {
>>>>>     return (synth(resp.status));
>>>>>   }
>>>>>   if (obj.hits > 0) {
>>>>>     set resp.http.X-Cache = "hit";
>>>>>   } else {
>>>>>     set resp.http.X-Cache = "miss";
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>>> in a few days to let us know?
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Guillaume.
>>>>>>>
>>>>>>> I think things will start to going better now after changing the
>>>>>>> bans.
>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>> regarding the bans:
>>>>>>>
>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>> marked 'completed'
>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>> using obj.*
>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>> using req.*
>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>
>>>>>>> And this is how it looks like now:
>>>>>>>
>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>> marked 'completed'
>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>> using obj.*
>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>> using req.*
>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>
>>>>>>> Before the changes, bans were never deleted!
>>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>>> couple of seconds.
>>>>>>>
>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>> having a large number of bans to manage and test against.
>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>> gone! :-)
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>
>>>>>>>> Looking good!
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Quintard
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Guillaume,
>>>>>>>>>
>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>
>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> sub vcl_recv {
>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>     return(synth(750));
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> sub vcl_deliver {
>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>
>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>
>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>
>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>
>>>>>>>>>> Thanks for answering.
>>>>>>>>>>
>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>>> performance but it stills restarting.
>>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>>> signal of overhead.
>>>>>>>>>>
>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>>
>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>
>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>
>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stefano
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>
>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>>
>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>
>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>
>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello.
>>>>>>>>>>>>
>>>>>>>>>>>> I am having a critical problem with Varnish Cache in production
>>>>>>>>>>>> for over a month and any help will be appreciated.
>>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>>>>>> responding to CLI, killed it.
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>> died signal=9
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>> complete
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> Started
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> said Child starts
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>
>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>
>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>
>>>>>>>>>>>> Environment:
>>>>>>>>>>>>
>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>
>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>
>>>>>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>>>>>> if this is a problem;
>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>>> anything to do with it;
>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>> syslog;
>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>> -misc
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/2684f6ae/attachment.html>

From stefanobaldo at gmail.com  Wed Jun 28 13:47:17 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Wed, 28 Jun 2017 10:47:17 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQzd9gr6j0QQkUn_=LQcUu4B2qa15X1_Wdgg7AbnixqiBA@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
 <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
 <CAJ6ZYQzd9gr6j0QQkUn_=LQcUu4B2qa15X1_Wdgg7AbnixqiBA@mail.gmail.com>
Message-ID: <CA+i_oAfEabCVGB62v-K+0h8oe1dgA3=qGB-fddNxLTj7YcfOeA@mail.gmail.com>

SMA.Transient.g_alloc                      3518          .   Allocations
outstanding
SMA.Transient.g_bytes                    546390          .   Bytes
outstanding
SMA.Transient.g_space                         0          .   Bytes available

g_space is always 0. It could mean anything?

On Wed, Jun 28, 2017 at 10:43 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Yeah, I was wondering about Transient, but it seems under control.
>
> Apart from moving away from file storage, I have nothing at the moment :-/
>
> --
> Guillaume Quintard
>
> On Wed, Jun 28, 2017 at 3:39 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hi.
>>
>> root at 2c6c325b279f:/# varnishstat -1 | grep g_bytes
>> SMA.Transient.g_bytes                    519022          .   Bytes
>> outstanding
>> SMF.s0.g_bytes                      23662845952          .   Bytes
>> outstanding
>>
>> You mean g_bytes from SMA.Transient? I have set no malloc storage.
>>
>>
>> On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Hi,
>>>
>>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>>> the memory you are seeing?
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume.
>>>>
>>>> I increased the cli_timeout yesterday to 900sec (15min) and it
>>>> restarted anyway, which seems to indicate that the thread is really stalled.
>>>>
>>>> This was 1 minute after the last restart:
>>>>
>>>> MAIN.n_object               3908216          .   object structs made
>>>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>>>
>>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>>
>>>> One interest thing I've found is about the memory usage.
>>>>
>>>> In the 1st minute of use:
>>>> MemTotal:        3865572 kB
>>>> MemFree:          120768 kB
>>>> MemAvailable:    2300268 kB
>>>>
>>>> 1 minute before a restart:
>>>> MemTotal:        3865572 kB
>>>> MemFree:           82480 kB
>>>> MemAvailable:      68316 kB
>>>>
>>>> It seems like the system is possibly running out of memory.
>>>>
>>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>>> the memory usage by varnishd?
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Sadly, nothing suspicious here, you can still try:
>>>>> - bumping the cli_timeout
>>>>> - changing your disk scheduler
>>>>> - changing the advice option of the file storage
>>>>>
>>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>>> the disk because of the file storage fragmentation.
>>>>>
>>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>>> allocations. If that number drops prior to a restart, that would be a good
>>>>> clue.
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <
>>>>> stefanobaldo at gmail.com> wrote:
>>>>>
>>>>>> Hi Guillaume.
>>>>>>
>>>>>> It keeps restarting.
>>>>>> Would you mind taking a quick look in the following VCL file to check
>>>>>> if you find anything suspicious?
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Best,
>>>>>> Stefano
>>>>>>
>>>>>> vcl 4.0;
>>>>>>
>>>>>> import std;
>>>>>>
>>>>>> backend default {
>>>>>>   .host = "sites-web-server-lb";
>>>>>>   .port = "80";
>>>>>> }
>>>>>>
>>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>>
>>>>>> sub vcl_recv {
>>>>>>   call bad_bot_detection;
>>>>>>
>>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>>     return(pass);
>>>>>>   }
>>>>>>
>>>>>>   unset req.http.Cookie;
>>>>>>   if (req.method == "PURGE") {
>>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>     return(synth(750));
>>>>>>   }
>>>>>>
>>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>>> }
>>>>>>
>>>>>> sub vcl_synth {
>>>>>>   if (resp.status == 750) {
>>>>>>     set resp.status = 200;
>>>>>>     synthetic("PURGED => " + req.url);
>>>>>>     return(deliver);
>>>>>>   } elsif (resp.status == 501) {
>>>>>>     set resp.status = 200;
>>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.ht
>>>>>> ml"));
>>>>>>     return(deliver);
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_backend_response {
>>>>>>   unset beresp.http.Set-Cookie;
>>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>
>>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>>   } else {
>>>>>>     unset beresp.http.Cache-Control;
>>>>>>   }
>>>>>>
>>>>>>   if (beresp.status == 200 ||
>>>>>>     beresp.status == 301 ||
>>>>>>     beresp.status == 302 ||
>>>>>>     beresp.status == 404) {
>>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>>         set beresp.http.X-TTL = "1d";
>>>>>>         set beresp.ttl = 1d;
>>>>>>       } else {
>>>>>>         set beresp.http.X-TTL = "1w";
>>>>>>         set beresp.ttl = 1w;
>>>>>>       }
>>>>>>   }
>>>>>>
>>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>>> {
>>>>>>     set beresp.do_gzip = true;
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_pipe {
>>>>>>   set bereq.http.connection = "close";
>>>>>>   return (pipe);
>>>>>> }
>>>>>>
>>>>>> sub vcl_deliver {
>>>>>>   unset resp.http.x-host;
>>>>>>   unset resp.http.x-user-agent;
>>>>>> }
>>>>>>
>>>>>> sub vcl_backend_error {
>>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status
>>>>>> == 504) {
>>>>>>     set beresp.status = 200;
>>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>>     return (deliver);
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_hash {
>>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>>     hash_data("Google Page Speed");
>>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>>     hash_data("Googlebot");
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_deliver {
>>>>>>   if (resp.status == 501) {
>>>>>>     return (synth(resp.status));
>>>>>>   }
>>>>>>   if (obj.hits > 0) {
>>>>>>     set resp.http.X-Cache = "hit";
>>>>>>   } else {
>>>>>>     set resp.http.X-Cache = "miss";
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>
>>>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>>>> in a few days to let us know?
>>>>>>> --
>>>>>>> Guillaume Quintard
>>>>>>>
>>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Guillaume.
>>>>>>>>
>>>>>>>> I think things will start to going better now after changing the
>>>>>>>> bans.
>>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>>> regarding the bans:
>>>>>>>>
>>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>>> marked 'completed'
>>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>>> using obj.*
>>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>>> using req.*
>>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>>
>>>>>>>> And this is how it looks like now:
>>>>>>>>
>>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>>> marked 'completed'
>>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>>> using obj.*
>>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>>> using req.*
>>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>>
>>>>>>>> Before the changes, bans were never deleted!
>>>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>>>> couple of seconds.
>>>>>>>>
>>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>>> having a large number of bans to manage and test against.
>>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>>> gone! :-)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stefano
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>
>>>>>>>>> Looking good!
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Quintard
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Guillaume,
>>>>>>>>>>
>>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>>
>>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> sub vcl_recv {
>>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>>     return(synth(750));
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> sub vcl_deliver {
>>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stefano
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>>
>>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>
>>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for answering.
>>>>>>>>>>>
>>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>>>> performance but it stills restarting.
>>>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>>>> signal of overhead.
>>>>>>>>>>>
>>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans
>>>>>>>>>>> are lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>>
>>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>>
>>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stefano
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>>
>>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>>>
>>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>>
>>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>>
>>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am having a critical problem with Varnish Cache in
>>>>>>>>>>>>> production for over a month and any help will be appreciated.
>>>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>> not responding to CLI, killed it.
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>> died signal=9
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>>> complete
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> Started
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> said Child starts
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>>
>>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>>
>>>>>>>>>>>>> Environment:
>>>>>>>>>>>>>
>>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>>
>>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - I need to cache a large number of objets and the cache
>>>>>>>>>>>>> should last for almost a week, so I have set up a 450G storage space, I
>>>>>>>>>>>>> don't know if this is a problem;
>>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>>>> anything to do with it;
>>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>>> syslog;
>>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>>> -misc
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/39cb1ee5/attachment.html>

From stefanobaldo at gmail.com  Wed Jun 28 13:54:28 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Wed, 28 Jun 2017 10:54:28 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQzd9gr6j0QQkUn_=LQcUu4B2qa15X1_Wdgg7AbnixqiBA@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
 <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
 <CAJ6ZYQzd9gr6j0QQkUn_=LQcUu4B2qa15X1_Wdgg7AbnixqiBA@mail.gmail.com>
Message-ID: <CA+i_oAf4HvauoLsU389FvouVDp4Vs-t80qRUXsrQCky=EKqdXg@mail.gmail.com>

Also, we are running varnish inside a docker container.
The storage disk is attached to the same host, and mounted to the container
via docker volume.

Do you think it's worth a try to run varnish directly on the host, avoiding
docker? I don't see how this could be a problem but I don't know what to do
anymore.

Best,


On Wed, Jun 28, 2017 at 10:43 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Yeah, I was wondering about Transient, but it seems under control.
>
> Apart from moving away from file storage, I have nothing at the moment :-/
>
> --
> Guillaume Quintard
>
> On Wed, Jun 28, 2017 at 3:39 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hi.
>>
>> root at 2c6c325b279f:/# varnishstat -1 | grep g_bytes
>> SMA.Transient.g_bytes                    519022          .   Bytes
>> outstanding
>> SMF.s0.g_bytes                      23662845952          .   Bytes
>> outstanding
>>
>> You mean g_bytes from SMA.Transient? I have set no malloc storage.
>>
>>
>> On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Hi,
>>>
>>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>>> the memory you are seeing?
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume.
>>>>
>>>> I increased the cli_timeout yesterday to 900sec (15min) and it
>>>> restarted anyway, which seems to indicate that the thread is really stalled.
>>>>
>>>> This was 1 minute after the last restart:
>>>>
>>>> MAIN.n_object               3908216          .   object structs made
>>>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>>>
>>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>>
>>>> One interest thing I've found is about the memory usage.
>>>>
>>>> In the 1st minute of use:
>>>> MemTotal:        3865572 kB
>>>> MemFree:          120768 kB
>>>> MemAvailable:    2300268 kB
>>>>
>>>> 1 minute before a restart:
>>>> MemTotal:        3865572 kB
>>>> MemFree:           82480 kB
>>>> MemAvailable:      68316 kB
>>>>
>>>> It seems like the system is possibly running out of memory.
>>>>
>>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>>> the memory usage by varnishd?
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Sadly, nothing suspicious here, you can still try:
>>>>> - bumping the cli_timeout
>>>>> - changing your disk scheduler
>>>>> - changing the advice option of the file storage
>>>>>
>>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>>> the disk because of the file storage fragmentation.
>>>>>
>>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>>> allocations. If that number drops prior to a restart, that would be a good
>>>>> clue.
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <
>>>>> stefanobaldo at gmail.com> wrote:
>>>>>
>>>>>> Hi Guillaume.
>>>>>>
>>>>>> It keeps restarting.
>>>>>> Would you mind taking a quick look in the following VCL file to check
>>>>>> if you find anything suspicious?
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Best,
>>>>>> Stefano
>>>>>>
>>>>>> vcl 4.0;
>>>>>>
>>>>>> import std;
>>>>>>
>>>>>> backend default {
>>>>>>   .host = "sites-web-server-lb";
>>>>>>   .port = "80";
>>>>>> }
>>>>>>
>>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>>
>>>>>> sub vcl_recv {
>>>>>>   call bad_bot_detection;
>>>>>>
>>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>>     return(pass);
>>>>>>   }
>>>>>>
>>>>>>   unset req.http.Cookie;
>>>>>>   if (req.method == "PURGE") {
>>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>     return(synth(750));
>>>>>>   }
>>>>>>
>>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>>> }
>>>>>>
>>>>>> sub vcl_synth {
>>>>>>   if (resp.status == 750) {
>>>>>>     set resp.status = 200;
>>>>>>     synthetic("PURGED => " + req.url);
>>>>>>     return(deliver);
>>>>>>   } elsif (resp.status == 501) {
>>>>>>     set resp.status = 200;
>>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.ht
>>>>>> ml"));
>>>>>>     return(deliver);
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_backend_response {
>>>>>>   unset beresp.http.Set-Cookie;
>>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>
>>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>>   } else {
>>>>>>     unset beresp.http.Cache-Control;
>>>>>>   }
>>>>>>
>>>>>>   if (beresp.status == 200 ||
>>>>>>     beresp.status == 301 ||
>>>>>>     beresp.status == 302 ||
>>>>>>     beresp.status == 404) {
>>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>>         set beresp.http.X-TTL = "1d";
>>>>>>         set beresp.ttl = 1d;
>>>>>>       } else {
>>>>>>         set beresp.http.X-TTL = "1w";
>>>>>>         set beresp.ttl = 1w;
>>>>>>       }
>>>>>>   }
>>>>>>
>>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>>> {
>>>>>>     set beresp.do_gzip = true;
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_pipe {
>>>>>>   set bereq.http.connection = "close";
>>>>>>   return (pipe);
>>>>>> }
>>>>>>
>>>>>> sub vcl_deliver {
>>>>>>   unset resp.http.x-host;
>>>>>>   unset resp.http.x-user-agent;
>>>>>> }
>>>>>>
>>>>>> sub vcl_backend_error {
>>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status
>>>>>> == 504) {
>>>>>>     set beresp.status = 200;
>>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>>     return (deliver);
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_hash {
>>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>>     hash_data("Google Page Speed");
>>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>>     hash_data("Googlebot");
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_deliver {
>>>>>>   if (resp.status == 501) {
>>>>>>     return (synth(resp.status));
>>>>>>   }
>>>>>>   if (obj.hits > 0) {
>>>>>>     set resp.http.X-Cache = "hit";
>>>>>>   } else {
>>>>>>     set resp.http.X-Cache = "miss";
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>
>>>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>>>> in a few days to let us know?
>>>>>>> --
>>>>>>> Guillaume Quintard
>>>>>>>
>>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Guillaume.
>>>>>>>>
>>>>>>>> I think things will start to going better now after changing the
>>>>>>>> bans.
>>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>>> regarding the bans:
>>>>>>>>
>>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>>> marked 'completed'
>>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>>> using obj.*
>>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>>> using req.*
>>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>>
>>>>>>>> And this is how it looks like now:
>>>>>>>>
>>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>>> marked 'completed'
>>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>>> using obj.*
>>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>>> using req.*
>>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>>
>>>>>>>> Before the changes, bans were never deleted!
>>>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>>>> couple of seconds.
>>>>>>>>
>>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>>> having a large number of bans to manage and test against.
>>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>>> gone! :-)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stefano
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>
>>>>>>>>> Looking good!
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Quintard
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Guillaume,
>>>>>>>>>>
>>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>>
>>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> sub vcl_recv {
>>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>>     return(synth(750));
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> sub vcl_deliver {
>>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stefano
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>>
>>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>
>>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for answering.
>>>>>>>>>>>
>>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>>>> performance but it stills restarting.
>>>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>>>> signal of overhead.
>>>>>>>>>>>
>>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans
>>>>>>>>>>> are lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>>
>>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>>
>>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stefano
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>>
>>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>>>
>>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>>
>>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>>
>>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am having a critical problem with Varnish Cache in
>>>>>>>>>>>>> production for over a month and any help will be appreciated.
>>>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>> not responding to CLI, killed it.
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>> died signal=9
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>>> complete
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> Started
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> said Child starts
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>>
>>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>>
>>>>>>>>>>>>> Environment:
>>>>>>>>>>>>>
>>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>>
>>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - I need to cache a large number of objets and the cache
>>>>>>>>>>>>> should last for almost a week, so I have set up a 450G storage space, I
>>>>>>>>>>>>> don't know if this is a problem;
>>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>>>> anything to do with it;
>>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>>> syslog;
>>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>>> -misc
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/4584e037/attachment.html>

From guillaume at varnish-software.com  Wed Jun 28 13:58:55 2017
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Wed, 28 Jun 2017 15:58:55 +0200
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAf4HvauoLsU389FvouVDp4Vs-t80qRUXsrQCky=EKqdXg@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
 <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
 <CAJ6ZYQzd9gr6j0QQkUn_=LQcUu4B2qa15X1_Wdgg7AbnixqiBA@mail.gmail.com>
 <CA+i_oAf4HvauoLsU389FvouVDp4Vs-t80qRUXsrQCky=EKqdXg@mail.gmail.com>
Message-ID: <CAJ6ZYQztCSAABs487e2Lae9gU+iJ+WZLsxYjSa8Df=ExO8tLHQ@mail.gmail.com>

Transient is not limited I suppose, so the g_space == 0 is normal.

You can try running on bare metal, not sure there will be a difference

-- 
Guillaume Quintard

On Wed, Jun 28, 2017 at 3:54 PM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Also, we are running varnish inside a docker container.
> The storage disk is attached to the same host, and mounted to the
> container via docker volume.
>
> Do you think it's worth a try to run varnish directly on the host,
> avoiding docker? I don't see how this could be a problem but I don't know
> what to do anymore.
>
> Best,
>
>
> On Wed, Jun 28, 2017 at 10:43 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Yeah, I was wondering about Transient, but it seems under control.
>>
>> Apart from moving away from file storage, I have nothing at the moment :-/
>>
>> --
>> Guillaume Quintard
>>
>> On Wed, Jun 28, 2017 at 3:39 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hi.
>>>
>>> root at 2c6c325b279f:/# varnishstat -1 | grep g_bytes
>>> SMA.Transient.g_bytes                    519022          .   Bytes
>>> outstanding
>>> SMF.s0.g_bytes                      23662845952          .   Bytes
>>> outstanding
>>>
>>> You mean g_bytes from SMA.Transient? I have set no malloc storage.
>>>
>>>
>>> On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>>>> the memory you are seeing?
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> I increased the cli_timeout yesterday to 900sec (15min) and it
>>>>> restarted anyway, which seems to indicate that the thread is really stalled.
>>>>>
>>>>> This was 1 minute after the last restart:
>>>>>
>>>>> MAIN.n_object               3908216          .   object structs made
>>>>> SMF.s0.g_alloc              7794510          .   Allocations
>>>>> outstanding
>>>>>
>>>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>>>
>>>>> One interest thing I've found is about the memory usage.
>>>>>
>>>>> In the 1st minute of use:
>>>>> MemTotal:        3865572 kB
>>>>> MemFree:          120768 kB
>>>>> MemAvailable:    2300268 kB
>>>>>
>>>>> 1 minute before a restart:
>>>>> MemTotal:        3865572 kB
>>>>> MemFree:           82480 kB
>>>>> MemAvailable:      68316 kB
>>>>>
>>>>> It seems like the system is possibly running out of memory.
>>>>>
>>>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>>>> the memory usage by varnishd?
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Sadly, nothing suspicious here, you can still try:
>>>>>> - bumping the cli_timeout
>>>>>> - changing your disk scheduler
>>>>>> - changing the advice option of the file storage
>>>>>>
>>>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>>>> the disk because of the file storage fragmentation.
>>>>>>
>>>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>>>> allocations. If that number drops prior to a restart, that would be a good
>>>>>> clue.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <
>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Guillaume.
>>>>>>>
>>>>>>> It keeps restarting.
>>>>>>> Would you mind taking a quick look in the following VCL file to
>>>>>>> check if you find anything suspicious?
>>>>>>>
>>>>>>> Thank you very much.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano
>>>>>>>
>>>>>>> vcl 4.0;
>>>>>>>
>>>>>>> import std;
>>>>>>>
>>>>>>> backend default {
>>>>>>>   .host = "sites-web-server-lb";
>>>>>>>   .port = "80";
>>>>>>> }
>>>>>>>
>>>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>>>
>>>>>>> sub vcl_recv {
>>>>>>>   call bad_bot_detection;
>>>>>>>
>>>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>>>     return(pass);
>>>>>>>   }
>>>>>>>
>>>>>>>   unset req.http.Cookie;
>>>>>>>   if (req.method == "PURGE") {
>>>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>     return(synth(750));
>>>>>>>   }
>>>>>>>
>>>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_synth {
>>>>>>>   if (resp.status == 750) {
>>>>>>>     set resp.status = 200;
>>>>>>>     synthetic("PURGED => " + req.url);
>>>>>>>     return(deliver);
>>>>>>>   } elsif (resp.status == 501) {
>>>>>>>     set resp.status = 200;
>>>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.ht
>>>>>>> ml"));
>>>>>>>     return(deliver);
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_backend_response {
>>>>>>>   unset beresp.http.Set-Cookie;
>>>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>
>>>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>>>   } else {
>>>>>>>     unset beresp.http.Cache-Control;
>>>>>>>   }
>>>>>>>
>>>>>>>   if (beresp.status == 200 ||
>>>>>>>     beresp.status == 301 ||
>>>>>>>     beresp.status == 302 ||
>>>>>>>     beresp.status == 404) {
>>>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>>>         set beresp.http.X-TTL = "1d";
>>>>>>>         set beresp.ttl = 1d;
>>>>>>>       } else {
>>>>>>>         set beresp.http.X-TTL = "1w";
>>>>>>>         set beresp.ttl = 1w;
>>>>>>>       }
>>>>>>>   }
>>>>>>>
>>>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>>>> {
>>>>>>>     set beresp.do_gzip = true;
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_pipe {
>>>>>>>   set bereq.http.connection = "close";
>>>>>>>   return (pipe);
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_deliver {
>>>>>>>   unset resp.http.x-host;
>>>>>>>   unset resp.http.x-user-agent;
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_backend_error {
>>>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status
>>>>>>> == 504) {
>>>>>>>     set beresp.status = 200;
>>>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>>>     return (deliver);
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_hash {
>>>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>>>     hash_data("Google Page Speed");
>>>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>>>     hash_data("Googlebot");
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_deliver {
>>>>>>>   if (resp.status == 501) {
>>>>>>>     return (synth(resp.status));
>>>>>>>   }
>>>>>>>   if (obj.hits > 0) {
>>>>>>>     set resp.http.X-Cache = "hit";
>>>>>>>   } else {
>>>>>>>     set resp.http.X-Cache = "miss";
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>
>>>>>>>> Nice! It may have been the cause, time will tell.can you report
>>>>>>>> back in a few days to let us know?
>>>>>>>> --
>>>>>>>> Guillaume Quintard
>>>>>>>>
>>>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Guillaume.
>>>>>>>>>
>>>>>>>>> I think things will start to going better now after changing the
>>>>>>>>> bans.
>>>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>>>> regarding the bans:
>>>>>>>>>
>>>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>>>> marked 'completed'
>>>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>>>> using obj.*
>>>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>>>> using req.*
>>>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>>>
>>>>>>>>> And this is how it looks like now:
>>>>>>>>>
>>>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>>>> marked 'completed'
>>>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>>>> using obj.*
>>>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>>>> using req.*
>>>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>>>
>>>>>>>>> Before the changes, bans were never deleted!
>>>>>>>>> Now the bans are added and quickly deleted after a minute or even
>>>>>>>>> a couple of seconds.
>>>>>>>>>
>>>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>>>> having a large number of bans to manage and test against.
>>>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>>>> gone! :-)
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>
>>>>>>>>>> Looking good!
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Guillaume,
>>>>>>>>>>>
>>>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>>>
>>>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> sub vcl_recv {
>>>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>>>     return(synth(750));
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> sub vcl_deliver {
>>>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stefano
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for answering.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to
>>>>>>>>>>>> increase performance but it stills restarting.
>>>>>>>>>>>> Also, I checked the I/O performance for the disk and there is
>>>>>>>>>>>> no signal of overhead.
>>>>>>>>>>>>
>>>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount.
>>>>>>>>>>>> There was a problem here. After a couple of hours varnish died and I
>>>>>>>>>>>> received a "no space left on device" message - deleting the
>>>>>>>>>>>> /var/lib/varnish solved the problem and varnish was up again, but it's
>>>>>>>>>>>> weird because there was free memory on the host to be used with the tmpfs
>>>>>>>>>>>> directory, so I don't know what could have happened. I will try to stop
>>>>>>>>>>>> increasing the /var/lib/varnish size.
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans
>>>>>>>>>>>> are lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>>>
>>>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>>>
>>>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stefano
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish
>>>>>>>>>>>>> gets stuck trying to push/pull data and can't make time to reply to the
>>>>>>>>>>>>> CLI. I'd recommend monitoring the disk activity (bandwidth and iops) to
>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am having a critical problem with Varnish Cache in
>>>>>>>>>>>>>> production for over a month and any help will be appreciated.
>>>>>>>>>>>>>> The problem is that Varnish child process is recurrently
>>>>>>>>>>>>>> being restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>>> not responding to CLI, killed it.
>>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected
>>>>>>>>>>>>>> reply from ping: 400 CLI communication error
>>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>>> died signal=9
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>>>> complete
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>>> Started
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>>> said Child starts
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Environment:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - I need to cache a large number of objets and the cache
>>>>>>>>>>>>>> should last for almost a week, so I have set up a 450G storage space, I
>>>>>>>>>>>>>> don't know if this is a problem;
>>>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system
>>>>>>>>>>>>>> just before the last crash. I really don't know if this is too much or may
>>>>>>>>>>>>>> have anything to do with it;
>>>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>>>> syslog;
>>>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>>>> -misc
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/05eb2c56/attachment.html>

From reza at varnish-software.com  Wed Jun 28 14:33:49 2017
From: reza at varnish-software.com (Reza Naghibi)
Date: Wed, 28 Jun 2017 10:33:49 -0400
Subject: Child process recurrently being restarted
In-Reply-To: <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
Message-ID: <CAAARNrH6V-3uizf0UuQvbzfPPv_wGu58fs1pYeP4hsMGJ0YOJA@mail.gmail.com>

Assuming the problem is running out of memory, you will need to do some
memory tuning, especially given the number of threads you are using and
your access patterns. Your options:

   - Add more memory to the system
   - Reduce thread_pool_max
   - Reduce jemalloc's thread cache (MALLOC_CONF="lg_tcache_max:10")
   - Use some of the tuning params in here:
   https://info.varnish-software.com/blog/understanding-varnish-cache-memory-usage


--
Reza Naghibi
Varnish Software

On Wed, Jun 28, 2017 at 9:26 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Hi,
>
> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
> the memory you are seeing?
>
> --
> Guillaume Quintard
>
> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hi Guillaume.
>>
>> I increased the cli_timeout yesterday to 900sec (15min) and it restarted
>> anyway, which seems to indicate that the thread is really stalled.
>>
>> This was 1 minute after the last restart:
>>
>> MAIN.n_object               3908216          .   object structs made
>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>
>> I've just changed the I/O Scheduler to noop to see what happens.
>>
>> One interest thing I've found is about the memory usage.
>>
>> In the 1st minute of use:
>> MemTotal:        3865572 kB
>> MemFree:          120768 kB
>> MemAvailable:    2300268 kB
>>
>> 1 minute before a restart:
>> MemTotal:        3865572 kB
>> MemFree:           82480 kB
>> MemAvailable:      68316 kB
>>
>> It seems like the system is possibly running out of memory.
>>
>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>> see in some examples that is common to use "-s file" AND "-s malloc"
>> together. Should I be passing "-s malloc" as well to somehow try to limit
>> the memory usage by varnishd?
>>
>> Best,
>> Stefano
>>
>>
>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Sadly, nothing suspicious here, you can still try:
>>> - bumping the cli_timeout
>>> - changing your disk scheduler
>>> - changing the advice option of the file storage
>>>
>>> I'm still convinced this is due to Varnish getting stuck waiting for the
>>> disk because of the file storage fragmentation.
>>>
>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>> objects. Ideally, we would have a 1:1 relation between objects and
>>> allocations. If that number drops prior to a restart, that would be a good
>>> clue.
>>>
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume.
>>>>
>>>> It keeps restarting.
>>>> Would you mind taking a quick look in the following VCL file to check
>>>> if you find anything suspicious?
>>>>
>>>> Thank you very much.
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>> vcl 4.0;
>>>>
>>>> import std;
>>>>
>>>> backend default {
>>>>   .host = "sites-web-server-lb";
>>>>   .port = "80";
>>>> }
>>>>
>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>
>>>> sub vcl_recv {
>>>>   call bad_bot_detection;
>>>>
>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>     return(pass);
>>>>   }
>>>>
>>>>   unset req.http.Cookie;
>>>>   if (req.method == "PURGE") {
>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>> obj.http.x-user-agent !~ Googlebot");
>>>>     return(synth(750));
>>>>   }
>>>>
>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>> }
>>>>
>>>> sub vcl_synth {
>>>>   if (resp.status == 750) {
>>>>     set resp.status = 200;
>>>>     synthetic("PURGED => " + req.url);
>>>>     return(deliver);
>>>>   } elsif (resp.status == 501) {
>>>>     set resp.status = 200;
>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>>>>     return(deliver);
>>>>   }
>>>> }
>>>>
>>>> sub vcl_backend_response {
>>>>   unset beresp.http.Set-Cookie;
>>>>   set beresp.http.x-host = bereq.http.host;
>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>
>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>   } else {
>>>>     unset beresp.http.Cache-Control;
>>>>   }
>>>>
>>>>   if (beresp.status == 200 ||
>>>>     beresp.status == 301 ||
>>>>     beresp.status == 302 ||
>>>>     beresp.status == 404) {
>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>         set beresp.http.X-TTL = "1d";
>>>>         set beresp.ttl = 1d;
>>>>       } else {
>>>>         set beresp.http.X-TTL = "1w";
>>>>         set beresp.ttl = 1w;
>>>>       }
>>>>   }
>>>>
>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>> {
>>>>     set beresp.do_gzip = true;
>>>>   }
>>>> }
>>>>
>>>> sub vcl_pipe {
>>>>   set bereq.http.connection = "close";
>>>>   return (pipe);
>>>> }
>>>>
>>>> sub vcl_deliver {
>>>>   unset resp.http.x-host;
>>>>   unset resp.http.x-user-agent;
>>>> }
>>>>
>>>> sub vcl_backend_error {
>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
>>>> 504) {
>>>>     set beresp.status = 200;
>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>     return (deliver);
>>>>   }
>>>> }
>>>>
>>>> sub vcl_hash {
>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>     hash_data("Google Page Speed");
>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>     hash_data("Googlebot");
>>>>   }
>>>> }
>>>>
>>>> sub vcl_deliver {
>>>>   if (resp.status == 501) {
>>>>     return (synth(resp.status));
>>>>   }
>>>>   if (obj.hits > 0) {
>>>>     set resp.http.X-Cache = "hit";
>>>>   } else {
>>>>     set resp.http.X-Cache = "miss";
>>>>   }
>>>> }
>>>>
>>>>
>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>> guillaume at varnish-software.com> wrote:
>>>>
>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>> in a few days to let us know?
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>>>>
>>>>>> Hi Guillaume.
>>>>>>
>>>>>> I think things will start to going better now after changing the bans.
>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>> regarding the bans:
>>>>>>
>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>> marked 'completed'
>>>>>> MAIN.bans_obj                     0          .   Number of bans using
>>>>>> obj.*
>>>>>> MAIN.bans_req                 41335          .   Number of bans using
>>>>>> req.*
>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>
>>>>>> And this is how it looks like now:
>>>>>>
>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>> marked 'completed'
>>>>>> MAIN.bans_obj                     2          .   Number of bans using
>>>>>> obj.*
>>>>>> MAIN.bans_req                     0          .   Number of bans using
>>>>>> req.*
>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>
>>>>>> Before the changes, bans were never deleted!
>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>> couple of seconds.
>>>>>>
>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>> having a large number of bans to manage and test against.
>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>> gone! :-)
>>>>>>
>>>>>> Best,
>>>>>> Stefano
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>
>>>>>>> Looking good!
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Quintard
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Guillaume,
>>>>>>>>
>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>
>>>>>>>> sub vcl_backend_response {
>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>> }
>>>>>>>>
>>>>>>>> sub vcl_recv {
>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>     return(synth(750));
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>> sub vcl_deliver {
>>>>>>>>   unset resp.http.x-url;
>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stefano
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>
>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>
>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Quintard
>>>>>>>>>
>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Guillaume.
>>>>>>>>>
>>>>>>>>> Thanks for answering.
>>>>>>>>>
>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>> performance but it stills restarting.
>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>> signal of overhead.
>>>>>>>>>
>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>>>>>>>>> default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>
>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>
>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>
>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Stefano,
>>>>>>>>>>
>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>
>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>
>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>
>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello.
>>>>>>>>>>>
>>>>>>>>>>> I am having a critical problem with Varnish Cache in production
>>>>>>>>>>> for over a month and any help will be appreciated.
>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>
>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>>>>> responding to CLI, killed it.
>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>>>>>>>>> signal=9
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>> complete
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>> Started
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>>>> Child starts
>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said
>>>>>>>>>>> SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>
>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>> before a restart:
>>>>>>>>>>>
>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>
>>>>>>>>>>> Environment:
>>>>>>>>>>>
>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>> packagecloud.io
>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>> NFILES=131072
>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>
>>>>>>>>>>> Additional info:
>>>>>>>>>>>
>>>>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>>>>> if this is a problem;
>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>> anything to do with it;
>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>> syslog;
>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>> -misc
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/d39cc300/attachment.html>

From reza at varnish-software.com  Wed Jun 28 14:48:36 2017
From: reza at varnish-software.com (Reza Naghibi)
Date: Wed, 28 Jun 2017 10:48:36 -0400
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
 <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
Message-ID: <CAAARNrEg2jkPxE3YnntZrkqfzq_g_ZU6f04yo+u_5rU-dfNeWw@mail.gmail.com>

Transient is using memory. One other option I forgot to mention:

   - Move transient allocations to file: -s Transient=file,/tmp,1G


--
Reza Naghibi
Varnish Software

On Wed, Jun 28, 2017 at 9:39 AM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Hi.
>
> root at 2c6c325b279f:/# varnishstat -1 | grep g_bytes
> SMA.Transient.g_bytes                    519022          .   Bytes
> outstanding
> SMF.s0.g_bytes                      23662845952          .   Bytes
> outstanding
>
> You mean g_bytes from SMA.Transient? I have set no malloc storage.
>
>
> On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Hi,
>>
>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>> the memory you are seeing?
>>
>> --
>> Guillaume Quintard
>>
>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hi Guillaume.
>>>
>>> I increased the cli_timeout yesterday to 900sec (15min) and it restarted
>>> anyway, which seems to indicate that the thread is really stalled.
>>>
>>> This was 1 minute after the last restart:
>>>
>>> MAIN.n_object               3908216          .   object structs made
>>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>>
>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>
>>> One interest thing I've found is about the memory usage.
>>>
>>> In the 1st minute of use:
>>> MemTotal:        3865572 kB
>>> MemFree:          120768 kB
>>> MemAvailable:    2300268 kB
>>>
>>> 1 minute before a restart:
>>> MemTotal:        3865572 kB
>>> MemFree:           82480 kB
>>> MemAvailable:      68316 kB
>>>
>>> It seems like the system is possibly running out of memory.
>>>
>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>> the memory usage by varnishd?
>>>
>>> Best,
>>> Stefano
>>>
>>>
>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Sadly, nothing suspicious here, you can still try:
>>>> - bumping the cli_timeout
>>>> - changing your disk scheduler
>>>> - changing the advice option of the file storage
>>>>
>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>> the disk because of the file storage fragmentation.
>>>>
>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>> allocations. If that number drops prior to a restart, that would be a good
>>>> clue.
>>>>
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> It keeps restarting.
>>>>> Would you mind taking a quick look in the following VCL file to check
>>>>> if you find anything suspicious?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>> vcl 4.0;
>>>>>
>>>>> import std;
>>>>>
>>>>> backend default {
>>>>>   .host = "sites-web-server-lb";
>>>>>   .port = "80";
>>>>> }
>>>>>
>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>
>>>>> sub vcl_recv {
>>>>>   call bad_bot_detection;
>>>>>
>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>     return(pass);
>>>>>   }
>>>>>
>>>>>   unset req.http.Cookie;
>>>>>   if (req.method == "PURGE") {
>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>     return(synth(750));
>>>>>   }
>>>>>
>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>> }
>>>>>
>>>>> sub vcl_synth {
>>>>>   if (resp.status == 750) {
>>>>>     set resp.status = 200;
>>>>>     synthetic("PURGED => " + req.url);
>>>>>     return(deliver);
>>>>>   } elsif (resp.status == 501) {
>>>>>     set resp.status = 200;
>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>>>>>     return(deliver);
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_backend_response {
>>>>>   unset beresp.http.Set-Cookie;
>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>
>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>   } else {
>>>>>     unset beresp.http.Cache-Control;
>>>>>   }
>>>>>
>>>>>   if (beresp.status == 200 ||
>>>>>     beresp.status == 301 ||
>>>>>     beresp.status == 302 ||
>>>>>     beresp.status == 404) {
>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>         set beresp.http.X-TTL = "1d";
>>>>>         set beresp.ttl = 1d;
>>>>>       } else {
>>>>>         set beresp.http.X-TTL = "1w";
>>>>>         set beresp.ttl = 1w;
>>>>>       }
>>>>>   }
>>>>>
>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>> {
>>>>>     set beresp.do_gzip = true;
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_pipe {
>>>>>   set bereq.http.connection = "close";
>>>>>   return (pipe);
>>>>> }
>>>>>
>>>>> sub vcl_deliver {
>>>>>   unset resp.http.x-host;
>>>>>   unset resp.http.x-user-agent;
>>>>> }
>>>>>
>>>>> sub vcl_backend_error {
>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
>>>>> 504) {
>>>>>     set beresp.status = 200;
>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>     return (deliver);
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_hash {
>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>     hash_data("Google Page Speed");
>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>     hash_data("Googlebot");
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_deliver {
>>>>>   if (resp.status == 501) {
>>>>>     return (synth(resp.status));
>>>>>   }
>>>>>   if (obj.hits > 0) {
>>>>>     set resp.http.X-Cache = "hit";
>>>>>   } else {
>>>>>     set resp.http.X-Cache = "miss";
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>>> in a few days to let us know?
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Guillaume.
>>>>>>>
>>>>>>> I think things will start to going better now after changing the
>>>>>>> bans.
>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>> regarding the bans:
>>>>>>>
>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>> marked 'completed'
>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>> using obj.*
>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>> using req.*
>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>
>>>>>>> And this is how it looks like now:
>>>>>>>
>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>> marked 'completed'
>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>> using obj.*
>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>> using req.*
>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>
>>>>>>> Before the changes, bans were never deleted!
>>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>>> couple of seconds.
>>>>>>>
>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>> having a large number of bans to manage and test against.
>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>> gone! :-)
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>
>>>>>>>> Looking good!
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Quintard
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Guillaume,
>>>>>>>>>
>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>
>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> sub vcl_recv {
>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>     return(synth(750));
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> sub vcl_deliver {
>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>
>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>
>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>
>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>
>>>>>>>>>> Thanks for answering.
>>>>>>>>>>
>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>>> performance but it stills restarting.
>>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>>> signal of overhead.
>>>>>>>>>>
>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>>
>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>
>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>
>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stefano
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>
>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>>
>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>
>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>
>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello.
>>>>>>>>>>>>
>>>>>>>>>>>> I am having a critical problem with Varnish Cache in production
>>>>>>>>>>>> for over a month and any help will be appreciated.
>>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>>>>>> responding to CLI, killed it.
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>> died signal=9
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>> complete
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> Started
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> said Child starts
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>
>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>
>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>
>>>>>>>>>>>> Environment:
>>>>>>>>>>>>
>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>
>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>
>>>>>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>>>>>> if this is a problem;
>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>>> anything to do with it;
>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>> syslog;
>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>> -misc
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/16abfa08/attachment.html>

From reza at varnish-software.com  Wed Jun 28 16:20:33 2017
From: reza at varnish-software.com (Reza Naghibi)
Date: Wed, 28 Jun 2017 12:20:33 -0400
Subject: Child process recurrently being restarted
In-Reply-To: <CA+i_oAfEabCVGB62v-K+0h8oe1dgA3=qGB-fddNxLTj7YcfOeA@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
 <CA+i_oAdSMWGA91kHqgc78LF8MvmH7d2cEJWgXjkY2ZHV+TzP2g@mail.gmail.com>
 <CAJ6ZYQzd9gr6j0QQkUn_=LQcUu4B2qa15X1_Wdgg7AbnixqiBA@mail.gmail.com>
 <CA+i_oAfEabCVGB62v-K+0h8oe1dgA3=qGB-fddNxLTj7YcfOeA@mail.gmail.com>
Message-ID: <CAAARNrEMoXPC1mdQQvyHVZaKdFyaSMRgJQXyf+OQn_kDUB=WFQ@mail.gmail.com>

That means its unlimited. Those numbers are from the Varnish perspective,
so they don't account for how jemalloc manages those allocations.

--
Reza Naghibi
Varnish Software

On Wed, Jun 28, 2017 at 9:47 AM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> SMA.Transient.g_alloc                      3518          .   Allocations
> outstanding
> SMA.Transient.g_bytes                    546390          .   Bytes
> outstanding
> SMA.Transient.g_space                         0          .   Bytes
> available
>
> g_space is always 0. It could mean anything?
>
> On Wed, Jun 28, 2017 at 10:43 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Yeah, I was wondering about Transient, but it seems under control.
>>
>> Apart from moving away from file storage, I have nothing at the moment :-/
>>
>> --
>> Guillaume Quintard
>>
>> On Wed, Jun 28, 2017 at 3:39 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hi.
>>>
>>> root at 2c6c325b279f:/# varnishstat -1 | grep g_bytes
>>> SMA.Transient.g_bytes                    519022          .   Bytes
>>> outstanding
>>> SMF.s0.g_bytes                      23662845952          .   Bytes
>>> outstanding
>>>
>>> You mean g_bytes from SMA.Transient? I have set no malloc storage.
>>>
>>>
>>> On Wed, Jun 28, 2017 at 10:26 AM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>>>> the memory you are seeing?
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> I increased the cli_timeout yesterday to 900sec (15min) and it
>>>>> restarted anyway, which seems to indicate that the thread is really stalled.
>>>>>
>>>>> This was 1 minute after the last restart:
>>>>>
>>>>> MAIN.n_object               3908216          .   object structs made
>>>>> SMF.s0.g_alloc              7794510          .   Allocations
>>>>> outstanding
>>>>>
>>>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>>>
>>>>> One interest thing I've found is about the memory usage.
>>>>>
>>>>> In the 1st minute of use:
>>>>> MemTotal:        3865572 kB
>>>>> MemFree:          120768 kB
>>>>> MemAvailable:    2300268 kB
>>>>>
>>>>> 1 minute before a restart:
>>>>> MemTotal:        3865572 kB
>>>>> MemFree:           82480 kB
>>>>> MemAvailable:      68316 kB
>>>>>
>>>>> It seems like the system is possibly running out of memory.
>>>>>
>>>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>>>> the memory usage by varnishd?
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Sadly, nothing suspicious here, you can still try:
>>>>>> - bumping the cli_timeout
>>>>>> - changing your disk scheduler
>>>>>> - changing the advice option of the file storage
>>>>>>
>>>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>>>> the disk because of the file storage fragmentation.
>>>>>>
>>>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>>>> allocations. If that number drops prior to a restart, that would be a good
>>>>>> clue.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <
>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Guillaume.
>>>>>>>
>>>>>>> It keeps restarting.
>>>>>>> Would you mind taking a quick look in the following VCL file to
>>>>>>> check if you find anything suspicious?
>>>>>>>
>>>>>>> Thank you very much.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano
>>>>>>>
>>>>>>> vcl 4.0;
>>>>>>>
>>>>>>> import std;
>>>>>>>
>>>>>>> backend default {
>>>>>>>   .host = "sites-web-server-lb";
>>>>>>>   .port = "80";
>>>>>>> }
>>>>>>>
>>>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>>>
>>>>>>> sub vcl_recv {
>>>>>>>   call bad_bot_detection;
>>>>>>>
>>>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>>>     return(pass);
>>>>>>>   }
>>>>>>>
>>>>>>>   unset req.http.Cookie;
>>>>>>>   if (req.method == "PURGE") {
>>>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>     return(synth(750));
>>>>>>>   }
>>>>>>>
>>>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_synth {
>>>>>>>   if (resp.status == 750) {
>>>>>>>     set resp.status = 200;
>>>>>>>     synthetic("PURGED => " + req.url);
>>>>>>>     return(deliver);
>>>>>>>   } elsif (resp.status == 501) {
>>>>>>>     set resp.status = 200;
>>>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.ht
>>>>>>> ml"));
>>>>>>>     return(deliver);
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_backend_response {
>>>>>>>   unset beresp.http.Set-Cookie;
>>>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>
>>>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>>>   } else {
>>>>>>>     unset beresp.http.Cache-Control;
>>>>>>>   }
>>>>>>>
>>>>>>>   if (beresp.status == 200 ||
>>>>>>>     beresp.status == 301 ||
>>>>>>>     beresp.status == 302 ||
>>>>>>>     beresp.status == 404) {
>>>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>>>         set beresp.http.X-TTL = "1d";
>>>>>>>         set beresp.ttl = 1d;
>>>>>>>       } else {
>>>>>>>         set beresp.http.X-TTL = "1w";
>>>>>>>         set beresp.ttl = 1w;
>>>>>>>       }
>>>>>>>   }
>>>>>>>
>>>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>>>> {
>>>>>>>     set beresp.do_gzip = true;
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_pipe {
>>>>>>>   set bereq.http.connection = "close";
>>>>>>>   return (pipe);
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_deliver {
>>>>>>>   unset resp.http.x-host;
>>>>>>>   unset resp.http.x-user-agent;
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_backend_error {
>>>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status
>>>>>>> == 504) {
>>>>>>>     set beresp.status = 200;
>>>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>>>     return (deliver);
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_hash {
>>>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>>>     hash_data("Google Page Speed");
>>>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>>>     hash_data("Googlebot");
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> sub vcl_deliver {
>>>>>>>   if (resp.status == 501) {
>>>>>>>     return (synth(resp.status));
>>>>>>>   }
>>>>>>>   if (obj.hits > 0) {
>>>>>>>     set resp.http.X-Cache = "hit";
>>>>>>>   } else {
>>>>>>>     set resp.http.X-Cache = "miss";
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>
>>>>>>>> Nice! It may have been the cause, time will tell.can you report
>>>>>>>> back in a few days to let us know?
>>>>>>>> --
>>>>>>>> Guillaume Quintard
>>>>>>>>
>>>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Guillaume.
>>>>>>>>>
>>>>>>>>> I think things will start to going better now after changing the
>>>>>>>>> bans.
>>>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>>>> regarding the bans:
>>>>>>>>>
>>>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>>>> marked 'completed'
>>>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>>>> using obj.*
>>>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>>>> using req.*
>>>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>>>
>>>>>>>>> And this is how it looks like now:
>>>>>>>>>
>>>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>>>> marked 'completed'
>>>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>>>> using obj.*
>>>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>>>> using req.*
>>>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>>>
>>>>>>>>> Before the changes, bans were never deleted!
>>>>>>>>> Now the bans are added and quickly deleted after a minute or even
>>>>>>>>> a couple of seconds.
>>>>>>>>>
>>>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>>>> having a large number of bans to manage and test against.
>>>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>>>> gone! :-)
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>
>>>>>>>>>> Looking good!
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Guillaume,
>>>>>>>>>>>
>>>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>>>
>>>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> sub vcl_recv {
>>>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>>>     return(synth(750));
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> sub vcl_deliver {
>>>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stefano
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for answering.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to
>>>>>>>>>>>> increase performance but it stills restarting.
>>>>>>>>>>>> Also, I checked the I/O performance for the disk and there is
>>>>>>>>>>>> no signal of overhead.
>>>>>>>>>>>>
>>>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount.
>>>>>>>>>>>> There was a problem here. After a couple of hours varnish died and I
>>>>>>>>>>>> received a "no space left on device" message - deleting the
>>>>>>>>>>>> /var/lib/varnish solved the problem and varnish was up again, but it's
>>>>>>>>>>>> weird because there was free memory on the host to be used with the tmpfs
>>>>>>>>>>>> directory, so I don't know what could have happened. I will try to stop
>>>>>>>>>>>> increasing the /var/lib/varnish size.
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans
>>>>>>>>>>>> are lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>>>
>>>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>>>
>>>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stefano
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish
>>>>>>>>>>>>> gets stuck trying to push/pull data and can't make time to reply to the
>>>>>>>>>>>>> CLI. I'd recommend monitoring the disk activity (bandwidth and iops) to
>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am having a critical problem with Varnish Cache in
>>>>>>>>>>>>>> production for over a month and any help will be appreciated.
>>>>>>>>>>>>>> The problem is that Varnish child process is recurrently
>>>>>>>>>>>>>> being restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>>> not responding to CLI, killed it.
>>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected
>>>>>>>>>>>>>> reply from ping: 400 CLI communication error
>>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>>> died signal=9
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>>>> complete
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>>> Started
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>>> said Child starts
>>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Environment:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - I need to cache a large number of objets and the cache
>>>>>>>>>>>>>> should last for almost a week, so I have set up a 450G storage space, I
>>>>>>>>>>>>>> don't know if this is a problem;
>>>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system
>>>>>>>>>>>>>> just before the last crash. I really don't know if this is too much or may
>>>>>>>>>>>>>> have anything to do with it;
>>>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>>>> syslog;
>>>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>>>> -misc
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170628/a3de68b0/attachment.html>

From stefanobaldo at gmail.com  Thu Jun 29 17:09:32 2017
From: stefanobaldo at gmail.com (Stefano Baldo)
Date: Thu, 29 Jun 2017 14:09:32 -0300
Subject: Child process recurrently being restarted
In-Reply-To: <CAAARNrH6V-3uizf0UuQvbzfPPv_wGu58fs1pYeP4hsMGJ0YOJA@mail.gmail.com>
References: <CA+i_oAeQc-GvL53vkW4ezog6Q6n31JedHq40ybJn2WD_t2JfYw@mail.gmail.com>
 <CAJ6ZYQwyfddXL84d_MXptp8wCRXEy4oR9xvVRMui_BrDWs_=QA@mail.gmail.com>
 <CA+i_oAfH-bC1u_B7nkkmxYROGxu0fa7i+rtV-BpBwFQc5N9q3g@mail.gmail.com>
 <CAJ6ZYQzxnFVL=yUbd7EVTpkKTqUi6JTqRrzWViLu8fAnZbkdwg@mail.gmail.com>
 <CA+i_oAcO2j5y-JcnULmUQR98ZN1-RZzZTDkCvfJVBrqwTrSnFg@mail.gmail.com>
 <CAJ6ZYQyPCtQmuPQAx2J7ZhMRJLiiMa-iaots2QNgeQOkURuVDw@mail.gmail.com>
 <CA+i_oAfyXNykB+rJ2+JhbyfBP9efoUcUu7AOhqnNa3MRvcDwyQ@mail.gmail.com>
 <CAJ6ZYQxZ1z_6GO4VNiKcsFTE1QP5t-HdhkVoHofRYZ4XH=Wkug@mail.gmail.com>
 <CA+i_oAfNg0kysNOvM6z_esmm=6mPr64W11aBNLuoUwTRyUgNGw@mail.gmail.com>
 <CAJ6ZYQxMxxytY9H_eYRnw7oy_BtC6JxaTYigUJeZj760sjQ8CA@mail.gmail.com>
 <CA+i_oAegmZWMVHCD2gnrMMOSZMYmQg5aRsnnzxO+XeFOO292uQ@mail.gmail.com>
 <CAJ6ZYQwM4O1GCgNNcr3U8gH5jpwK9ShbF6GDQRaRSnSLEDe+ig@mail.gmail.com>
 <CAAARNrH6V-3uizf0UuQvbzfPPv_wGu58fs1pYeP4hsMGJ0YOJA@mail.gmail.com>
Message-ID: <CA+i_oAf4fgLdJGMGKyBQp_UJyqd7gR8V_eQ0HVVmiv-O7-mwHA@mail.gmail.com>

Hi Guillaume and Reza.

This time varnish restarted but it left some more info on syslog.
It seems like the system is running out of memory.

Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.297487] pool_herder invoked
oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.300992] pool_herder cpuset=/
mems_allowed=0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.303157] CPU: 1 PID: 16214
Comm: pool_herder Tainted: G         C O  3.16.0-4-amd64 #1 Debian
3.16.36-1+deb8u2
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984] Hardware name: Xen HVM
domU, BIOS 4.2.amazon 02/16/2017
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  0000000000000000
ffffffff815123b5 ffff8800eb3652f0 0000000000000000
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  ffffffff8150ff8d
0000000000000000 ffffffff810d6e3f 0000000000000000
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  ffffffff81516d2e
0000000000000200 ffffffff810689d3 ffffffff810c43e4
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984] Call Trace:
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff815123b5>]
? dump_stack+0x5d/0x78
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff8150ff8d>]
? dump_header+0x76/0x1e8
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff810d6e3f>]
? smp_call_function_single+0x5f/0xa0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81516d2e>]
? mutex_lock+0xe/0x2a
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff810689d3>]
? put_online_cpus+0x23/0x80
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff810c43e4>]
? rcu_oom_notify+0xc4/0xe0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81153d1c>]
? do_try_to_free_pages+0x4ac/0x520
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff811427dd>]
? oom_kill_process+0x21d/0x370
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff8114239d>]
? find_lock_task_mm+0x3d/0x90
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81142f43>]
? out_of_memory+0x473/0x4b0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81148e0f>]
? __alloc_pages_nodemask+0x9ef/0xb50
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81065c86>]
? copy_process.part.25+0x116/0x1c50
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81058301>]
? __do_page_fault+0x1d1/0x4f0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81067990>]
? do_fork+0xe0/0x3d0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff815188f9>]
? stub_clone+0x69/0x90
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff8151858d>]
? system_call_fast_compare_end+0x10/0x15
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.367638] Mem-Info:
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.368962] Node 0 DMA per-cpu:
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.370768] CPU    0: hi:    0,
btch:   1 usd:   0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.373249] CPU    1: hi:    0,
btch:   1 usd:   0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.375652] Node 0 DMA32 per-cpu:
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.377508] CPU    0: hi:  186,
btch:  31 usd:  29
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.379898] CPU    1: hi:  186,
btch:  31 usd:   0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318] active_anon:846474
inactive_anon:1913 isolated_anon:0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  active_file:408
inactive_file:415 isolated_file:32
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  unevictable:20736
dirty:27 writeback:0 unstable:0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  free:16797
slab_reclaimable:15276 slab_unreclaimable:10521
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  mapped:22002
shmem:22935 pagetables:30362 bounce:0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  free_cma:0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.397242] Node 0 DMA
free:15192kB min:184kB low:228kB high:276kB active_anon:416kB
inactive_anon:60kB active_file:0kB inactive_file:0kB unevictable:20kB
isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB
mlocked:20kB dirty:0kB writeback:0kB mapped:20kB shmem:80kB
slab_reclaimable:32kB slab_unreclaimable:0kB kernel_stack:112kB
pagetables:20kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.416338] lowmem_reserve[]: 0
3757 3757 3757
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.419030] Node 0 DMA32
free:50120kB min:44868kB low:56084kB high:67300kB active_anon:3386780kB
inactive_anon:7592kB active_file:1732kB inactive_file:2060kB
unevictable:82924kB isolated(anon):0kB isolated(file):128kB
present:3915776kB managed:3849676kB mlocked:82924kB dirty:108kB
writeback:0kB mapped:88432kB shmem:91660kB slab_reclaimable:61072kB
slab_unreclaimable:42184kB kernel_stack:27248kB pagetables:121428kB
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.440095] lowmem_reserve[]: 0 0
0 0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.442202] Node 0 DMA: 22*4kB
(UEM) 6*8kB (EM) 1*16kB (E) 2*32kB (UM) 2*64kB (UE) 2*128kB (EM) 3*256kB
(UEM) 1*512kB (E) 3*1024kB (UEM) 3*2048kB (EMR) 1*4096kB (M) = 15192kB
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.451936] Node 0 DMA32: 4031*4kB
(EM) 2729*8kB (EM) 324*16kB (EM) 1*32kB (R) 1*64kB (R) 0*128kB 0*256kB
1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 46820kB
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.460240] Node 0
hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.464122] 24240 total pagecache
pages
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.466048] 0 pages in swap cache
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.467672] Swap cache stats: add
0, delete 0, find 0/0
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.470159] Free swap  = 0kB
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.471513] Total swap = 0kB
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.472980] 982941 pages RAM
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.474380] 0 pages
HighMem/MovableOnly
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.476190] 16525 pages reserved
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.477772] 0 pages hwpoisoned
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.479189] [ pid ]   uid  tgid
total_vm      rss nr_ptes swapents oom_score_adj name
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.482698] [  163]     0   163
 10419     1295      21        0             0 systemd-journal
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.486646] [  165]     0   165
 10202      136      21        0         -1000 systemd-udevd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.490598] [  294]     0   294
  6351     1729      14        0             0 dhclient
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.494457] [  319]     0   319
  6869       62      18        0             0 cron
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.498260] [  321]     0   321
  4964       67      14        0             0 systemd-logind
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.502346] [  326]   105   326
 10558      101      25        0          -900 dbus-daemon
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.506315] [  342]     0   342
 65721      228      31        0             0 rsyslogd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.510222] [  343]     0   343
 88199     2108      61        0          -500 dockerd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.514022] [  350]   106   350
 18280      181      36        0             0 zabbix_agentd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.518040] [  351]   106   351
 18280      475      36        0             0 zabbix_agentd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.522041] [  352]   106   352
 18280      187      36        0             0 zabbix_agentd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.526025] [  353]   106   353
 18280      187      36        0             0 zabbix_agentd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.530067] [  354]   106   354
 18280      187      36        0             0 zabbix_agentd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.534033] [  355]   106   355
 18280      190      36        0             0 zabbix_agentd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.538001] [  358]     0   358
 66390     1826      32        0             0 fail2ban-server
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.541972] [  400]     0   400
 35984      444      24        0          -500 docker-containe
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.545879] [  568]     0   568
 13796      168      30        0         -1000 sshd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.549733] [  576]     0   576
  3604       41      12        0             0 agetty
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.553569] [  577]     0   577
  3559       38      12        0             0 agetty
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.557322] [16201]     0 16201
 29695    20707      60        0             0 varnishd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.561103] [16209]   108 16209
118909802   822425   29398        0             0 cache-main
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.565002] [27352]     0 27352
 20131      214      42        0             0 sshd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.568682] [27354]  1000 27354
 20165      211      41        0             0 sshd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.572307] [27355]  1000 27355
  5487      146      17        0             0 bash
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.575920] [27360]     0 27360
 11211      107      26        0             0 sudo
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.579593] [27361]     0 27361
 11584       97      27        0             0 su
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.583155] [27362]     0 27362
  5481      142      15        0             0 bash
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.586782] [27749]     0 27749
 20131      214      41        0             0 sshd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.590428] [27751]  1000 27751
 20164      211      39        0             0 sshd
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.593979] [27752]  1000 27752
  5487      147      15        0             0 bash
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.597488] [28762]     0 28762
 26528      132      17        0             0 varnishstat
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.601239] [28764]     0 28764
 11211      106      26        0             0 sudo
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.604737] [28765]     0 28765
 11584       97      26        0             0 su
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.608602] [28766]     0 28766
  5481      141      15        0             0 bash
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.612288] [28768]     0 28768
 26528      220      18        0             0 varnishstat
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.616189] Out of memory: Kill
process 16209 (cache-main) score 880 or sacrifice child
Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.620106] Killed process 16209
(cache-main) total-vm:475639208kB, anon-rss:3289700kB, file-rss:0kB
Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (16209) died signal=9
Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child cleanup complete
Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (30313) Started
Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (30313) said Child
starts
Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (30313) said SMF.s0
mmap'ed 483183820800 bytes of 483183820800

Best,
Stefano


On Wed, Jun 28, 2017 at 11:33 AM, Reza Naghibi <reza at varnish-software.com>
wrote:

> Assuming the problem is running out of memory, you will need to do some
> memory tuning, especially given the number of threads you are using and
> your access patterns. Your options:
>
>    - Add more memory to the system
>    - Reduce thread_pool_max
>    - Reduce jemalloc's thread cache (MALLOC_CONF="lg_tcache_max:10")
>    - Use some of the tuning params in here: https://info.varnish-
>    software.com/blog/understanding-varnish-cache-memory-usage
>    <https://info.varnish-software.com/blog/understanding-varnish-cache-memory-usage>
>
>
> --
> Reza Naghibi
> Varnish Software
>
> On Wed, Jun 28, 2017 at 9:26 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Hi,
>>
>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>> the memory you are seeing?
>>
>> --
>> Guillaume Quintard
>>
>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hi Guillaume.
>>>
>>> I increased the cli_timeout yesterday to 900sec (15min) and it restarted
>>> anyway, which seems to indicate that the thread is really stalled.
>>>
>>> This was 1 minute after the last restart:
>>>
>>> MAIN.n_object               3908216          .   object structs made
>>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>>
>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>
>>> One interest thing I've found is about the memory usage.
>>>
>>> In the 1st minute of use:
>>> MemTotal:        3865572 kB
>>> MemFree:          120768 kB
>>> MemAvailable:    2300268 kB
>>>
>>> 1 minute before a restart:
>>> MemTotal:        3865572 kB
>>> MemFree:           82480 kB
>>> MemAvailable:      68316 kB
>>>
>>> It seems like the system is possibly running out of memory.
>>>
>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>> the memory usage by varnishd?
>>>
>>> Best,
>>> Stefano
>>>
>>>
>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> Sadly, nothing suspicious here, you can still try:
>>>> - bumping the cli_timeout
>>>> - changing your disk scheduler
>>>> - changing the advice option of the file storage
>>>>
>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>> the disk because of the file storage fragmentation.
>>>>
>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>> allocations. If that number drops prior to a restart, that would be a good
>>>> clue.
>>>>
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <stefanobaldo at gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Guillaume.
>>>>>
>>>>> It keeps restarting.
>>>>> Would you mind taking a quick look in the following VCL file to check
>>>>> if you find anything suspicious?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Best,
>>>>> Stefano
>>>>>
>>>>> vcl 4.0;
>>>>>
>>>>> import std;
>>>>>
>>>>> backend default {
>>>>>   .host = "sites-web-server-lb";
>>>>>   .port = "80";
>>>>> }
>>>>>
>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>
>>>>> sub vcl_recv {
>>>>>   call bad_bot_detection;
>>>>>
>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>     return(pass);
>>>>>   }
>>>>>
>>>>>   unset req.http.Cookie;
>>>>>   if (req.method == "PURGE") {
>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>     return(synth(750));
>>>>>   }
>>>>>
>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>> }
>>>>>
>>>>> sub vcl_synth {
>>>>>   if (resp.status == 750) {
>>>>>     set resp.status = 200;
>>>>>     synthetic("PURGED => " + req.url);
>>>>>     return(deliver);
>>>>>   } elsif (resp.status == 501) {
>>>>>     set resp.status = 200;
>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.html"));
>>>>>     return(deliver);
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_backend_response {
>>>>>   unset beresp.http.Set-Cookie;
>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>
>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>   } else {
>>>>>     unset beresp.http.Cache-Control;
>>>>>   }
>>>>>
>>>>>   if (beresp.status == 200 ||
>>>>>     beresp.status == 301 ||
>>>>>     beresp.status == 302 ||
>>>>>     beresp.status == 404) {
>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>         set beresp.http.X-TTL = "1d";
>>>>>         set beresp.ttl = 1d;
>>>>>       } else {
>>>>>         set beresp.http.X-TTL = "1w";
>>>>>         set beresp.ttl = 1w;
>>>>>       }
>>>>>   }
>>>>>
>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>> {
>>>>>     set beresp.do_gzip = true;
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_pipe {
>>>>>   set bereq.http.connection = "close";
>>>>>   return (pipe);
>>>>> }
>>>>>
>>>>> sub vcl_deliver {
>>>>>   unset resp.http.x-host;
>>>>>   unset resp.http.x-user-agent;
>>>>> }
>>>>>
>>>>> sub vcl_backend_error {
>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status ==
>>>>> 504) {
>>>>>     set beresp.status = 200;
>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>     return (deliver);
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_hash {
>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>     hash_data("Google Page Speed");
>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>     hash_data("Googlebot");
>>>>>   }
>>>>> }
>>>>>
>>>>> sub vcl_deliver {
>>>>>   if (resp.status == 501) {
>>>>>     return (synth(resp.status));
>>>>>   }
>>>>>   if (obj.hits > 0) {
>>>>>     set resp.http.X-Cache = "hit";
>>>>>   } else {
>>>>>     set resp.http.X-Cache = "miss";
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>>> in a few days to let us know?
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Guillaume.
>>>>>>>
>>>>>>> I think things will start to going better now after changing the
>>>>>>> bans.
>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>> regarding the bans:
>>>>>>>
>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>> marked 'completed'
>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>> using obj.*
>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>> using req.*
>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>
>>>>>>> And this is how it looks like now:
>>>>>>>
>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>> marked 'completed'
>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>> using obj.*
>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>> using req.*
>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>
>>>>>>> Before the changes, bans were never deleted!
>>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>>> couple of seconds.
>>>>>>>
>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>> having a large number of bans to manage and test against.
>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>> gone! :-)
>>>>>>>
>>>>>>> Best,
>>>>>>> Stefano
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>
>>>>>>>> Looking good!
>>>>>>>>
>>>>>>>> --
>>>>>>>> Guillaume Quintard
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Guillaume,
>>>>>>>>>
>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>
>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> sub vcl_recv {
>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>     return(synth(750));
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> sub vcl_deliver {
>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stefano
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>
>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>
>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>
>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>
>>>>>>>>>> Thanks for answering.
>>>>>>>>>>
>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>>> performance but it stills restarting.
>>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>>> signal of overhead.
>>>>>>>>>>
>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>>
>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans are
>>>>>>>>>> lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>
>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>
>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stefano
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>
>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>>
>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>
>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>
>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>> stefanobaldo at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello.
>>>>>>>>>>>>
>>>>>>>>>>>> I am having a critical problem with Varnish Cache in production
>>>>>>>>>>>> for over a month and any help will be appreciated.
>>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>>>>>>>>>> responding to CLI, killed it.
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>> died signal=9
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>> complete
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> Started
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> said Child starts
>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>
>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>
>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>
>>>>>>>>>>>> Environment:
>>>>>>>>>>>>
>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>
>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>
>>>>>>>>>>>> - I need to cache a large number of objets and the cache should
>>>>>>>>>>>> last for almost a week, so I have set up a 450G storage space, I don't know
>>>>>>>>>>>> if this is a problem;
>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>>> anything to do with it;
>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>> syslog;
>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>> varnish-misc at varnish-cache.org
>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>> -misc
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/vinyl-misc/attachments/20170629/db6355bd/attachment.html>