From phk at phk.freebsd.dk Tue Jan 8 14:32:03 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 08 Jan 2008 14:32:03 +0000 Subject: development efforts on the Solaris side. In-Reply-To: Your message of "Mon, 31 Dec 2007 16:26:16 EST." <762EF0EC-5C51-41BA-A38F-65DDCD4F9249@omniti.com> Message-ID: <65784.1199802723@critter.freebsd.dk> In message <762EF0EC-5C51-41BA-A38F-65DDCD4F9249 at omniti.com>, Theo Schlossnagle writes: >Hi guys, > >I'd really like to be able to contribute some of the improvements >we've made to varnish back. Is there a way I can get access to >commit. I'd be happy to stay in my own branch. My current patch set >is unwieldy and I'm very tempted to just start my own repos... That, >of course, seems silly. I've fixed up (removed) some of the gccism in >favor or more portability (#include over -include). I've fixed a few >bugs, made the VCC line a but smarter and more accepting of non gcc >compiler, I've added a portfs acceptor and built a storage_umem >allocator facility that rides on Solaris' excellent libumem (highly >scalable allocator) which we also ported to run on Linux and FreeBSD >(and Mac OS X): https://labs.omniti.com/trac/portableumem > >Next steps? Can you mail me a link to the patch ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From theo at omniti.com Tue Jan 8 16:38:56 2008 From: theo at omniti.com (Theo Schlossnagle) Date: Tue, 8 Jan 2008 11:38:56 -0500 Subject: development efforts on the Solaris side. In-Reply-To: <65784.1199802723@critter.freebsd.dk> References: <65784.1199802723@critter.freebsd.dk> Message-ID: On Jan 8, 2008, at 9:32 AM, Poul-Henning Kamp wrote: > In message <762EF0EC-5C51-41BA-A38F-65DDCD4F9249 at omniti.com>, Theo > Schlossnagle > writes: >> Hi guys, >> >> I'd really like to be able to contribute some of the improvements >> we've made to varnish back. Is there a way I can get access to >> commit. I'd be happy to stay in my own branch. My current patch set >> is unwieldy and I'm very tempted to just start my own repos... That, >> of course, seems silly. I've fixed up (removed) some of the gccism >> in >> favor or more portability (#include over -include). I've fixed a few >> bugs, made the VCC line a but smarter and more accepting of non gcc >> compiler, I've added a portfs acceptor and built a storage_umem >> allocator facility that rides on Solaris' excellent libumem (highly >> scalable allocator) which we also ported to run on Linux and FreeBSD >> (and Mac OS X): https://labs.omniti.com/trac/portableumem >> >> Next steps? > > Can you mail me a link to the patch ? http://lethargy.org/~jesus/misc/varnish-solaris-trunk-2328.diff Enjoy! -- Theo Schlossnagle Principal/CEO OmniTI Computer Consulting, Inc. W: http://omniti.com P: +1.443.325.1357 x201 F: +1.410.872.4911 From des at linpro.no Tue Jan 8 19:18:01 2008 From: des at linpro.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue, 08 Jan 2008 20:18:01 +0100 Subject: Solaris support In-Reply-To: <63AB202A-A1CE-486D-8231-7F61588F84D1@omniti.com> (Theo Schlossnagle's message of "Fri, 21 Dec 2007 00:07:23 -0500") References: <63AB202A-A1CE-486D-8231-7F61588F84D1@omniti.com> Message-ID: Theo Schlossnagle writes: > We've been running this for a while on Solaris. Works really well. Only because you haven't noticed the bugs yet... for instance, session timeout is broken (commented out, actually) in your patch, so broken backends and / or clients will bog you down. > What we need is the function in cache_acceptor.c: > [...] > I see this as a cleaner mechanism regardless as there is no reason for > the generic cache_acceptor to care about int vca_pipes[2]; -- it's an > implementation detail. How's that sound? That sounds like a good idea. Even better if you can submit an isolated patch :) > sendfile and sendfilev on solaris Probably not a good idea unless sendfile() semantics are significantly better on Solaris than on FreeBSD and Linux. > using fcntl() when flock() is unavailable There are issues here as well; the semantics are subtly different from OS to OS. For instance, what happens if separate threads in the same process try to lock the same file? It's even less fun if you take into consideration systems that support both. DES -- Dag-Erling Sm?rgrav Senior Software Developer Linpro AS - www.linpro.no From jesus at omniti.com Tue Jan 8 20:06:30 2008 From: jesus at omniti.com (Theo Schlossnagle) Date: Tue, 8 Jan 2008 15:06:30 -0500 Subject: Solaris support In-Reply-To: References: <63AB202A-A1CE-486D-8231-7F61588F84D1@omniti.com> Message-ID: <6DBBB3B9-BF32-46D8-B5FA-166BD4D05E25@omniti.com> On Jan 8, 2008, at 2:18 PM, Dag-Erling Sm?rgrav wrote: > Theo Schlossnagle writes: >> We've been running this for a while on Solaris. Works really well. > > Only because you haven't noticed the bugs yet... for instance, > session timeout is broken (commented out, actually) in your patch, so > broken backends and / or clients will bog you down. Sure. When I said "works well," I meant "as well as on Linux. > >> What we need is the function in cache_acceptor.c: >> [...] >> I see this as a cleaner mechanism regardless as there is no reason >> for >> the generic cache_acceptor to care about int vca_pipes[2]; -- it's an >> implementation detail. How's that sound? > > That sounds like a good idea. Even better if you can submit an > isolated patch :) Okay, I hadn't tackled that, I'll look at submitting a patch specific to that. > >> sendfile and sendfilev on solaris > > Probably not a good idea unless sendfile() semantics are significantly > better on Solaris than on FreeBSD and Linux. It's sendfile, it has all the advantages of sendfile. To support them, you have to conform to their APIs. I just added support so it could say "oh, look, I know how to use that sendfile..." and then actually use it (just as linux and freebsd now). And I think sendfilev on Solaris is pretty slick. > >> using fcntl() when flock() is unavailable > > There are issues here as well; the semantics are subtly different from > OS to OS. For instance, what happens if separate threads in the same > process try to lock the same file? It's even less fun if you take > into consideration systems that support both. As I see it you only supported flock(). If I don't have it, I use fcntl(). It will certainly not break any system that current works. It doesn't use fcntl() when available.. it uses fcntl() only when flock() isn't available. fcntl() on Linux acts really strange -- so it would be bad -- as you intimate above. Best regards, Theo -- Theo Schlossnagle Principal/CEO OmniTI Computer Consulting, Inc. W: http://omniti.com P: +1.443.325.1357 x201 F: +1.410.872.4911 From des at linpro.no Tue Jan 8 20:45:48 2008 From: des at linpro.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue, 08 Jan 2008 21:45:48 +0100 Subject: Solaris support In-Reply-To: <6DBBB3B9-BF32-46D8-B5FA-166BD4D05E25@omniti.com> (Theo Schlossnagle's message of "Tue, 8 Jan 2008 15:06:30 -0500") References: <63AB202A-A1CE-486D-8231-7F61588F84D1@omniti.com> <6DBBB3B9-BF32-46D8-B5FA-166BD4D05E25@omniti.com> Message-ID: Theo Schlossnagle writes: > Dag-Erling Sm?rgrav writes: > > Theo Schlossnagle writes: > > > We've been running this for a while on Solaris. Works really well. > > Only because you haven't noticed the bugs yet... for instance, > > session timeout is broken (commented out, actually) in your patch, so > > broken backends and / or clients will bog you down. > Sure. When I said "works well," I meant "as well as on Linux. Uh, no, Linux actually supports SO_{RCV,SND}TIMEO, so Varnish does *not* work as well on Solaris as on Linux, with or without your patch. > > > sendfile and sendfilev on solaris > > Probably not a good idea unless sendfile() semantics are significantly > > better on Solaris than on FreeBSD and Linux. > It's sendfile, it has all the advantages of sendfile. To support > them, you have to conform to their APIs. I just added support so it > could say "oh, look, I know how to use that sendfile..." and then > actually use it (just as linux and freebsd now). And I think > sendfilev on Solaris is pretty slick. So you've missed the numerous threads on sendfile() bugs affecting Varnish, and the more recent threads on sendfile() in FreeBSD and Linux being broken by design so that Varnish cannot reliably use it, and Poul-Henning's commit disabling the sendfile() detection in configure.ac to stop the whining. > > > using fcntl() when flock() is unavailable > > There are issues here as well; the semantics are subtly different from > > OS to OS. For instance, what happens if separate threads in the same > > process try to lock the same file? It's even less fun if you take > > into consideration systems that support both. > As I see it you only supported flock(). You've got it exactly backwards - Varnish has used fcntl() locks exclusively for... what... five months now? ever since I determined that in addition to being more portable, fcntl() tends to be the least broken on platforms that support both (though not on FreeBSD, where flock() is slightly better, but I didn't consider it "better enough" to warrant an #ifdef). I even credited you in the commit log. DES -- Dag-Erling Sm?rgrav Senior Software Developer Linpro AS - www.linpro.no From jesus at omniti.com Tue Jan 8 21:27:16 2008 From: jesus at omniti.com (Theo Schlossnagle) Date: Tue, 8 Jan 2008 16:27:16 -0500 Subject: Solaris support In-Reply-To: References: <63AB202A-A1CE-486D-8231-7F61588F84D1@omniti.com> <6DBBB3B9-BF32-46D8-B5FA-166BD4D05E25@omniti.com> Message-ID: On Jan 8, 2008, at 3:45 PM, Dag-Erling Sm?rgrav wrote: > Theo Schlossnagle writes: >> Dag-Erling Sm?rgrav writes: >>> Theo Schlossnagle writes: >>>> We've been running this for a while on Solaris. Works really well. >>> Only because you haven't noticed the bugs yet... for instance, >>> session timeout is broken (commented out, actually) in your patch, >>> so >>> broken backends and / or clients will bog you down. >> Sure. When I said "works well," I meant "as well as on Linux. > > Uh, no, Linux actually supports SO_{RCV,SND}TIMEO, so Varnish does > *not* work as well on Solaris as on Linux, with or without your patch. And Solaris supports portfs which is better than epoll. It's not really a competition. I'm kinda lost as to how this turned into an argument. I have had a good experience on Solaris so far and I don't know what is gained by rebutting my comments assuming I don't realize there might be bugs. The patch is against /trunk/ and as such I would assume many bugs, half baked features, prototype code, etc. I'd assume both bugs in my path as well as to the /trunk/ to which it is applied. Varnish as a whole only seems to work because of the bugs we haven't noticed, right? I just look forward to having solid support for the OS features that are applicable. I'd note that the performance from the umem stevedore implementation is pretty nice. And that works on FreeBSD and Linux now that libumem is ported there. Obviously, every implementation has its advantages and disadvantages, but umem stuff is an excellent alternative for the malloc based stevedore under similar usage. > >>>> sendfile and sendfilev on solaris >>> Probably not a good idea unless sendfile() semantics are >>> significantly >>> better on Solaris than on FreeBSD and Linux. >> It's sendfile, it has all the advantages of sendfile. To support >> them, you have to conform to their APIs. I just added support so it >> could say "oh, look, I know how to use that sendfile..." and then >> actually use it (just as linux and freebsd now). And I think >> sendfilev on Solaris is pretty slick. > > So you've missed the numerous threads on sendfile() bugs affecting > Varnish, and the more recent threads on sendfile() in FreeBSD and > Linux being broken by design so that Varnish cannot reliably use it, > and Poul-Henning's commit disabling the sendfile() detection in > configure.ac to stop the whining. Well, so far so good. There are some bugs in Solaris' sendfile as well, of course. I haven't been able to tickle them in my testing. I didn't miss the discussion, but I did miss that commit. > >>>> using fcntl() when flock() is unavailable >>> There are issues here as well; the semantics are subtly different >>> from >>> OS to OS. For instance, what happens if separate threads in the >>> same >>> process try to lock the same file? It's even less fun if you take >>> into consideration systems that support both. >> As I see it you only supported flock(). > > You've got it exactly backwards - Varnish has used fcntl() locks > exclusively for... what... five months now? ever since I determined > that in addition to being more portable, fcntl() tends to be the least > broken on platforms that support both (though not on FreeBSD, where > flock() is slightly better, but I didn't consider it "better enough" > to warrant an #ifdef). I even credited you in the commit log. It looks like when updating to trunk that part was in conflict. I had removed my code as yours did the trick. So, that patch was reverted in my set a while ago and wasn't in the one I linked to. -- Theo Schlossnagle Esoteric Curio -- http://lethargy.org/ OmniTI Computer Consulting, Inc. -- http://omniti.com/ From des at linpro.no Tue Jan 8 23:53:19 2008 From: des at linpro.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Wed, 09 Jan 2008 00:53:19 +0100 Subject: Solaris support In-Reply-To: (Theo Schlossnagle's message of "Tue, 8 Jan 2008 16:27:16 -0500") References: <63AB202A-A1CE-486D-8231-7F61588F84D1@omniti.com> <6DBBB3B9-BF32-46D8-B5FA-166BD4D05E25@omniti.com> Message-ID: Theo Schlossnagle writes: > Dag-Erling Sm?rgrav writes: > > Theo Schlossnagle writes: > > > Dag-Erling Sm?rgrav writes: > > > > [...] session timeout is broken (commented out, actually) in > > > > your patch, so broken backends and / or clients will bog you > > > > down. > > > Sure. When I said "works well," I meant "as well as on Linux. > > Uh, no, Linux actually supports SO_{RCV,SND}TIMEO, so Varnish does > > *not* work as well on Solaris as on Linux, with or without your > > patch. > And Solaris supports portfs which is better than epoll. This is not a valid parallel: event ports and epoll are two ways of implementing the same functionality, but your patch completely disables the idle session timeout instead of replacing it with something that works in Solaris. Implementing session timeouts portably (without SO_{RCV,SND}TIMEO) would not be impossible, but it would certainly be hard. Removing sendfile support entirely might help, by simplifying the code. > I'd note that the performance from the umem stevedore implementation > is pretty nice. What umem stevedore implementation? I've never seen this mentioned anywhere on or off the lists. > And that works on FreeBSD and Linux now that libumem is ported > there. Obviously, every implementation has its advantages and > disadvantages, but umem stuff is an excellent alternative for the > malloc based stevedore under similar usage. The malloc based stevedore is not inteded to be used at all, except for debugging - and it hasn't been used for even that in ages, so it's really pretty much just a proof of concept, or a sample implementation if you will. DES -- Dag-Erling Sm?rgrav Senior Software Developer Linpro AS - www.linpro.no From phk at phk.freebsd.dk Wed Jan 9 09:40:12 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Wed, 09 Jan 2008 09:40:12 +0000 Subject: development efforts on the Solaris side. In-Reply-To: Your message of "Tue, 08 Jan 2008 11:38:56 EST." Message-ID: <82666.1199871612@critter.freebsd.dk> In message , Theo Schlossnagle writes: >> Can you mail me a link to the patch ? > >http://lethargy.org/~jesus/misc/varnish-solaris-trunk-2328.diff Ok, various comments: The autoconf stuff and the #inclusion of "varnish_config.h" all over the place you will have to negotiate with DES, I only do C code. lib/libvarnish/time.c: I don't like #ifdefs like that in the middle of functions. The appropriate way is to add a compat function in libvarnishcompat. flock/fcntl-locks you're already talking with DES about. curses: What exactly is the semantics of KEY_RESIZE ? mgt_vcc.c: The suffix shouldn't matter to dlopen(), please explain ? White-space and {} spam. Adding the c-compiler command to the error message is bound to make it overflow the linewidth. VCL_WaitForActive() Need to look at that in more detail. I deliberately tried to limit varnish to only use mutexes for locking, so introducing semaphores just for this seems somewhat silly. storage_umem.c I'm not sure I see much point in this. The main advantage of umem is SMP localized storage management, and the Varnish objects are exactly not local to any one CPU. Benchmarks could possibly convince me otherwise. #include This may be a portability issue where the easiest way to fix is to avoid it. sendfile(): Does the solaris sendfile guarantee that storage is no longer touched when it returns ? Otherwise it's as little use as the FreeBSD and Linux versions. /* pick a stevedore and bump the head along */ /* XXX: only safe as long as pointer writes are atomic */ +/* jesus: dear god, are you crazy? */ stv = stevedores = stevedores->next; God to Jesus: No I'm not. -/* Note: intentionally not IOV_MAX */ +#ifdef IOV_MAX +#define MAX_IOVS IOV_MAX +#else #define MAX_IOVS (HTTP_HDR_MAX * 2) +#endif Which bit of "intentionally" didn't you understand ? Have you examined the value of IOV_MAX, looked at where this is used and measured the impact ? cache_acceptor.c Under no circumstances should #ifdef HAVE_PORT_CREATE be necessary here. If a new method is necessary for the acceptors, so be it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From theo at omniti.com Wed Jan 9 14:27:13 2008 From: theo at omniti.com (Theo Schlossnagle) Date: Wed, 9 Jan 2008 09:27:13 -0500 Subject: development efforts on the Solaris side. In-Reply-To: <82666.1199871612@critter.freebsd.dk> References: <82666.1199871612@critter.freebsd.dk> Message-ID: <03B017F1-86E3-45AD-9387-10CD3B80AC32@omniti.com> On Jan 9, 2008, at 4:40 AM, Poul-Henning Kamp wrote: > Ok, various comments: > > The autoconf stuff and the #inclusion of "varnish_config.h" all over > the place you will have to negotiate with DES, I only do C code. First off, let me say thanks for a detailed, clear and critical response. > > lib/libvarnish/time.c: > > I don't like #ifdefs like that in the middle of functions. > The appropriate way is to add a compat function in > libvarnishcompat. okay. makes sense. > > flock/fcntl-locks you're already talking with DES about. > > curses: > What exactly is the semantics of KEY_RESIZE ? Curses sends KEY_RESIZE on a resize event, however Solaris' curses is a bit aged and doesn't support that. > > mgt_vcc.c: > The suffix shouldn't matter to dlopen(), please explain ? > White-space and {} spam. > Adding the c-compiler command to the error message is bound > to make it overflow the linewidth. It isn't dlopen, it is the compiler linker chain that fails on Solaris. While it is arguable that it is a SunStudio bug, I think the limitation is extremely unnecessary and annoying -- but alas. The work around does not break gcc. > > VCL_WaitForActive() > Need to look at that in more detail. I deliberately tried > to limit varnish to only use mutexes for locking, so introducing > semaphores just for this seems somewhat silly. I added it while troubleshooting a different timeout that was triggering. I happened to be running this on ZFS and when it asked "ho big can I make my backing file?" it answered 21TB. as the vfsstat stuff has different meaning when you don't have a fixed number of fs blocks. I was timing out trying to mmap 21TB -- so I added that semaphore. When I further understood the problem, I left the code as it seemed more correct. > > storage_umem.c > I'm not sure I see much point in this. The main advantage of > umem is SMP localized storage management, and the Varnish > objects are exactly not local to any one CPU. > Benchmarks could possibly convince me otherwise. The thing it adds is slab allocation with reduced CPU contention and that seems to work pretty well in my tests. I completely agree that benchmarks would be a needed step to legitimize the approach. I added it when I was having the 21TB mmap issue above. IT was such simple code to add and seemed useful even if it remained an experimental stevedore implementation. > > #include > This may be a portability issue where the easiest way to fix > is to avoid it. Agreed. When I started working on this, I got an error that was satisfied by adding err.h. I just checked and removing it no longer produces an error, so either I disabled some code path in the preprocessor or the code that argued was removed. It's bee a few revisions since I started this patch. > sendfile(): > Does the solaris sendfile guarantee that storage is no longer > touched when it returns ? Otherwise it's as little use as > the FreeBSD and Linux versions. > > /* pick a stevedore and bump the head along */ > /* XXX: only safe as long as pointer writes are atomic */ > +/* jesus: dear god, are you crazy? */ > stv = stevedores = stevedores->next; > > God to Jesus: No I'm not. heh... cute. Just my annotations about the assumption. Was worried that the assumption that both assignments were atomic. One is, then the next is. Perhaps that doesn't matter. It does seem like a hard to trigger race condition to me because while the assignment is atomic, the access to stevedores->next is not guaranteed to be view consistent in that assignment: T1: get stevedores->next in R(T1,a) T2: get stevedores->next in R(T2,a) T1: set stevedores to R(T1,a) T2: set stevedores to T(T2,a) Now T1 and T2 both "advanced the pointer" but they did the same work and the stv they have is the same. Is that not a problem? Perhaps I don't understand the impact of that scenario correctly. > > -/* Note: intentionally not IOV_MAX */ > +#ifdef IOV_MAX > +#define MAX_IOVS IOV_MAX > +#else > #define MAX_IOVS (HTTP_HDR_MAX * 2) > +#endif > > Which bit of "intentionally" didn't you understand ? > Have you examined the value of IOV_MAX, looked at where > this is used and measured the impact ? I understood intentionally. I also understood that it doesn't work on Solaris. On Solaris, IOV_MAX is the maximum number of elements that can be passed to writev(2), using a larger number will fail. Perhaps this is the wrong solution to the problem. You cannot use HTTP_HDR_MAX*2 as the nvec to writev(2) on Solaris. It is strictly limited to IOV_MAX. So, the app breaks. After reading the code, it looked like that was the right place to fix it. Perhaps there should be an autoconf fragment that detects the "real" OS IOV_MAX and then uses that in the event that it is lower HTTP_HDR_MAX*2. Or am I thinking about this all wrong? > > cache_acceptor.c > Under no circumstances should #ifdef HAVE_PORT_CREATE be > necessary here. If a new method is necessary for the > acceptors, so be it. Completely agreed. I noted that it was a hack in a previous email. I'd like that add the ping stuff as a function into each acceptor so they can have their own approach to waking the acceptor thread up. Solaris' portfs is much more like kqueue than epoll and support more than just filedescriptors. It allows user-space eventing, so it is really easy to have one thread just say "dude, wake up" to another waiting in port_get(n). People tend to have strong preferences to adding functions to structures in C, so I was pretty confident that I should propose the design before implementing it, as it would likely be redone. Best regards, Theo -- Theo Schlossnagle Esoteric Curio -- http://lethargy.org/ OmniTI Computer Consulting, Inc. -- http://omniti.com/ From theo at omniti.com Wed Jan 9 16:42:36 2008 From: theo at omniti.com (Theo Schlossnagle) Date: Wed, 9 Jan 2008 11:42:36 -0500 Subject: development efforts on the Solaris side. In-Reply-To: <82666.1199871612@critter.freebsd.dk> References: <82666.1199871612@critter.freebsd.dk> Message-ID: <79838813-4375-4B63-ACB7-3538293BE8E6@omniti.com> On Jan 9, 2008, at 4:40 AM, Poul-Henning Kamp wrote: > > sendfile(): > Does the solaris sendfile guarantee that storage is no longer > touched when it returns ? Otherwise it's as little use as > the FreeBSD and Linux versions. I just conferred with a Solaris kernel engineer and... Solaris guarantees that the address ranges (and file data) references as the source for data in calls to both sendfile() and sendfilev() will not be touched after control returns to the caller. Destination guarantees are more complicated. If the destination is a file, it may not be on disk after the call returns. opening the file O_DSYNC or calling fsync() would be required. However, I don't see how this aspect will apply to varnish at all as the destination is a network socket -- and for the sake of this discussion we really only care about the semantics of the source data and how that is accessed relative to the return of control to the caller. -- Theo Schlossnagle Principal/CEO OmniTI Computer Consulting, Inc. W: http://omniti.com P: +1.443.325.1357 x201 F: +1.410.872.4911 From daniel at papasian.org Thu Jan 10 02:31:58 2008 From: daniel at papasian.org (Daniel Papasian) Date: Wed, 09 Jan 2008 21:31:58 -0500 Subject: race condition in ticket 144 - 1.1 backport fix? [patch] Message-ID: <4785839E.1020503@papasian.org> Hello everyone, First off, thank you all for the work you've put into varnish - the VCL configuration is clean and exciting, the code is elegantly written, and the performance I've seen in tests is astounding. I'm looking forward to putting it into production use in the coming months. I believe the crash reported in this ticket: http://varnish.projects.linpro.no/ticket/144 is the race condition mentioned in cache_backend.c with the backend address list getting deleted while it's being used. I see that in 1.2 the backend code is significantly different and more flexible, and I suspect the bug does not exist there. Are there plans to backport this backend code to 1.1? If not, I have a patch that I believe fixes the issue on the 1.1 branch by acquiring a lock before the addr structure is used. I wrote it before seeing that it was fixed in 1.2 and trunk, so the variable names are perhaps a bit off, but I've made sure the important one (the mutex in the backend struct) is the same. I've been unable to repeat the crash from the ticket itself so for all I know I'm not helping at all, but I don't see it start to crash when I apply the patch and it certainly looks more correct to me. The patch is here: http://papasian.org/~dannyp/dpapasian-144.patch -- Daniel Papasian daniel at papasian.org From phk at phk.freebsd.dk Thu Jan 10 22:07:49 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Thu, 10 Jan 2008 22:07:49 +0000 Subject: development efforts on the Solaris side. In-Reply-To: Your message of "Wed, 09 Jan 2008 09:27:13 EST." <03B017F1-86E3-45AD-9387-10CD3B80AC32@omniti.com> Message-ID: <5069.1200002869@critter.freebsd.dk> In message <03B017F1-86E3-45AD-9387-10CD3B80AC32 at omniti.com>, Theo Schlossnagle writes: >> mgt_vcc.c: >> The suffix shouldn't matter to dlopen(), please explain ? >> White-space and {} spam. >> Adding the c-compiler command to the error message is bound >> to make it overflow the linewidth. > >It isn't dlopen, it is the compiler linker chain that fails on >Solaris. While it is arguable that it is a SunStudio bug, I think the >limitation is extremely unnecessary and annoying -- but alas. The >work around does not break gcc. Then just specify a compiler parameter as: "cc -bla -bla -o %o.so %s && mv %o.so %o" >> -/* Note: intentionally not IOV_MAX */ >> +#ifdef IOV_MAX >> +#define MAX_IOVS IOV_MAX >> +#else >> #define MAX_IOVS (HTTP_HDR_MAX * 2) >> +#endif >> >> Which bit of "intentionally" didn't you understand ? >> Have you examined the value of IOV_MAX, looked at where >> this is used and measured the impact ? > >I understood intentionally. I also understood that it doesn't work on >Solaris. On Solaris, IOV_MAX is the maximum number of elements that >can be passed to writev(2), using a larger number will fail. I will fix this to use IOV_MAX if it is less than the desired size. >> cache_acceptor.c >> Under no circumstances should #ifdef HAVE_PORT_CREATE be >> necessary here. If a new method is necessary for the >> acceptors, so be it. > >Completely agreed. I noted that it was a hack in a previous email. >I'd like that add the ping stuff as a function into each acceptor so >they can have their own approach to waking the acceptor thread up. >Solaris' portfs is much more like kqueue than epoll and support more >than just filedescriptors. It allows user-space eventing, so it is >really easy to have one thread just say "dude, wake up" to another >waiting in port_get(n). People tend to have strong preferences to >adding functions to structures in C, so I was pretty confident that I >should propose the design before implementing it, as it would likely >be redone. I'm not sure I see what the point is here ? I'll pull in the KEY_RESIZE #ifdef also. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From surendarmts at gmail.com Thu Jan 17 15:08:44 2008 From: surendarmts at gmail.com (surendar) Date: Thu, 17 Jan 2008 09:08:44 -0600 Subject: surendar sent you a friend request on Yaari... Message-ID: <06e57125a11b8b483d88f70794dfa12b@localhost.localdomain> An HTML attachment was scrubbed... URL: From aotto at mosso.com Sat Jan 26 09:11:34 2008 From: aotto at mosso.com (Adrian Otto) Date: Sat, 26 Jan 2008 01:11:34 -0800 Subject: Apparent bug in exp_prefetch() Message-ID: <21A902F1-DF66-4A9C-9481-2B5D8BBD5AC1@mosso.com> Hello, I'm new to varnish but I think I've found a bug and I'm seeking help to confirm it. This is varnish 1.1.2 on RHEL4. I'm trying to use this in my VCL: sub vcl_timeout { fetch; } This is in contrast to the default setting: sub vcl_timeout { discard; } According to the man page, this is supposed to fetch a fresh copy of the document as it's about to expire: vcl_timeout Called by the reaper thread shortly before a cached document reaches its expiry time. The vcl_timeout subroutine may terminate with one of the following keywords: fetch Request a fresh copy of the object from the backend. discard Discard the object. What I want is a way to always have a fresh copy of a given document cached, so that no client connections actually have to fall through to "pass" to fetch the document from the backend. Now, I looked into the source, and tracked that capability back to the file cache_expire.c and function exp_prefetch. This is where I found it using GDB: (gdb) info threads 6 Thread -1217016912 (LWP 17397) 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 5 Thread -1227506768 (LWP 17398) 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 4 Thread -1237996624 (LWP 17399) 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 3 Thread 1972149168 (LWP 17400) 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 2 Thread 1961659312 (LWP 17401) 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 1 Thread -1208621376 (LWP 17396) 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) thread 5 [Switching to thread 5 (Thread -1227506768 (LWP 17398))]#0 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) bt #0 0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x0031bce6 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #2 0x0031baac in sleep () from /lib/tls/libc.so.6 #3 0x080516d2 in exp_prefetch (arg=0x0) at cache_expire.c:178 #4 0x004f33b8 in start_thread () from /lib/tls/libpthread.so.0 #5 0x0035a2fe in clone () from /lib/tls/libc.so.6 It's always running in thread 5. Looking carefully at the source, I only see how an object could be removed from the heap. I don't see how it could actually get replaced. It only seems to know how to expire something. It seems that line 183 runs, and takes it out of the binheap, and then the lock is released. After that happens, it follows what seemed to be a rather confusing sequence of code under VCL_timeout_method(sp). I have not yet learned how the VCL actually controls the internals of varnish, so I'm not pretending to fathom it. I think what might work nicely is a code sequence like this: A) Find an object that's expired. B) Fetch a new copy if it was modified since the last fetch, and verify it. C) Lock the binheap. D) Drop the old object, and swap in the new one. E) Unlock the binheap. It might also be nice if the 'varnishlog' process could have visibility into the progress of the above code sequence as well. Am I missing something, or is there a bug here that makes the 'fetch' method in 'vcl_timeout' impossible to use? Here are steps to reproduce the problem: 1) Redefine your vcl_timeout like this: sub vcl_timeout { fetch; } 2) Start varnish, and varnishlog. 3) Browse a page that has a short TTL, like 120 seconds. 4) Wait for the TTL to expire. At this point I expect to see the varnishlog output indicate that it's fetching a fresh copy, but I don't. 5) Browse to the same page again, and you'll notice that it re- fetches the document from the origin at the time your browser requests the file via 'miss', not before the miss occured. I look forward to any guidance you can provide. Thanks, Adrian From phk at phk.freebsd.dk Sat Jan 26 09:51:38 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sat, 26 Jan 2008 09:51:38 +0000 Subject: Apparent bug in exp_prefetch() In-Reply-To: Your message of "Sat, 26 Jan 2008 01:11:34 PST." <21A902F1-DF66-4A9C-9481-2B5D8BBD5AC1@mosso.com> Message-ID: <9034.1201341098@critter.freebsd.dk> In message <21A902F1-DF66-4A9C-9481-2B5D8BBD5AC1 at mosso.com>, Adrian Otto writes : >Hello, > >I'm new to varnish but I think I've found a bug and I'm seeking help >to confirm it. This is varnish 1.1.2 on RHEL4. I'm trying to use this >in my VCL: > >sub vcl_timeout { > fetch; >} This is not yet supported, I'm still working on the code. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From jodok at lovelysystems.com Sat Jan 26 22:39:04 2008 From: jodok at lovelysystems.com (Jodok Batlogg) Date: Sat, 26 Jan 2008 23:39:04 +0100 Subject: how to build the nagios-plugin? Message-ID: hi, i'm trying to build the nagios-plugin for varnish but i fail already at the ./configure step... checking for VARNISHAPI... configure: error: Package requirements (varnishapi) were not met: No package 'varnishapi' found Consider adjusting the PKG_CONFIG_PATH environment variable if you installed software in a non-standard prefix. Alternatively, you may set the environment variables VARNISHAPI_CFLAGS and VARNISHAPI_LIBS to avoid the need to call pkg-config. See the pkg-config man page for more details. any idea? thanks jodok batlogg