From des at linpro.no Mon Jun 19 10:34:19 2006 From: des at linpro.no (Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?=) Date: Mon, 19 Jun 2006 12:34:19 +0200 Subject: Summer of code. References: <1825.193.213.34.102.1146694978.squirrel@denise.vg.no> Message-ID: "Anders Berg" writes: > Subject ID > httpd-cache-test > [...] > To bad the scope did not involve the reverse proxy. But anyway, hope > someone takes this one... It wasn't picked up - either noone volunteered, or whoever did wasn't approved by Google. The following cache-related projects were approved: Derby LRU Cache Manager by Gokul Soundararajan, mentored by ?ystein Gr?vlen Redesigning and extending the Apache cache architecture. by Davi Einstein Melges Arnaut, mentored by Paul Querna DES -- Dag-Erling Sm?rgrav Senior Software Developer Linpro AS - www.linpro.no From Anders.Berg at vg.no Mon Jun 19 10:55:08 2006 From: Anders.Berg at vg.no (Anders Berg) Date: Mon, 19 Jun 2006 12:55:08 +0200 Subject: SV: Summer of code. Message-ID: <6AD33D89A21F7B479107D712EF96FDA54868A9@VG-EXC-VIR-1.Akersgt.local> > -----Opprinnelig melding----- > Fra: varnish-dev-bounces at projects.linpro.no > [mailto:varnish-dev-bounces at projects.linpro.no] P? vegne av > Dag-Erling Sm?rgrav > Sendt: 19. juni 2006 12:34 > Til: varnish-dev at projects.linpro.no > Emne: Re: Summer of code. > > "Anders Berg" writes: > > Subject ID > > httpd-cache-test > > [...] > > To bad the scope did not involve the reverse proxy. But > anyway, hope > > someone takes this one... > > It wasn't picked up - either noone volunteered, or whoever > did wasn't approved by Google. The following cache-related > projects were > approved: To bad :(( > Derby LRU Cache Manager > by Gokul Soundararajan, mentored by ?ystein Gr?vlen I assume this is database caching. > Redesigning and extending the Apache cache architecture. > by Davi Einstein Melges Arnaut, mentored by Paul Querna Makes me even more certain Varnish is the way to go :) Anders Berg > DES > -- > Dag-Erling Sm?rgrav > Senior Software Developer > Linpro AS - www.linpro.no > _______________________________________________ > varnish-dev mailing list > varnish-dev at projects.linpro.no > http://projects.linpro.no/mailman/listinfo/varnish-dev > ***************************************************************** Denne fotnoten bekrefter at denne e-postmeldingen ble skannet av MailSweeper og funnet fri for virus. ***************************************************************** This footnote confirms that this email message has been swept by MailSweeper for the presence of computer viruses. ***************************************************************** From phk at phk.freebsd.dk Tue Jun 20 21:41:16 2006 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue, 20 Jun 2006 21:41:16 +0000 Subject: Expiring a million items Message-ID: <76396.1150839676@critter.freebsd.dk> Warning: algorithm-wanking in progress :-) Imagine a varnish running and having a million objects in the cache. Each of those objects have a TTL and consequently an expiry time, but before that time, we want the prefetcher to get a shot at the object. Our only requirement to the data structure apart from speed is that we can pick the lowest expiry time out fast. What order the remaining elements are in, or how they are organized is not really relevant. A sorted list ------------- The simple solution of a sorted list will not work, as every insert of a long TTL object will have to traverse the majority of the list to find the right spot. Callout Wheel ------------- A callout wheel is a circle of A sorted linked lists. An object which expires at time B is inserted into the ordered list number (B mod A). The prefetcher will run around the circle like a hamster in a threadmill, taking one bucket per second and looking for all those items which are about to expire. Basically, this reduces the work by around a factor A. But A is the trouble. If we have 100 objects, A=10 is a good choice. If we have a million objects, A should probably be 10000 or maybe 50000. Changing A on the fly is not fun and takes extra CPU work if it has to be perfect. If we allow some objects to miss the prefetcher opportunity when A changes, it is less troublesome. A Tree (of some sort) --------------------- The main problem with using a tree structure is locking. To avoid cross-object locks, it needs to be a tree where branch nodes are distinct from objects. Or said another way: the tree holds pointer(s) to the object, the object holds pointer(s) to the tree but not to any other object. Most tree types can be implemented this way, but seldom are. One exception to this rule is the binary heap: It is often implemented as an array of pointers to leaf elements where element [i] is the parent of elements [2i+1] and [2i+2]. The binary heap has the extra bonus that the root is always the extreeme key according to the chosen sort order, this matches our requirement perfectly. We already have object store for large sequential chunks of data in Varnish. If we need to extend the array-object, we will incur a memcpy hit in order to move to a larger object, but I think this will be lost in the noise performance wise. Alternatives are to use an array of equalsized objects or a linked list of objects to implement the array. After studying the search-trees for an evening, I have not found any that beat a minimum binary heap for our needs: We have no need to merge trees, so the advanced B trees families are not relevant. A plain B-tree would keep getting skewed as time passes, that means a lot of one-side rebalancing effort. I doubt the caching of the least recently used node will buy us anything at all so splay trees are not interesting and the complexity of the red-black tree also seems surplus to requirements. I don't think I know of any binary heap implemenations we can adopt so I will probably be writing my own. Unless there are better suggestions ? Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Sat Jun 24 21:31:40 2006 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Sat, 24 Jun 2006 21:31:40 +0000 Subject: www.vg.no (cache) clock is wrong ? Message-ID: <21425.1151184700@critter.freebsd.dk> I was testing with "www.vg.no" as backend, and my TTL calculation freaked out because I got headers like these: Looks like one of your boxes as three months off time-wise... critter phk> fetch -vv -o /dev/null http://www.vg.no/gfk/front/theshadow.jpg [...] <<< HTTP/1.0 200 OK <<< Date: Mon, 27 Mar 2006 00:45:42 GMT <<< Server: Apache/1.3.27 (Unix) <<< Cache-Control: max-age=604800 <<< Expires: Mon, 03 Apr 2006 00:45:42 GMT <<< Last-Modified: Tue, 05 Oct 2004 06:44:48 GMT <<< Content-Type: image/jpeg <<< X-Cache: HIT from www.vg.no <<< Age: 1982 <<< X-Cache: HIT from www.vg.no <<< Connection: close -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk at FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.