Iocaine: What is and how to integrate with Vinyl Cache

Iocaine had gained attention earlier in 2026 as defense option to cope with the AI crawler problem. We contextualize Iocaine from the perspective of Vinyl Cache, explain what it does and does not do, show how to integrate it with Vinyl Cache and reimplement Iocaine’s classifier in VCL.

Note

This article has been written with good intentions based on the Iocaine reference documentation with some reference to the Iocaine source code and contains personal opinion alongside professional judgement. The author does not have production experience using Iocaine. If you have feedback, criticism, corrections or improvements to propose, please do get in touch, open an issue or propose a change.

The source code reference is acb32d3551a2ec221a3edde5b68717268f47e2d0 and statements are made based on a “vanilla” configuration using the bundled example scripts.

What is?

Iocaine combines and integrates several components with an external HTTP proxy:

  • A classifier to judge on whether or not an HTTP request originates from a bot (AI crawler)

  • A content generator (tarpit, garbage and link maze generator) to potentially “poison” AI training

  • nftables firewall control via netlink

  • Management functions: Metrics and Logging

The first two components will be commented on in more detail in reverse order, but first we need to understand another basic concept.

How Iocaine integrates with proxies

Integrating iocaine with reverse proxies documents the concept well: Iocaine expects to see the original request as received from the client, it then responds with either a 421 return code if the proxy should serve the original content, or otherwise the proxy should serve whatever Iocaine returns.

We will get back to this later, but obviously, if this concept was applied to Vinyl Cache direcly, using Iocaine would increase the load for cache hits, because a request would need to be sent to a backend, processed by Iocaine and the response interpreted, when we could just serve content from cache.

The garbage generator

Iocaines content generator includes several text and image generation options. In the author’s personal opinion, this is an interesting option to have available and the idea of fighting back when being abused by exploitative practice might have some, in particular emotional, appeal, but it remains questionable if this approach leads to the intended result of poisoning AI training datasets for the following reasons:

  • For integration into and coexistence with other website content, Iocaine needs to identify “garbage URLs”. This is currently done using a naive approach of adding fixed strings (“poison-ids”) to URLs. Crawlers can trivially identify these as as repeating, random patterns in URLs and exclude them.

    This could potentially be improved by implementing a stegographic approach of hiding the identifier in otherwise normal looking URLs.

  • The generated HTML output can be identified equally trivially.

    This could be improved by adjusting generator details to match the “legitimate content” of the respective website.

  • It seems likely that the generator method will expose statistical properties which can be detected.

In summary, the author considers the goal of actually poisoning AI training data very ambitious. Because there (likely) is no feedback loop back from AI crawler operators to Iocaine developers and operators, potential improvements will have to be made “in the blind”, leading to an arms race where the impact of action is unclear.

One motivation of deploying Iocaine might be to simply avoid (expensive) content generation for AI crawlers, so the question of whether or not the garbage generator actually is “The deadliest poison known to AI” might be irrelevant to users. And this is fine.

The classifier

The main goal of the classifier in this context is to identify AI bots. The high level classifier code is implemented in scripting languages, which can then use facilities provided by the Iocaine runtime environment.

Iocaine offers three scripting languages (Lua, Fennel, Roto) in which the classifier (the decide(request) function) can be implemented together with the output generator (the output(request, decision) function).

In the context of Vinyl cache, this closely matches what can be done in VCL sub vcl_recv {}, resulting in a return(synth(421)) to instruct the external proxy to deliver original content, or a (hypothetical) return(synth(xxx)) for generating garbage.

Request information

There is nothing an Iocaine decide(request) function can do which could not be done in sub vcl_recv {} with some additional VMODs, simply because the data available to the decide(request) function is the HTTP request, which is less than what sub vcl_recv {} has available. Matching the Request type documentation, the VCL equivalents are:

  • path: req.url

  • method: req.method

  • headers: req.http.*

  • query parameters: req.url

  • cookies: req.http.cookie

Regarding query parameters and cookies, native access in Vinyl Cache is via the “data on the line”, but VMODs like vmod_querystring and vmod_cookie can be used to simplify handling.

In addition, Vinyl Cache provides access to information about the network socket and information received via the PROXY protocol, which can include information about a TLS connection (see Configuring TLS with haproxy).

Pattern matching

The Pattern matching capabilities in Iocaine are matched in Vinyl Cache as follows:

  • Substrings and Regular Expressions: See Sets in VMODs re2 and selector

  • IP Prefixes: acl with the match operator ~: See ACL

  • ASNs & GeoIP: See vmod_geoip2

  • Static matchers: if (true) {} and if (false) {}

  • Sec-CH-UA: req.http.sec-ch-ua

What it is not?

Iocaine, sadly, does not have a magic bullet for bot detection. It makes available and allows to combine well known methods for user agent classification, which have all been available in Vinyl Cache for years.

This is not a bad thing, and there is nothing wrong about having a freedom of choice, particularly if the freedom comes in the form of FOSS.

Also, Iocaine brings with it sensible defaults and examples for the particular purpose of AI bot detection, which certainly are a useful and helpful baseline and starting configuration.

The problem with “traditional” bot detection

The fundamental problem with this “traditional” bot detection is that it ultimately relies on information being sent voluntarily by the user agent. In particular, the User-Agent header can be set at will, and if we have to deal with bots not playing by the rules (by ignoring robots.txt and disrespecting common sense crawl frequency limits), why should these exploitative crawlers set headers such that we can easily identify them?

Similarly, the age of successful classification and rate limiting by IP and ASN seems to largely come to an end with residential IP proxies sold wholesale as SAAS.

Some new ideas are in need, and some already exist, but that is a topic for another article…

That said, we still want to explore how Iocaine can play with Vinyl Cache.

Integrating Iocaine into Vinyl Cache (the simple way)

Applying the proxy integration concept of Iocaine as intended would, in the case of Vinyl Cache, actually increase overhead significantly for cache hits:

We would need to turn the cache hit into a pass first, send a backend request to Iocaine, and then, on the client side, restart if the response is a 421, now allowing the cache hit to succeed.

If the goal is to reduce load from AI crawlers, this approach is wrong: With Vinyl Cache, serving an object from cache is extremely efficient. If we put aside the question of used bandwidth, it is thus much more efficient and simple to service cache hits no matter what, and call out to Iocaine only for cache misses and passes.

This is what the following VCL snippet does:

sub iocaine_backend_fetch {
        if (bereq.retries > 0) {
                return;
        }
        set bereq.backend = iocaine;
        return (fetch);
}

sub iocaine_backend_response {
        if (bereq.backend != iocaine) {
                return;
        }
        if (beresp.status == 421) {
                std.rollback(bereq);
                return (retry);
        }
        set beresp.uncacheable = true;
}

sub vcl_backend_fetch {
        set bereq.backend = my_normal_backend;
        call iocaine_backend_fetch;
}
sub vcl_backend_response {
        call iocaine_backend_response;
}

Upon the first backend request (bereq.retries == 0), we send the request to iocaine. If it comes back with a 421, we roll back and retry, otherwise we deliver the garbage from Iocaine, but do not cache it.

To use this snippet:

  • Download vcl/iocaine.inc.vcl and install it in a directory in vcl_path.

  • Install Iocaine and start it on the default port 127.0.0.1:42069 (or change the backend iocaine definition accordingly).

  • Include the code in your VCL:

    sub vcl_backend_fetch {
            # Important
            set bereq.backend = your_real_backend;
    }
    include "iocaine.inc.vcl";
    

It is important that the real backend is set before the include, either as shown in the example, or by calling iocaine_backend_fetch explicitly and setting the real backend either before or after.

Tip

To ensure the VCL code works as expected, vtc files for use with vinyltest are provided for all code in this article.

To run the test case for iocaine.inc.vcl, download vtc/iocaine.vtc into a vtc subdirectory, put iocaine.inc.vcl in a vcl subdirectory and run the test with vinyltest vtc/iocaine.vtc.

Make sure your PATH contains the vinyld binary, that you have all VMODs installed and that iocaine is running on the default port, else the test will return errors or be skipped.

Again, with this method, we only call Iocaine for cache misses and passes, but otherwise use it as originally intended, with the Iocaine classifier being used.

If we also want to run a classifier for cache hits, we either need to do something inefficient (as laid out before), or we need to use Vinyl Cache for what it’s made for:

Vinylcaine: The Iocaine classifier in VCL

As laid out before, the classifier part of Iocaine can be implemented in VCL relatively easily, this is mostly busywork. The content generator could also be ported to a VMOD, but this part is, on the other hand, also less interesting because we can just connect to Iocaine to get garbage if we need it.

Above, all the information necessary for the implementation is already given, so you could just do that yourself, but action speaks louder than words, right?

So here’s a reimplementation of Iocaine’s default matcher in VCL:

To use it:

  • Download vcl/vinylcaine.inc.vcl and install it in a directory in vcl_path.

  • Download an mmdb database mapping from IP addresses to Autonomous system numbers (ASNs) and replace /tmp/GeoLite2-ASN.mmdb with the respective path.

    Such databases are available, for example, from

    The code is for Maxmind. For IPinfo, it needs minor adjustments: The ASN is called asn instead of autonomous_system_number and is prefixed with AS, according to the link above.

    Alternatively, all references to geoip2 and geo in the code can be removed to just disable the ASN lookup functionality.

  • Install vmod_re2, vmod_selector and vmod_geoip2 (a quick adjustment to Vinyl Cache, see issue 70 for context).

  • Install Iocaine and start it on the default port 127.0.0.1:42069

  • Include the code in your VCL:

    include "vinylcaine.inc.vcl";
    

That’s it. You now have the classifier functionality equivalent to Iocaine implemented in VCL, plus the garbage generator from Iocaine itself, with one deliberate difference: The trusted paths (vcai_trusted_paths) are applied with a prefix match.

As always, changes to VCL are easy, for example, the matchers can easily be changed from selector to re2 in order to get regular expressions, and the actual patterns can be improved/extended.

The code assumes Iocaine running on 127.0.0.1:42069 with unwanted-visitors configured to contain Perplexity (the default). We need this configuration as a way to make Iocaine return garbage when we need it.

Tip

To run the test case for vinylcaine.inc.vcl, download vtc/vinylcaine.vtc into a vtc subdirectory, put vinylcaine.inc.vcl in a vcl subdirectory and run the test with vinyltest vtc/vinylcaine.vtc.

Make sure your PATH contains the vinyld binary, that you have all VMODs installed and that iocaine is running on the default port, else the test will return errors or be skipped.

Compared to the simple way, this code looks quite elaborate, but it actually is not: This is almost 1:1 the default classifier from Iocaine, which would otherwise run as a Lua, Fennel or Roto script, so in terms of the actual code being executed, we have just moved it from Iocaine to Vinyl Cache

Moving the code to VCL allows us to cheaply run the classifier at request time, so we can also sensibly apply it to cached content.

And, from the perspective of a Vinyl Cache administrator, now all the configuration and classifier logic is in one place.

Vinylcaine performance compared with Iocaine

To ensure that we recommend an option which is at least en par with Iocaine, we ran a micro benchmark.

First and foremost, to jump ahead, the important result is that yes, Iocaine really is fast. The perspective of a Vinyl Cache maintainer is that usually the performance of other software is on a level at least some orders of magnitude slower, but in this case, in particular given all the things Iocaine and the Rust runtime need to do (generate content dynamically, generating and collecting a lot of garbage) and considering all the optimizations we have in the highly specialized Vinyl Cache server, the Iocaine results do impress.

That said, when running Iocaine, you still need a reverse proxy anyway, so we are really comparing apples as pears: We compare here running Vinyl Cache with the Vinylcaine VCL (and Iocaine, but it does not do much in this case) vs. just Iocaine, which by itself does not provide the reverse proxy functionality.

But enough jumping ahead, what are the tests and the results?

The microbenchmark was run on Debian 12.13 with Linux 6.12.85+deb12-amd64 on a Laptop (Lenovo T590) with an Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (4 cores / 8 threads). wrk with 100 connections was run for 20 seconds against Iocaine and Vinyl Cache with the Vinylcaine VCL, in both cases sending the User-Agent: Perplexity header to make the classifier evaluate all the steps until the last.

In the case of the test against Iocaine directly, it was generating the garbage content for all requests, while in the case of Vinyl Cache, that content was still fetched from Iocaine, but cached, so it was only generated once.

To not test MMDB lookup performance, the ASN lookup was disabled both for Iocaine and Vinylcaine.

Because this is a micro-benchmark, we should not compare the numbers in detail, for example, the amount of body data which Iocaine generates differs between runs, but for comparison we picked runs where the Vinylcaine setup had a disadvantage (because more body data was transferred).

By default, the Vinylcaine VCL as presented here compresses cached content, but wrk requests uncompressed content and Iocaine does not compress. So for the direct comparison with the VCL presented here, Vinyl Cache needs to decompress the cached content for each response. For completeness, we also ran the microbenchmark with compression disabled.

For all runs, all CPU cores were basically saturated. getrusage numbers are comparible for all runs.

The Numbers

The rough numbers are:

  • Iocaine: 65kreq/s

  • Vinylcaine including gunzip: 88kreq/s

  • Vinylcaine without gunzip: 113kreq/s

Again, we do not want to make Iocaine look bad, their numbers are fine, the point to make is just that reimplementing the same classifier in Vinyl Cache is not slower.

Details

This section contains all the details about software versions and how they were compiled and run.

Iocaine build:

slink@haggis21:~/Devel/madhouse/iocaine (main)$ git rev-parse HEAD
acb32d3551a2ec221a3edde5b68717268f47e2d0
slink@haggis21:~/Devel/madhouse/iocaine (main)$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home:  /home/slink/.rustup

installed toolchains
--------------------
nightly-2025-06-12-x86_64-unknown-linux-gnu
1.89-x86_64-unknown-linux-gnu (active, default)

active toolchain
----------------
name: 1.89-x86_64-unknown-linux-gnu
active because: it's the default toolchain
installed targets:
  x86_64-unknown-linux-gnu

slink@haggis21:~/Devel/madhouse/iocaine (main)$ RUST_MIN_STACK=16777216 cargo build --profile release
   Compiling libnftables1-sys v0.1.2
   Compiling iocaine-table v4.0.0-snapshot (/home/slink/Devel/madhouse/iocaine/iocaine-table)
   Compiling iocaine-powder v4.0.0-snapshot (/home/slink/Devel/madhouse/iocaine/iocaine-powder)
   Compiling iocaine v4.0.0-snapshot (/home/slink/Devel/madhouse/iocaine/iocaine)
    Finished `release` profile [optimized] target(s) in 3m 14s

Adjustments to vinylcaine.inc.vcl

  • Make trusted IPs config equal to Iocaine default

  • Remove ASN check

diff --git a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
index 263ab8f..edcff6b 100644
--- a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
+++ b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
@@ -86,7 +86,7 @@ sub vcl_init {
 }

 acl vcai_trusted_ips {
-       "127.0.0.1"/32;
+#      "127.0.0.1"/32;
 }

 sub vcl_init {
@@ -307,11 +307,11 @@ sub vinylcaine_recv_decide_default {
 ##     if ASN.matches(request.header("x-forwarded-for")) {
 ##             return augment_decision(request, "garbage", "asn");
 ##     }
-       if (vcai_asn.match(geo.lookup("autonomous_system_number", client.ip))) {
-               set req.http.vcai-decision = "garbage";
-               set req.http.vcai-reason = "asn";
-               return;
-       }
+#      if (vcai_asn.match(geo.lookup("autonomous_system_number", client.ip))) {
+#              set req.http.vcai-decision = "garbage";
+#              set req.http.vcai-reason = "asn";
+#              return;
+#      }

 ##     if AI_ROBOTS_TXT.matches(user_agent) {
 ##             return augment_decision(request, "garbage", "ai.robots.txt");
@@ -352,6 +352,8 @@ sub vinylcaine_backend_fetch {
        # force iocaine to generate garbage (requires Perplexity
        # in UNWANTED_VISITORS (default))
        set bereq.http.User-Agent = "Perplexity";
+       set bereq.http.Host = "localhost:42069";
+       unset bereq.http.Accept;
        # prevent hitting X-F-F 127.0.0.1 rule (trusted ips)
        unset bereq.http.X-Forwarded-For;
        set bereq.backend = iocaine;

Vinyl Cache and VMODs build:

slink@haggis21:~/Devel/varnish-git/vinyl-cache (main)$ clang-19 -v
Debian clang version 19.1.7 (3~deb12u1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/10
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
slink@haggis21:~/Devel/varnish-git/vinyl-cache (main)$ git rev-parse HEAD
00a97d073d499ee5af52d39ecc313e273123dfc7
slink@haggis21:~/Devel/varnish-git/vinyl-cache (main)$ CC=clang-19 ./configure --prefix=/tmp && make -j20 install
...

slink@haggis21:~/Devel/varnish-git/libvmod-selector (master)$ git rev-parse HEAD && CC=clang-19 ./configure --prefix=/tmp && make -j20 install
2b29a0836103649bca6e7cdbdb855d0b525588b9
...

slink@haggis21:~/Devel/varnish-git/libvmod-re2 (master)$ git rev-parse HEAD && CC=clang-19 ./configure --prefix=/tmp && make -j20 install
c009a80895cc4a5fa88bd4c1433d8e6955521be7
...

slink@haggis21:~/Devel/varnish-git/libvmod-geoip2 (dumb_viylize)$ git rev-parse HEAD && CC=clang-19 ./configure --prefix=/tmp && make -j20 install
48012d78719056dfda32f818c46d3c07be463416
...

Iocaine run:

slink@haggis21:~/Devel/madhouse/iocaine (main)$ time ./target/release/iocaine start & sleep 2 ; wrk -c 100 -d 20 -H 'User-Agent: Perplexity' http://localhos
t:42069 ; pkill iocaine
[1] 196864
2026-06-12T17:28:29.642161Z  WARN iocaine::user: No ai-robots-txt-path configured, using default
2026-06-12T17:28:29.643754Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Running 20s test @ http://localhost:42069
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.50ms    0.86ms  17.73ms   75.38%
    Req/Sec    32.96k     2.04k   44.15k    73.75%
  1314021 requests in 20.03s, 1.28GB read
Requests/sec:  65590.39
Transfer/sec:     65.49MB
slink@haggis21:~/Devel/madhouse/iocaine (main)$
real    0m22,067s
user    1m50,493s
sys     0m22,652s

Vinyl Cache run with gunzip:

slink@haggis21:~/Devel/madhouse/iocaine (main)$ export VINYL_DEFAULT_N=/tmp/t
slink@haggis21:~/Devel/madhouse/iocaine (main)$ time /tmp/sbin/vinyld -f /tmp/t.vcl -a 127.0.0.1:8080 -F & time ./target/release/iocaine start & sleep 2 ; w
rk -c 100 -d 20 -H 'User-Agent: Perplexity' http://localhost:8080 ; pkill vinyld ; pkill iocaine
[1] 196568
[2] 196569
2026-06-12T17:27:28.891373Z  WARN iocaine::user: No ai-robots-txt-path configured, using default
2026-06-12T17:27:28.893156Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Debug: Version: vinyl-cache-trunk revision 00a97d073d499ee5af52d39ecc313e273123dfc7
Debug: Platform: Linux,6.12.85+deb12-amd64,x86_64,-jnone,-sdefault,-sdefault,-ssynth,-hcritbit
Debug: Child (196594) Started
Child launched OK
Info: Child (196594) said Child starts
Running 20s test @ http://localhost:8080
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.95ms    1.23ms  51.56ms   94.55%
    Req/Sec    44.54k    11.51k   74.69k    67.00%
  1773670 requests in 20.03s, 1.96GB read
Requests/sec:  88564.46
Transfer/sec:    100.27MB
Info: Manager got SIGTERM from PID 196828
Debug: Stopping Child

real    0m22,079s
user    0m0,090s
sys     0m0,024s
[2]+  Done                    time ./target/release/iocaine start
slink@haggis21:~/Devel/madhouse/iocaine (main)$ Info: Child (196594) said shutdown waiting for 96 references on boot
Info: Child (196594) said shutdown waiting for 41 references on boot
Info: Child (196594) said Child dies usr=90.588737 sys=39.215018
Info: Child (196594) ended
Debug: Child cleanup complete
Info: manager stopping child
Info: manager dies

real    0m24,664s
user    1m30,960s
sys     0m39,319s

Adjust VCL for no gunzip:

diff --git a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
index edcff6b..1954f41 100644
--- a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
+++ b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
@@ -370,7 +370,7 @@ sub vinylcaine_backend_response {
        if (! beresp.uncacheable) {
                set beresp.ttl = 1h;
                # to save cache space
-               set beresp.do_gzip = true;
+               #set beresp.do_gzip = true;
        }
        return (deliver);
 }

Vinyl Cache run without gunzip:

slink@haggis21:~/Devel/madhouse/iocaine (main)$ time /tmp/sbin/vinyld -f /tmp/t.vcl -a 127.0.0.1:8080 -F & time ./target/release/iocaine start & sleep 2 ; wrk -c 100 -d 20 -H 'User-Agent: Perplexity' http://localhost:8080 ; pkill vinyld ; pkill iocaine
[1] 197259
[2] 197260
2026-06-12T17:33:23.639303Z  WARN iocaine::user: No ai-robots-txt-path configured, using default
2026-06-12T17:33:23.640940Z  WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Debug: Version: vinyl-cache-trunk revision 00a97d073d499ee5af52d39ecc313e273123dfc7
Debug: Platform: Linux,6.12.85+deb12-amd64,x86_64,-jnone,-sdefault,-sdefault,-ssynth,-hcritbit
Debug: Child (197283) Started
Child launched OK
Info: Child (197283) said Child starts
Running 20s test @ http://localhost:8080
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   689.50us    0.90ms  41.74ms   96.03%
    Req/Sec    56.93k    10.19k   76.01k    63.75%
  2265658 requests in 20.00s, 2.46GB read
Requests/sec: 113263.08
Transfer/sec:    125.86MB
Info: Manager got SIGTERM from PID 197516
Debug: Stopping Child

real    0m22,059s
user    0m0,086s
sys     0m0,016s
[2]+  Done                    time ./target/release/iocaine start
slink@haggis21:~/Devel/madhouse/iocaine (main)$ Info: Child (197283) said shutdown waiting for 83 references on boot
Info: Child (197283) said shutdown waiting for 28 references on boot
Info: Child (197283) said Child dies usr=77.671676 sys=46.360983
Info: Child (197283) ended
Debug: Child cleanup complete
Info: manager stopping child
Info: manager dies

real    0m24,646s
user    1m18,037s
sys     0m46,461s

Conclusion

We have explained what, from the perspective of a Vinyl Cache maintainer, Iocaine is, how it works, how it can be integrated with Vinyl Cache and how its classifier can be ported to VCL without any efficiency loss.

References

This work was motivated by mastodon feedback to the presentation WAF: Wrong Approach Firewall at GPN24.

Acknowledgements

Development of the VCL code and this documentation has been funded through an investment of the Sovereign Tech Agency.