Iocaine: What is and how to integrate with Vinyl Cache¶
Iocaine had gained attention earlier in 2026 as defense option to cope with the AI crawler problem. We contextualize Iocaine from the perspective of Vinyl Cache, explain what it does and does not do, show how to integrate it with Vinyl Cache and reimplement Iocaine’s classifier in VCL.
Note
This article has been written with good intentions based on the Iocaine reference documentation with some reference to the Iocaine source code and contains personal opinion alongside professional judgement. The author does not have production experience using Iocaine. If you have feedback, criticism, corrections or improvements to propose, please do get in touch, open an issue or propose a change.
The source code reference is acb32d3551a2ec221a3edde5b68717268f47e2d0 and
statements are made based on a “vanilla” configuration using the bundled
example scripts.
What is?¶
Iocaine combines and integrates several components with an external HTTP proxy:
A classifier to judge on whether or not an HTTP request originates from a bot (AI crawler)
A content generator (tarpit, garbage and link maze generator) to potentially “poison” AI training
nftables firewall control via netlink
Management functions: Metrics and Logging
The first two components will be commented on in more detail in reverse order, but first we need to understand another basic concept.
How Iocaine integrates with proxies¶
Integrating iocaine with reverse proxies documents the concept well: Iocaine
expects to see the original request as received from the client, it then
responds with either a 421 return code if the proxy should serve the
original content, or otherwise the proxy should serve whatever Iocaine returns.
We will get back to this later, but obviously, if this concept was applied to Vinyl Cache direcly, using Iocaine would increase the load for cache hits, because a request would need to be sent to a backend, processed by Iocaine and the response interpreted, when we could just serve content from cache.
The garbage generator¶
Iocaines content generator includes several text and image generation options. In the author’s personal opinion, this is an interesting option to have available and the idea of fighting back when being abused by exploitative practice might have some, in particular emotional, appeal, but it remains questionable if this approach leads to the intended result of poisoning AI training datasets for the following reasons:
For integration into and coexistence with other website content, Iocaine needs to identify “garbage URLs”. This is currently done using a naive approach of adding fixed strings (“poison-ids”) to URLs. Crawlers can trivially identify these as as repeating, random patterns in URLs and exclude them.
This could potentially be improved by implementing a stegographic approach of hiding the identifier in otherwise normal looking URLs.
The generated HTML output can be identified equally trivially.
This could be improved by adjusting generator details to match the “legitimate content” of the respective website.
It seems likely that the generator method will expose statistical properties which can be detected.
In summary, the author considers the goal of actually poisoning AI training data very ambitious. Because there (likely) is no feedback loop back from AI crawler operators to Iocaine developers and operators, potential improvements will have to be made “in the blind”, leading to an arms race where the impact of action is unclear.
One motivation of deploying Iocaine might be to simply avoid (expensive) content generation for AI crawlers, so the question of whether or not the garbage generator actually is “The deadliest poison known to AI” might be irrelevant to users. And this is fine.
The classifier¶
The main goal of the classifier in this context is to identify AI bots. The high level classifier code is implemented in scripting languages, which can then use facilities provided by the Iocaine runtime environment.
Iocaine offers three scripting languages (Lua, Fennel, Roto) in which the
classifier (the decide(request) function) can be implemented together with
the output generator (the output(request, decision) function).
In the context of Vinyl cache, this closely matches what can be done in VCL
sub vcl_recv {}, resulting in a return(synth(421)) to instruct the
external proxy to deliver original content, or a (hypothetical)
return(synth(xxx)) for generating garbage.
Request information¶
There is nothing an Iocaine decide(request) function can do which could
not be done in sub vcl_recv {} with some additional VMODs, simply because
the data available to the decide(request) function is the HTTP request,
which is less than what sub vcl_recv {} has available. Matching the Request
type documentation, the VCL equivalents are:
path:
req.urlmethod:
req.methodheaders:
req.http.*query parameters:
req.urlcookies:
req.http.cookie
Regarding query parameters and cookies, native access in Vinyl Cache is via the “data on the line”, but VMODs like vmod_querystring and vmod_cookie can be used to simplify handling.
In addition, Vinyl Cache provides access to information about the network socket and information received via the PROXY protocol, which can include information about a TLS connection (see Configuring TLS with haproxy).
Pattern matching¶
The Pattern matching capabilities in Iocaine are matched in Vinyl Cache as follows:
Substrings and Regular Expressions: See Sets in VMODs re2 and selector
IP Prefixes:
aclwith the match operator~: See ACLASNs & GeoIP: See vmod_geoip2
Static matchers:
if (true) {}andif (false) {}Sec-CH-UA:
req.http.sec-ch-ua
What it is not?¶
Iocaine, sadly, does not have a magic bullet for bot detection. It makes available and allows to combine well known methods for user agent classification, which have all been available in Vinyl Cache for years.
This is not a bad thing, and there is nothing wrong about having a freedom of choice, particularly if the freedom comes in the form of FOSS.
Also, Iocaine brings with it sensible defaults and examples for the particular purpose of AI bot detection, which certainly are a useful and helpful baseline and starting configuration.
The problem with “traditional” bot detection¶
The fundamental problem with this “traditional” bot detection is that it
ultimately relies on information being sent voluntarily by the user agent. In
particular, the User-Agent header can be set at will, and if we have to deal
with bots not playing by the rules (by ignoring robots.txt and disrespecting
common sense crawl frequency limits), why should these exploitative crawlers set
headers such that we can easily identify them?
Similarly, the age of successful classification and rate limiting by IP and ASN seems to largely come to an end with residential IP proxies sold wholesale as SAAS.
Some new ideas are in need, and some already exist, but that is a topic for another article…
That said, we still want to explore how Iocaine can play with Vinyl Cache.
Integrating Iocaine into Vinyl Cache (the simple way)¶
Applying the proxy integration concept of Iocaine as intended would, in the case of Vinyl Cache, actually increase overhead significantly for cache hits:
We would need to turn the cache hit into a pass first, send a backend
request to Iocaine, and then, on the client side, restart if the
response is a 421, now allowing the cache hit to succeed.
If the goal is to reduce load from AI crawlers, this approach is wrong: With Vinyl Cache, serving an object from cache is extremely efficient. If we put aside the question of used bandwidth, it is thus much more efficient and simple to service cache hits no matter what, and call out to Iocaine only for cache misses and passes.
This is what the following VCL snippet does:
sub iocaine_backend_fetch {
if (bereq.retries > 0) {
return;
}
set bereq.backend = iocaine;
return (fetch);
}
sub iocaine_backend_response {
if (bereq.backend != iocaine) {
return;
}
if (beresp.status == 421) {
std.rollback(bereq);
return (retry);
}
set beresp.uncacheable = true;
}
sub vcl_backend_fetch {
set bereq.backend = my_normal_backend;
call iocaine_backend_fetch;
}
sub vcl_backend_response {
call iocaine_backend_response;
}
Upon the first backend request (bereq.retries == 0), we send the request to
iocaine. If it comes back with a 421, we roll back and retry, otherwise we
deliver the garbage from Iocaine, but do not cache it.
To use this snippet:
Download
vcl/iocaine.inc.vcland install it in a directory invcl_path.Install Iocaine and start it on the default port
127.0.0.1:42069(or change thebackend iocainedefinition accordingly).Include the code in your VCL:
sub vcl_backend_fetch { # Important set bereq.backend = your_real_backend; } include "iocaine.inc.vcl";
It is important that the real backend is set before the include, either as shown
in the example, or by calling iocaine_backend_fetch explicitly and setting
the real backend either before or after.
Tip
To ensure the VCL code works as expected, vtc files for use with
vinyltest are provided for all code in this article.
To run the test case for iocaine.inc.vcl, download
vtc/iocaine.vtc into a vtc subdirectory, put
iocaine.inc.vcl in a vcl subdirectory and run the test with
vinyltest vtc/iocaine.vtc.
Make sure your PATH contains the vinyld binary, that you have all
VMODs installed and that iocaine is running on the default port, else the
test will return errors or be skipped.
Again, with this method, we only call Iocaine for cache misses and passes, but otherwise use it as originally intended, with the Iocaine classifier being used.
If we also want to run a classifier for cache hits, we either need to do something inefficient (as laid out before), or we need to use Vinyl Cache for what it’s made for:
Vinylcaine: The Iocaine classifier in VCL¶
As laid out before, the classifier part of Iocaine can be implemented in VCL relatively easily, this is mostly busywork. The content generator could also be ported to a VMOD, but this part is, on the other hand, also less interesting because we can just connect to Iocaine to get garbage if we need it.
Above, all the information necessary for the implementation is already given, so you could just do that yourself, but action speaks louder than words, right?
So here’s a reimplementation of Iocaine’s default matcher in VCL:
To use it:
Download
vcl/vinylcaine.inc.vcland install it in a directory invcl_path.Download an mmdb database mapping from IP addresses to Autonomous system numbers (ASNs) and replace
/tmp/GeoLite2-ASN.mmdbwith the respective path.Such databases are available, for example, from
The code is for Maxmind. For IPinfo, it needs minor adjustments: The ASN is called
asninstead ofautonomous_system_numberand is prefixed withAS, according to the link above.Alternatively, all references to
geoip2andgeoin the code can be removed to just disable the ASN lookup functionality.
Install vmod_re2, vmod_selector and vmod_geoip2 (a quick adjustment to Vinyl Cache, see issue 70 for context).
Install Iocaine and start it on the default port
127.0.0.1:42069Include the code in your VCL:
include "vinylcaine.inc.vcl";
That’s it. You now have the classifier functionality equivalent to Iocaine
implemented in VCL, plus the garbage generator from Iocaine itself, with one
deliberate difference: The trusted paths (vcai_trusted_paths) are applied
with a prefix match.
As always, changes to VCL are easy, for example, the matchers can easily be
changed from selector to re2 in order to get regular expressions, and
the actual patterns can be improved/extended.
The code assumes Iocaine running on 127.0.0.1:42069 with
unwanted-visitors configured to contain Perplexity (the default). We
need this configuration as a way to make Iocaine return garbage when we
need it.
Tip
To run the test case for vinylcaine.inc.vcl, download
vtc/vinylcaine.vtc into a vtc subdirectory, put
vinylcaine.inc.vcl in a vcl subdirectory and run the test with
vinyltest vtc/vinylcaine.vtc.
Make sure your PATH contains the vinyld binary, that you have all
VMODs installed and that iocaine is running on the default port, else the
test will return errors or be skipped.
Compared to the simple way, this code looks quite elaborate, but it actually is not: This is almost 1:1 the default classifier from Iocaine, which would otherwise run as a Lua, Fennel or Roto script, so in terms of the actual code being executed, we have just moved it from Iocaine to Vinyl Cache
Moving the code to VCL allows us to cheaply run the classifier at request time, so we can also sensibly apply it to cached content.
And, from the perspective of a Vinyl Cache administrator, now all the configuration and classifier logic is in one place.
Vinylcaine performance compared with Iocaine¶
To ensure that we recommend an option which is at least en par with Iocaine, we ran a micro benchmark.
First and foremost, to jump ahead, the important result is that yes, Iocaine really is fast. The perspective of a Vinyl Cache maintainer is that usually the performance of other software is on a level at least some orders of magnitude slower, but in this case, in particular given all the things Iocaine and the Rust runtime need to do (generate content dynamically, generating and collecting a lot of garbage) and considering all the optimizations we have in the highly specialized Vinyl Cache server, the Iocaine results do impress.
That said, when running Iocaine, you still need a reverse proxy anyway, so we are really comparing apples as pears: We compare here running Vinyl Cache with the Vinylcaine VCL (and Iocaine, but it does not do much in this case) vs. just Iocaine, which by itself does not provide the reverse proxy functionality.
But enough jumping ahead, what are the tests and the results?
The microbenchmark was run on Debian 12.13 with Linux 6.12.85+deb12-amd64 on a
Laptop (Lenovo T590) with an Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (4 cores /
8 threads). wrk
with 100 connections was run for 20 seconds against Iocaine and Vinyl Cache with
the Vinylcaine VCL, in both cases sending the User-Agent: Perplexity header
to make the classifier evaluate all the steps until the last.
In the case of the test against Iocaine directly, it was generating the garbage content for all requests, while in the case of Vinyl Cache, that content was still fetched from Iocaine, but cached, so it was only generated once.
To not test MMDB lookup performance, the ASN lookup was disabled both for Iocaine and Vinylcaine.
Because this is a micro-benchmark, we should not compare the numbers in detail, for example, the amount of body data which Iocaine generates differs between runs, but for comparison we picked runs where the Vinylcaine setup had a disadvantage (because more body data was transferred).
By default, the Vinylcaine VCL as presented here compresses cached content, but wrk requests uncompressed content and Iocaine does not compress. So for the direct comparison with the VCL presented here, Vinyl Cache needs to decompress the cached content for each response. For completeness, we also ran the microbenchmark with compression disabled.
For all runs, all CPU cores were basically saturated. getrusage numbers are comparible for all runs.
The Numbers¶
The rough numbers are:
Iocaine: 65kreq/s
Vinylcaine including gunzip: 88kreq/s
Vinylcaine without gunzip: 113kreq/s
Again, we do not want to make Iocaine look bad, their numbers are fine, the point to make is just that reimplementing the same classifier in Vinyl Cache is not slower.
Details¶
This section contains all the details about software versions and how they were compiled and run.
Iocaine build:
slink@haggis21:~/Devel/madhouse/iocaine (main)$ git rev-parse HEAD
acb32d3551a2ec221a3edde5b68717268f47e2d0
slink@haggis21:~/Devel/madhouse/iocaine (main)$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home: /home/slink/.rustup
installed toolchains
--------------------
nightly-2025-06-12-x86_64-unknown-linux-gnu
1.89-x86_64-unknown-linux-gnu (active, default)
active toolchain
----------------
name: 1.89-x86_64-unknown-linux-gnu
active because: it's the default toolchain
installed targets:
x86_64-unknown-linux-gnu
slink@haggis21:~/Devel/madhouse/iocaine (main)$ RUST_MIN_STACK=16777216 cargo build --profile release
Compiling libnftables1-sys v0.1.2
Compiling iocaine-table v4.0.0-snapshot (/home/slink/Devel/madhouse/iocaine/iocaine-table)
Compiling iocaine-powder v4.0.0-snapshot (/home/slink/Devel/madhouse/iocaine/iocaine-powder)
Compiling iocaine v4.0.0-snapshot (/home/slink/Devel/madhouse/iocaine/iocaine)
Finished `release` profile [optimized] target(s) in 3m 14s
Adjustments to vinylcaine.inc.vcl
Make trusted IPs config equal to Iocaine default
Remove ASN check
diff --git a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
index 263ab8f..edcff6b 100644
--- a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
+++ b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
@@ -86,7 +86,7 @@ sub vcl_init {
}
acl vcai_trusted_ips {
- "127.0.0.1"/32;
+# "127.0.0.1"/32;
}
sub vcl_init {
@@ -307,11 +307,11 @@ sub vinylcaine_recv_decide_default {
## if ASN.matches(request.header("x-forwarded-for")) {
## return augment_decision(request, "garbage", "asn");
## }
- if (vcai_asn.match(geo.lookup("autonomous_system_number", client.ip))) {
- set req.http.vcai-decision = "garbage";
- set req.http.vcai-reason = "asn";
- return;
- }
+# if (vcai_asn.match(geo.lookup("autonomous_system_number", client.ip))) {
+# set req.http.vcai-decision = "garbage";
+# set req.http.vcai-reason = "asn";
+# return;
+# }
## if AI_ROBOTS_TXT.matches(user_agent) {
## return augment_decision(request, "garbage", "ai.robots.txt");
@@ -352,6 +352,8 @@ sub vinylcaine_backend_fetch {
# force iocaine to generate garbage (requires Perplexity
# in UNWANTED_VISITORS (default))
set bereq.http.User-Agent = "Perplexity";
+ set bereq.http.Host = "localhost:42069";
+ unset bereq.http.Accept;
# prevent hitting X-F-F 127.0.0.1 rule (trusted ips)
unset bereq.http.X-Forwarded-For;
set bereq.backend = iocaine;
Vinyl Cache and VMODs build:
slink@haggis21:~/Devel/varnish-git/vinyl-cache (main)$ clang-19 -v
Debian clang version 19.1.7 (3~deb12u1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/10
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
slink@haggis21:~/Devel/varnish-git/vinyl-cache (main)$ git rev-parse HEAD
00a97d073d499ee5af52d39ecc313e273123dfc7
slink@haggis21:~/Devel/varnish-git/vinyl-cache (main)$ CC=clang-19 ./configure --prefix=/tmp && make -j20 install
...
slink@haggis21:~/Devel/varnish-git/libvmod-selector (master)$ git rev-parse HEAD && CC=clang-19 ./configure --prefix=/tmp && make -j20 install
2b29a0836103649bca6e7cdbdb855d0b525588b9
...
slink@haggis21:~/Devel/varnish-git/libvmod-re2 (master)$ git rev-parse HEAD && CC=clang-19 ./configure --prefix=/tmp && make -j20 install
c009a80895cc4a5fa88bd4c1433d8e6955521be7
...
slink@haggis21:~/Devel/varnish-git/libvmod-geoip2 (dumb_viylize)$ git rev-parse HEAD && CC=clang-19 ./configure --prefix=/tmp && make -j20 install
48012d78719056dfda32f818c46d3c07be463416
...
Iocaine run:
slink@haggis21:~/Devel/madhouse/iocaine (main)$ time ./target/release/iocaine start & sleep 2 ; wrk -c 100 -d 20 -H 'User-Agent: Perplexity' http://localhos
t:42069 ; pkill iocaine
[1] 196864
2026-06-12T17:28:29.642161Z WARN iocaine::user: No ai-robots-txt-path configured, using default
2026-06-12T17:28:29.643754Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Running 20s test @ http://localhost:42069
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.50ms 0.86ms 17.73ms 75.38%
Req/Sec 32.96k 2.04k 44.15k 73.75%
1314021 requests in 20.03s, 1.28GB read
Requests/sec: 65590.39
Transfer/sec: 65.49MB
slink@haggis21:~/Devel/madhouse/iocaine (main)$
real 0m22,067s
user 1m50,493s
sys 0m22,652s
Vinyl Cache run with gunzip:
slink@haggis21:~/Devel/madhouse/iocaine (main)$ export VINYL_DEFAULT_N=/tmp/t
slink@haggis21:~/Devel/madhouse/iocaine (main)$ time /tmp/sbin/vinyld -f /tmp/t.vcl -a 127.0.0.1:8080 -F & time ./target/release/iocaine start & sleep 2 ; w
rk -c 100 -d 20 -H 'User-Agent: Perplexity' http://localhost:8080 ; pkill vinyld ; pkill iocaine
[1] 196568
[2] 196569
2026-06-12T17:27:28.891373Z WARN iocaine::user: No ai-robots-txt-path configured, using default
2026-06-12T17:27:28.893156Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Debug: Version: vinyl-cache-trunk revision 00a97d073d499ee5af52d39ecc313e273123dfc7
Debug: Platform: Linux,6.12.85+deb12-amd64,x86_64,-jnone,-sdefault,-sdefault,-ssynth,-hcritbit
Debug: Child (196594) Started
Child launched OK
Info: Child (196594) said Child starts
Running 20s test @ http://localhost:8080
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.95ms 1.23ms 51.56ms 94.55%
Req/Sec 44.54k 11.51k 74.69k 67.00%
1773670 requests in 20.03s, 1.96GB read
Requests/sec: 88564.46
Transfer/sec: 100.27MB
Info: Manager got SIGTERM from PID 196828
Debug: Stopping Child
real 0m22,079s
user 0m0,090s
sys 0m0,024s
[2]+ Done time ./target/release/iocaine start
slink@haggis21:~/Devel/madhouse/iocaine (main)$ Info: Child (196594) said shutdown waiting for 96 references on boot
Info: Child (196594) said shutdown waiting for 41 references on boot
Info: Child (196594) said Child dies usr=90.588737 sys=39.215018
Info: Child (196594) ended
Debug: Child cleanup complete
Info: manager stopping child
Info: manager dies
real 0m24,664s
user 1m30,960s
sys 0m39,319s
Adjust VCL for no gunzip:
diff --git a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
index edcff6b..1954f41 100644
--- a/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
+++ b/R1/source/tips/iocaine/vcl/vinylcaine.inc.vcl
@@ -370,7 +370,7 @@ sub vinylcaine_backend_response {
if (! beresp.uncacheable) {
set beresp.ttl = 1h;
# to save cache space
- set beresp.do_gzip = true;
+ #set beresp.do_gzip = true;
}
return (deliver);
}
Vinyl Cache run without gunzip:
slink@haggis21:~/Devel/madhouse/iocaine (main)$ time /tmp/sbin/vinyld -f /tmp/t.vcl -a 127.0.0.1:8080 -F & time ./target/release/iocaine start & sleep 2 ; wrk -c 100 -d 20 -H 'User-Agent: Perplexity' http://localhost:8080 ; pkill vinyld ; pkill iocaine
[1] 197259
[2] 197260
2026-06-12T17:33:23.639303Z WARN iocaine::user: No ai-robots-txt-path configured, using default
2026-06-12T17:33:23.640940Z WARN iocaine::user: No unwanted-asns.db-path configured, check disabled
Debug: Version: vinyl-cache-trunk revision 00a97d073d499ee5af52d39ecc313e273123dfc7
Debug: Platform: Linux,6.12.85+deb12-amd64,x86_64,-jnone,-sdefault,-sdefault,-ssynth,-hcritbit
Debug: Child (197283) Started
Child launched OK
Info: Child (197283) said Child starts
Running 20s test @ http://localhost:8080
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 689.50us 0.90ms 41.74ms 96.03%
Req/Sec 56.93k 10.19k 76.01k 63.75%
2265658 requests in 20.00s, 2.46GB read
Requests/sec: 113263.08
Transfer/sec: 125.86MB
Info: Manager got SIGTERM from PID 197516
Debug: Stopping Child
real 0m22,059s
user 0m0,086s
sys 0m0,016s
[2]+ Done time ./target/release/iocaine start
slink@haggis21:~/Devel/madhouse/iocaine (main)$ Info: Child (197283) said shutdown waiting for 83 references on boot
Info: Child (197283) said shutdown waiting for 28 references on boot
Info: Child (197283) said Child dies usr=77.671676 sys=46.360983
Info: Child (197283) ended
Debug: Child cleanup complete
Info: manager stopping child
Info: manager dies
real 0m24,646s
user 1m18,037s
sys 0m46,461s
Conclusion¶
We have explained what, from the perspective of a Vinyl Cache maintainer, Iocaine is, how it works, how it can be integrated with Vinyl Cache and how its classifier can be ported to VCL without any efficiency loss.
References¶
This work was motivated by mastodon feedback to the presentation WAF: Wrong Approach Firewall at GPN24.
Acknowledgements¶
Development of the VCL code and this documentation has been funded through an investment of the Sovereign Tech Agency.