Filter request- or response-headers with VMOD re2 sets

Why filter headers at all?

When we care about security, less is often more. If we avoid malicious headers reaching backends, they can not be used to exploit security issues.

In general, there is a denylist and an allowlist approach. Both can be efficiently implemented using vmod_re2.

The denylist approach is (way) less secure, but used by most commercial WAFs and CDNs with WAF-features, because it needs less customization. The allowlist approach is much more restrictive and provides best security.

Denylist with plain vcl

But before we dive into vmod_re2 let’s warm up with a very simple denylist-example using plain vcl:

sub vcl_recv {
        unset req.http.Chaotic;
        unset req.http.Evil;
        if (req.http.Wicked ~ "(?i)^wicked$") {
                unset req.http.Wicked;
        }
        ...

All unwanted headers are sanitized. So far so good. But this doesn’t scale. If you have thousands of patterns it will get really really slow, because patterns are checked sequentially. Also you can’t implement an allowlist approach with plain vcl. Using hdr_filter from vmod_re2 solves both problems.

Tip

To ensure they are working as expected, vtc files for use with vinyltest are provided for all examples in this tutorial.

Download the vtc file for this first example hdr_filter-simple-denylist-plain-vcl.vtc and run it using: vinyltest <testfile.vtc>.

Make sure your PATH contains the vinyld binary. You can easily make your own modifications to try out stuff. The reference manual for vtc is available here.

Denylist with vmod_re2 and hdr_filter

So, to get startet with vmod_re2 and hdr_filter, let’s do the exact same thing as before:

import re2;

sub vcl_init {
        new deny = re2.set(anchor=start, case_sensitive=false);
        deny.add("chaotic:");
        deny.add("Evil:");
        deny.add("Wicked: wicked$");
}

sub vcl_recv {
        deny.hdr_filter(req, false);   # "false" makes this filter a denylist
        ...

Tip

Download full example as vtc: hdr_filter-simple-denylist.vtc

Two parameters are used for the set.

  1. anchor=start puts an implicit anchor ^ at the beginning of each regex. It is equivalent to:

    new deny = re2.set(case_sensitive=false);
    deny.add("^chaotic:");
    deny.add("^Evil:");
    deny.add("^Wicked: wicked$");
    
  2. case_sensitive=false is very useful for our use case because http-headers are case insensitive anyway and attackers could easily bypass our filters otherwise.

Important

Notice that the request is not rejected if headers matching the denylisted are received, they get removed as if unset was called.

Allowlists with vmod_re2 and hdr_filter

Now let’s proceed to what we really want to do: filter out all headers except for a list of explicitly allowed headers:

import re2;

sub vcl_init {
        new allow = re2.set(anchor=start, case_sensitive=false);
        allow.add("Host:");
        allow.add("If-(Modified-Since|None-Match):");
        allow.add("Non-Standard-Header:");
}

sub vcl_recv {
        allow.hdr_filter(req, true);   # "true" makes this filter an allowlist
        ...

Tip

Download full example as vtc: hdr_filter-simple-allowlist.vtc

The second parameter for hdr_filter may also be omitted, because true for allowlist is the default anyway. All following examples will not contain the second parameter.

Ok, now we filter out everything except the added patterns, which makes it a really restrictive and secure setup. Currently, we do this in vcl_recv, which is pretty early.

Filter in backend_fetch

There could be some reasons to delay the filtering until vcl_backend_fetch, just before the request is send to the backends. Or to do an additional filtering at this point. For example:

  • the client sends some headers we want to honour within vcl, but there is no need to send it to our backends

  • we might set some artificial request headers in vcl for internal purposes, which we also don’t want to send to our backends

  • we want to modify the cache-key in vcl_hash based on headers the backends don’t need

Fortunately hdr_filter can also be used on the backend side:

sub vcl_init {
        new allow_recv = re2.set(anchor=start, case_sensitive=false);
        allow_recv.add("Host:");
        allow_recv.add("If-(Modified-Since|None-Match):");
        allow_recv.add("Non-Standard-Header:");

        new allow_backend_fetch = re2.set(anchor=start, case_sensitive=false);
        allow_backend_fetch.add("Host:");
        allow_backend_fetch.add("If-(Modified-Since|None-Match):");
}

sub vcl_recv {
        allow_recv.hdr_filter(req);
        ...
}

sub vcl_backend_fetch {
        allow_backend_fetch.hdr_filter(bereq);
        ...
}

Tip

Download full example as vtc: hdr_filter-simple-allowlist-backend_fetch.vtc

Patterns in detail

Now let’s look in more detail at some examples of patterns:

  • fixed header name:

    allow.add("Host:");
    
  • regular expressions for header names:

    allow.add("X-Forwarded-(Host|Proto):");
    
  • prefix matches on header names:

    allow.add("Accept");
    

    Notice the missing colon! This matches all headers starting with Accept and any value, for example Accept: xyz, Accept-Encoding: gzip and Accept-Language: en-US

  • regular expression for header name and value:

    allow.add("Content-Type: \w+/\w+(; charset=\w+)?$");
    

Tip

Download examples as vtc: hdr_filter-simple-allowlist-pattern.vtc

Response header filtering

So far we’ve been looking at filtering request headers of incoming requests, which is certainly the most important use case from a security perspective.

But vmod_re2 is also capable of response header filtering. You might need this in some cases, for example:

  • For misbehaving backends, which are not under your control, sending response headers you don’t want, you can filter in vcl_backend_response using .hdr_filter(beresp).

  • Response headers from backends steering Vinyl Cache behaviour, which should not leak to clients, can either be filtered in vcl_backend_response after having been avaluated using .hdr_filter(beresp), or in vcl_deliver using .hdr_filter(resp).

  • To prevent information disclosure through debug headers or other internal headers, filter in vcl_deliver using .hdr_filter(resp).

To summarize all options:

client side backend

side

request

req (e.g. vcl_recv)

bereq (e.g. vcl_backend_fetch)

response

resp (e.g. vcl_deliver)

beresp (e.g. vcl_backend_response)

Response header filtering example

Let’s take a look at a simple denylist example for response headers. Of course you could use allowlists, but in contrast to request header filtering, for response headers using denylists can be more practical if the backends can be trusted:

sub vcl_init {
        new deny_beresp = re2.set(anchor=start, case_sensitive=false);
        deny_beresp.add("Some-debug-header-send-by-backends: ");
}

sub vcl_backend_response {
        deny_beresp.hdr_filter(beresp, false);   # false makes this filter a denylist
}

Tip

Download example as vtc: hdr_filter-simple-response-denylist.vtc

Depending on your scenario you could do the same with hdr_filter(resp, false) in vcl_deliver, the vtc-file includes an example.

And yes, if you only use denylists and only a bunch of headers like in this example, you could also use plain vcl unset, but the more headers you want to filter, the more useful vmod_re2 gets in addition to being more efficient. And if you want to use allowlists there is no alternative anyway.

After this excursion to response header filtering let’s get back to request header filtering.

Which headers should I put on my allowlist without breaking stuff?

We are looking at the case to build an allowlist for request headers. He’s some practical assistance how to analyse your live traffic:

# for a quick overview, just dump the content of the VSL buffer:
vinyllog -d -g request -c -i Reqheader \
  -q 'RespStatus >= 200 and RespStatus <=399' \
  -w /tmp/reqheader

# for more data let vinyllog run for longer, for example for one day:
timeout 1d vinyllog -g request -c -i Reqheader \
  -q 'RespStatus >= 200 and RespStatus <=399' \
  -w /tmp/reqheader

# you may also add a filter to only trusted IP-adresses not sending malicious headers:
timeout 1d vinyllog -g request -c -i Reqheader \
  -q 'RespStatus >= 200 and RespStatus <=399 and ReqStart ~"^10\.72\."' \
  -w /tmp/reqheader

We group by request to grab ReqHeader while still being able to filter the requests to just those with status >=200 and <=399 responses. This might be useful to reduce the amount of illegal headers. For example, if you have Apache with mod_security running it will respond to requests classified as malicious with a 403 status.

If you let vinyllog run for a longer time, please make sure you have enough space in /tmp/ or choose another location.

When it’s time for the evaluation, you might want to run:

vinyllog -r /tmp/reqheader | awk '$2 == "ReqHeader" {print tolower($3)}' |
  head -10000 | sort | uniq -c | sort -n

You’ll get a histogram like this, the first column being the number of occurrences of the respective headername in your logfile:

...
    109 authorization:
    118 priority:
    146 content-length:
    168 sec-fetch-dest:
    168 sec-fetch-site:
    195 content-type:
    219 sec-fetch-mode:
    224 cookie:
    257 referer:
    312 accept-language:
    394 connection:
    423 accept:
    490 user-agent:
    555 host:
    555 via:
    600 accept-encoding:

This might serve as a good starting point to build your own allowlist.

Documentation / Further Reading

The current documentation for vmod re2 is available using man vmod_re2 or online here.

Contributing

If you found any mistake in this tutorial, I’d like to cite Poul-Henning: “We’d absolutely love to have you help improve the project homepage, send us pull requests!” https://code.vinyl-cache.org/vinyl-cache/homepage