Squid Web Cache wiki

Squid Web Cache documentation

🔗 Feature: Store URL Rewriting

🔗 Details

My main focus with this feature is to support caching various CDN-supplied content which maps the same resource/content to multiple locations. Initially I’m targetting Google content - Google Earth, Google Maps, Google Video, Youtube - but the same technique can be used to cache similar content from CDNs such as Akamai (think “Microsoft Updates”.)

The current changes to Squid-2.HEAD implement the functionality through a number of structural changes:

🔗 Squid Configuration

First, you need to determine which URLs to send to the store url rewriter.

acl store_rewrite_list dstdomain mt.google.com mt0.google.com mt1.google.com mt2.google.com
acl store_rewrite_list dstdomain mt3.google.com
acl store_rewrite_list dstdomain kh.google.com kh0.google.com kh1.google.com kh2.google.com
acl store_rewrite_list dstdomain kh3.google.com
acl store_rewrite_list dstdomain kh.google.com.au kh0.google.com.au kh1.google.com.au
acl store_rewrite_list dstdomain kh2.google.com.au kh3.google.com.au

# This needs to be narrowed down quite a bit!
acl store_rewrite_list dstdomain .youtube.com

storeurl_access allow store_rewrite_list
storeurl_access deny all

Then you need to configure a rewriter helper.

storeurl_rewrite_program /Users/adrian/work/squid/run/local/store_url_rewrite

Then, to cache the content in Google Maps/etc, you need to change the defaults so content with “?”’s in the URL aren’t automatically made uncachable. Search your configuration and remove these two lines:

#We recommend you to use the following two lines.
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY

Make sure you check your configuration file for cache and no_cache directives; you need to disable them and use refresh_patterns where applicable to tell Squid what to not cache!

Then, add these refresh patterns at the bottom of your refresh_pattern section.

refresh_pattern -i (/cgi-bin/|\?)   0       0%      0
refresh_pattern .                   0       20%     4320

These rules make sure that you don’t try caching cgi-bin and ? URLs unless expiry information is explictly given. Make sure you don’t add the rules after a “refresh_pattern .” line; refresh_pattern entries are evaluated in order and the first match is used! The last entry must be the “.” entry!

🔗 Storage URL re-writing Helper

Here’s what I’ve been using:

$| = 1;
while (<>) {
        # print STDERR $_ . "\n";
        if (m/kh(.*?)\.google\.com(.*?)\/(.*?) /) {
                print "http://keyhole-srv.google.com" . $2 . ".SQUIDINTERNAL/" . $3 . "\n";
                # print STDERR "KEYHOLE\n";
        } elsif (m/mt(.*?)\.google\.com(.*?)\/(.*?) /) {
                print "http://map-srv.google.com" . $2 . ".SQUIDINTERNAL/" . $3 . "\n";
                # print STDERR "MAPSRV\n";
        } elsif (m/^http:\/\/([A-Za-z]*?)-(.*?)\.(.*)\.youtube\.com\/get_video\?video_id=(.*) /) {
                # http://lax-v290.lax.youtube.com/get_video?video_id=jqx1ZmzX0k0
                print "http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=" . $4 . "\n";
        } else {
                print $_ . "\n";

A simple very fast rewriter called SQUIRM is also good to check out, it uses the regex lib to allow pattern matching.

An even faster and slightly more featured rewriter is jesred.

🔗 How do I make my own?

The helper program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Squid writes additional information after the URL which a redirector can use to make a decision.

Input line received from Squid:

[channel-ID] URL [key-extras]

Result line sent back to Squid:

[channel-ID] [result] [kv-pair] [URL]

:information_source: the result field is only accepted by Squid-3.4 and newer.

:information_source: the kv-pair returned by this helper can be logged by the %note logformat code.

Categories: Feature

Navigation: Site Search, Site Pages, Categories, 🔼 go up