Squid Web Cache wiki

Squid Web Cache documentation

🔗 Configuring Squid

🔗 Before you start configuring

by Gregori Parker

The best all around advice I can give on Squid is to start simple! Once everything works the way you expect, then start tweaking your way into complexity with a means to track the (in)effectiveness of each change you make (and a known good configuration that you can always go back to when you inevitably fubar the thing!).

🔗 How do I configure Squid without re-compiling it?

The squid.conf file. By default, this file is located at /etc/squid/squid.conf or maybe /usr/local/squid/etc/squid.conf.

Also, a QUICKSTART guide has been included with the source distribution. Please see the directory where you unpacked the source archive.

🔗 What does the squid.conf file do?

The squid.conf file defines the configuration for squid. The configuration includes (but not limited to) HTTP port number, the ICP request port number, incoming and outgoing requests, information about firewall access, and various timeout information.

🔗 Where can I find examples and configuration for a Feature?

There is still a fair bit of config knowledge buried in the old SquidFaq and Guide pages of this wiki. We are endeavoring to pull them into a layout easier to use.

What we have so far is:

🔗 Do you have a squid.conf example?

Yes.

After you make install, a sample squid.conf.default* file will exist in the etc directory under the Squid installation directory

🔗 How do I configure Squid to work behind a firewall?

If you are behind a firewall which can’t make direct connections to the outside world, you must use a parent cache. Normally Squid tries to be smart and only uses cache peers when it makes sense from a perspective of global hit ratio, and thus you need to tell Squid when it can not go direct and must use a parent proxy even if it knows the request will be a cache miss.

You can use the never_direct access list in squid.conf to specify which requests must be forwarded to your parent cache outside the firewall, and the always_direct access list to specify which requests must not be forwarded. For example, if Squid must connect directly to all servers that end with mydomain.com, but must use the parent for all others, you would write:

acl INSIDE dstdomain .mydomain.com
always_direct allow INSIDE
never_direct allow all

You could also specify internal servers by IP address

acl INSIDE_IP dst 1.2.3.0/24
always_direct allow INSIDE_IP
never_direct allow all

Note, however that when you use IP addresses, Squid must perform a DNS lookup to convert URL hostnames to an address. Your internal DNS servers may not be able to lookup external domains.

If you use never_direct and you have multiple parent caches, then you probably will want to mark one of them as a default choice in case Squid can’t decide which one to use. That is done with the default keyword on a cache_peer line. For example:

cache_peer xyz.mydomain.com parent 3128 0 no-query default

🔗 How do I configure Squid forward all requests to another proxy?

see Features/CacheHierarchy

🔗 What ‘‘cache_dir’’ size should I use?

This chapter assumes that you are dedicating an entire disk partition to a squid cache_dir, as is often the case.

Generally speaking, setting the cache_dir to be the same size as the disk partition is not a wise choice, for two reasons. The first is that squid is not very tolerant to running out of disk space. On top of the cache_dir size, squid will use some extra space for swap.state and then some more temporary storage as work-areas, for instance when rebuilding swap.state. So in any case make sure to leave some extra room for this, or your cache will enter an endless crash-restart cycle.

The second reason is fragmentation (note, this won’t apply to the COSS object storage engine - when it will be ready): filesystems can only do so much to avoid fragmentation, and in order to be effective they need to have the space to try and optimize file placement. If the disk is full, optimization is very hard, and when the disk is 100% full optimizing is plain impossible. Get your disk fragmented, and it will most likely be your worst bottleneck, by far offsetting the modest gain you got by having more storage.

Let’s see an example: you have a 9Gb disk (these times they’re even hard to find..). First thing, manifacturers often lie about disk capacity (the whole Megabyte vs Mebibyte issue), and then the OS needs some space for its accounting structures, so you’ll reasonably end up with 8Gib of useable space. You then have to account for another 10% in overhead for Squid, and then the space needed for keeping fragmentation at bay. So in the end the recommended cache_dir setting is 6000 to 7000 Mebibyte.

cache_dir ... 7000 16 256

Its better to start out with a conservative setting and then, after the cache has been filled, look at the disk usage. If you think there is plenty of unused space, then increase the cache_dir setting a little.

If you’re getting “disk full” write errors, then you definitely need to decrease your cache size.

🔗 I’m adding a new cache_dir. Will I lose my cache?

No. You can add and delete cache_dir lines without affecting any of the others.

🔗 Squid and http-gw from the TIS toolkit

Several people on both the fwtk-users and the squid-users mailing asked about using Squid in combination with http-gw from the TIS toolkit. The most elegant way in my opinion is to run an internal Squid caching proxyserver which handles client requests and let this server forward it’s requests to the http-gw running on the firewall. Cache hits won’t need to be handled by the firewall.

In this example Squid runs on the same server as the http-gw, Squid uses 8000 and http-gw uses 8080 (web). The local domain is home.nl.

🔗 Firewall configuration

Either run http-gw as a daemon from the /etc/rc.d/rc.local(Linux Slackware):

exec /usr/local/fwtk/http-gw -daemon 8080

or run it from inetd like this:

web stream      tcp      nowait.100  root /usr/local/fwtk/http-gw http-gw

I increased the watermark to 100 because a lot of people run into problems with the default value.

Make sure you have at least the following line in /usr/local/etc/netperm-table:

http-gw: hosts 127.0.0.1

You could add the IP-address of your own workstation to this rule and make sure the http-gw by itself works, like:

http-gw:                hosts 127.0.0.1 10.0.0.1

🔗 Squid configuration

The following settings are important:

http_port       8000
icp_port        0
cache_peer      localhost.home.nl parent 8080 0 default
acl HOME        dstdomain .home.nl
alwayws_direct  allow HOME
never_direct    allow all

This tells Squid to use the parent for all domains other than home.nl. Below,access.logentries show what happens if you do a reload on the Squid-homepage:

872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl -
872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl

http-gw entries in syslog:

Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/
Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif
Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif
Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif
Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2
Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3

To summarize:

Advantages:

Disadvantages:

contributed by Rodney van den Oever

🔗 What is “HTTP_X_FORWARDED_FOR”? Why does squid provide it to WWW servers, and how can I stop it?

see Security - X-Forwarded-For

When a proxy-cache is used, a server does not see the connection coming from the originating client. Many people like to implement access controls based on the client address. To accommodate these people, Squid adds the request header called “X-Forwarded-For” which looks like this:

X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30

Entries are always IP addresses, or the word unknown if the address could not be determined or if it has been disabled with the forwarded_for configuration option.

We must note that access controls based on this header are extremely weak and simple to fake. Anyone may hand-enter a request with any IP address whatsoever. This is perhaps the reason why client IP addresses have been omitted from the HTTP/1.1 specification.

Because of the weakness of this header, access controls based on X-Forwarded-For are not used by default. It’s needs to be specifically enabled with follow_x_forwarded_for.

🔗 Can Squid anonymize HTTP requests?

Yes it can, however the way of doing it has changed from earlier versions of squid. Please follow the instructions for the version of squid that you are using. As a default, no anonymizing is done.

If you choose to use the anonymizer you might wish to investigate the forwarded_for option to prevent the client address being disclosed. Failure to turn off the forwarded_for option will reduce the effectiveness of the anonymizer. Finally if you filter the User-Agent header using the fake_user_agent option can prevent some user problems as some sites require the User-Agent header.

NP: Squid must be built with the –enable-http-violations configure option before building.

Current squid releases provide a mix of header control directives and capability;

For details see the documentation in squid.conf.default or squid.conf.documented for your specific version of squid.

References: Anonymous WWW

🔗 Can I make Squid go direct for some sites?

Sure, just use the always_direct access list.

For example, if you want Squid to connect directly to hotmail.com servers, you can use these lines in your config file:

acl hotmail dstdomain .hotmail.com
always_direct allow hotmail

🔗 Can I make Squid proxy only, without caching anything?

Sure, there are few things you can do.

You can use the cache access list to make Squid never cache any response:

cache deny all

With Squid-2.7, Squid-3.1 and later you can also remove all ‘cache_dir’ options from your squid.conf to avoid having a cache directory.

With Squid-2.4, 2.5, 2.6, and 3.0 you need to use the “null” storage module:

cache_dir null /tmp

Note: a null cache_dir does not disable caching, but it does save you from creating a cache structure if you have disabled caching with cache. The directory (e.g., /tmp) must exist so that squid can chdir to it, unless you also use the coredump_dir option.

To configure Squid for the “null” storage module, specify it on the configurecommand line:

--enable-storeio=null,...

🔗 Can I prevent users from downloading large files?

You can set the global reply_body_max_size parameter. This option controls the largest HTTP message body that will be sent to a cache client for one request.

If the HTTP response coming from the server has a Content-length header, then Squid compares the content-length value to the reply_body_max_size value. If the content-length is larger,the server connection is closed and the user receives an error message from Squid.

Some responses don’t have Content-length headers. In this case, Squid counts how many bytes are written to the client. Once the limit is reached, the client’s connection is simply closed.

:bulb: Note that “creative” user-agents will still be able to download really large files through the cache using HTTP/1.1 range requests.

To the FAQ Index

Navigation: Site Search, Site Pages, Categories, 🔼 go up