Contents

    1. What is Squid?
    2. What is Internet object caching?
    3. Why is it called Squid?
    4. What is the latest version of Squid?
    5. Who is responsible for Squid?
    6. Where can I get Squid?
    7. What Operating Systems does Squid support?
    8. What Squid mailing lists are available?
    9. What other Squid-related documentation is available?
    10. What's the legal status of Squid?
    11. How to add a new Squid feature, enhance, of fix something?
    12. Can I pay someone for Squid support?
    13. Squid FAQ contributors
    14. About This Document
    15. Want to contribute?
  1. Compiling Squid
    1. Which file do I download to get Squid?
    2. Do you have pre-compiled binaries available?
    3. How do I compile Squid?
    4. Building Squid on ...
    5. I see a lot warnings while compiling Squid.
    6. undefined reference to __inet_ntoa
    7. How big of a system do I need to run Squid?
    8. How do I install Squid?
    9. How do I start Squid?
    10. How do I start Squid automatically when the system boots?
    11. How do I tell if Squid is running?
    12. squid command line options
    13. How do I see how Squid works?
    14. Can Squid benefit from SMP systems?
    15. Is it okay to use separate drives for Squid?
    16. Is it okay to use RAID on Squid?
    17. Is it okay to use ZFS on Squid?
  2. Configuring Squid
  3. Before you start configuring
    1. How do I configure Squid without re-compiling it?
    2. What does the squid.conf file do?
    3. Where can I find examples and configuration for a Feature?
    4. Do you have a squid.conf example?
    5. How do I configure Squid to work behind a firewall?
    6. How do I configure Squid forward all requests to another proxy?
    7. What ''cache_dir'' size should I use?
    8. I'm adding a new cache_dir. Will I lose my cache?
    9. Squid and http-gw from the TIS toolkit.
    10. What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?
    11. Can Squid anonymize HTTP requests?
    12. Can I make Squid go direct for some sites?
    13. Can I make Squid proxy only, without caching anything?
    14. Can I prevent users from downloading large files?
    15. Communication between browsers and Squid
    16. Recommended network configuration
    17. Manual Browser Configuration
    18. Partially Automatic Configuration
    19. Fully Automatic Configuration
    20. Redundant Proxy Auto-Configuration
    21. Proxy Auto-Configuration with URL Hashing
    22. Where can I find more information about PAC?
    23. How do I tell Squid to use a specific username for FTP urls?
    24. IE 5.0x crops trailing slashes from FTP URL's
    25. IE 6.0 SP1 fails when using authentication
    26. Squid Log Files
    27. Which log files can I delete safely?
    28. How can I disable Squid's log files?
    29. What is the maximum size of access.log?
    30. My log files get very big!
    31. I want to use another tool to maintain the log files.
    32. Managing log files
    33. Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
    34. What does ERR_LIFETIME_EXP mean?
    35. Retrieving "lost" files from the cache
    36. Can I use store.log to figure out if a response was cachable?
    37. Can I pump the squid access.log directly into a pipe?
    38. How do I see system level Squid statistics?
    39. Managing the Cache Storage
    40. Using ICMP to Measure the Network
    41. Why are so few requests logged as TCP_IMS_MISS?
    42. Why do I need to run Squid as root? why can't I just use cache_effective_user root?
    43. Can you tell me a good way to upgrade Squid with minimal downtime?
    44. Can Squid listen on more than one HTTP port?
    45. Can I make origin servers see the client's IP address when going through Squid?
    46. Why does Squid use so much memory!?
    47. How can I tell how much memory my Squid process is using?
    48. Why does Squid use so much cache memory?
    49. My Squid process grows without bounds.
    50. I set cache_mem to XX, but the process grows beyond that!
    51. How do I analyze memory usage from the cache manager output?
    52. The "Total memory accounted" value is less than the size of my Squid process.
    53. xmalloc: Unable to allocate 4096 bytes!
    54. fork: (12) Cannot allocate memory
    55. What can I do to reduce Squid's memory usage?
    56. Using an alternate malloc library
    57. How much memory do I need in my Squid server?
    58. Why can't my Squid process grow beyond a certain size?
  4. Access Controls in Squid
    1. The Basics: How the parts fit together
    2. ACL elements
    3. Access Lists
    4. How do I allow my clients to use the cache?
    5. how do I configure Squid not to cache a specific server?
    6. How do I implement an ACL ban list?
    7. How do I block specific users or groups from accessing my cache?
    8. Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
    9. Do you have a CGI program which lets users change their own proxy passwords?
    10. Common Mistakes
    11. I set up my access controls, but they don't work! why?
    12. Proxy-authentication and neighbor caches
    13. Is there an easy way of banning all Destination addresses except one?
    14. How can I block access to porn sites?
    15. Does anyone have a ban list of porn sites and such?
    16. Squid doesn't match my subdomains
    17. Why does Squid deny some port numbers?
    18. Does Squid support the use of a database such as mySQL for storing the ACL list?
    19. How can I allow a single address to access a specific URL?
    20. How can I allow some clients to use the cache at specific times?
    21. How can I allow some users to use the cache at specific times?
    22. Problems with IP ACL's that have complicated netmasks
    23. Can I set up ACL's based on MAC address rather than IP?
    24. Can I limit the number of connections from a client?
    25. I'm trying to deny ''foo.com'', but it's not working.
    26. I want to customize, or make my own error messages.
    27. I want to use local time zone in error messages.
    28. I want to put ACL parameters in an external file.
    29. I want to authorize users depending on their MS Windows group memberships
    30. Maximum length of an acl name
    31. Fast and Slow ACLs
    32. Starting Point
    33. Why am I getting "Proxy Access Denied?"
    34. Connection Refused when reaching a sibling
    35. Running out of filedescriptors
    36. What are these strange lines about removing objects?
    37. Can I change a Windows NT FTP server to list directories in Unix format?
    38. Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
    39. DNS lookups for domain names with underscores (_) always fail.
    40. Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
    41. Why am I getting access denied from a sibling cache?
    42. Cannot bind socket FD NN to *:8080 (125) Address already in use
    43. icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
    44. icpDetectClientClose: FD 135, 255 unexpected bytes
    45. Does Squid work with NTLM Authentication?
    46. My Squid becomes very slow after it has been running for some time.
    47. WARNING: Failed to start 'dnsserver'
    48. Sending bug reports to the Squid team
    49. FATAL: ipcache_init: DNS name lookup tests failed
    50. FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
    51. FATAL: Cannot open HTTP Port
    52. FATAL: All redirectors have exited!
    53. FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
    54. pingerOpen: icmp_sock: (13) Permission denied
    55. What is a forwarding loop?
    56. accept failure: (71) Protocol error
    57. storeSwapInFileOpened: ... Size mismatch
    58. Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''
    59. Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
    60. I get a lot of "URI has whitespace" error messages in my cache log, what should I do?
    61. commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
    62. What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?
    63. What does ''Connection refused'' mean?
    64. squid: ERROR: no running copy
    65. FATAL: getgrnam failed to find groupid for effective group 'nogroup'
    66. Squid uses 100% CPU
    67. Webmin's ''cachemgr.cgi'' crashes the operating system
    68. Segment Violation at startup or upon first request
    69. urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
    70. Requests for international domain names do not work
    71. Why do I sometimes get "Zero Sized Reply"?
    72. Why do I get "The request or reply is too large" errors?
    73. Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit
    74. Problems with Windows update
    75. What are cachable objects?
    76. What is the ICP protocol?
    77. What is a cache hierarchy? What are parents and siblings?
    78. What is the Squid cache resolution algorithm?
    79. What features are Squid developers currently working on?
    80. Tell me more about Internet traffic workloads
    81. What are the tradeoffs of caching with the NLANR cache system?
    82. Where can I find out more about firewalls?
    83. What is the "Storage LRU Expiration Age?"
    84. What is "Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes"?
    85. Does squid periodically re-read its configuration file?
    86. How does ''unlinkd'' work?
    87. What is an icon URL?
    88. Can I make my regular FTP clients use a Squid cache?
    89. Why is the select loop average time so high?
    90. How does Squid deal with Cookies?
    91. How does Squid decide when to refresh a cached object?
    92. What exactly is a ''deferred read''?
    93. Why is my cache's inbound traffic equal to the outbound traffic?
    94. How come some objects do not get cached?
    95. What does ''keep-alive ratio'' mean?
    96. How does Squid's cache replacement algorithm work?
    97. What are private and public keys?
    98. What is FORW_VIA_DB for?
    99. Does Squid send packets to port 7 (echo)? If so, why?
    100. What does "WARNING: Reply from unknown nameserver [a.b.c.d]" mean?
    101. How does Squid distribute cache files among the available directories?
    102. Why do I see negative byte hit ratio?
    103. What does "Disabling use of private keys" mean?
    104. What is a half-closed filedescriptor?
    105. What does --enable-heap-replacement do?
    106. Why is actual filesystem space used greater than what Squid thinks?
    107. How do ''positive_dns_ttl'' and ''negative_dns_ttl'' work?
    108. What does ''swapin MD5 mismatch'' mean?
    109. What does ''failed to unpack swapfile meta data'' mean?
    110. Why doesn't Squid make ''ident'' lookups in interception mode?
    111. What are FTP passive connections?
    112. When does Squid re-forward a client request?
    113. General advice
    114. FreeBSD
    115. Solaris
    116. FreeBSD
    117. OSF1/3.2
    118. BSD/OS
    119. Linux
    120. IRIX
    121. SCO-UNIX
    122. AIX
    123. What is a Cache Digest?
    124. How and why are they used?
    125. What is the theory behind Cache Digests?
    126. How is the size of the Cache Digest in Squid determined?
    127. What hash functions (and how many of them) does Squid use?
    128. How are objects added to the Cache Digest in Squid?
    129. Does Squid support deletions in Cache Digests? What are diffs/deltas?
    130. When and how often is the local digest built?
    131. How are Cache Digests transferred between peers?
    132. How and where are Cache Digests stored?
    133. How are the Cache Digest statistics in the Cache Manager to be interpreted?
    134. What are False Hits and how should they be handled?
    135. How can Cache Digest related activity be traced/debugged?
    136. What about ICP?
    137. Is there a Cache Digest Specification?
    138. Would it be possible to stagger the timings when cache_digests are retrieved from peers?
    139. Concepts of Interception Caching
    140. Requirements and methods for Interception Caching
    141. Steps involved in configuring Interception Caching
    142. Issues with HotMail
    143. What are the new features in squid 2.X?
    144. How do I configure 'ssl_proxy' now?
    145. Adding a new cache disk
    146. How do I configure proxy authentication?
    147. Why does proxy-auth reject all users after upgrading from Squid-2.1 or earlier?
    148. My squid.conf from version 1.1 doesn't work!
  5. Reverse Proxy Mode
    1. What is the Reverse Proxy (httpd-accelerator) mode?
    2. How do I set it up?
    3. Running the web server on the same server
    4. Load balancing of backend servers
    5. Common Problems
    6. Clients
    7. Load Balancers
    8. HA Clusters
    9. Monitoring
    10. Logfile Analysis
    11. Configuration Tools
    12. Squid add-ons
    13. Ident Servers
    14. Cacheability Validators
    15. Neighbor
    16. Regular Expression
    17. Open-access proxies
    18. Mail relaying
    19. Hijackable proxies
    20. X-Forwarded-For fiddling
    21. The Safe_Ports and SSL_Ports ACL
    22. The manager ACLs
    23. Way Too Many Cache Misses
    24. Pruning the Cache Down
    25. Changing the Cache Levels

What is Squid?

Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Squid handles all requests in a single, non-blocking, I/O-driven process over IPv4 or IPv6.

Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests.

Squid supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, Squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings.

Squid consists of a main server program squid, some optional programs for rewriting requests and performing authentication, and some management and client tools.

Squid is originally derived from the ARPA-funded Harvest project. Since then it has gone through many changes and has many new features.

What is Internet object caching?

Internet object caching is a way to store requested Internet objects (i.e., data available via the HTTP, FTP, and gopher protocols) on a system closer to the requesting site than to the source. Web browsers can then use the local Squid cache as a proxy HTTP server, reducing access time as well as bandwidth consumption.

Why is it called Squid?

Harris' Lament says, "All the good ones are taken."

We needed to distinguish this new version from the Harvest cache software. Squid was the code name for initial development, and it stuck.

What is the latest version of Squid?

This is best answered by the the Squid Versions page where you can also download the sources.

Who is responsible for Squid?

Squid is the result of efforts by numerous individuals from the Internet community.

  • The Squid Software Foundation provides representation and oversight of the Squid Project.
  • The core team and main contributors list is at WhoWeAre.

  • A list of our many excellent code contributors can be seen in the CONTRIBUTORS file within each copy of published sources.

Where can I get Squid?

You can download Squid via FTP or HTTP from one of the many worldwide mirror sites or the primary FTP site.

Many sushi bars also have Squid.

What Operating Systems does Squid support?

The software is designed to operate on any modern system, and is known to work on at least the following platforms:

BSD:

Linux:

Unix:

  • AIX
  • HP-UX
  • IRIX
  • SCO Unix
  • Solaris

  • OmniOS
  • OpenIndiana

  • OSF/Digital Unix/Tru64

Windows: (Cygwin and MinGW)

  • Windows 2000 Server
  • Windows NT
  • Windows XP Server
  • Windows 2003 Server
  • Windows Vista Server

Other:

  • OS/2

If you encounter any platform-specific problems, please let us know by registering an entry in our bug database. If you're curious about what is the best OS to run Squid, see BestOsForSquid.

If you would like your favorite OS to join the list above, please try to build the latest Squid on it and send any feedback to the squid-dev mailing list.

What Squid mailing lists are available?

That question is best answered by the official mailing lists page at http://www.squid-cache.org/Support/mailing-lists.html

I can't figure out how to unsubscribe from your mailing list.

All of our mailing lists have "-subscribe" and "-unsubscribe" addresses that you must use for subscribe and unsubscribe requests. To unsubscribe from the squid-users list, you send a message to <squid-users-unsubscribe AT squid-cache DOT org>.

Squid is copyrighted by The Squid Software Foundation and contributors. Squid copyright holders are listed in the CONTRIBUTORS file.

Squid is Free Software, distributed under the terms of the GNU General Public License, version 2 (GPLv2). Squid includes various software components distributed under several GPLv2-compatible open source licenses listed in the CREDITS file.

Squid contributors and components change with Squid software. The appropriate CONTRIBUTORS and CREDITS files can be found in the corresponding Squid sources, available for download.

Official Squid artwork distribution terms are detailed elsewhere.

How to add a new Squid feature, enhance, of fix something?

Adding new features, enhancing, or fixing Squid behavior usually requires source code modifications. Several options are generally available to those who need Squid development:

  • Wait for somebody to do it: Waiting is free but may take forever. If you want to use this option, make sure you file a bugzilla report describing the bug or enhancement so that others know what you need. Posting feature requests to a mailing list is often useful because it can generate interest and discussion, but without a bugzilla record, your request may be overlooked or forgotten.

  • Do it yourself: Enhancing Squid and working with other developers can be a very rewarding experience. However, this option requires understanding and modifying the source code, which is getting better, but it is still very complex, often ugly, and lacking documentation. These obstacles affect the required development effort. In most cases, you would want your changes to be incorporated into the official Squid sources for long-term support. To get the code committed, one needs to cooperate with other developers. It is a good idea to describe the changes you are going to work on before diving into development. Development-related discussions happen on squid-dev mailing list. Documenting upcoming changes as a bugzilla entry or a wiki feature page helps attract contributors or sponsors.

  • Pay somebody to do it: Many organizations and individuals offer commercial Squid development services. When selecting the developer, discuss how they plan to integrate the changes with the official Squid sources and consider the company past contributions to the Squid project. Please see the "Can I pay?" entry for more details.

The best development option depends on many factors. Here is some project dynamics information that may help you pick the right one: Most Squid features and maintenance is done by individual contributors, working alone or in small development/consulting shops. In the early years (1990-2000), these developers were able to work on Squid using their free time, research grants, or similarly broad-scope financial support. Requested features were often added on-demand because many folks could work on them. Most recent (2006-2008) contributions, especially large features, are the result of paid development contracts, reflecting both the maturity of software and the lack of "free" time among active Squid developers.

Can I pay someone for Squid support?

Yes. Please see Squid Support Services. Unfortunately, that page is poorly maintained and has many stale/bogus entries, but we do plan to improve it in the foreseeable future. Please do not email the Squid Project asking for official recommendations -- the Project itself cannot recommend specific Squid administrators or developers due to various conflicts of interests. However, if the Project could make official referrals, they would probably form a (tiny) subset of the listed entries.

Besides the Services page, you can post a Request For Proposals to squid-users (Squid administration and integration) or squid-dev (Squid development) mailing list. A good RFP contains enough details (including your deadlines and Squid versions) for the respondents to provide a ballpark cost estimate. Expect private responses to your RFPs and avoid discussing private arrangements on the public mailing lists. Please do not email RFPs to the Project info@ alias for the reasons discussed in the previous paragraph.

You can also donate money or equipment to the Squid project.

Squid FAQ contributors

The following people have made contributions to this document:

Dodjie Nava, Jonathan Larmour, Cord Beermann, Tony Sterrett, Gerard Hynes, Katayama, Takeo, Duane Wessels, K Claffy, Paul Southworth, Oskar Pearson, Ong Beng Hui, Torsten Sturm, James R Grinter, Rodney van den Oever, Kolics Bertold, Carson Gaspar, Michael O'Reilly, Hume Smith, Richard Ayres, John Saunders, Miquel van Smoorenburg, David J N Begley, Kevin Sartorelli, Andreas Doering, Mark Visser, tom minchin, Jens-S. Vöckler, Andre Albsmeier, Doug Nazar, HenrikNordstrom, Mark Reynolds, Arjan de Vet, Peter Wemm, John Line, Jason Armistead, Chris Tilbury, Jeff Madison, Mike Batchelor, Bill Bogstad, Radu Greab, F.J. Bosscha, Brian Feeny, Martin Lyons, David Luyer, Chris Foote, Jens Elkner, Simon White, Jerry Murdock, Gerard Eviston, Rob Poe, FrancescoChemolli, ReubenFarrelly AlexRousskov AmosJeffries

About This Document

This FAQ was maintained for a long time as an XML Docbook file. It was converted to a Wiki in March 2006. The wiki is now the authoritative version.

Want to contribute?

We always welcome help keeping the Squid FAQ up-to-date. If you would like to help out, please register with this Wiki and type away.

Compiling Squid

Which file do I download to get Squid?

That depends on the version of Squid you have chosen to try. The list of current versions released can be found at http://www.squid-cache.org/Versions/. Each version has a page of release bundles. Usually you want the release bundle that is listed as the most current.

You must download a source archive file of the form squid-x.y.tar.gz or squid-x.y.tar.bz2 (eg, squid-2.6.STABLE14.tar.bz2).

We recommend you first try one of our mirror sites for the actually download. They are usually faster.

Alternatively, the main Squid WWW site www.squid-cache.org, and FTP site ftp.squid-cache.org have these files.

Context diffs are usually available for upgrading to new versions. These can be applied with the patch program (available from the GNU FTP site or your distribution).

Do you have pre-compiled binaries available?

see SquidFaq/BinaryPackages

How do I compile Squid?

You must run the configure script yourself before running make. We suggest that you first invoke ./configure --help and make a note of the configure options you need in order to support the features you intend to use. Do not compile in features you do not think you will need.

% tar xzf squid-2.6.RELEASExy.tar.gz
% cd squid-2.6.RELEASExy
% ./configure --with-MYOPTION --with-MYOPTION2 etc
% make
  • .. and finally install...

% make install

Squid will by default, install into /usr/local/squid. If you wish to install somewhere else, see the --prefix option for configure.

What kind of compiler do I need?

You will need a C++ compiler:

  • To compile Squid v3, any decent C++ compiler would do. Almost all modern Unix systems come with pre-installed C++ compilers which work just fine.
  • To compile Squid v4 and later, you will need a C++11-compliant compiler. Most recent Unix distributions come with pre-installed compilers that support C++11.

/!\ Squid v3.4 and v3.5 automatically enable C++11 support in the compiler if ./configure detects such support. Later Squid versions require C++11 support while earlier ones may fail to build if C++11 compliance is enforced by the compiler.

If you are uncertain about your system's C compiler, The GNU C compiler is widely available and supplied in almost all operating systems. It is also well tested with Squid. If your OS does not come with GCC you may download it from the GNU FTP site. In addition to gcc and g++, you may also want or need to install the binutils package and a number of libraries, depending on the feature-set you want to enable.

Clang is a popular alternative to gcc, especially on BSD systems. It also generally works quite fine for building Squid. Other alternatives which are or were tested in the past were Intel's C++ compiler and Sun's SunStudio. Microsoft Visual C++ is another target the Squid developers aim for, but at the time of this writing (April 2014) still quite a way off.

/!\ Please note that due to a bug in clang's support for atomic operations, squid doesn't build on clang older than 3.2.

What else do I need to compile Squid?

You will need the automake toolset for compiling from Makefiles.

You will need Perl installed on your system.

Each feature you choose to enable may also require additional libraries or tools to build.

How do I cross-compile Squid ?

Use the ./configure option --host to specify the cross-compilation tuplet for the machine which Squid will be installed on. The autotools manual has some simple documentation for this and other cross-configuration options - in particular what they mean is a very useful detail to know.

Additionally, Squid is created using several custom tools which are themselves created during the build process. This requires a C++ compiler to generate binaries which can run on the build platform. The HOSTCXX= parameter needs to be provided with the name or path to this compiler.

How do I apply a patch or a diff?

You need the patch program. You should probably duplicate the entire directory structure before applying the patch. For example, if you are upgrading from squid-2.6.STABLE13 to 2.6.STABLE14, you would run these commands:

cp -rl squid-2.6.STABLE13 squid-2.6.STABLE14
cd squid-2.6.STABLE14
zcat /tmp/squid-2.6.STABLE13-STABLE14.diff.gz | patch -p1
  • {i} Squid-2 patches require the -p1 option.

    {i} Squid-3 patches require the -p0 option.

After the patch has been applied, you must rebuild Squid from the very beginning, i.e.:

make distclean
./configure [--option --option...]
make
make install

If your patch program seems to complain or refuses to work, you should get a more recent version, from the GNU FTP site, for example.

Ideally you should use the patch command which comes with your OS.

configure options

The configure script can take numerous options. The most useful is --prefix to install it in a different directory. The default installation directory is /usr/local/squid/. To change the default, you could do:

% cd squid-x.y.z
% ./configure --prefix=/some/other/directory/squid

Some OS require files to be installed in certain locations. See the OS specific instructions below for ./configure options required to make those installations happen correctly.

Type

% ./configure --help

to see all available options. You will need to specify some of these options to enable or disable certain features. Some options which are used often include:

--prefix=PREFIX         install architecture-independent files in PREFIX
                        [/usr/local/squid]
--enable-dlmalloc[=LIB] Compile & use the malloc package by Doug Lea
--enable-gnuregex       Compile GNUregex
--enable-xmalloc-debug  Do some simple malloc debugging
--enable-xmalloc-debug-trace
                        Detailed trace of memory allocations
--enable-xmalloc-statistics
                        Show malloc statistics in status page
--enable-async-io       Do ASYNC disk I/O using threads
--enable-icmp           Enable ICMP pinging and network measurement
--enable-delay-pools    Enable delay pools to limit bandwidth usage
--enable-useragent-log  Enable logging of User-Agent header
--enable-kill-parent-hack
                        Kill parent on shutdown
--enable-cachemgr-hostname[=hostname]
                        Make cachemgr.cgi default to this host
--enable-htpc           Enable HTCP protocol
--enable-forw-via-db    Enable Forw/Via database
--enable-cache-digests  Use Cache Digests
                        see http://www.squid-cache.org/Doc/FAQ/FAQ-16.html

These are also commonly needed by Squid-2, but are now defaults in Squid-3.

--enable-carp           Enable CARP support
--enable-snmp           Enable SNMP monitoring
--enable-err-language=lang
                        Select language for Error pages (see errors dir)

Building Squid on ...

BSD/OS or BSDI

{X} Known Problem:

cache_cf.c: In function `parseConfigFile':
cache_cf.c:1353: yacc stack overflow before `token'
...

You may need to upgrade your gcc installation to a more recent version. Check your gcc version with

  gcc -v

If it is earlier than 2.7.2, you might consider upgrading. Gcc 2.7.2 is very old and not widely supported.

CentOS

# You will need the usual build chain
yum install -y perl gcc autoconf automake make sudo wget

# and some extra packages
yum install libxml2-devel libcap-devel

# to bootstrap and build from bzr needs also the packages
yum install libtool-ltdl-devel

The following ./configure options install Squid into the CentOS structure properly:

  --prefix=/usr
  --includedir=/usr/include
  --datadir=/usr/share
  --bindir=/usr/sbin
  --libexecdir=/usr/lib/squid
  --localstatedir=/var
  --sysconfdir=/etc/squid

Debian, Ubuntu

Many versions of Ubuntu and Debian are routinely build-tested and unit-tested as part of our BuildFarm and are known to compile OK.

  • /!\ The Linux system layout differs markedly from the Squid defaults. The following ./configure options are needed to install Squid into the Debian / Ubuntu standard filesystem locations:

--prefix=/usr \
--localstatedir=/var \
--libexecdir=${prefix}/lib/squid \
--datadir=${prefix}/share/squid \
--sysconfdir=/etc/squid \
--with-default-user=proxy \
--with-logdir=/var/log/squid \
--with-pidfile=/var/run/squid.pid

Plus, of course, any custom configuration options you may need.

  • {X} For Debian Jesse (8), Ubuntu Oneiric (11.10), or older squid3 packages; the above squid labels should have a 3 appended.

  • {X} Remember these are only defaults. Altering squid.conf you can point the logs at the right path anyway without either the workaround or the patching.

As always, additional libraries may be required to support the features you want to build. The default package dependencies can be installed using:

aptitude build-dep squid

This requires only that your sources.list contain the deb-src repository to pull the source package information. Features which are not supported by the distribution package will need investigation to discover the dependency package and install it.

  • {i} The usual one requested is libssl-dev for SSL support.

    • /!\ However, please note that Squid-3.5 is not compatible with OpenSSL v1.1+. As of Debian Squeeze, or Ubuntu Zesty the libssl1.0-dev package must be used instead. This is resolved in the Squid-4 packages.

Init Script

The init.d script is part of the official Debain/Ubuntu packaging. It does not come with Squid directly. So you will need to download a copy from https://alioth.debian.org/plugins/scmgit/cgi-bin/gitweb.cgi?p=pkg-squid/pkg-squid3.git;a=blob_plain;f=debian/squid.rc to /etc/init.d/squid

Fedora

Rebuilding the binary rpm is most easily done by checking out the package definition from cvs

cvs -d :pserver:anonymous@cvs.fedoraproject.org:/cvs/pkgs/ co squid

then do a "make local" in the version you want to recompile.

FreeBSD, NetBSD, OpenBSD

Squid is developed on FreeBSD. The general build instructions should be all you need.

However, if you wish to integrate patching of Squid with patching of your other FreeBSD packages, it might be easiest to install Squid from the Ports collection. There are three ports, matching the three packages for the current Squid releases:

  • squid33 - the Squid 3.3 tree.

 cd /usr/ports/www/squid33
 make install clean
  • squid32 - the Squid 3.2 tree;

 cd /usr/ports/www/squid32
 make install clean
  • squid - the Squid 2.7 tree;

 cd /usr/ports/www/squid
 make install clean

Each port will prompt for configuration information for your Squid installation. The following list of options is from the Squid 3.1 port on FreeBSD 8.0:

[X] SQUID_KERB_AUTH      Install Kerberos authentication helpers
[ ] SQUID_LDAP_AUTH      Install LDAP authentication helpers
[X] SQUID_NIS_AUTH       Install NIS/YP authentication helpers
[ ] SQUID_SASL_AUTH      Install SASL authentication helpers
[X] SQUID_IPV6           Enable IPv6 support
[ ] SQUID_DELAY_POOLS    Enable delay pools
[X] SQUID_SNMP           Enable SNMP support
[ ] SQUID_SSL            Enable SSL support for reverse proxies
[ ] SQUID_PINGER         Install the icmp helper
[ ] SQUID_DNS_HELPER     Use the old 'dnsserver' helper
[X] SQUID_HTCP           Enable HTCP support
[ ] SQUID_VIA_DB         Enable forward/via database
[ ] SQUID_CACHE_DIGESTS  Enable cache digests
[X] SQUID_WCCP           Enable Web Cache Coordination Prot. v1
[ ] SQUID_WCCPV2         Enable Web Cache Coordination Prot. v2

Windows

  • These instructions apply to building Squid-3.x. Squid-2 package are available for download. See the

New configure options:

  • --enable-win32-service

Updated configure options:

  • --enable-default-hostsfile

Unsupported configure options:

  • --with-large-files: No suitable build environment is available on both Cygwin and MinGW, but --enable-large-files works fine

Compiling with Cygwin

  • This section needs re-writing. Is has very little in compiling Squid and much about installation.

In order to compile Squid, you need to have Cygwin fully installed.

The usage of the Cygwin environment is very similar to other Unix/Linux environments, and -devel version of libraries must be installed.

{i}

Squid will by default, install into /usr/local/squid. If you wish to install somewhere else, see the --prefix option for configure.

Now, add a new Cygwin user - see the Cygwin user guide - and map it to SYSTEM, or create a new NT user, and a matching Cygwin user and they become the squid runas users.

Read the squid FAQ on permissions if you are using CYGWIN=ntsec.

When that has completed run:

squid -z

If that succeeds, try:

squid -N -D -d1

Squid should start. Check that there are no errors. If everything looks good, try browsing through squid.

Now, configure cygrunsrv to run Squid as a service as the chosen username. You may need to check permissions here.

Compiling with MinGW

In order to compile squid using the MinGW environment, the packages MSYS, MinGW and msysDTK must be installed. Some additional libraries and tools must be downloaded separately:

Before building Squid with SSL support, some operations are needed (in the following example OpenSSL is installed in C:\OpenSSL and MinGW in C:\MinGW):

  • Copy C:\OpenSSL\lib\MinGW content to C:\MinGW\lib
  • Copy C:\OpenSSL\include\openssl content to C:\MinGW\include\openssl
  • Rename C:\MinGW\lib\ssleay32.a to C:\MinGW\lib\libssleay32.a

Unpack the source archive as usual and run configure.

The following are the recommended minimal options for Windows:

Squid-3 : (requires Squid-3.5 or later, see porting efforts section below)

--prefix=c:/squid
--enable-default-hostsfile=none

Then run make and install as usual.

Squid will install into c:\squid. If you wish to install somewhere else, change the --prefix option for configure.

When that has completed run:

squid -z

If that succeeds, try:

squid -N -D -d1
  • squid should start. Check that there are no errors. If everything looks good, try browsing through squid.

Now, to run Squid as a Windows system service, run squid -n, this will create a service named "Squid" with automatic startup. To start it run net start squid from command line prompt or use the Services Administrative Applet.

Always check the provided release notes for any version specific detail.

OS/2

by Doug Nazar (<nazard AT man-assoc DOT on DOT ca>).

In order in compile squid, you need to have a reasonable facsimile of a Unix system installed. This includes bash, make, sed, emx, various file utilities and a few more. I've setup a TVFS drive that matches a Unix file system but this probably isn't strictly necessary.

I made a few modifications to the pristine EMX 0.9d install.

  • added defines for strcasecmp() & strncasecmp() to string.h

  • changed all occurrences of time_t to signed long instead of unsigned long
  • hacked ld.exe
    • to search for both xxxx.a and libxxxx.a
    • to produce the correct filename when using the -Zexe option

You will need to run scripts/convert.configure.to.os2 (in the Squid source distribution) to modify the configure script so that it can search for the various programs.

Next, you need to set a few environment variables (see EMX docs for meaning):

export EMXOPT="-h256 -c"
export LDFLAGS="-Zexe -Zbin -s"

Now you are ready to configure, make, and install Squid.

Now, don't forget to set EMXOPT before running squid each time. I recommend using the -Y and -N options.

RedHat, RHEL

The following ./configure options install Squid into the RedHat structure properly:

  --prefix=/usr
  --includedir=/usr/include
  --datadir=/usr/share
  --bindir=/usr/sbin
  --libexecdir=/usr/lib/squid
  --localstatedir=/var
  --sysconfdir=/etc/squid

Solaris

In order to successfully build squid on Solaris, a complete build-chain has to be available.

Squid-3.x

In order to successfully build squid, a few GNU-related packages need to be available. Unfortunately, not all of the software is available on a stock Solaris install.

What you need is:

 pkg install SUNWgnu-coreutils SUNWgtar SUNWgm4 SUNWgmake SUNWlxml  SUNWgsed

and of course a compiler. You can choose between

 pkg install SUNWgcc

and

 pkg install sunstudioexpress SUNWbtool

com_err.h: warning: ignoring #pragma ident

This problem occurs with certain kerberos library headers distributed with Solaris 10. It has been fixed in later release of the kerberos library.

{X} Unfortunately the /usr/include/kerberosv5/com_err.h system-include file sports a #pragma directive which is not compatible with gcc.

There are several options available:

  1. Upgrading your library to a working version is the recommended best option.
  2. Applying a patch distributed with Squid ( contrib/solaris/solaris-krb5-include.patch ) which updates the krb5.h header to match the one found in later working krb5 library releases.

  3. Editing com_err.h directly to change the line

#pragma ident   "%Z%%M% %I%     %E% SMI"

to

#if !defined(__GNUC__)
#pragma ident   "%Z%%M% %I%     %E% SMI"
#endif

3.1 -enable-ipf-transparent support

{X} Unfortunately the /usr/include/inet/mib2.h header required for IPF interception support clashes with Squid-3.1 class definitions. This has been fixed in the 3.2 series.

For 3.1 to build you may need to run this class rename command in the top Squid sources directory:

find . -type f -print | xargs perl -i -p -e 's/\b(IpAddress\b[^.])/Squid$1/g

Squid-2.x and older

The following error occurs on Solaris systems using gcc when the Solaris C compiler is not installed:

/usr/bin/rm -f libmiscutil.a
/usr/bin/false r libmiscutil.a rfc1123.o rfc1738.o util.o ...
make[1]: *** [libmiscutil.a] Error 255
make[1]: Leaving directory `/tmp/squid-1.1.11/lib'
make: *** [all] Error 1

Note on the second line the /usr/bin/false. This is supposed to be a path to the ar program. If configure cannot find ar on your system, then it substitutes false.

To fix this you either need to:

  • Add /usr/ccs/bin to your PATH. This is where the ar command should be. You need to install SUNWbtool if ar is not there. Otherwise,

  • Install the binutils package from the GNU FTP site. This package includes programs such as ar, as, and ld.

Other Platforms

Please let us know of other platforms you have built squid. Whether successful or not.

Please check the page of platforms on which Squid is known to compile.

If you have a problem not listed above with a solution, mail us at squid-dev what you are trying, your Squid version, and the problems you encounter.

I see a lot warnings while compiling Squid.

Warnings are usually not usually a big concern, and can be common with software designed to operate on multiple platforms. Squid 3.2 and later should build without generating any warnings; a big effort was spent into making the code truly portable.

undefined reference to __inet_ntoa

Probably you have bind 8.x installed.

UPDATE: That version of bind is now officially obsolete and known to be vulnerable to a critical infrastructure flaw. It should be upgraded to bind 9.x or replaced as soon as possible.

How big of a system do I need to run Squid?

There are no hard-and-fast rules. The most important resource for Squid is physical memory, so put as much in your Squid box as you can. Your processor does not need to be ultra-fast. We recommend buying whatever is economical at the time.

Your disk system will be the major bottleneck, so fast disks are important for high-volume caches. SCSI disks generally perform better than ATA, if you can afford them. Serial ATA (SATA) performs somewhere between the two. Your system disk, and logfile disk can probably be IDE without losing any cache performance.

The ratio of memory-to-disk can be important. We recommend that you have at least 32 MB of RAM for each GB of disk space that you plan to use for caching.

How do I install Squid?

From Binary Packages if available for your operating system.

Or from Source Code.

After SquidFaq/CompilingSquid, you can install it with this simple command:

% make install

If you have enabled ICMP or the pinger then you will also want to type

% su
# make install-pinger

After installing, you will want to read SquidFaq/ConfiguringSquid to edit and customize Squid to run the way you want it to.

How do I start Squid?

First you need to check your Squid configuration. The Squid configuration can be found in /usr/local/squid/etc/squid.conf and includes documentation on all directives.

In the Squid distribution there is a small QUICKSTART guide indicating which directives you need to look closer at and why. At a absolute minimum you need to change the http_access configuration to allow access from your clients.

To verify your configuration file you can use the -k parse option

% /usr/local/squid/sbin/squid -k parse

If this outputs any errors then these are syntax errors or other fatal misconfigurations and needs to be corrected before you continue. If it is silent and immediately gives back the command prompt then your squid.conf is syntactically correct and could be understood by Squid.

After you've finished editing the configuration file, you can start Squid for the first time. The procedure depends a little bit on which version you are using.

First, you must create the swap directories. Do this by running Squid with the -z option:

% /usr/local/squid/sbin/squid -z

<!>

If you run Squid as root then you may need to first create /usr/local/squid/var/logs and your cache_dir directories and assign ownership of these to the cache_effective_user configured in your squid.conf

Once the creation of the cache directories completes, you can start Squid and try it out. Probably the best thing to do is run it from your terminal and watch the debugging output. Use this command:

% /usr/local/squid/sbin/squid -NCd1

If everything is working okay, you will see the line:

Ready to serve requests.

If you want to run squid in the background, as a daemon process, just leave off all options:

% /usr/local/squid/sbin/squid

<!>

Depending on which http_port you select you may need to start squid as root (http_port <1024)

How do I start Squid automatically when the system boots?

by hand

Squid has a restart feature built in. This greatly simplifies starting Squid and means that you don't need to use RunCache or inittab. At the minimum, you only need to enter the pathname to the Squid executable. For example:

/usr/local/squid/sbin/squid

Squid will automatically background itself and then spawn a child process. In your syslog messages file, you should see something like this:

Sep 23 23:55:58 kitty squid[14616]: Squid Parent: child process 14617 started

That means that process ID 14563 is the parent process which monitors the child process (pid 14617). The child process is the one that does all of the work. The parent process just waits for the child process to exit. If the child process exits unexpectedly, the parent will automatically start another child process. In that case, syslog shows:

Sep 23 23:56:02 kitty squid[14616]: Squid Parent: child process 14617 exited with status 1
Sep 23 23:56:05 kitty squid[14616]: Squid Parent: child process 14619 started

If there is some problem, and Squid can not start, the parent process will give up after a while. Your syslog will show:

Sep 23 23:56:12 kitty squid[14616]: Exiting due to repeated, frequent failures

When this happens you should check your syslog messages and cache.log file for error messages.

When you look at a process (ps command) listing, you'll see two squid processes:

24353  ??  Ss     0:00.00 /usr/local/squid/bin/squid
24354  ??  R      0:03.39 (squid) (squid)

The first is the parent process, and the child process is the one called "(squid)". Note that if you accidentally kill the parent process, the child process will not notice.

If you want to run Squid from your termainal and prevent it from backgrounding and spawning a child process, use the -N command line option.

/usr/local/squid/bin/squid -N

from inittab

On systems which have an /etc/inittab file (Digital Unix, old Solaris, IRIX, HP-UX, Linux), you can add a line like this:

sq:3:respawn:/usr/local/squid/sbin/squid.sh < /dev/null >> /tmp/squid.log 2>&1

We recommend using a squid.sh shell script, but you could instead call Squid directly with the -N option and other options you may require. A sample squid.sh script is shown below:

C=/usr/local/squid
PATH=/usr/bin:$C/bin
TZ=PST8PDT
export PATH TZ

# User to notify on restarts
notify="root"

# Squid command line options
opts=""

cd $C
umask 022
sleep 10
while [ -f /var/run/nosquid ]; do
        sleep 1
done
/usr/bin/tail -20 $C/logs/cache.log \
        | Mail -s "Squid restart on `hostname` at `date`" $notify
exec bin/squid -N $opts

from rc.local

On BSD-ish systems, you will need to start Squid from the "rc" files, usually /etc/rc.local. For example:

if [ -f /usr/local/squid/sbin/squid ]; then
        echo -n ' Squid'
        /usr/local/squid/sbin/squid
fi

from init.d

Squid ships with a init.d type startup script in contrib/squid.rc which works on most init.d type systems. Or you can write your own using any normal init.d script found in your system as template and add the start/stop fragments shown below.

Start:

/usr/local/squid/sbin/squid

Stop:

/usr/local/squid/sbin/squid -k shutdown
n=120
while /usr/local/squid/sbin/squid -k check && [ $n -gt 0 ]; do
    sleep 1
    echo -n .
    n=`expr $n - 1`
done

with daemontools

Create squid service directory, and the log directory (if it does not exist yet).

mkdir -p /usr/local/squid/supervise/log /var/log/squid
chown squid /var/log/squid

Then, change to the service directory,

cd /usr/local/squid/supervise

and create 2 executable scripts: run

rm -f /var/run/squid/squid.pid
exec /usr/local/squid/sbin/squid -N 2>&1

and log/run.

exec /usr/local/bin/multilog t /var/log/squid

Finally, start the squid service by linking it into svscan monitored area.

cd /service
ln -s /usr/local/squid/supervise squid

Squid should start within 5 seconds.

from SMF

On new Solaris (10 and above) inittab/sysvinit is deprecated and is recommended to use new SMF (Service Management Facility).

To do that you need to create service manifest in XML format like this:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!--   Manifest-file for Squid
-->
<service_bundle type='manifest' name='Squid'>

<service
        name='network/squid'
        type='service'
        version='1'>

        <create_default_instance enabled='false' />

        <single_instance />

        <dependency name='fs-local'
                grouping='require_all'
                restart_on='none'
                type='service'>
                <service_fmri
                        value='svc:/system/filesystem/local' />
        </dependency>

        <dependency name='net-loopback'
                grouping='require_all'
                restart_on='none'
                type='service'>
                <service_fmri value='svc:/network/loopback' />
        </dependency>

        <dependency name='net-physical'
                grouping='require_all'
                restart_on='none'
                type='service'>
                <service_fmri value='svc:/network/physical' />
        </dependency>

        <dependency name='utmp'
                grouping='require_all'
                restart_on='none'
                type='service'>
                <service_fmri value='svc:/system/utmp' />
        </dependency>

        <dependency name='squid_config_data'
                grouping='require_all'
                restart_on='refresh'
                type='path'>
                <service_fmri value='file://localhost/usr/local/squid/etc/squid.conf' />
        </dependency>

        <exec_method
                type='method'
                name='start'
                exec='/lib/svc/method/init.squid %m'
                timeout_seconds='60'/>

        <exec_method
                type='method'
                name='stop'
                exec='/lib/svc/method/init.squid %m'
                timeout_seconds='60' />

        <exec_method
                type='method'
                name='refresh'
                exec='/lib/svc/method/init.squid %m'
                timeout_seconds='60' />

        <exec_method
                type='method'
                name='restart'
                exec='/lib/svc/method/init.squid %m'
                timeout_seconds='60' />

        <property_group name='general' type='framework'>
                <!-- to start stop squid -->
                <propval name='action_authorization' type='astring'
                        value='solaris.smf.manage' />
        </property_group>

        <stability value='Unstable' />

        <template>
                <common_name>
                        <loctext xml:lang='C'>
                        Squid proxy server
                        </loctext>
                </common_name>
                <documentation>
                        <manpage title='squid' section='8' manpath='/usr/local/squid/share/man/man8' />
                </documentation>
        </template>

</service>

</service_bundle>

then put this file in /var/svc/manifest/network directory and execute command

svccfg import /var/svc/manifest/network/squid.xml

as root. Then create init-like script (named service method) with command-line arguments start|stop|refresh|restart, put it into /lib/svc/method, and execute command

svcadm enable squid

You can get complete Squid SMF scripts with all needful to run it on Solaris here: squid_autostart25.tar.gz

How do I tell if Squid is running?

You can use the squidclient program:

% squidclient http://www.netscape.com/ > test

There are other command-line HTTP client programs available as well. Two that you may find useful are wget and echoping.

Another way is to use Squid itself to see if it can signal a running Squid process:

% squid -k check

And then check the shell's exit status variable.

Also, check the log files, most importantly the access.log and cache.log files.

squid command line options

These are the command line options for Squid-2:

-a Specify an alternate port number for incoming HTTP requests. Useful for testing a configuration file on a non-standard port.

-d Debugging level for "stderr" messages. If you use this option, then debugging messages up to the specified level will also be written to stderr.

-f Specify an alternate squid.conf file instead of the pathname compiled into the executable.

-h Prints the usage and help message.

-k reconfigure Sends a HUP signal, which causes Squid to re-read its configuration files.

-k rotate Sends an USR1 signal, which causes Squid to rotate its log files. Note, if logfile_rotate is set to zero, Squid still closes and re-opens all log files.

-k shutdown Sends a TERM signal, which causes Squid to wait briefly for current connections to finish and then exit. The amount of time to wait is specified with shutdown_lifetime.

-k interrupt Sends an INT signal, which causes Squid to shutdown immediately, without waiting for current connections.

-k kill Sends a KILL signal, which causes the Squid process to exit immediately, without closing any connections or log files. Use this only as a last resort.

-k debug Sends an USR2 signal, which causes Squid to generate full debugging messages until the next USR2 signal is recieved. Obviously very useful for debugging problems.

-k check Sends a "ZERO" signal to the Squid process. This simply checks whether or not the process is actually running.

-s Send debugging (level 0 only) message to syslog.

-u Specify an alternate port number for ICP messages. Useful for testing a configuration file on a non-standard port.

-v Prints the Squid version.

-z Creates disk swap directories. You must use this option when installing Squid for the first time, or when you add or modify the cache_dir configuration.

-D Do not make initial DNS tests. Normally, Squid looks up some well-known DNS hostnames to ensure that your DNS name resolution service is working properly. (!) obsolete in 3.1 and later.

-F If the swap.state logs are clean, then the cache is rebuilt in the "foreground" before any requests are served. This will decrease the time required to rebuild the cache, but HTTP requests will not be satisfied during this time.

-N Do not automatically become a background daemon process.

-R Do not set the SO_REUSEADDR option on sockets.

-X Enable full debugging while parsing the config file.

-Y Return ICP_OP_MISS_NOFETCH instead of ICP_OP_MISS while the swap.state file is being read. If your cache has mostly child caches which use ICP, this will allow your cache to rebuild faster.

How do I see how Squid works?

  • Check the cache.log file in your logs directory. It logs interesting things as a part of its normal operation and can be boosted to show all the boring details.

  • Install and use the ../CacheManager.

Can Squid benefit from SMP systems?

Squid is a single process application and can not make use of SMP. If you want to make Squid benefit from a SMP system you will need to run multiple instances of Squid and find a way to distribute your users on the different Squid instances just as if you had multiple Squid boxes.

Having two CPUs is indeed nice for running other CPU intensive tasks on the same server as the proxy, such as if you have a lot of logs and need to run various statistics collections during peak hours.

The authentication and group helpers barely use any CPU and does not benefit much from dual-CPU configuration.

Is it okay to use separate drives for Squid?

Yes. Running Squid on separate drives to that which your OS is running is often a very good idea.

Generally seek time is what you want to optimize for Squid, or more precisely the total amount of seeks/s your system can sustain. This is why it is better to have your cache_dir spread over multiple smaller disks than one huge drive (especially with SCSI).

If your system is very I/O bound, you will want to have both your OS and log directories running on separate drives.

Is it okay to use RAID on Squid?

see Section on RAID

Is it okay to use ZFS on Squid?

Yes. Running Squid on native ZFS-supporting systems, like Solaris or OpenIndiana is well-known practice.

In general, just set up ZFS mirror (usually the best with separate controllers for each spindle) and set recordsize 4-64k (depending your cache prefferable cache_replacement_policy). Also it can better for disk IO performance to change primarycache=metadata and secondarycache=none, and atime=off on cache_dir filesystems. Consider to correctly set logbias property for zfs fs which Squid's cache stores. Default value for this property is latency, which is appropriate for software ZFS raid/mirror. When ZFS created over hardware RAID5/10, set this property to throughput to avoid much TCP_SWAPFAIL_MISS. On system level the good idea is limiting ZFS ARC size to 1/8-1/4 of RAM by setting zfs:zfs_arc_max.

ZFS works perfectly both diskd and aufs Squid storeIO modules (best choise depending your box/storage architecture).


Include: Nothing found for "^Back to the"!

Configuring Squid

Before you start configuring

  • The best all around advice I can give on Squid is to start simple! Once everything works the way you expect, then start tweaking your way into complexity with a means to track the (in)effectiveness of each change you make (and a known good configuration that you can always go back to when you inevitably fubar the thing!).

    by Gregori Parker Seconded by all the Squid developers and Squid helpers.

How do I configure Squid without re-compiling it?

The squid.conf file. By default, this file is located at /etc/squid/squid.conf or maybe /usr/local/squid/etc/squid.conf.

Also, a QUICKSTART guide has been included with the source distribution. Please see the directory where you unpacked the source archive.

What does the squid.conf file do?

The squid.conf file defines the configuration for squid. The configuration includes (but not limited to) HTTP port number, the ICP request port number, incoming and outgoing requests, information about firewall access, and various timeout information.

Where can I find examples and configuration for a Feature?

There is still a fair bit of config knowledge buried in the old SquidFaq and Guide pages of this wiki. We are endeavoring to pull them into a layout easier to use.

What we have so far is:

  • The general background configuration info here on this page
  • Specific feature descriptions pros/cons and some config are linked from the main SquidFaq in a features section.

  • Any complex tuning stuff mixing features and specific demos in ConfigExamples and usually linked from the related features or FAQ pages as well.

Do you have a squid.conf example?

Yes.

For Squid 2.x and 3.0 after you make install, a sample squid.conf.default file will exist in the etc directory under the Squid installation directory.

From 2.6 the Squid developers also provide a set of Configuration Guides online. They list all the options each version of Squid can accept in its squid.conf file

including guides for the current development test releases

Squid-3.1 default config

From 3.1 a lot of configuration cleanups have been done to make things easier.

  • /!\

    This minimal configuration does not work with versions earlier than 3.1 which are missing special cleanup done to the code.

http_port 3128

refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

acl manager url_regex -i ^cache_object:// +i ^https?://[^/]+/squid-internal-mgr/

acl localhost src 127.0.0.1/32 ::1
acl to_localhost dst 127.0.0.0/8 0.0.0.0/32 ::1

acl localnet src 10.0.0.0/8     # RFC 1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC 1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC 1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT

http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost
http_access allow localnet
http_access deny all

Squid-3.2 default config

From 3.2 further configuration cleanups have been done to make things easier and safer. The manager, localhost, and to_localhost ACL definitions are now built-in.

  • /!\

    This minimal configuration does not work with versions earlier than 3.2 which are missing special cleanup done to the code.

http_port 3128

refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

acl localnet src 10.0.0.0/8     # RFC 1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC 1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC 1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443

acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT

http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost
http_access allow localnet
http_access deny all

Squid-3.3 default config

From 3.3 a few performance improvements have been done. The manager regex ACLs have been moved after the DoS and protocol smuggling attack protections.

  • /!\

    This minimal configuration does not work with versions earlier than 3.2 which are missing special cleanup done to the code.

http_port 3128

refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

acl localnet src 10.0.0.0/8     # RFC 1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC 1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC 1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443          # https

acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http

acl CONNECT method CONNECT

http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager
http_access allow localnet
http_access allow localhost
http_access deny all

Squid-3.3 default config

From 3.3 a few performance improvements have been done. The manager regex ACLs have been moved after the DoS and protocol smuggling attack protections.

  • /!\

    This minimal configuration does not work with versions earlier than 3.2 which are missing special cleanup done to the code.

http_port 3128

refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

acl localnet src 10.0.0.0/8     # RFC 1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC 1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC 1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443          # https

acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http

acl CONNECT method CONNECT

http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager
http_access allow localnet
http_access allow localhost
http_access deny all

Squid-3.5 default config

From 3.5 a few performance improvements have been done. The manager regex ACLs have been moved after the DoS and protocol smuggling attack protections.

  • /!\

    This minimal configuration does not work with versions earlier than 3.2 which are missing special cleanup done to the code.

http_port 3128

acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443

acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl Safe_ports port 1025-65535  # unregistered ports

acl CONNECT method CONNECT

http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#

http_access allow localnet
http_access allow localhost
http_access deny all

coredump_dir /squid/var/cache/squid

refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

How do I configure Squid to work behind a firewall?

If you are behind a firewall which can't make direct connections to the outside world, you must use a parent cache. Normally Squid tries to be smart and only uses cache peers when it makes sense from a perspective of global hit ratio, and thus you need to tell Squid when it can not go direct and must use a parent proxy even if it knows the request will be a cache miss.

You can use the never_direct access list in squid.conf to specify which requests must be forwarded to your parent cache outside the firewall, and the always_direct access list to specify which requests must not be forwarded. For example, if Squid must connect directly to all servers that end with mydomain.com, but must use the parent for all others, you would write:

acl INSIDE dstdomain .mydomain.com
always_direct allow INSIDE
never_direct allow all

You could also specify internal servers by IP address

acl INSIDE_IP dst 1.2.3.0/24
always_direct allow INSIDE_IP
never_direct allow all

Note, however that when you use IP addresses, Squid must perform a DNS lookup to convert URL hostnames to an address. Your internal DNS servers may not be able to lookup external domains.

If you use never_direct and you have multiple parent caches, then you probably will want to mark one of them as a default choice in case Squid can't decide which one to use. That is done with the default keyword on a cache_peer line. For example:

cache_peer xyz.mydomain.com parent 3128 0 no-query default

How do I configure Squid forward all requests to another proxy?

see Features/CacheHierarchy

What ''cache_dir'' size should I use?

This chapter assumes that you are dedicating an entire disk partition to a squid cache_dir, as is often the case.

Generally speaking, setting the cache_dir to be the same size as the disk partition is not a wise choice, for two reasons. The first is that squid is not very tolerant to running out of disk space. On top of the cache_dir size, squid will use some extra space for swap.state and then some more temporary storage as work-areas, for instance when rebuilding swap.state. So in any case make sure to leave some extra room for this, or your cache will enter an endless crash-restart cycle.

The second reason is fragmentation (note, this won't apply to the COSS object storage engine - when it will be ready): filesystems can only do so much to avoid fragmentation, and in order to be effective they need to have the space to try and optimize file placement. If the disk is full, optimization is very hard, and when the disk is 100% full optimizing is plain impossible. Get your disk fragmented, and it will most likely be your worst bottleneck, by far offsetting the modest gain you got by having more storage.

Let's see an example: you have a 9Gb disk (these times they're even hard to find..). First thing, manifacturers often lie about disk capacity (the whole Megabyte vs Mebibyte issue), and then the OS needs some space for its accounting structures, so you'll reasonably end up with 8Gib of useable space. You then have to account for another 10% in overhead for Squid, and then the space needed for keeping fragmentation at bay. So in the end the recommended cache_dir setting is 6000 to 7000 Mebibyte.

cache_dir ... 7000 16 256

Its better to start out with a conservative setting and then, after the cache has been filled, look at the disk usage. If you think there is plenty of unused space, then increase the cache_dir setting a little.

If you're getting "disk full" write errors, then you definitely need to decrease your cache size.

I'm adding a new cache_dir. Will I lose my cache?

No. You can add and delete cache_dir lines without affecting any of the others.

Squid and http-gw from the TIS toolkit.

Several people on both the fwtk-users and the squid-users mailing asked about using Squid in combination with http-gw from the TIS toolkit. The most elegant way in my opinion is to run an internal Squid caching proxyserver which handles client requests and let this server forward it's requests to the http-gw running on the firewall. Cache hits won't need to be handled by the firewall.

In this example Squid runs on the same server as the http-gw, Squid uses 8000 and http-gw uses 8080 (web). The local domain is home.nl.

Firewall configuration

Either run http-gw as a daemon from the /etc/rc.d/rc.local (Linux Slackware):

exec /usr/local/fwtk/http-gw -daemon 8080

or run it from inetd like this:

web stream      tcp      nowait.100  root /usr/local/fwtk/http-gw http-gw

I increased the watermark to 100 because a lot of people run into problems with the default value.

Make sure you have at least the following line in /usr/local/etc/netperm-table:

http-gw: hosts 127.0.0.1

You could add the IP-address of your own workstation to this rule and make sure the http-gw by itself works, like:

http-gw:                hosts 127.0.0.1 10.0.0.1

Squid configuration

The following settings are important:

http_port       8000
icp_port        0
cache_peer      localhost.home.nl parent 8080 0 default
acl HOME        dstdomain .home.nl
alwayws_direct  allow HOME
never_direct    allow all

This tells Squid to use the parent for all domains other than home.nl. Below, access.log entries show what happens if you do a reload on the Squid-homepage:

872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl -
872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl

http-gw entries in syslog:

Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/
Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif
Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif
Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif
Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2
Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3

To summarize:

Advantages:

  • http-gw allows you to selectively block ActiveX and Java, and it's primary design goal is security.
  • The firewall doesn't need to run large applications like Squid.
  • The internal Squid-server still gives you the benefit of caching.

Disadvantages:

  • The internal Squid proxyserver can't (and shouldn't) work with other parent or neighbor caches.
  • Initial requests are slower because these go through http-gw, http-gw also does reverse lookups. Run a nameserver on the firewall or use an internal nameserver.

(contributed by Rodney van den Oever)

What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?

see. Security - X-Forwarded-For

When a proxy-cache is used, a server does not see the connection coming from the originating client. Many people like to implement access controls based on the client address. To accommodate these people, Squid adds the request header called "X-Forwarded-For" which looks like this:

X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30

Entries are always IP addresses, or the word unknown if the address could not be determined or if it has been disabled with the forwarded_for configuration option.

We must note that access controls based on this header are extremely weak and simple to fake. Anyone may hand-enter a request with any IP address whatsoever. This is perhaps the reason why client IP addresses have been omitted from the HTTP/1.1 specification.

Because of the weakness of this header, access controls based on X-Forwarded-For are not used by default. It's needs to be specifically enabled with follow_x_forwarded_for.

Can Squid anonymize HTTP requests?

Yes it can, however the way of doing it has changed from earlier versions of squid. Please follow the instructions for the version of squid that you are using. As a default, no anonymizing is done.

If you choose to use the anonymizer you might wish to investigate the forwarded_for option to prevent the client address being disclosed. Failure to turn off the forwarded_for option will reduce the effectiveness of the anonymizer. Finally if you filter the User-Agent header using the fake_user_agent option can prevent some user problems as some sites require the User-Agent header.

NP: Squid must be built with the --enable-http-violations configure option before building.

Current squid releases provide a mix of header control directives and capability;

Squid 2.6 - 2.7

Allow erasure or replacement of specific headers through the http_header_access and header_replace options.

Squid 3.0

Allows selective erasure and replacement of specific headers in either request or reply with the request_header_access and reply_header_access and header_replace settings.

Squid 3.1
Adds to the 3.0 capability with truncation, replacement, or removal of X-Forwarded-For header.

For details see the documentation in squid.conf.default or squid.conf.documented for your specific version of squid.

http://www.squid-cache.org/Versions/v2/HEAD/cfgman/ , http://www.squid-cache.org/Versions/v3/3.HEAD/cfgman/ , References: Anonymous WWW

Can I make Squid go direct for some sites?

Sure, just use the always_direct access list.

For example, if you want Squid to connect directly to hotmail.com servers, you can use these lines in your config file:

acl hotmail dstdomain .hotmail.com
always_direct allow hotmail

Can I make Squid proxy only, without caching anything?

Sure, there are few things you can do.

You can use the cache access list to make Squid never cache any response:

cache deny all

With Squid-2.7, Squid-3.1 and later you can also remove all 'cache_dir' options from your squid.conf to avoid having a cache directory.

With Squid-2.4, 2.5, 2.6, and 3.0 you need to use the "null" storage module:

cache_dir null /tmp

Note: a null cache_dir does not disable caching, but it does save you from creating a cache structure if you have disabled caching with cache. The directory (e.g., /tmp) must exist so that squid can chdir to it, unless you also use the coredump_dir option.

To configure Squid for the "null" storage module, specify it on the configure command line:

--enable-storeio=null,...

Can I prevent users from downloading large files?

You can set the global reply_body_max_size parameter. This option controls the largest HTTP message body that will be sent to a cache client for one request.

If the HTTP response coming from the server has a Content-length header, then Squid compares the content-length value to the reply_body_max_size value. If the content-length is larger,the server connection is closed and the user receives an error message from Squid.

Some responses don't have Content-length headers. In this case, Squid counts how many bytes are written to the client. Once the limit is reached, the client's connection is simply closed.

  • (!) Note that "creative" user-agents will still be able to download really large files through the cache using HTTP/1.1 range requests.


Communication between browsers and Squid

Most web browsers available today support proxying and are easily configured to use a Squid server as a proxy. Some browsers support advanced features such as lists of domains or URL patterns that shouldn't be fetched through the proxy, or JavaScript automatic proxy configuration.

There are three ways to configure browsers to use Squid. The first method involves manually configuring the proxy in each browser. Alternatively, a proxy.pac file can be manually entered into each browser so that it will download the proxy settings (partial auto configuration), and lastly all modern browsers can also and indeed are configured by default to fully automatically configure themselves if the network is configured to support this.

Recommended network configuration

For best use of the proxy we recommend a multiple-layers approach. The following are the layers we recommend, in order of preference.

We are aware that many networks only implement layer 3 and 4 of this design due to administrators familiarity with NAT, confusion about the benefits, and historic problems with the upper two layers.

  1. Web Proxy Automatic Detection (WPAD) (aka transparent configuration)

    • Browsers set to auto-detect the proxy for whatever network they are plugged into. This is particularly useful for mobile users.
    • The big problem with this layer is that there is no formal RFC standard to follow, so browsers implement and two separate DNS and DHCP systems to setup.
    • requires the PAC to be implemented.
  2. Proxy auto-configuration (PAC) (aka transparent proxy)

    • As a backup to per-machine configuration.
    • Some systems support PAC file to be explicitly set in the machine-wide environment.
  3. Machine-wide Configuration
    • Using a system-wide environment variable http_proxy (or GUI configuration which sets it). Most operating systems support this. Windows is the exception, however the IE settings are used in an equivalent way.

    • A lot of software supports it. Only set once per machine.
    • Some systems allow this to be pushed out across the network (Windows uses a Domain Policy)
  4. NAT or TPROXY interception. (aka transparent proxy)

    • Client software does not need to be touched.
    • security takes several major reductions (several whole families of vulnerability are created, proxy authentication disappears, peering abilities disappear)
    • System resources and connection reliability take several major reductions
  5. Manual Configuration.

    • Nothing beats an explicit manual configuration for it works excitement. However doing it for each and every piece of software on a machine is quite a hassle. Doing it for a whole network is unrealistic outside of highly paranoid systems. It is mentioned here simply as an option.

  6. For completeness sake: the best underlying secure systems back several of these layers up with a complete firewall ban on web traffic. This prevents users and machines bypassing the proxy control points.

Manual Browser Configuration

This involves manually specifying the proxy server and port name in each browser.

Firefox and Thunderbird manual configuration

Both Firefox and Thunderbird are configured in the same way. Look in the Tools menu, Options, General and then Connection Settings. The options in there are fairly self explanatory. Firefox and Thunderbird support manually specifying the proxy server, automatically downloading a wpad.dat file from a specified source, and additionally wpad auto-detection.

Thunderbird uses these settings for downloading HTTP images in emails.

In both cases if you are manually configuring proxies, make sure you should add relevant statements for your network in the "No Proxy For" boxes.

Microsoft Internet Explorer manual configuration

Select Options from the View menu. Click on the Connection tab. Tick the Connect through Proxy Server option and hit the Proxy Settings button. For each protocol that your Squid server supports (by default, HTTP, FTP, and gopher) enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port column. For any protocols that your Squid does not support, leave the fields blank.

Netscape manual configuration

Select Network Preferences from the Options menu. On the Proxies page, click the radio button next to Manual Proxy Configuration and then click on the View button. For each protocol that your Squid server supports (by default, HTTP, FTP, and gopher) enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port column. For any protocols that your Squid does not support, leave the fields blank.

Lynx and Mosaic manual configuration

For Mosaic and Lynx, you can set environment variables before starting the application. For example (assuming csh or tcsh):

% setenv http_proxy http://mycache.example.com:3128/
% setenv gopher_proxy http://mycache.example.com:3128/
% setenv ftp_proxy http://mycache.example.com:3128/

For Lynx you can also edit the lynx.cfg file to configure proxy usage. This has the added benefit of causing all Lynx users on a system to access the proxy without making environment variable changes for each user. For example:

http_proxy:http://mycache.example.com:3128/
ftp_proxy:http://mycache.example.com:3128/
gopher_proxy:http://mycache.example.com:3128/

Opera 2.12 manual configuration

by Hume Smith

Select Proxy Servers... from the Preferences menu. Check each protocol that your Squid server supports (by default, HTTP, FTP, and Gopher) and enter the Squid server's address as hostname:port (e.g. mycache.example.com:3128 or 192.0.2.2:3128). Click on Okay to accept the setup.

Notes:

  • Opera 2.12 doesn't support gopher on its own, but requires a proxy; therefore Squid's gopher proxying can extend the utility of your Opera immensely.
  • Unfortunately, Opera 2.12 chokes on some HTTP requests, for example abuse.net.

At the moment I think it has something to do with cookies. If you have trouble with a site, try disabling the HTTP proxying by unchecking that protocol in the Preferences|Proxy Servers... dialogue. Opera will remember the address, so reenabling is easy.

Netmanage Internet Chameleon WebSurfer manual configuration

Netmanage WebSurfer supports manual proxy configuration and exclusion lists for hosts or domains that should not be fetched via proxy (this information is current as of WebSurfer 5.0). Select Preferences from the Settings menu. Click on the Proxies tab. Select the Use Proxy options for HTTP, FTP, and gopher. For each protocol that enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port boxes. For any protocols that your Squid does not support, leave the fields blank.

On the same configuration window, you'll find a button to bring up the exclusion list dialog box, which will let you enter some hosts or domains that you don't want fetched via proxy.

Partially Automatic Configuration

This involves the browser being preconfigured with the location of an autoconfiguration script.

Netscape automatic configuration

Netscape Navigator's proxy configuration can be automated with JavaScript (for Navigator versions 2.0 or higher). Select Network Preferences from the Options menu. On the Proxies page, click the radio button next to Automatic Proxy Configuration and then fill in the URL for your JavaScript proxy configuration file in the text box. The box is too small, but the text will scroll to the r8ight as you go.

You may also wish to consult Netscape's documentation for the Navigator JavaScript proxy configuration

Here is a sample auto configuration file from Oskar Pearson (link to save at the bottom):

   1 //We (www.is.co.za) run a central cache for our customers that they
   2 //access through a firewall - thus if they want to connect to their intranet
   3 //system (or anything in their domain at all) they have to connect
   4 //directly - hence all the "fiddling" to see if they are trying to connect
   5 //to their local domain.
   6 //
   7 //Replace each occurrence of company.com with your domain name
   8 //and if you have some kind of intranet system, make sure
   9 //that you put it's name in place of "internal" below.
  10 //
  11 //We also assume that your cache is called "cache.company.com", and
  12 //that it runs on port 8080. Change it down at the bottom.
  13 //
  14 //(C) Oskar Pearson and the Internet Solution (http://www.is.co.za)
  15 
  16 function FindProxyForURL(url, host)
  17 {
  18     //If they have only specified a hostname, go directly.
  19     if (isPlainHostName(host))
  20             return "DIRECT";
  21 
  22     //These connect directly if the machine they are trying to
  23     //connect to starts with "intranet" - ie http://intranet
  24     //Connect  directly if it is intranet.*
  25     //If you have another machine that you want them to
  26     //access directly, replace "internal*" with that
  27     //machine's name
  28     if (shExpMatch( host, "intranet*")||
  29                     shExpMatch(host, "internal*"))
  30         return "DIRECT";
  31 
  32     //Connect directly to our domains (NB for Important News)
  33     if (dnsDomainIs( host,"company.com")||
  34     //If you have another domain that you wish to connect to
  35     //directly, put it in here
  36                     dnsDomainIs(host,"sistercompany.com"))
  37         return "DIRECT";
  38 
  39     //So the error message "no such host" will appear through the
  40     //normal Netscape box - less support queries :)
  41     if (!isResolvable(host))
  42             return "DIRECT";
  43 
  44     //We only cache http, ftp and gopher
  45     if (url.substring(0, 5) == "http:" ||
  46                     url.substring(0, 4) == "ftp:"||
  47                     url.substring(0, 7) == "gopher:")
  48 
  49     //Change the ":8080" to the port that your cache
  50     //runs on, and "cache.company.com" to the machine that
  51     //you run the cache on
  52             return "PROXY cache.company.com:8080; DIRECT";
  53 
  54     //We don't cache WAIS
  55     if (url.substring(0, 5) == "wais:")
  56             return "DIRECT";
  57 
  58     else
  59             return "DIRECT";
  60 }

sample1.pac.txt

Microsoft Internet Explorer

Microsoft Internet Explorer, versions 4.0 and above, supports JavaScript automatic proxy configuration in a Netscape-compatible way. Just select Options from the View menu. Click on the Advanced tab. In the lower left-hand corner, click on the Automatic Configuration button. Fill in the URL for your JavaScript file in the dialog box it presents you. Then exit MSIE and restart it for the changes to take effect. MSIE will reload the JavaScript file every time it starts.

Fully Automatic Configuration

by Mark Reynolds

You may like to start by reading the Expired Internet-Draft that describes WPAD.

After reading the 8 steps below, if you don't understand any of the terms or methods mentioned, you probably shouldn't be doing this. Implementing wpad requires you to fully understand:

  • web server installations and modifications.
  • squid proxy server (or others) installation etc.
  • Domain Name System maintenance etc.

<!>

Please don't bombard the squid list with web server or DNS questions. See your system administrator, or do some more research on those topics.

This is not a recommendation for any product or version. All major browsers out now implementing WPAD. I think WPAD is an excellent feature that will return several hours of life per month.

There are probably many more tricks and tips which hopefully will be detailed here in the future. Things like wpad.dat files being served from the proxy server themselves, maybe with a round robin dns setup for the WPAD host.

I have only focused on the domain name method, to the exclusion of the DHCP method. I think the dns method might be easier for most people. I don't currently, and may never, fully understand wpad and IE5, but this method worked for me. It may work for you.

But if you'd rather just have a go ...

The PAC file

Create a standard Netscape auto proxy config file. The sample provided above is more than adequate to get you going. No doubt all the other load balancing and backup scripts will be fine also.

Store the resultant file in the document root directory of a handy web server as wpad.dat (Not proxy.pac as you may have previously done.) Andrei Ivanov notes that you should be able to use an HTTP redirect if you want to store the wpad.dat file somewhere else. You can probably even redirect wpad.dat to proxy.pac:

Redirect /wpad.dat http://example.com/proxy.pac

If you do nothing more, a URL like http://www.example.com/wpad.dat should bring up the script text in your browser window.

Insert the following entry into your web server mime.types file. Maybe in addition to your pac file type, if you've done this before.

application/x-ns-proxy-autoconfig       dat

And then restart your web server, for new mime type to work.

Browser Configurations

Internet explorer 5

Under Tools, Internet Options, Connections, Settings or Lan Settings, set ONLY Use Automatic Configuration Script to be the URL for where your new wpad.dat file can be found.

i.e. http://www.example.com/wpad.dat.

Test that that all works as per your script and network. There's no point continuing until this works ...

Automatic WPAD with DNS

Create/install/implement a DNS record so that wpad.example.com resolves to the host above where you have a functioning auto config script running. You should now be able to use http://wpad.example.com/wpad.dat as the Auto Config Script location in step 5 above.

And finally, go back to the setup screen detailed in 5 above, and choose nothing but the Automatically Detect Settings option, turning everything else off. Best to restart IE5, as you normally do with any Microsoft product... And it should all work. Did for me anyway.

One final question might be "Which domain name does the client (IE5) use for the wpad... lookup?" It uses the hostname from the control panel setting. It starts the search by adding the hostname wpad to current fully-qualified domain name. For instance, a client in a.b.example.com would search for a WPAD server at wpad.a.b.example.com. If it could not locate one, it would remove the bottom-most domain and try again; for instance, it would try wpad.b.example.com next. IE 5 would stop searching when it found a WPAD server or reached the bottom-level domain, wpad.

Automatic WPAD with DHCP

You can also use DHCP to configure browsers for WPAD. This technique allows you to set any URL as the PAC URL. For ISC DHCPD, enter a line like this in your dhcpd.conf file:

option wpad code 252 = text;
option wpad "http://www.example.com/proxy.pac";

Replace the hostname with the name or address of your own server.

Ilja Pavkovic notes that the DHCP mode does not work reliably with every version of Internet Explorer. The DNS name method to find wpad.dat is more reliable.

Another user adds that IE 6.01 seems to strip the last character from the URL. By adding a trailing newline, he is able to make it work with both IE 5.0 and 6.0:

option wpad "http://www.example.com/proxy.pac\n";

Redundant Proxy Auto-Configuration

by Rodney van den Oever

There's one nasty side-effect to using auto-proxy scripts: if you start the web browser it will try and load the auto-proxy-script.

If your script isn't available either because the web server hosting the script is down or your workstation can't reach the web server (e.g. because you're working off-line with your notebook and just want to read a previously saved HTML-file) you'll get different errors depending on the browser you use.

The Netscape browser will just return an error after a timeout (after that it tries to find the site 'www.proxy.com' if the script you use is called 'proxy.pac').

The Microsoft Internet Explorer on the other hand won't even start, no window displays, only after about 1 minute it'll display a window asking you to go on with/without proxy configuration.

The point is that your workstations always need to locate the proxy-script. I created some extra redundancy by hosting the script on two web servers (actually Apache web servers on the proxy servers themselves) and adding the following records to my primary nameserver:

proxy   IN      A       192.0.2.1 ; IP address of proxy1
        IN      A       192.0.2.2 ; IP address of proxy2

The clients just refer to 'http://proxy/proxy.pac'. This script looks like this:

   1 function FindProxyForURL(url,host)
   2 {
   3 // Hostname without domainname or host within our own domain?
   4 // Try them directly:
   5 // http://www.domain.com actually lives before the firewall, so
   6 // make an exception:
   7 if ((isPlainHostName(host)||dnsDomainIs( host,".domain.com")) &&
   8         !localHostOrDomainIs(host, "www.domain.com"))
   9         return "DIRECT";
  10 
  11 // First try proxy1 then proxy2. One server mostly caches '.com'
  12 // to make sure both servers are not
  13 // caching the same data in the normal situation. The other
  14 // server caches the other domains normally.
  15 // If one of 'm is down the client will try the other server.
  16 else if (shExpMatch(host, "*.com"))
  17         return "PROXY proxy1.domain.com:8080; PROXY proxy2.domain.com:8081; DIRECT";
  18 return "PROXY proxy2.domain.com:8081; PROXY proxy1.domain.com:8080; DIRECT";
  19 }

sample2.pac.txt

I made sure every client domain has the appropriate 'proxy' entry. The clients are automatically configured with two nameservers using DHCP.

Proxy Auto-Configuration with URL Hashing

The Sharp Super Proxy Script page contains a lot of good information about hash-based proxy auto-configuration scripts. With these you can distribute the load between a number of caching proxies.

Where can I find more information about PAC?

There is a community website explaining PAC features and functions at http://findproxyforurl.com/.

How do I tell Squid to use a specific username for FTP urls?

There are several ways the login can be done with FTP through Squid.

ftp_user directive will accept the username or username:password values to be used by default on all FTP login requests. It will be overridden by any other available login credentials.

The strongest credentials that override all others are credentials added to the URL itself.

Insert your username in the host part of the URL, for example:

ftp://joecool@ftp.example.com/

Squid (from 2.6 through to 3.0) will then use a default password.

Alternatively, you can specify both your username and password in the URL itself:

ftp://joecool:secret@ftp.example.com/

However, we certainly do not recommend this, as it could be very easy for someone to see or grab your password.

Starting with Squid-3.1, the above will be tried then regular HTTP Basic authentication will be used to recover new credentials. If login is required and none given a regular website login popup box will appear asking for the credentials to be entered.

IE 5.0x crops trailing slashes from FTP URL's

by ReubenFarrelly

There was a bug in the 5.0x releases of Internet Explorer in which IE cropped any trailing slash off an FTP URL. The URL showed up correctly in the browser's "Address:" field, however squid logs show that the trailing slash was being taken off.

An example of where this impacted squid if you had a setup where squid would go direct for FTP directory listings but forward a request to a parent for FTP file transfers. This was useful if your upstream proxy was an older version of Squid or another vendors software which displayed directory listings with broken icons and you wanted your own local version of squid to generate proper FTP directory listings instead. The workaround for this is to add a double slash to any directory listing in which the slash was important, or else upgrade IE to at least 5.5. (Or use Firefox if you cannot upgrade your IE)

IE 6.0 SP1 fails when using authentication

When using authentication with Internet Explorer 6 SP1, you may encounter issues when you first launch Internet Explorer. The problem will show itself when you first authenticate, you will receive a "Page Cannot Be Displayed" error. However, if you click refresh, the page will be correctly displayed.

This only happens immediately after you authenticate.

This is not a Squid error or bug. Microsoft broke the Basic Authentication when they put out IE6 SP1.

  • /!\ this appears to be fixed again in later service packs and IE 7+

There is a knowledgebase article ( KB 331906) regarding this issue, which contains a link to a downloadable "hot fix." They do warn that this code is not "regression tested" but so far there have not been any reports of this breaking anything else. The problematic file is wininet.dll. Please note that this hotfix is included in the latest security update.

Lloyd Parkes notes that the article references another article, KB 312176. He says that you must not have the registry entry that KB 312176 encourages users to add to their registry.

According to Joao Coutinho, this simple solution also corrects the problem:

  • Go to Tools/Internet
  • Go to Options/Advanced
  • UNSELECT "Show friendly HTTP error messages" under Browsing.

Another possible workaround to these problems is to make the ERR_CACHE_ACCESS_DENIED larger than 1460 bytes. This should trigger IE to handle the authentication in a slightly different manner.


Squid Log Files

The logs are a valuable source of information about Squid workloads and performance. The logs record not only access information, but also system configuration errors and resource consumption (e.g. memory, disk space). There are several log file maintained by Squid. Some have to be explicitly activated during compile time, others can safely be deactivated during run-time.

There are a few basic points common to all log files. The time stamps logged into the log files are usually UTC seconds unless stated otherwise. The initial time stamp usually contains a millisecond extension.

cache.log

The cache.log file contains the debug and error messages that Squid generates. If you start your Squid using the -s command line option, a copy of certain messages will go into your syslog facilities. It is a matter of personal preferences to use a separate file for the squid log data.

From the area of automatic log file analysis, the cache.log file does not have much to offer. You will usually look into this file for automated error reports, when programming Squid, testing new features, or searching for reasons of a perceived misbehavior, etc.

Squid Error Messages

Error messages come in several forms. Debug traces are not logged at level 0 or level 1. These levels are reserved for important and critical administrative messages.

  • FATAL messages indicate a problem which has killed the Squid process. Affecting all current client traffic being supplied by that Squid instance.

    • Obviously if these occur when starting or configuring a Squid component it must be resolved before you can run Squid.

  • ERROR messages indicate a serious problem which has broken an individual client transaction and may have some effect on other clients indirectly. But has not completely aborted all traffic service.

    • These can also occur when starting or configuring Squid components. In which case any service actions which that component would have supplied will not happen until it is resolved and Squid reconfigured.
    • NOTE: Some log level 0 error messages inherited from older Squid versions exist without any prioritization tag.
  • WARNING messages indicate problems which might be causing problems to the client, but Squid is capable of working around automatically. These usually only display at log level 1 and higher.

    • NOTE: Some log level 1 warning messages inherited from older Squid versions exist without any prioritization tag.
  • SECURITY ERROR messages indicate problems processing a client request with the security controls which Squid has been configured with. Some impossible condition is required to pass the security test.

    • This is commonly seen when testing whether to accept a client request based on some reply detail which will only be available in the future.

  • SECURITY ALERT messages indicate security attack problems being detected. This is only for problems which are unambiguous. 'Attacks' signatures which can appear in normal traffic are logged as regular WARNING.

    • A complete solution to these usually requires fixing the client, which may not be possible.
    • Administrative workarounds (extra firewall rules etc) can assist Squid in reducing the damage to network performance.
    • Attack notices may seem rather critical, but occur at level 1 since in all cases Squid also has some workaround it can perform.
  • SECURITY NOTICE messages can appear during startup and reconfigure to indicate security related problems with the configuration file setting. These are accompanied by hints for better configuration where possible, and an indication of what Squid is going to do instead of the configured action.

Some of the more frequently questioned messages and what they mean are outlined in the KnowledgeBase:

access.log

Most log file analysis program are based on the entries in access.log.

Squid-2.7 and Squid-3.2 allow the administrators to configure their logfile format and log output method with great flexibility. Previous versions offered a much more limited functionality.

Squid result codes

The Squid result code is composed of several tags (separated by underscore characters) which describe the response sent to the client.

  • One of these tags always exists to describe how it was delivered:

    TCP

    Requests on the HTTP port (usually 3128).

    UDP

    Requests on the ICP port (usually 3130) or HTCP port (usually 4128). If ICP logging was disabled using the log_icp_queries option, no ICP replies will be logged.

    NONE

    Squid delivered an unusual response or no response at all. Seen with cachemgr requests and errors, usually when the transaction fails before being classified into one of the above outcomes. Also seen with responses to CONNECT requests.

  • These tags are optional and describe why the particular handling was performed or where the request came from:

    CF

    At least one request in this transaction was collapsed. See collapsed_forwarding for more details about request collapsing. Support for this tag has been added to Squid v5 on 2018-06-18 (commit d2a6dc). It may not be available in earlier Squid versions.

    CLIENT

    The client request placed limits affecting the response. Usually seen with client issued a "no-cache", or analogous cache control command along with the request. Thus, the cache has to validate the object.

    IMS

    The client sent a revalidation (conditional) request.

    ASYNC

    The request was generated internally by Squid. Usually this is background fetches for cache information exchanges, background revalidation from stale-while-revalidate cache controls, or ESI sub-objects being loaded.

    SWAPFAIL

    The object was believed to be in the cache, but could not be accessed. A new copy was requested from the server.

    REFRESH

    A revalidation (conditional) request was sent to the server.

    SHARED

    This tag is not supported yet. This request was combined with an existing transaction by collapsed forwarding. NOTE: the existing request is not marked as SHARED.

    REPLY

    The HTTP reply from server or peer. Usually seen on DENIED due to http_reply_access ACLs preventing delivery of servers response object to the client.

  • These tags are optional and describe what type of object was produced:

    NEGATIVE

    Only seen on HIT responses. Indicating the response was a cached error response. e.g. "404 not found"

    STALE

    The object was cached and served stale. This is usually caused by stale-while-revalidate or stale-if-error cache controls.

    OFFLINE

    The requested object was retrieved from the cache during offline_mode. The offline mode never validates any object.

    INVALID

    An invalid request was received. An error response was delivered indicating what the problem was.

    FAIL

    Only seen on REFRESH to indicate the revalidation request failed. The response object may be the server provided network error or the stale object which was being revalidated depending on stale-if-error cache control.

    MODIFIED

    Only seen on REFRESH responses to indicate revalidation produced a new modified object.

    UNMODIFIED

    Only seen on REFRESH responses to indicate revalidation produced a 304 (Not Modified) status. The client gets either a full 200 (OK), a 304 (Not Modified), or (in theory) another response, depending on the client request and other details.

    REDIRECT

    Squid generated an HTTP redirect response to this request. Only on Squid-3.2+ or Squid built with -DLOG_TCP_REDIRECTS compiler flag.

  • These tags are optional and describe whether the response was loaded from cache, network, or otherwise:

    HIT

    The response object delivered was the local cache object.

    MEM

    Additional tag indicating the response object came from memory cache, avoiding disk accesses. Only seen on HIT responses.

    MISS

    The response object delivered was the network response object.

    DENIED

    The request was denied by access controls.

    NOFETCH

    A ICP specific type. Indicating service is alive, but not to be used for this request. Sent during "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or UDP_MISS_NOFETCH. Neighbours will thus only fetch hits.

    TUNNEL

    A binary tunnel was established for this transaction. Only on Squid-3.5+

  • These tags are optional and describe some error conditions which occured during response delivery (if any):

    ABORTED

    The response was not completed due to the connection being aborted (usually by the client).

    TIMEOUT

    The response was not completed due to a connection timeout.

    IGNORED

    While refreshing a previously cached response A, Squid got a response B that was older than A (as determined by the Date header field). Squid ignored response B (and attempted to use A instead). This "ignore older responses" logic complies with RFC 7234 Section 4 requirement: a cache MUST use the most recent response (as determined by the Date header field).

HTTP status codes

These are taken from RFC 1945 (HTTP/1.0), 2616 (HTTP/1.1) and verified for Squid. Squid uses almost all codes except 416 (Request Range Not Satisfiable). Extra codes used in the Squid logs (but not live traffic) include 000 for a result code being unavailable, and 600 to signal an invalid header, a proxy error. Also, some definitions were added as for RFC 2518 and 4918 (WebDAV). Yes, there are really two entries for status code 424:

Status

Description

RFC(s)

000

Used mostly with UDP traffic.

N/A

Informational

100

Continue

2616

101

Switching Protocols

2616

102

Processing

2518

Successful Transaction

200

OK

1945, 2616

201

Created

1945, 2616

202

Accepted

1945, 2616

203

Non-Authoritative Information

2616

204

No Content

1945, 2616, 4918

205

Reset Content

2616

206

Partial Content

2616

207

Multi Status

2518, 4918

Redirection

300

Multiple Choices

1945, 2616, 4918

301

Moved Permanently

1945, 2616, 4918

302

Moved Temporarily

1945, 2616, 4918

303

See Other

2616, 4918

304

Not Modified

1945, 2616

305

Use Proxy

2616, 4918

307

Temporary Redirect

2616, 4918

Client Error

400

Bad Request

1945, 2616, 4918

401

Unauthorized

1945, 2616

402

Payment Required

2616

403

Forbidden

1945, 2616, 4918

404

Not Found

1945, 2616

405

Method Not Allowed

2616

406

Not Acceptable

2616

407

Proxy Authentication Required

2616

408

Request Timeout

2616

409

Conflict

2616, 4918

410

Gone

2616

411

Length Required

2616

412

Precondition Failed

2616, 4918

413

Request Entity Too Large

2616

414

Request URI Too Large

2616, 4918

415

Unsupported Media Type

2616

416

Request Range Not Satisfiable

2616

417

Expectation Failed

2616

422

Unprocessable Entity

2518, 4918

424

Locked

(broken WebDAV implementations??)

424

Failed Dependency

2518, 4918

433

Unprocessable Entity

Server Errors

500

Internal Server Error

1945, 2616

501

Not Implemented

1945, 2616

502

Bad Gateway

1945, 2616

503

Service Unavailable

1945, 2616

504

Gateway Timeout

2616

505

HTTP Version Not Supported

2616

507

Insufficient Storage

2518, 4918

Broken Server Software

600

Squid: header parsing error

601

Squid: header size overflow detected while parsing

601

roundcube: software configuration error

603

roundcube: invalid authorization

Request methods

Squid recognizes several request methods as defined in RFC 2616 and RFC 2518 "HTTP Extensions for Distributed Authoring -- WEBDAV" extensions.

 method    defined    cachabil.  meaning
 --------- ---------- ---------- -------------------------------------------
 GET       HTTP/0.9   possibly   object retrieval and simple searches.
 HEAD      HTTP/1.0   possibly   metadata retrieval.
 POST      HTTP/1.0   CC or Exp. submit data (to a program).
 PUT       HTTP/1.1   never      upload data (e.g. to a file).
 DELETE    HTTP/1.1   never      remove resource (e.g. file).
 TRACE     HTTP/1.1   never      appl. layer trace of request route.
 OPTIONS   HTTP/1.1   never      request available comm. options.
 CONNECT   HTTP/1.1r3 never      tunnel SSL connection.
 ICP_QUERY Squid      never      used for ICP based exchanges.
 PURGE     Squid      never      remove object from cache.
 PROPFIND  rfc2518    ?          retrieve properties of an object.
 PROPATCH  rfc2518    ?          change properties of an object.
 MKCOL     rfc2518    never      create a new collection.
 COPY      rfc2518    never      create a duplicate of src in dst.
 MOVE      rfc2518    never      atomically move src to dst.
 LOCK      rfc2518    never      lock an object against modifications.
 UNLOCK    rfc2518    never      unlock an object.

Note that since Squid 3.1, methods not listed here (such as PATCH) are supported "out of the box."

Hierarchy Codes

NONE For TCP HIT, TCP failures, cachemgr requests and all UDP requests, there is no hierarchy information.

DIRECT The object was fetched from the origin server.

SIBLING_HIT The object was fetched from a sibling cache which replied with UDP_HIT.

PARENT_HIT The object was requested from a parent cache which replied with UDP_HIT.

DEFAULT_PARENT No ICP queries were sent. This parent was chosen because it was marked "default" in the config file.

SINGLE_PARENT The object was requested from the only parent appropriate for the given URL.

FIRST_UP_PARENT The object was fetched from the first parent in the list of parents.

NO_PARENT_DIRECT The object was fetched from the origin server, because no parents existed for the given URL.

FIRST_PARENT_MISS The object was fetched from the parent with the fastest (possibly weighted) round trip time.

CLOSEST_PARENT_MISS This parent was chosen, because it included the the lowest RTT measurement to the origin server. See also the closest-only peer configuration option.

CLOSEST_PARENT The parent selection was based on our own RTT measurements.

CLOSEST_DIRECT Our own RTT measurements returned a shorter time than any parent.

NO_DIRECT_FAIL The object could not be requested because of a firewall configuration, see also never_direct and related material, and no parents were available.

SOURCE_FASTEST The origin site was chosen, because the source ping arrived fastest.

ROUNDROBIN_PARENT No ICP replies were received from any parent. The parent was chosen, because it was marked for round robin in the config file and had the lowest usage count.

CACHE_DIGEST_HIT The peer was chosen, because the cache digest predicted a hit. This option was later replaced in order to distinguish between parents and siblings.

CD_PARENT_HIT The parent was chosen, because the cache digest predicted a hit.

CD_SIBLING_HIT The sibling was chosen, because the cache digest predicted a hit.

NO_CACHE_DIGEST_DIRECT This output seems to be unused?

CARP The peer was selected by CARP.

PINNED The server connection was pinned by NTLM or Negotiate authentication requirements.

ORIGINAL_DST The server connection was limited to the client provided destination IP. This occurs on interception proxies when Host security is enabled, or client_dst_passthru transparency is enabled.

ANY_OLD_PARENT (former ANY_PARENT?) Squid used the first considered-alive parent it could reach. This happens when none of the specific parent cache selection algorithms (e.g., userhash or carp) were enabled, all enabled algorithms failed to find a suitable parent, or all suitable parents found by those algorithms failed when Squid tried to forward the request to them.

INVALID CODE part of src/peer_select.c:hier_strings[].

Almost any of these may be preceded by 'TIMEOUT_' if the two-second (default) timeout occurs waiting for all ICP replies to arrive from neighbors, see also the icp_query_timeout configuration option.

The following hierarchy codes were removed from Squid-2:

code                  meaning
--------------------  -------------------------------------------------
PARENT_UDP_HIT_OBJ    hit objects are not longer available.
SIBLING_UDP_HIT_OBJ   hit objects are not longer available.
SSL_PARENT_MISS       SSL can now be handled by squid.
FIREWALL_IP_DIRECT    No special logging for hosts inside the firewall.
LOCAL_IP_DIRECT       No special logging for local networks.

store.log

This file covers the objects currently kept on disk or removed ones. As a kind of transaction log (or journal) it is usually used for debugging purposes. A definitive statement, whether an object resides on your disks is only possible after analyzing the complete log file. The release (deletion) of an object may be logged at a later time than the swap out (save to disk).

The store.log file may be of interest to log file analysis which looks into the objects on your disks and the time they spend there, or how many times a hot object was accessed. The latter may be covered by another log file, too. With knowledge of the cache_dir configuration option, this log file allows for a URL to filename mapping without recursing your cache disks. However, the Squid developers recommend to treat store.log primarily as a debug file, and so should you, unless you know what you are doing.

The print format for a store log entry (one line) consists of thirteen space-separated columns, compare with the storeLog() function in file src/store_log.c:

9ld.%03d %-7s %02d %08X %s %4d %9ld %9ld %9ld %s %ld/%ld %s %s
  1. time The timestamp when the line was logged in UTC with a millisecond fraction.

  2. action The action the object was sumitted to, compare with src/store_log.c:

    • CREATE Seems to be unused.

    • RELEASE The object was removed from the cache (see also file number below).

    • SWAPOUT The object was saved to disk.

    • SWAPIN The object existed on disk and was read into memory.

  3. dir number The cache_dir number this object was stored into, starting at 0 for your first cache_dir line.

  4. file number The file number for the object storage file. Please note that the path to this file is calculated according to your cache_dir configuration. A file number of FFFFFFFF indicates "memory only" objects. Any action code for such a file number refers to an object which existed only in memory, not on disk. For instance, if a RELEASE code was logged with file number FFFFFFFF, the object existed only in memory, and was released from memory.

  5. hash The hash value used to index the object in the cache. Squid currently uses MD5 for the hash value.

  6. status The HTTP reply status code.

  7. datehdr The value of the HTTP Date reply header.

  8. lastmod The value of the HTTP Last-Modified reply header.

  9. expires The value of the HTTP "Expires: " reply header.

  10. type The HTTP Content-Type major value, or "unknown" if it cannot be determined.

  11. sizes This column consists of two slash separated fields:

    • The advertised content length from the HTTP Content-Length reply header.

    • The size actually read.
      • If the advertised (or expected) length is missing, it will be set to zero. If the advertised length is not zero, but not equal to the real length, the object will be released from the cache.
  12. method The request method for the object, e.g. GET.

  13. key The key to the object, usually the URL.

    • The datehdr, lastmod, and expires values are all expressed in UTC seconds. The actual values are parsed from the HTTP reply headers. An unparsable header is represented by a value of -1, and a missing header is represented by a value of -2.

swap.state

This file has a rather unfortunate history which has led to it often being called the swap log. It is in fact a journal of the cache index with a record of every cache object written to disk. It is read when Squid starts up to "reload" the cache quickly.

If you remove this file when squid is NOT running, you will effectively wipe out your cache index of contents. Squid can rebuild it from the original files, but that procedure can take a long time as every file in the cache must be fully scanned for meta data.

If you remove this file while squid IS running, you can easily recreate it. The safest way is to simply shutdown the running process:

% squid -k shutdown

This will disrupt service, but at least you will have your swap log back. Alternatively, you can tell squid to rotate its log files. This also causes a clean swap log to be written.

% squid -k rotate

By default the swap.state file is stored in the top-level of each cache_dir. You can move the logs to a different location with the cache_swap_state option.

The file is a binary format that includes MD5 checksums, and StoreEntry fields. Please see the Programmers' Guide for information on the contents and format of that file.

squid.out

If you run your Squid from the RunCache script, a file squid.out contains the Squid startup times, and also all fatal errors, e.g. as produced by an assert() failure. If you are not using RunCache, you will not see such a file.

  • /!\ RunCache has been obsoleted since Squid-2.6. Modern Squid run as daemons usually log this output to the system syslog facility or if run manually to stdout for the account which operates the master daemon process.

useragent.log

  • /!\ Starting from Squid-3.2 this log has become one of the default access.log formats and is always available for use. It is no longer a special separate log file.

The user agent log file is only maintained, if

  • you configured the compile time --enable-useragent-log option, and

  • you pointed the useragent_log configuration option to a file.

From the user agent log file you are able to find out about distribution of browsers of your clients. Using this option in conjunction with a loaded production squid might not be the best of all ideas.

Which log files can I delete safely?

You should never delete access.log, store.log, or cache.log while Squid is running. With Unix, you can delete a file when a process has the file opened. However, the filesystem space is not reclaimed until the process closes the file.

If you accidentally delete swap.state while Squid is running, you can recover it by following the instructions in the previous questions. If you delete the others while Squid is running, you can not recover them.

The correct way to maintain your log files is with Squid's "rotate" feature. You should rotate your log files at least once per day. The current log files are closed and then renamed with numeric extensions (.0, .1, etc). If you want to, you can write your own scripts to archive or remove the old log files. If not, Squid will only keep up to logfile_rotate versions of each log file. The logfile rotation procedure also writes a clean swap.state file, but it does not leave numbered versions of the old files.

If you set logfile_rotate to 0, Squid simply closes and then re-opens the logs. This allows third-party logfile management systems, such as newsyslog, to maintain the log files.

To rotate Squid's logs, simple use this command:

squid -k rotate

For example, use this cron entry to rotate the logs at midnight:

0 0 * * * /usr/local/squid/bin/squid -k rotate

How can I disable Squid's log files?

To disable access.log:

access_log none

To disable store.log:

cache_store_log none

To disable cache.log:

cache_log /dev/null

<!>

It is a bad idea to disable the cache.log because this file contains many important status and debugging messages. However, if you really want to, you can

/!\

If /dev/null is specified to any of the above log files, logfile rotate MUST also be set to 0 or else risk Squid rotating away /dev/null making it a plain log file

{i}

Instead of disabling the log files, it is advisable to use a smaller value for logfile_rotate and properly rotating Squid's log files in your cron. That way, your log files are more controllable and self-maintained by your system

What is the maximum size of access.log?

Squid does not impose a size limit on its log files. Some operating systems have a maximum file size limit, however. If a Squid log file exceeds the operating system's size limit, Squid receives a write error and shuts down. You should regularly rotate Squid's log files so that they do not become very large.

/!\

Logging is very important to Squid. In fact, it is so important that it will shut itself down if it can't write to its logfiles. This includes cases such as a full log disk, or logfiles getting too big.

My log files get very big!

You need to rotate your log files with a cron job. For example:

0 0 * * * /usr/local/squid/bin/squid -k rotate

When logging debug information into cache.log it can easily become extremely large and when a long access.log traffic history is required (ie by law in some countries) storing large cache.log for that time is not reasonable. From Squid-3.2 cache.log can be rotated with an individual cap set by debug_options rotate=N} option to store fewer of these large files in the .0 to .N series of backups. The default is to store the same number as with access.log and set in the logfile_rotate directive.

I want to use another tool to maintain the log files.

If you set logfile_rotate to 0, Squid simply closes and then re-opens the logs. This allows third-party logfile management systems, such as newsyslog or logrotate, to maintain the log files.

Squid-2.7 and Squid-3.2 and later also provide modular logging outputs which provide flexibility for sending log data to alternative logging systems.

Managing log files

The preferred log file for analysis is the access.log file in native format. For long term evaluations, the log file should be obtained at regular intervals. Squid offers an easy to use API for rotating log files, in order that they may be moved (or removed) without disturbing the cache operations in progress. The procedures were described above.

Depending on the disk space allocated for log file storage, it is recommended to set up a cron job which rotates the log files every 24, 12, or 8 hour. You will need to set your logfile_rotate to a sufficiently large number. During a time of some idleness, you can safely transfer the log files to your analysis host in one burst.

Before transport, the log files can be compressed during off-peak time. On the analysis host, the log file are concatenated into one file, so one file for 24 hours is the yield. Also note that with log_icp_queries enabled, you might have around 1 GB of uncompressed log information per day and busy cache. Look into you cache manager info page to make an educated guess on the size of your log files.

The EU project DESIRE developed some some basic rules to obey when handling and processing log files:

  • Respect the privacy of your clients when publishing results.
  • Keep logs unavailable unless anonymized. Most countries have laws on privacy protection, and some even on how long you are legally allowed to keep certain kinds of information.
  • Rotate and process log files at least once a day. Even if you don't process the log files, they will grow quite large, see My log files get very big above here. If you rely on processing the log files, reserve a large enough partition solely for log files.

  • Keep the size in mind when processing. It might take longer to process log files than to generate them!
  • Limit yourself to the numbers you are interested in. There is data beyond your dreams available in your log file, some quite obvious, others by combination of different views. Here are some examples for figures to watch:
    • The hosts using your cache.
    • The elapsed time for HTTP requests - this is the latency the user sees. Usually, you will want to make a distinction for HITs and MISSes and overall times. Also, medians are preferred over averages.
    • The requests handled per interval (e.g. second, minute or hour).

Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?

This message means that the requested object was in "Delete Behind" mode and the user aborted the transfer. An object will go into "Delete Behind" mode if

  • It is larger than maximum_object_size

  • It is being fetched from a neighbor which has the proxy-only option set.

What does ERR_LIFETIME_EXP mean?

This means that a timeout occurred while the object was being transferred. Most likely the retrieval of this object was very slow (or it stalled before finishing) and the user aborted the request. However, depending on your settings for quick_abort, Squid may have continued to try retrieving the object. Squid imposes a maximum amount of time on all open sockets, so after some amount of time the stalled request was aborted and logged win an ERR_LIFETIME_EXP message.

Retrieving "lost" files from the cache

"I've been asked to retrieve an object which was accidentally destroyed at the source for recovery. So, how do I figure out where the things are so I can copy them out and strip off the headers?""

The following method applies only to the Squid-1.1 versions:

Use grep to find the named object (URL) in the cache.log file. The first field in this file is an integer file number.

Then, find the file fileno-to-pathname.pl from the "scripts" directory of the Squid source distribution. The usage is

perl fileno-to-pathname.pl [-c squid.conf]

file numbers are read on stdin, and pathnames are printed on stdout.

Can I use store.log to figure out if a response was cachable?

Sort of. You can use store.log to find out if a particular response was cached.

Cached responses are logged with the SWAPOUT tag. Uncached responses are logged with the RELEASE tag.

However, your analysis must also consider that when a cached response is removed from the cache (for example due to cache replacement) it is also logged in store.log with the RELEASE tag. To differentiate these two, you can look at the filenumber (3rd) field. When an uncachable response is released, the filenumber is FFFFFFFF (-1). Any other filenumber indicates a cached response was released.

Can I pump the squid access.log directly into a pipe?

Several people have asked for this, usually to feed the log into some kind of external database, or to analyze them in real-time.

The answer is No. Well, yes, sorta. Using a pipe directly opens up a whole load of possible problems.

/!\

Logging is very important to Squid. In fact, it is so important that it will shut itself down if it can't write to its logfiles.

There are several alternatives which are much safer to setup and use. The basic capabilities present are :

since Squid-2.6:

  • logging to system syslog

since Squid-2.7:

  • logging to an external service via UDP packets
  • logging through IPC to a custom local daemon

since Squid-3.2:

  • logging to an external service via TCP streams

See the Log Modules feature for technical details on setting up a daemon or other output modules.


How do I see system level Squid statistics?

The Squid distribution includes a CGI utility called cachemgr.cgi which can be used to view squid statistics with a web browser. See ../CacheManager for more information on its usage and installation.

Managing the Cache Storage

How can I make Squid NOT cache some servers or URLs?

From Squid-2.6, you use the cache option to specify uncachable requests and any exceptions to your cachable rules.

For example, this makes all responses from origin servers in the 10.0.1.0/24 network uncachable:

acl localnet dst 10.0.1.0/24
cache deny localnet

This example makes all URL's with '.html' uncachable:

acl HTML url_regex .html$
cache deny HTML

This example makes a specific URL uncachable:

acl XYZZY url_regex ^http://www.i.suck.com/foo.html$
cache deny XYZZY

This example caches nothing between the hours of 8AM to 11AM:

acl Morning time 08:00-11:00
cache deny Morning

How can I purge an object from my cache?

Squid does not allow you to purge objects unless it is configured with access controls in squid.conf. First you must add something like

acl PURGE method PURGE
acl localhost src 127.0.0.1
http_access allow PURGE localhost
http_access deny PURGE

The above only allows purge requests which come from the local host and denies all other purge requests.

To purge an object, you can use the squidclient program:

squidclient -m PURGE http://www.miscreant.com/

If the purge was successful, you will see a "200 OK" response:

HTTP/1.0 200 OK
Date: Thu, 17 Jul 1997 16:03:32 GMT
...

Sometimes if the object was not found in the cache, you will see a "404 Not Found" response:

HTTP/1.0 404 Not Found
Date: Thu, 17 Jul 1997 16:03:22 GMT
...

Such 404 are not failures. It simply means the object has already been purged by other means or never existed. So the final result you wanted (object no longer exists in cache) has happened.

How can I purge multiple objects from my cache?

It's not possible; you have to purge the objects one by one by URL. This is because squid doesn't keep in memory the URL of every object it stores, but only a compact representation of it (a hash). Finding the hash given the URL is easy, the other way around is not possible.

Purging by wildcard, by domain, by time period, etc. are unfortunately not possible at this time.

How can I find the biggest objects in my cache?

sort -r -n +4 -5 access.log | awk '{print $5, $7}' | head -25

If your cache processes several hundred hits per second, good luck.

How can I add a cache directory?

  1. Edit squid.conf and add a new cache_dir line.

  2. Shutdown Squid  squid -k shutdown 

  3. Initialize the new directory by running

     squid -z 
  4. Start Squid again

How can I delete a cache directory?

  • {i} If you don't have any cache_dir lines in your squid.conf, then Squid was using the default. From Squid-3.1 the default has been changed to memory-only cache and does not involve cache_dir.

    For Squid older than 3.1 using the default you'll need to add a new cache_dir line because Squid will continue to use the default otherwise. You can add a small, temporary directory, for example:

    /usr/local/squid/cachetmp ....
    see above about creating a new cache directory.

    /!\ do not use /tmp !! That will cause Squid to periodically encounter fatal errors.

The removal:

  1. Edit your squid.conf file and comment out, or delete the cache_dir line for the cache directory that you want to remove.

  2. You can not delete a cache directory from a running Squid process; you can not simply reconfigure squid.
  3. You must shutdown Squid:

    squid -k shutdown
  4. Once Squid exits, you may immediately start it up again.

Since you deleted the old cache_dir from squid.conf, Squid won't try to access that directory. If you use the RunCache script, Squid should start up again automatically.

Now Squid is no longer using the cache directory that you removed from the config file. You can verify this by checking "Store Directory" information with the cache manager. From the command line, type:

squidclient mgr:storedir

I want to restart Squid with a clean cache

Squid-2.6 and later contain mechanisms which will automatically detect dirty information in both the cache directories and swap.state file. When squid starts up it runs these validation and security checks. The objects which fail for any reason are automatically purged from the cache.

The above mechanisms can be triggered manually to force squid into a full cache_dir scan and re-load all objects from disk by simply shuttign down Squid and deleting the swap.state journal from each cache_dir before restarting.

  • NP: Deleting the swap.state before shutting down will cause Squid to generate new ones and fail to do the re-scan you wanted.

I want to restart Squid with an empty cache

To erase the entire contents of the cache and make Squid start fresh the following commands provide the fastest recovery time:

 squid -k shutdown
 mv /dir/cache /dir/cache.old

repeat for each cache_dir location you wish to empty.

 squid -z
 squid
 rm -rf /dir/cache.old

The rm command may take some time, but since Squid is already back up and running the service downtime is reduced.

Using ICMP to Measure the Network

As of version 1.1.9, Squid is able to utilize ICMP Round-Trip-Time (RTT) measurements to select the optimal location to forward a cache miss. Previously, cache misses would be forwarded to the parent cache which returned the first ICP reply message. These were logged with FIRST_PARENT_MISS in the access.log file. Now we can select the parent which is closest (RTT-wise) to the origin server.

Supporting ICMP in your Squid cache

It is more important that your parent caches enable the ICMP features. If you are acting as a parent, then you may want to enable ICMP on your cache. Also, if your cache makes RTT measurements, it will fetch objects directly if your cache is closer than any of the parents.

If you want your Squid cache to measure RTT's to origin servers, Squid must be compiled with the USE_ICMP option. This is easily accomplished by uncommenting "-DUSE_ICMP=1" in src/Makefile and/or src/Makefile.in.

An external program called pinger is responsible for sending and receiving ICMP packets. It must run with root privileges. After Squid has been compiled, the pinger program must be installed separately. A special Makefile target will install pinger with appropriate permissions.

% make install
% su
# make install-pinger

There are three configuration file options for tuning the measurement database on your cache. netdb_low and netdb_high specify high and low water marks for keeping the database to a certain size (e.g. just like with the IP cache). The netdb_ttl option specifies the minimum rate for pinging a site. If netdb_ttl is set to 300 seconds (5 minutes) then an ICMP packet will not be sent to the same site more than once every five minutes. Note that a site is only pinged when an HTTP request for the site is received.

Another option, minimum_direct_hops can be used to try finding servers which are close to your cache. If the measured hop count to the origin server is less than or equal to minimum_direct_hops, the request will be forwarded directly to the origin server.

Utilizing your parents database

Your parent caches can be asked to include the RTT measurements in their ICP replies. To do this, you must enable query_icmp in your config file:

query_icmp on

This causes a flag to be set in your outgoing ICP queries.

If your parent caches return ICMP RTT measurements then the eighth column of your access.log will have lines similar to:

CLOSEST_PARENT_MISS/it.cache.nlanr.net

In this case, it means that it.cache.nlanr.net returned the lowest RTT to the origin server. If your cache measured a lower RTT than any of the parents, the request will be logged with

CLOSEST_DIRECT/www.sample.com

Inspecting the database

The measurement database can be viewed from the cachemgr by selecting "Network Probe Database." Hostnames are aggregated into /24 networks. All measurements made are averaged over time. Measurements are made to specific hosts, taken from the URLs of HTTP requests. The recv and sent fields are the number of ICMP packets sent and received. At this time they are only informational.

A typical database entry looks something like this:

    Network          recv/sent     RTT  Hops Hostnames
    192.41.10.0        20/  21    82.3   6.0 www.jisedu.org www.dozo.com
bo.cache.nlanr.net        42.0   7.0
uc.cache.nlanr.net        48.0  10.0
pb.cache.nlanr.net        55.0  10.0
it.cache.nlanr.net       185.0  13.0

This means we have sent 21 pings to both www.jisedu.org and www.dozo.com. The average RTT is 82.3 milliseconds. The next four lines show the measured values from our parent caches. Since bo.cache.nlanr.net has the lowest RTT, it would be selected as the location to forward a request for a www.jisedu.org or www.dozo.com URL.

Why are so few requests logged as TCP_IMS_MISS?

When Squid receives an If-Modified-Since request, it will not forward the request unless the object needs to be refreshed according to the refresh_pattern rules. If the request does need to be refreshed, then it will be logged as TCP_REFRESH_HIT or TCP_REFRESH_MISS.

If the request is not forwarded, Squid replies to the IMS request according to the object in its cache. If the modification times are the same, then Squid returns TCP_IMS_HIT. If the modification times are different, then Squid returns TCP_IMS_MISS. In most cases, the cached object will not have changed, so the result is TCP_IMS_HIT. Squid will only return TCP_IMS_MISS if some other client causes a newer version of the object to be pulled into the cache.

Why do I need to run Squid as root? why can't I just use cache_effective_user root?

  • by Antony Stone and Dave J Woolley

Why run the parent squid process as root and the child as user proxy? Is that normal? Is it best practice? Should I chmod or chown cache and other directories?

It is completely normal for a great many applications providing network services, and yes, it is best practice. In fact some will not allow you to run them as root, without an unprivileged user to run the main process as. This applies to all programs that don't absolutely need root status, not just squid.

The reasoning is simple:

  1. You need root privileges to do certain things when you start an application

(such as bind to a network socket, open a log file, perhaps read a configuration file), therefore it starts as root.

  1. Any application might contain bugs which lead to security vulnerabilities,

which can be remotely exploited through the network connection, and until the bugs are fixed, you at least want to minimise the risk presented by them.

  1. Therefore as soon as you've done all the things involved in (1) above, you

drop the privilege level of the application, and/or spawn a child process with reduced privilege, so that it still runs and does everything you need, but if a vulnerability is exploited, it no longer has root privilege and therefore cannot cause as much damage as it might have done.

Squid does this with cache_effective_user. The coordinator (daemon manager) process must be run as 'root' in order to setup the administrative details and will downgrade its privileges to the cache_effective_user account before running any of the more risky network operations.

If the cache_effective_group is configured Squid will drop additional group privileges and run as only the user:group specified.

The -N command line option makes Squid run without spawning low-privileged child processes for safe networking. When this option is used Squid main process will drop its privileges down to the cache_effective_user account but will try to retain some means of regaining root privileges for reconfiguration. Some components which rely on the more dangerous root privieges will not be able to be altered with just a reconfigure but will need a full restart.

Can you tell me a good way to upgrade Squid with minimal downtime?

Here is a technique that was described by Radu Greab.

Start a second Squid server on an unused HTTP port (say 4128). This instance of Squid probably doesn't need a large disk cache. When this second server has finished reloading the disk store, swap the http_port values in the two squid.conf files. Set the original Squid to use port 5128, and the second one to use 3128. Next, run "squid -k reconfigure" for both Squids. New requests will go to the second Squid, now on port 3128 and the first Squid will finish handling its current requests. After a few minutes, it should be safe to fully shut down the first Squid and upgrade it. Later you can simply repeat this process in reverse.

Can Squid listen on more than one HTTP port?

Note: The information here is current for version 2.3.

Yes, you can specify multiple http_port lines in your squid.conf file. Squid attempts to bind() to each port that you specify. Sometimes Squid may not be able to bind to a port, either because of permissions or because the port is already in use. If Squid can bind to at least one port, then it will continue running. If it can not bind to any of the ports, then Squid stops.

With version 2.3 and later you can specify IP addresses and port numbers together (see the squid.conf comments).

Can I make origin servers see the client's IP address when going through Squid?

Normally you cannot. Most TCP/IP stacks do not allow applications to create sockets with the local endpoint assigned to a foreign IP address. However, some folks have some patches to Linux that allow exactly that.

In this situation, you must ensure that all HTTP packets destined for the client IP addresses are routed to the Squid box. If the packets take another path, the real clients will send TCP resets to the origin servers, thereby breaking the connections.


Why does Squid use so much memory!?

Squid uses a lot of memory for performance reasons. It takes much, much longer to read something from disk than it does to read directly from memory.

A small amount of metadata for each cached object is kept in memory. This is the StoreEntry data structure. This is 56-bytes on 32-bit architectures and 88-bytes on 64-bit architectures. In addition, there is a 16-byte cache key (MD5 checksum) associated with each StoreEntry. This means there are 72 or 104 bytes of metadata in memory for every object in your cache. A cache with 1,000,000 objects therefore requires 72MB of memory for metadata only. In practice it requires much more than that.

Uses of memory by Squid include:

reason

parameter

explanation

Disk buffers for reading and writing

-

-

Network I/O buffers

read_ahead_gap *

D

IP Cache contents

ipcache_size

DNS

FQDN Cache contents

fqdncache_size

DNS

Netdb ICMP measurement database

-

N

Per-request state information, including full request and reply headers

-

D

Miscellaneous statistics collection

-

D

Index of on-disk cache (metadata, kept in memory)

cache_dir

I

In-memory cache with "hot objects"

cache_mem

M+I

Explanation of letters:

.

explanation

D

dynamic; more memory is used if more users visit more websites

I

10 MB of memory per 1 GB on disk for 32-bit Squid
14 MB of memory per 1 GB on disk for 64-bit Squid

N

not used often

M

rule of thumb: cache_mem is usually one third of the total memory consumption.
On top of the value configured there is also memory used by the index of these objects (see 'I').

DNS

not recommended to change. Only increase for very large caches or if there is a slow DNS server

  • read_ahead_gap only caps the window of data read from a server and not yet delivered to the client. There are at least two buffers (client-to-server and server-to-client directions) and an additional one for each ICAP service the current transaction is going through.

There is also memory used indirectly: the Operating System has buffers for TCP connections and file system I/O.

How can I tell how much memory my Squid process is using?

One way is to simply look at ps output on your system. For BSD-ish systems, you probably want to use the -u option and look at the VSZ and RSS fields:

wessels ~ 236% ps -axuhm
USER       PID %CPU %MEM   VSZ  RSS     TT  STAT STARTED       TIME COMMAND
squid     9631  4.6 26.4 141204 137852  ??  S    10:13PM   78:22.80 squid -NCYs

For SYSV-ish, you probably want to use the -l option. When interpreting the ps output, be sure to check your ps manual page. It may not be obvious if the reported numbers are kbytes, or pages (usually 4 kb).

A nicer way to check the memory usage is with a program called top:

last pid: 20128;  load averages:  0.06,  0.12,  0.11                   14:10:58
46 processes:  1 running, 45 sleeping
CPU states:     % user,     % nice,     % system,     % interrupt,     % idle
Mem: 187M Active, 1884K Inact, 45M Wired, 268M Cache, 8351K Buf, 1296K Free
Swap: 1024M Total, 256K Used, 1024M Free

  PID USERNAME PRI NICE SIZE    RES STATE    TIME   WCPU    CPU COMMAND
 9631 squid     2   0   138M   135M select  78:45  3.93%  3.93% squid

Finally, you can ask the Squid process to report its own memory usage. This is available on the Cache Manager info page. Your output may vary depending upon your operating system and Squid version, but it looks similar to this:

Resource usage for squid:
Maximum Resident Size: 137892 KB
Memory usage for squid via mstats():
Total space in arena:  140144 KB
Total free:              8153 KB 6%

If your RSS (Resident Set Size) value is much lower than your process size, then your cache performance is most likely suffering due to Paging. See also ../CacheManager

Why does Squid use so much cache memory?

It can appear that a machine running Squid is using a huge amount of memory "cached Mem"

 KiB Mem:   4037016 total,  3729152 used,   307864 free,   120508 buffers
 KiB Swap:  8511484 total,        0 used,  8511484 free.  2213580 cached Mem

This is normal behaviour in Linux - everything that's once read from disk is cached in RAM, as long as there is free memory. If the RAM is needed in another way, the cache in memory will be reduced. See also https://www.linuxatemyram.com/

Machines running Squid can show unusual amounts of this disk I/O caching happening because Squid caches contain a lot of files and access them randomly.

My Squid process grows without bounds.

You might just have your cache_mem parameter set too high. See What can I do to reduce Squid's memory usage? below.

When a process continually grows in size, without levelling off or slowing down, it often indicates a memory leak. A memory leak is when some chunk of memory is used, but not free'd when it is done being used.

Memory leaks are a real problem for programs (like Squid) which do all of their processing within a single process. Historically, Squid has had real memory leak problems. But as the software has matured, we believe almost all of Squid's memory leaks have been eliminated, and new ones are least easy to identify.

Memory leaks may also be present in your system's libraries, such as libc.a or even libmalloc.a. If you experience the ever-growing process size phenomenon, we suggest you first try #alternate-malloc.

I set cache_mem to XX, but the process grows beyond that!

The cache_mem parameter does NOT specify the maximum size of the process. It only specifies how much memory to use for caching "hot" (very popular) replies. Squid's actual memory usage is depends very strongly on your cache_dir sizes (disk space) and your incoming request load. Reducing cache_mem will usually also reduce the process size, but not necessarily, and there are other ways to reduce Squid's memory usage (see below).

See also How much memory do I need in my Squid server?.

How do I analyze memory usage from the cache manager output?

Note: This information is specific to Squid-1.1 versions

Look at your cachemgr.cgi Cache Information page. For example:

Memory usage for squid via mallinfo():
       Total space in arena:   94687 KB
       Ordinary blocks:        32019 KB 210034 blks
       Small blocks:           44364 KB 569500 blks
       Holding blocks:             0 KB   5695 blks
       Free Small blocks:       6650 KB
       Free Ordinary blocks:   11652 KB
       Total in use:           76384 KB 81%
       Total free:             18302 KB 19%

Meta Data:
StoreEntry                246043 x 64 bytes =  15377 KB
IPCacheEntry              971 x   88 bytes  =     83 KB
Hash link                 2 x   24 bytes    =      0 KB
URL strings                                 =  11422 KB
Pool MemObject structures 514 x  144 bytes  =     72 KB (    70 free)
Pool for Request structur 516 x 4380 bytes  =   2207 KB (  2121 free)
Pool for in-memory object 6200 x 4096 bytes =  24800 KB ( 22888 free)
Pool for disk I/O         242 x 8192 bytes =   1936 KB (  1888 free)
Miscellaneous                              =   2600 KB
total Accounted                            =  58499 KB

First note that mallinfo() reports 94M in "arena." This is pretty close to what top says (97M).

Of that 94M, 81% (76M) is actually being used at the moment. The rest has been freed, or pre-allocated by malloc(3) and not yet used.

Of the 76M in use, we can account for 58.5M (76%). There are some calls to malloc(3) for which we can't account.

The Meta Data list gives the breakdown of where the accounted memory has gone. 45% has gone to StoreEntry and URL strings. Another 42% has gone to buffering hold objects in VM while they are fetched and relayed to the clients (Pool for in-memory object).

The pool sizes are specified by squid.conf parameters. In version 1.0, these pools are somewhat broken: we keep a stack of unused pages instead of freeing the block. In the Pool for in-memory object, the unused stack size is 1/2 of cache_mem. The Pool for disk I/O is hardcoded at 200. For MemObject and Request it's 1/8 of your system's FD_SETSIZE value.

If you need to lower your process size, we recommend lowering the max object sizes in the 'http', 'ftp' and 'gopher' config lines. You may also want to lower cache_mem to suit your needs. But if you make cache_mem too low, then some objects may not get saved to disk during high-load periods. Newer Squid versions allow you to set memory_pools OFF to disable the free memory pools.

The "Total memory accounted" value is less than the size of my Squid process.

We are not able to account for all memory that Squid uses. This would require excessive amounts of code to keep track of every last byte. We do our best to account for the major uses of memory.

Also, note that the malloc and free functions have their own overhead. Some additional memory is required to keep track of which chunks are in use, and which are free. Additionally, most operating systems do not allow processes to shrink in size. When a process gives up memory by calling free, the total process size does not shrink. So the process size really represents the maximum size your Squid process has reached.

xmalloc: Unable to allocate 4096 bytes!

by HenrikNordström

Messages like "FATAL: xcalloc: Unable to allocate 4096 blocks of 1 bytes!" appear when Squid can't allocate more memory, and on most operating systems (inclusive BSD) there are only two possible reasons:

  • The machine is out of swap
  • The process' maximum data segment size has been reached

The first case is detected using the normal swap monitoring tools available on the platform (pstat on SunOS, perhaps pstat is used on BSD as well).

To tell if it is the second case, first rule out the first case and then monitor the size of the Squid process. If it dies at a certain size with plenty of swap left then the max data segment size is reached without no doubts.

The data segment size can be limited by two factors:

  • Kernel imposed maximum, which no user can go above
  • The size set with ulimit, which the user can control.

When squid starts it sets data and file ulimit's to the hard level. If you manually tune ulimit before starting Squid make sure that you set the hard limit and not only the soft limit (the default operation of ulimit is to only change the soft limit). root is allowed to raise the soft limit above the hard limit.

This command prints the hard limits:

ulimit -aH

This command sets the data size to unlimited:

ulimit -HSd unlimited

BSD/OS

by Arjan de Vet

The default kernel limit on BSD/OS for datasize is 64MB (at least on 3.0 which I'm using).

Recompile a kernel with larger datasize settings:

maxusers        128
# Support for large inpcb hash tables, e.g. busy WEB servers.
options         INET_SERVER
# support for large routing tables, e.g. gated with full Internet routing:
options         "KMEMSIZE=\(16*1024*1024\)"
options         "DFLDSIZ=\(128*1024*1024\)"
options         "DFLSSIZ=\(8*1024*1024\)"
options         "SOMAXCONN=128"
options         "MAXDSIZ=\(256*1024*1024\)"

See /usr/share/doc/bsdi/config.n for more info.

In /etc/login.conf I have this:

default:\
        :path=/bin /usr/bin /usr/contrib/bin:\
        :datasize-cur=256M:\
        :openfiles-cur=1024:\
        :openfiles-max=1024:\
        :maxproc-cur=1024:\
        :stacksize-cur=64M:\
        :radius-challenge-styles=activ,crypto,skey,snk,token:\
        :tc=auth-bsdi-defaults:\
        :tc=auth-ftp-bsdi-defaults:

#
# Settings used by /etc/rc and root
# This must be set properly for daemons started as root by inetd as well.
# Be sure reset these values back to system defaults in the default class!
#
daemon:\
        :path=/bin /usr/bin /sbin /usr/sbin:\
        :widepasswords:\
        :tc=default:
#       :datasize-cur=128M:\
#       :openfiles-cur=256:\
#       :maxproc-cur=256:\

This should give enough space for a 256MB squid process.

FreeBSD (2.2.X)

by [wessels Duane Wessels]

The procedure is almost identical to that for BSD/OS above. Increase the open filedescriptor limit in /sys/conf/param.c:

int     maxfiles = 4096;
int     maxfilesperproc = 1024;

Increase the maximum and default data segment size in your kernel config file, e.g. /sys/conf/i386/CONFIG:

options         "MAXDSIZ=(512*1024*1024)"
options         "DFLDSIZ=(128*1024*1024)"

We also found it necessary to increase the number of mbuf clusters:

options         "NMBCLUSTERS=10240"

And, if you have more than 256 MB of physical memory, you probably have to disable BOUNCE_BUFFERS (whatever that is), so comment out this line:

#options        BOUNCE_BUFFERS          #include support for DMA bounce buffers

Also, update limits in /etc/login.conf:

# Settings used by /etc/rc
#
daemon:\
        :coredumpsize=infinity:\
        :datasize=infinity:\
        :maxproc=256:\
        :maxproc-cur@:\
        :memoryuse-cur=64M:\
        :memorylocked-cur=64M:\
        :openfiles=4096:\
        :openfiles-cur@:\
        :stacksize=64M:\
        :tc=default:

And don't forget to run "cap_mkdb /etc/login.conf" after editing that file.

OSF, Digital Unix

by Ong Beng Hui

To increase the data size for Digital UNIX, edit the file /etc/sysconfigtab and add the entry...

proc:
        per-proc-data-size=1073741824

Or, with csh, use the limit command, such as

> limit datasize 1024M

Editing /etc/sysconfigtab requires a reboot, but the limit command doesn't.

fork: (12) Cannot allocate memory

When Squid is reconfigured (SIGHUP) or the logs are rotated (SIGUSR1), some of the helper processes (dnsserver) must be killed and restarted. If your system does not have enough virtual memory, the Squid process may not be able to fork to start the new helper processes. This is due to the UNIX way of starting child processes using the fork() system call which temporary duplicates the whole Squid process, and when rapidly starting many child processes such as on "squid -k rotate" the memory usage can temporarily grow to many times the normal memory usage due to several temporary copies of the whole process.

The best way to fix this is to increase your virtual memory by adding swap space. Normally your system uses raw disk partitions for swap space, but most operating systems also support swapping on regular files (Digital Unix excepted). See your system manual pages for swap, swapon, and mkfile. Alternatively you can use the sleep_after_fork directive to make Squid sleep a little while invoking helpers to allow the helper to start up before trying to start the next one. This can be helpful if you find that Squid sometimes fail to restart all helpers on "squid -k reconfigure".

What can I do to reduce Squid's memory usage?

If your cache performance is suffering because of memory limitations, you might consider buying more memory. But if that is not an option, There are a number of things to try:

  • Try a different malloc library (see below)
  • Reduce the cache_mem parameter in the config file. This controls how many "hot" objects are kept in memory. Reducing this parameter will not significantly affect performance, but you may recieve some warnings in cache.log if your cache is busy.

  • Turn the memory_pools OFF in the config file. This causes Squid to give up unused memory by calling free() instead of holding on to the chunk for potential, future use. Generally speaking, this is a bad idea as it will induce heap fragmentation. Use memory_pools_limit instead.

  • Reduce the cache_swap_low or cache_dir parameter in your config file. This will reduce the number of objects Squid keeps. Your overall hit ratio may go down a little, but your cache will perform significantly better.

Using an alternate malloc library

Many users have found improved performance and memory utilization when linking Squid with an external malloc library. We recommend either GNU malloc, or dlmalloc.

GNU malloc

To make Squid use GNU malloc follow these simple steps:

  • Download the GNU malloc source, available from one of The GNU mirrors.

  • Compile it

% gzip -dc malloc.tar.gz | tar xf -
% cd malloc
% vi Makefile     # edit as needed
% make
  • Copy libmalloc.a to your system's library directory and be sure to name it libgnumalloc.a.

% su
# cp malloc.a /usr/lib/libgnumalloc.a
  • (Optional) Copy the GNU malloc.h to your system's include directory and be sure to name it gnumalloc.h. This step is not required, but if you do this, then Squid will be able to use the mstat() function to report memory usage statistics on the cachemgr info page.

# cp malloc.h /usr/include/gnumalloc.h
  • Reconfigure and recompile Squid

% make distclean
% ./configure ...
% make
% make install

As Squid's configure script runs, watch its output. You should find that it locates libgnumalloc.a and optionally gnumalloc.h.

How much memory do I need in my Squid server?

As a rule of thumb on Squid uses approximately 10 MB of RAM per GB of the total of all cache_dirs (more on 64 bit servers such as Alpha), plus your cache_mem setting and about an additional 10-20MB. It is recommended to have at least twice this amount of physical RAM available on your Squid server. For a more detailed discussion on Squid's memory usage see the sections above.

The recommended extra RAM besides what is used by Squid is used by the operating system to improve disk I/O performance and by other applications or services running on the server. This will be true even of a server which runs Squid as the only tcp service, since there is a minimum level of memory needed for process management, logging, and other OS level routines.

If you have a low memory server, and a large disk, then you will not necessarily be able to use all the disk space, since as the cache fills the memory available will be insufficient, forcing Squid to swap out memory and affecting performance. A very large cache_dir total and insufficient physical RAM + Swap could cause Squid to stop functioning completely. The solution for larger caches is to get more physical RAM; allocating more to Squid via cache_mem will not help.

Why can't my Squid process grow beyond a certain size?

by [AdrianChadd Adrian Chadd]

A number of people are running Squid with more than a gigabyte of memory. Here are some things to keep in mind.

  • The Operating System may put a limit on how much memory available per-process. Check the resource limits (/etc/security/limits.conf or similar under PAM systems, 'ulimit', etc.)
  • The Operating System may have a limit on the size of processes. 32-bit platforms are sometimes "split" to be 2gb process/2gb kernel; this can be changed to be 3gb process/1gb kernel through a kernel recompile or boot-time option. Check your operating system's documentation for specific details.
  • Some malloc implementations may not support > 2gb of memory - eg dlmalloc. Don't use dlmalloc unless your platform is very broken (and then realise you won't be able to use >2gb RAM using it.)

  • Make sure the Squid has been compiled to be a 64 bit binary (with modern Unix-like OSes you can use the 'file' command for this); some platforms may have a 64 bit kernel but a 32 bit userland, or the compiler may default to a 32 bit userland.


Include: Nothing found for "^Back to the"!

Access Controls in Squid

Contents

  1. Access Controls in Squid
    1. The Basics: How the parts fit together
    2. ACL elements
    3. Access Lists
    4. How do I allow my clients to use the cache?
    5. how do I configure Squid not to cache a specific server?
    6. How do I implement an ACL ban list?
    7. How do I block specific users or groups from accessing my cache?
      1. Using Ident
    8. Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
      1. Using Proxy Authentication
    9. Do you have a CGI program which lets users change their own proxy passwords?
    10. Common Mistakes
      1. And/Or logic
      2. allow/deny mixups
      3. Differences between ''src'' and ''srcdomain'' ACL types
    11. I set up my access controls, but they don't work! why?
    12. Proxy-authentication and neighbor caches
    13. Is there an easy way of banning all Destination addresses except one?
    14. How can I block access to porn sites?
    15. Does anyone have a ban list of porn sites and such?
    16. Squid doesn't match my subdomains
    17. Why does Squid deny some port numbers?
    18. Does Squid support the use of a database such as mySQL for storing the ACL list?
    19. How can I allow a single address to access a specific URL?
    20. How can I allow some clients to use the cache at specific times?
    21. How can I allow some users to use the cache at specific times?
    22. Problems with IP ACL's that have complicated netmasks
    23. Can I set up ACL's based on MAC address rather than IP?
    24. Can I limit the number of connections from a client?
    25. I'm trying to deny ''foo.com'', but it's not working.
    26. I want to customize, or make my own error messages.
    27. I want to use local time zone in error messages.
    28. I want to put ACL parameters in an external file.
    29. I want to authorize users depending on their MS Windows group memberships
    30. Maximum length of an acl name
    31. Fast and Slow ACLs

The Basics: How the parts fit together

Squid's access control scheme is relatively comprehensive and difficult for some people to understand. There are two different components: ACL elements, and access lists. An access list consists of an allow or deny action followed by a number of ACL elements.

When loading the configuration file Squid processes all the acl lines (directives) into memory as tests which can be performed against any request transaction. Types of tests are outlined in the next section ACL Elements. By themselves these tests do nothing. For example; the word "Sunday" matches a day of the week, but does not indicate which day of the week you are reading this.

To process a transaction another type of line is used. As each processing action needs to take place a check in run to test what action or limitations are to occur for the transaction. The types of checks are outlined in the next section Access Lists followed by details of how the checks operate.

ACL elements

{i}

The information here is current for version 3.1 see acl for the latest configuration guide list of available types.

Squid knows about the following types of ACL elements:

***** ACL TYPES AVAILABLE *****

  • src: source (client) IP addresses

  • dst: destination (server) IP addresses

  • myip: the local IP address of a client's connection

  • arp: Ethernet (MAC) address matching

  • srcdomain: source (client) domain name

  • dstdomain: destination (server) domain name

  • srcdom_regex: source (client) regular expression pattern matching

  • dstdom_regex: destination (server) regular expression pattern matching

  • src_as: source (client) Autonomous System number

  • dst_as: destination (server) Autonomous System number

  • peername: name tag assigned to the cache_peer where request is expected to be sent.

  • time: time of day, and day of week

  • url_regex: URL regular expression pattern matching

  • urlpath_regex: URL-path regular expression pattern matching, leaves out the protocol and hostname

  • port: destination (server) port number

  • myport: local port number that client connected to

  • myportname: name tag assigned to the squid listening port that client connected to

  • proto: transfer protocol (http, ftp, etc)

  • method: HTTP request method (get, post, etc)

  • http_status: HTTP response status (200 302 404 etc.)

  • browser: regular expression pattern matching on the request user-agent header

  • referer_regex: regular expression pattern matching on the request http-referer header

  • ident: string matching on the user's name

  • ident_regex: regular expression pattern matching on the user's name

  • proxy_auth: user authentication via external processes

  • proxy_auth_regex: regular expression pattern matching on user authentication via external processes

  • snmp_community: SNMP community string matching

  • maxconn: a limit on the maximum number of connections from a single client IP address

  • max_user_ip: a limit on the maximum number of IP addresses one user can login from

  • req_mime_type: regular expression pattern matching on the request content-type header

  • req_header: regular expression pattern matching on a request header content

  • rep_mime_type: regular expression pattern matching on the reply (downloaded content) content-type header. This is only usable in the http_reply_access directive, not http_access.

  • rep_header: regular expression pattern matching on a reply header content. This is only usable in the http_reply_access directive, not http_access.

  • external: lookup via external acl helper defined by external_acl_type

  • user_cert: match against attributes in a user SSL certificate

  • ca_cert: match against attributes a users issuing CA SSL certificate

  • ext_user: match on user= field returned by external acl helper defined by external_acl_type

  • ext_user_regex: regular expression pattern matching on user= field returned by external acl helper defined by external_acl_type

Notes:

Not all of the ACL elements can be used with all types of access lists (described below). For example, snmp_community is only meaningful when used with snmp_access. The src_as and dst_as types are only used in cache_peer_access lines.

The arp ACL requires the special configure option --enable-arp-acl in Squid-3.1 and older, for newer Squid versions EUI-48 (aka MAC address) support is enabled by default. Furthermore, the ARP / EUI-48 code is not portable to all operating systems. It works on Linux, Solaris, and some *BSD variants.

The SNMP ACL element and access list require the --enable-snmp configure option.

Some ACL elements can cause processing delays. For example, use of srcdomain and srcdom_regex require a reverse DNS lookup on the client's IP address. This lookup adds some delay to the request.

Each ACL element is assigned a unique name. A named ACL element consists of a list of values. When checking for a match, the multiple values use OR logic. In other words, an ACL element is matched when any one of its values is a match.

You can't give the same name to two different types of ACL elements. It will generate a syntax error.

You can put different values for the same ACL name on different lines. Squid combines them into one list.

Access Lists

There are a number of different access lists:

  • http_access: Allows HTTP clients (browsers) to access the HTTP port. This is the primary access control list.

  • http_reply_access: Allows HTTP clients (browsers) to receive the reply to their request. This further restricts permissions given by http_access, and is primarily intended to be used together with rep_mime_type acl for blocking different content types.

  • icp_access: Allows neighbor caches to query your cache with ICP.

  • miss_access: Allows certain clients to forward cache misses through your cache. This further restricts permissions given by http_access, and is primarily intended to be used for enforcing sibling relations by denying siblings from forwarding cache misses through your cache.

  • cache: Defines responses that should not be cached.

  • url_rewrite_access: Controls which requests are sent through the redirector pool.

  • ident_lookup_access: Controls which requests need an Ident lookup.

  • always_direct: Controls which requests should always be forwarded directly to origin servers.

  • never_direct: Controls which requests should never be forwarded directly to origin servers.

  • snmp_access: Controls SNMP client access to the cache.

  • broken_posts: Defines requests for which squid appends an extra CRLF after POST message bodies as required by some broken origin servers.

  • cache_peer_access: Controls which requests can be forwarded to a given neighbor (cache_peer).

  • htcp_access: Controls which remote machines are able to make HTCP requests.

  • htcp_clr_access: Controls which remote machines are able to make HTCP CLR requests.

  • request_header_access: Controls which request headers are removed when violating HTTP protocol.

  • reply_header_access: Controls which reply headers are removed from delivery to the client when violating HTTP protocol.

  • delay_access: Controls which requests are handled by what delay pool

  • icap_access: (replaced by adaptation_access in Squid-3.1) What requests may be sent to a particular ICAP server.

  • adaptation_access: What requests may be sent to a particular ICAP or eCAP filter service.

  • log_access: Controls which requests are logged. This is global and overrides specific file access lists appended to access_log directives.

Notes:

An access list rule consists of an allow or deny keyword, followed by a list of ACL element names.

An access list consists of one or more access list rules.

Access list rules are checked in the order they are written. List searching terminates as soon as one of the rules is a match.

If a rule has multiple ACL elements, it uses AND logic. In other words, all ACL elements of the rule must be a match in order for the rule to be a match. This means that it is possible to write a rule that can never be matched. For example, a port number can never be equal to both 80 AND 8000 at the same time.

To summarize the ACL logics can be described as: (note: AND/OR below is just for illustartion, not part of the syntax)

http_access allow|deny acl AND acl AND ...
        OR
http_access allow|deny acl AND acl AND ...
        OR
...

If none of the rules are matched, then the default action is the opposite of the last rule in the list. Its a good idea to be explicit with the default action. The best way is to use the all ACL. For example:

http_access deny all

How do I allow my clients to use the cache?

Define an ACL that corresponds to your client's IP addresses. For example:

acl myclients src 172.16.5.0/24

Next, allow those clients in the http_access list:

http_access allow myclients

how do I configure Squid not to cache a specific server?

acl someserver dstdomain .someserver.com
cache deny someserver

How do I implement an ACL ban list?

As an example, we will assume that you would like to prevent users from accessing cooking recipes.

One way to implement this would be to deny access to any URLs that contain the words "cooking" or "recipe." You would use these configuration lines:

acl Cooking1 url_regex cooking
acl Recipe1 url_regex recipe
acl myclients src 172.16.5.0/24
http_access deny Cooking1
http_access deny Recipe1
http_access allow myclients
http_access deny all

The url_regex means to search the entire URL for the regular expression you specify. Note that these regular expressions are case-sensitive, so a url containing "Cooking" would not be denied.

Another way is to deny access to specific servers which are known to hold recipes. For example:

acl Cooking2 dstdomain www.gourmet-chef.com
http_access deny Cooking2
http_access allow all

The dstdomain means to search the hostname in the URL for the string "www.gourmet-chef.com." Note that when IP addresses are used in URLs (instead of domain names), Squid may have to do a DNS lookup to determine whether the ACL matches: If a domain name for the IP address is already in the Squid's "FQDN cache", then Squid can immediately compare the destination domain against the access controls. Otherwise, Squid does an asynchronous reverse DNS lookup and evaluates the ACL after that lookup is over. Subsequent ACL evaluations may be able to use the cached lookup result (if any).

Asynchronous lookups are done for http_access and other directives that support so called "slow" ACLs. If a directive does not support a required asynchronous DNS lookup, then modern Squids use "none" instead of the actual domain name to determine whether a dstdomain ACL matches, but you should not rely on that behavior. To disable DNS lookups, use the "-n" ACL option (where supported).

How do I block specific users or groups from accessing my cache?

Using Ident

You can use ident lookups to allow specific users access to your cache. This requires that an ident server process runs on the user's machine(s). In your squid.conf configuration file you would write something like this:

ident_lookup_access allow all
acl friends ident kim lisa frank joe
http_access allow friends
http_access deny all

Note that ident_lookup_access only permits/denies whether a machine is tested for its Ident. This does not directly alter access to the users request.

Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?

You can use the ident_lookup_access directive to control for which hosts Squid will issue ident lookup requests.

Additionally, if you use a ident ACL in squid.conf, then Squid will make sure an ident lookup is performed while evaluating the acl even if ident_lookup_access does not indicate ident lookups should be performed earlier.

However, Squid does not wait for the lookup to complete unless the ACL rules require it. Consider this configuration:

acl host1 src 10.0.0.1
acl host2 src 10.0.0.2
acl pals  ident kim lisa frank joe
http_access allow host1
http_access allow host2 pals

Requests coming from 10.0.0.1 will be allowed immediately because there are no user requirements for that host. However, requests from 10.0.0.2 will be allowed only after the ident lookup completes, and if the username is in the set kim, lisa, frank, or joe.

Using Proxy Authentication

Another option is to use proxy-authentication. In this scheme, you assign usernames and passwords to individuals. When they first use the proxy they are asked to authenticate themselves by entering their username and password.

In Squid this authentication is handled via external processes. For information on how to configure this, please see SquidFaq/ProxyAuthentication.

Do you have a CGI program which lets users change their own proxy passwords?

Pedro L Orso has adapted the Apache's htpasswd into a CGI program called chpasswd.cgi.

Common Mistakes

And/Or logic

You've probably noticed (and been frustrated by) the fact that you cannot combine access controls with terms like "and" or "or." These operations are already built in to the access control scheme in a fundamental way which you must understand.

  • All elements of an acl entry are OR'ed together.

  • All elements of an access entry are AND'ed together (e.g. http_access and icp_access)

For example, the following access control configuration will never work:

acl ME src 10.0.0.1
acl YOU src 10.0.0.2
http_access allow ME YOU

In order for the request to be allowed, it must match the "ME" acl AND the "YOU" acl. This is impossible because any IP address could only match one or the other. This should instead be rewritten as:

acl ME src 10.0.0.1
acl YOU src 10.0.0.2
http_access allow ME
http_access allow YOU

Or, alternatively, this would also work:

acl US src 10.0.0.1 10.0.0.2
http_access allow US

allow/deny mixups

I have read through my squid.conf numerous times, spoken to my neighbors, read the FAQ and Squid Docs and cannot for the life of me work out why the following will not work.

I can successfully access cachemgr.cgi from our web server machine here, but I would like to use MRTG to monitor various aspects of our proxy. When I try to use squidclient or GET cache_object from the machine the proxy is running on, I always get access denied.

acl manager proto cache_object
acl localhost src 127.0.0.1
acl server    src 1.2.3.4
acl ourhosts  src 1.2.0.0/24
http_access deny manager !localhost !server
http_access allow ourhosts
http_access deny all

The intent here is to allow cache manager requests from the localhost and server addresses, and deny all others. This policy has been expressed here:

http_access deny manager !localhost !server

The problem here is that for allowable requests, this access rule is not matched. For example,

  • if the source IP address is localhost, then "!localhost" is false and the access rule is not matched, so Squid continues checking the other rules.

  • if the source IP address is server, then "!server is false and the access rule is not matched, so Squid continues checking the other rules.

Cache manager requests from the server address work because server is a subset of ourhosts and the second access rule will match and allow the request.

  • /!\ Also note that this means any cache manager request from ourhosts would be allowed.

To implement the desired policy correctly, the access rules should be rewritten as

http_access allow manager localhost
http_access allow manager server
http_access deny manager
http_access allow ourhosts
http_access deny all

If you're using miss_access, then don't forget to also add a miss_access rule for the cache manager:

miss_access allow manager

You may be concerned that the having five access rules instead of three may have an impact on the cache performance. In our experience this is not the case. Squid is able to handle a moderate amount of access control checking without degrading overall performance. You may like to verify that for yourself, however.

Differences between ''src'' and ''srcdomain'' ACL types

For the srcdomain ACL type, Squid does a reverse lookup of the client's IP address and checks the result with the domains given on the acl line. With the src ACL type, Squid converts hostnames to IP addresses at startup and then only compares the client's IP address. The src ACL is preferred over srcdomain because it does not require address-to-name lookups for each request.

I set up my access controls, but they don't work! why?

If ACLs are giving you problems and you don't know why they aren't working, you can use this tip to debug them.

In squid.conf enable debugging for section 33 at level 2. For example:

debug_options ALL,1 33,2

Then restart or reconfigure squid.

From now on, your cache.log should contain a line for every request that explains if it was allowed, or denied, and which ACL was the last one that it matched.

If this does not give you sufficient information to nail down the problem you can also enable detailed debug information on ACL processing

debug_options ALL,1 33,2 28,9

Then restart or reconfigure squid as above.

From now on, your cache.log should contain detailed traces of all access list processing. Be warned that this can be quite some lines per request.

See also SquidFaq/TroubleShooting.

Proxy-authentication and neighbor caches

The problem

               [ Parents ]
               /         \
              /           \
       [ Proxy A ] --- [ Proxy B ]
           |
           |
          USER

Proxy A sends and ICP query to Proxy B about an object, Proxy B replies with an ICP_HIT. Proxy A forwards the HTTP request to Proxy B, but does not pass on the authentication details, therefore the HTTP GET from Proxy A fails.

Only ONE proxy cache in a chain is allowed to "use" the Proxy-Authentication request header. Once the header is used, it must not be passed on to other proxies.

Therefore, you must allow the neighbor caches to request from each other without proxy authentication. This is simply accomplished by listing the neighbor ACL's first in the list of http_access lines. For example:

acl proxy-A src 10.0.0.1
acl proxy-B src 10.0.0.2
acl user_passwords proxy_auth /tmp/user_passwds
http_access allow proxy-A
http_access allow proxy-B
http_access allow user_passwords
http_access deny all

Squid-2.5 allows two exceptions to this rule, by defining the appropriate cache_peer options:

cache_peer parent.foo.com parent login=PASS

This will forward the user's credentials as-is to the parent proxy which will be thus able to authenticate again.

<!>

This will only work with the Basic authentication scheme. If any other scheme is enabled, it will fail

cache_peer parent.foo.com parent login=*:somepassword

This will perform Basic authentication against the parent, sending the username of the current client connection and as password always somepassword. The parent will need to authorization against the child cache's IP address, as if there was no authentication forwarding, and it will need to perform client authentication for all usernames against somepassword via a specially-designed authentication helper. The purpose is to log the client cache's usernames into the parent's access.log. You can find an example semi-tested helper of that kind as parent_auth.pl .

Is there an easy way of banning all Destination addresses except one?

acl GOOD dst 10.0.0.1
http_access allow GOOD
http_access deny all

How can I block access to porn sites?

Often, the hardest part about using Squid to deny pornography is coming up with the list of sites that should be blocked. You may want to maintain such a list yourself, or get one from somewhere else (see below). Note that once you start blocking web content, users will try to use web proxies to circumvent the porn filter, hence you will also need to block all web proxies (visit http://www.proxy.org if you do not know what a web proxy is).

The ACL syntax for using such a list depends on its contents. If the list contains regular expressions, use this:

acl PornSites url_regex "/usr/local/squid/etc/pornlist"
http_access deny PornSites

On the other hand, if the list contains origin server hostnames, simply change url_regex to dstdomain in this example.

Does anyone have a ban list of porn sites and such?

  • The http://www.squidblacklist.org/ site contains a number of free blacklists designed specifically for use in Squid.

  • The SquidGuard redirector folks have links to some lists.

  • The maintainer of the free ufdbGuard redirector has a commercial URL database.

  • Bill Stearns maintains the sa-blacklist of known spammers. By blocking the spammer web sites in squid, users can no longer use up bandwidth downloading spam images and html. Even more importantly, they can no longer send out requests for things like scripts and gifs that have a unique identifer attached, showing that they opened the email and making their addresses more valuable to the spammer.

  • The SleezeBall site has a list of patterns that you can download.

  • The Shalla Secure Services provide a nice downloadable blacklist on free basis with many categories.

Note that once you start blocking web content, users will try to use web proxies to circumvent the filtering, hence you will also need to block all web proxies.

Squid doesn't match my subdomains

If you are using Squid-2.4 or later then keep in mind that dstdomain acls uses different syntax for exact host matches and entire domain matches. www.example.com matches the exact host www.example.com, while .example.com matches the entire domain example.com (including example.com alone)

There is also subtle issues if your dstdomain ACLs contains matches for both an exact host in a domain and the whole domain where both are in the same domain (i.e. both www.example.com and .example.com). Depending on how your data is ordered this may cause only the most specific of these (e.g. www.example.com) to be used.

{i}

Squid-2.4 and later will warn you when this kind of configuration is used. If your Squid does not warn you while reading the configuration file you do not have the problem described below. Also the configuration here uses the dstdomain syntax of Squid-2.1 or earlier.. (Squid-2.2 and later needs to have domains prefixed by a dot)

There is a subtle problem with domain-name based access controls when a single ACL element has an entry that is a subdomain of another entry. For example, consider this list:

acl FOO dstdomain boulder.co.us vail.co.us .co.us

In the first place, the above list is simply wrong because the first two (boulder.co.us and vail.co.us) are unnecessary. Any domain name that matches one of the first two will also match the last one (co.us). Ok, but why does this happen?

The problem stems from the data structure used to index domain names in an access control list. Squid uses Splay trees for lists of domain names. As other tree-based data structures, the searching algorithm requires a comparison function that returns -1, 0, or +1 for any pair of keys (domain names). This is similar to the way that strcmp() works.

The problem is that it is wrong to say that co.us is greater-than, equal-to, or less-than boulder.co.us.

For example, if you said that co.us is LESS than fff.co.us, then the Splay tree searching algorithm might never discover co.us as a match for kkk.co.us.

similarly, if you said that co.us is GREATER than fff.co.us, then the Splay tree searching algorithm might never discover co.us as a match for bbb.co.us.

The bottom line is that you can't have one entry that is a subdomain of another. Squid will warn you if it detects this condition.

Why does Squid deny some port numbers?

It is dangerous to allow Squid to connect to certain port numbers. For example, it has been demonstrated that someone can use Squid as an SMTP (email) relay. As I'm sure you know, SMTP relays are one of the ways that spammers are able to flood our mailboxes. To prevent mail relaying, Squid denies requests when the URL port number is 25. Other ports should be blocked as well, as a precaution against other less common attacks.

There are two ways to filter by port number: either allow specific ports, or deny specific ports. By default, Squid does the first. This is the ACL entry that comes in the default squid.conf:

acl Safe_ports port 80 21 443 563 70 210 1025-65535
http_access deny !Safe_ports

The above configuration denies requests when the URL port number is not in the list. The list allows connections to the standard ports for HTTP, FTP, Gopher, SSL, WAIS, and all non-privileged ports.

Another approach is to deny dangerous ports. The dangerous port list should look something like:

acl Dangerous_ports port 7 9 19 22 23 25 53 109 110 119
http_access deny Dangerous_ports

...and probably many others.

Please consult the /etc/services file on your system for a list of known ports and protocols.

Does Squid support the use of a database such as mySQL for storing the ACL list?

Yes, Squid supports acl interaction with external data sources via the external_acl_type directive. Helpers for LDAP and NT Domain group membership is included in the distribution and it's very easy to write additional helpers to fit your environment.

How can I allow a single address to access a specific URL?

This example allows only the special_client to access the special_url. Any other client that tries to access the special_url is denied.

acl special_client src 10.1.2.3
acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
http_access allow special_client special_url
http_access deny special_url

How can I allow some clients to use the cache at specific times?

Let's say you have two workstations that should only be allowed access to the Internet during working hours (8:30 - 17:30). You can use something like this:

acl FOO src 10.1.2.3 10.1.2.4
acl WORKING time MTWHF 08:30-17:30
http_access allow FOO WORKING
http_access deny FOO

How can I allow some users to use the cache at specific times?

acl USER1 proxy_auth Dick
acl USER2 proxy_auth Jane
acl DAY time 06:00-18:00
http_access allow USER1 DAY
http_access deny USER1
http_access allow USER2 !DAY
http_access deny USER2

Problems with IP ACL's that have complicated netmasks

The following ACL entry gives inconsistent or unexpected results:

acl restricted  src 10.0.0.128/255.0.0.128 10.85.0.0/16

The reason is that IP access lists are stored in "splay" tree data structures. These trees require the keys to be sortable. When you use a complicated, or non-standard, netmask (255.0.0.128), it confuses the function that compares two address/mask pairs.

The best way to fix this problem is to use separate ACL names for each ACL value. For example, change the above to:

acl restricted1 src 10.0.0.128/255.0.0.128
acl restricted2 src 10.85.0.0/16

Then, of course, you'll have to rewrite your http_access lines as well.

Can I set up ACL's based on MAC address rather than IP?

Yes, for some operating systes. The ACL type is named arp after the ARP protocol used in IPv4 to fetch the EUI-48 / MAC address. This ACL is supported on Linux, Solaris, and probably BSD variants.

/!\

MAC address is only available for clients that are on the same subnet. If the client is on a different subnet, then Squid can not find out its MAC address as the MAC is replaced by the router MAC when a packet is router.

For Squid-3.1 and older to use ARP (MAC) access controls, you first need to compile in the optional code.

Do this with the --enable-arp-acl configure option:

% ./configure --enable-arp-acl ...
% make clean
% make

If src/acl.c doesn't compile, then ARP ACLs are probably not supported on your system.

For Squid-3.2 and newer the EUI support is enabled by default whenever it can be used.

Add some arp ACL lines to your squid.conf:

acl M1 arp 01:02:03:04:05:06
acl M2 arp 11:12:13:14:15:16
http_access allow M1
http_access allow M2
http_access deny all

Run squid -k parse to confirm that the ARP / EUI supprot is available and the ACLs are going to work.

Can I limit the number of connections from a client?

Yes, use the maxconn ACL type in conjunction with http_access deny. For example:

acl losers src 1.2.3.0/24
acl 5CONN maxconn 5
http_access deny 5CONN losers

Given the above configuration, when a client whose source IP address is in the 1.2.3.0/24 subnet tries to establish 6 or more connections at once, Squid returns an error page. Unless you use the deny_info feature, the error message will just say "access denied."

The maxconn ACL requires the client_db feature. If you've disabled client_db (for example with client_db off) then maxconn ALCs will not work.

Note, the maxconn ACL type is kind of tricky because it uses less-than comparison. The ACL is a match when the number of established connections is greater than the value you specify. Because of that, you don't want to use the maxconn ACL with http_access allow.

Also note that you could use maxconn in conjunction with a user type (ident, proxy_auth), rather than an IP address type.

I'm trying to deny ''foo.com'', but it's not working.

In Squid-2.3 we changed the way that Squid matches subdomains. There is a difference between .foo.com and foo.com. The first matches any domain in foo.com, while the latter matches only "foo.com" exactly. So if you want to deny bar.foo.com, you should write

acl yuck dstdomain .foo.com
http_access deny yuck

I want to customize, or make my own error messages.

You can customize the existing error messages as described in Customizable Error Messages in SquidFaq/MiscFeatures. You can also create new error messages and use these in conjunction with the deny_info option.

For example, lets say you want your users to see a special message when they request something that matches your pornography list. First, create a file named ERR_NO_PORNO in the /usr/local/squid/etc/errors directory. That file might contain something like this:

Our company policy is to deny requests to known porno sites.  If you
feel you've received this message in error, please contact
the support staff (support@this.company.com, 555-1234).

Next, set up your access controls as follows:

acl porn url_regex "/usr/local/squid/etc/porno.txt"
deny_info ERR_NO_PORNO porn
http_access deny porn
(additional http_access lines ...)

I want to use local time zone in error messages.

Squid, by default, uses GMT as timestamp in all generated error messages. This to allow the cache to participate in a hierarchy of caches in different timezones without risking confusion about what the time is.

To change the timestamp in Squid generated error messages you must change the Squid signature. See Customizable Error Messages in MiscFeatures. The signature by defaults uses %T as timestamp, but if you like then you can use %t instead for a timestamp using local time zone.

I want to put ACL parameters in an external file.

by Adam Aube

Squid can read ACL parameters from an external file. To do this, first place the acl parameters, one per line, in a file. Then, on the ACL line in squid.conf, put the full path to the file in double quotes.

For example, instead of:

acl trusted_users proxy_auth john jane jim

you would have:

acl trusted_users proxy_auth "/usr/local/squid/etc/trusted_users.txt"

Inside trusted_users.txt, there is:

john
jane
jim

I want to authorize users depending on their MS Windows group memberships

There is an excellent resource over at http://workaround.org/squid-ldap on how to use LDAP-based group membership checking.

Also the LDAP or Active Directory config example here in the squid wiki might prove useful.

Maximum length of an acl name

By default the maximum length of an ACL name is 32-1 = 31 characters, but it can be changed by editing the source: in defines.h

#define ACL_NAME_SZ 32

Fast and Slow ACLs

Some ACL types require information which may not be already available to Squid. Checking them requires suspending work on the current request, querying some external source, and resuming work when the needed information becomes available. This is for example the case for DNS, authenticators or external authorization scripts. ACLs can thus be divided in FAST ACLs, which do not require going to external sources to be fulfilled, and SLOW ACLs, which do.

Fast ACLs include (as of squid 3.1.0.7):

  • all (built-in)
  • src
  • dstdomain
  • dstdom_regex
  • myip
  • arp
  • src_as
  • peername
  • time
  • url_regex
  • urlpath_regex
  • port
  • myport
  • myportname
  • proto
  • method
  • http_status {R}
  • browser
  • referer_regex
  • snmp_community
  • maxconn
  • max_user_ip
  • req_mime_type
  • req_header
  • rep_mime_type {R}
  • user_cert
  • ca_cert

Slow ACLs include:

  • dst
  • dst_as
  • srcdomain
  • srcdom_regex
  • ident
  • ident_regex
  • proxy_auth
  • proxy_auth_regex
  • external
  • ext_user
  • ext_user_regex

This list may be incomplete or out-of-date. See your squid.conf.documented file for details. ACL types marked with {R} are reply ACLs, see the dedicated FAQ chapter.

Squid caches the results of ACL lookups whenever possible, thus slow ACLs will not always need to go to the external data-source.

Knowing the behaviour of an ACL type is relevant because not all ACL matching directives support all kinds of ACLs. Some check-points will not suspend the request: they allow (or deny) immediately. If a SLOW acl has to be checked, and the results of the check are not cached, the corresponding ACL result will be as if it didn't match. In other words, such ACL types are in general not reliable in all access check clauses.

The following are SLOW access clauses:

These are instead FAST access clauses:

Thus the safest course of action is to only use fast ACLs in fast access clauses, and any kind of ACL in slow access clauses.

A possible workaround which can mitigate the effect of this characteristic consists in exploiting caching, by setting some "useless" ACL checks in slow clauses, so that subsequent fast clauses may have a cached result to evaluate against.


Contents

  1. Starting Point
  2. Why am I getting "Proxy Access Denied?"
  3. Connection Refused when reaching a sibling
  4. Running out of filedescriptors
    1. Linux
    2. FreeBSD
      1. 2015
      2. older then 2015
    3. General BSD
      1. SunOS
      2. FreeBSD (from the 2.1.6 kernel)
      3. BSD/OS (from the 2.1 kernel)
    4. Reconfigure afterwards
  5. What are these strange lines about removing objects?
  6. Can I change a Windows NT FTP server to list directories in Unix format?
  7. Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
  8. DNS lookups for domain names with underscores (_) always fail.
  9. Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
  10. Why am I getting access denied from a sibling cache?
  11. Cannot bind socket FD NN to *:8080 (125) Address already in use
  12. icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
  13. icpDetectClientClose: FD 135, 255 unexpected bytes
  14. Does Squid work with NTLM Authentication?
  15. My Squid becomes very slow after it has been running for some time.
  16. WARNING: Failed to start 'dnsserver'
  17. Sending bug reports to the Squid team
  18. FATAL: ipcache_init: DNS name lookup tests failed
  19. FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
  20. FATAL: Cannot open HTTP Port
  21. FATAL: All redirectors have exited!
  22. FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
  23. pingerOpen: icmp_sock: (13) Permission denied
  24. What is a forwarding loop?
  25. accept failure: (71) Protocol error
  26. storeSwapInFileOpened: ... Size mismatch
  27. Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''
  28. Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
  29. I get a lot of "URI has whitespace" error messages in my cache log, what should I do?
  30. commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
  31. What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?
  32. What does ''Connection refused'' mean?
  33. squid: ERROR: no running copy
  34. FATAL: getgrnam failed to find groupid for effective group 'nogroup'
  35. Squid uses 100% CPU
  36. Webmin's ''cachemgr.cgi'' crashes the operating system
  37. Segment Violation at startup or upon first request
  38. urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
  39. Requests for international domain names do not work
  40. Why do I sometimes get "Zero Sized Reply"?
  41. Why do I get "The request or reply is too large" errors?
  42. Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit
  43. Problems with Windows update

Starting Point

If your Squid version is older than 2.6 it is very outdated. Many of the issues experienced in those versions are now fixed in 2.6 and later.

Your first point of troubleshooting should be to test with a newer supported release and resolve any remaining issues with that install.

Current releases can be retrieved from http://www.squid-cache.org/Versions or your operating system distributor. How to do this is outlined in the system-specific help pages.

Additional problems and resolutions for your specific system may be found in the system-specific help troubleshooting:

Some common situations have their own detailed explanations and workarounds:

Why am I getting "Proxy Access Denied?"

You may need to set up the http_access option to allow requests from your IP addresses. Please see ../SquidAcl for information about that.

Alternately, you may have misconfigured one of your ACLs. Check the access.log and squid.conf files for clues.

Connection Refused when reaching a sibling

I get Connection Refused when the cache tries to retrieve an object located on a sibling, even though the sibling thinks it delivered the object to my cache.

If the HTTP port number is wrong but the ICP port is correct you will send ICP queries correctly and the ICP replies will fool your cache into thinking the configuration is correct but large objects will fail since you don't have the correct HTTP port for the sibling in your squid.conf file. If your sibling changed their http_port, you could have this problem for some time before noticing.

Running out of filedescriptors

If you see the Too many open files error message, you are most likely running out of file descriptors. This may be due to running Squid on an operating system with a low filedescriptor limit. This limit is often configurable in the kernel or with other system tuning tools. There are two ways to run out of file descriptors: first, you can hit the per-process limit on file descriptors. Second, you can hit the system limit on total file descriptors for all processes.

{i} (!)

Squid 2.0-2.6 provide a ./configure option --with-maxfd=N

{i} (!)

Squid 2.7+ provide a squid.conf option max_filedescriptors

{i} (!)

Squid 3.x provide a ./configure option --with-filedescriptors=N

  • {X} Even with Squid built to support large numbers of FD and the system configuration default set to permit large numbers to be used. The ulimit or equivalent tools can change those limits under Squid at any time. Before reporting this as a problem or bug please carefully check your startup scripts and any tools used to run or manage Squid to discover if they are setting a low FD limit.

Linux

Linux kernel 2.2.12 and later supports "unlimited" number of open files without patching. So does most of glibc-2.1.1 and later (all areas touched by Squid is safe from what I can tell, even more so in later glibc releases). But you still need to take some actions as the kernel defaults to only allow processes to use up to 1024 filedescriptors, and Squid picks up the limit at build time.

  • Before configuring Squid run "ulimit -HS -n ####" (where #### is the number of filedescriptors you need to support). Be sure to run "make clean" before configure if you have already run configure as the script might otherwise have cached the prior result.

  • Configure, build and install Squid as usual
  • Make sure your script for starting Squid contains the above ulimit command to raise the filedescriptor limit. You may also need to allow a larger port span for outgoing connections (set in /proc/sys/net/ipv4/, like in "echo 1024 32768 > /proc/sys/net/ipv4/ip_local_port_range")

    /!\ NOTE that the -n option is separate from the -HS options. ulimit will fail on some systems if you try to combine them.

Alternatively you can

  • Run configure with your needed configure options
  • edit include/autoconf.h and define SQUID_MAXFD to your desired limit. Make sure to make it a nice and clean modulo 64 value (multiple of 64) to avoid various bugs in the libc headers.
  • build and install Squid as usual
  • Set the runtime ulimit as described above when starting Squid.

If running things as root is not an option then get your sysadmin to install a the needed ulimit command in /etc/inittscript (see man initscript), install a patched kernel where INR_OPEN in include/linux/fs.h is changed to at least the amount you need or have them install a small suid program which sets the limit (see link below).

More information can be found from Henriks How to get many filedescriptors on Linux 2.2.X and later page.

FreeBSD

2015

Eliezer Croitoru:

* Referencing to Tuning Kernel Limits of the FreeBSD based on Adrian Chad article.

  • The docs describes that the basic "server accept" socket is bounded to a queue of 128 connections.
  • You would probably see something like "connection reset by peer" and you will need to increase the kern.ipc.somaxconn to 2048 to match something useful for production network of about 300 users.

  • In a case you have a loaded server you would need to increase it past the 16384 limit.

older then 2015

  • >:> This information is outdated, and may no longer be relevant.

by Torsten Sturm

  • How do I check my maximum filedescriptors?
    • Do sysctl -a and look for the value of kern.maxfilesperproc.

  • How do I increase them?

sysctl -w kern.maxfiles=XXXX
sysctl -w kern.maxfilesperproc=XXXX

/!\

You probably want maxfiles > maxfilesperproc if you're going to be pushing the limit.

  • What is the upper limit?
    • I don't think there is a formal upper limit inside the kernel. All the data structures are dynamically allocated. In practice there might be unintended metaphenomena (kernel spending too much time searching tables, for example).

General BSD

  • >:> This information is outdated, and may no longer be relevant.

For most BSD-derived systems (SunOS, 4.4BSD, OpenBSD, FreeBSD, NetBSD, BSD/OS, 386BSD, Ultrix) you can also use the "brute force" method to increase these values in the kernel (requires a kernel rebuild):

  • How do I check my maximum filedescriptors?
    • Do pstat -T and look for the files value, typically expressed as the ratio of currentmaximum.

  • How do I increase them the easy way?
    • One way is to increase the value of the maxusers variable in the kernel configuration file and build a new kernel. This method is quick and easy but also has the effect of increasing a wide variety of other variables that you may not need or want increased.

  • Is there a more precise method?
    • Another way is to find the param.c file in your kernel build area and change the arithmetic behind the relationship between maxusers and the maximum number of open files.

Here are a few examples which should lead you in the right direction:

SunOS

  • >:> This information is outdated, and may no longer be relevant.

Change the value of nfile in 'usr/kvm/sys/conf.common/param.c/tt> by altering this equation:

Where NPROC is defined by:

#define NPROC (10 + 16 * MAXUSERS)

FreeBSD (from the 2.1.6 kernel)

  • >:> This information is outdated, and may no longer be relevant.

Very similar to SunOS, edit /usr/src/sys/conf/param.c and alter the relationship between maxusers and the maxfiles and maxfilesperproc variables:

int     maxfiles = NPROC*2;
int     maxfilesperproc = NPROC*2;

Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) The per-process limit can also be adjusted directly in the kernel configuration file with the following directive: options OPEN_MAX=128

BSD/OS (from the 2.1 kernel)

  • >:> This information is outdated, and may no longer be relevant.

Edit /usr/src/sys/conf/param.c and adjust the maxfiles math here:

int     maxfiles = 3 * (NPROC + MAXUSERS) + 80;

Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) You should also set the OPEN_MAX value in your kernel configuration file to change the per-process limit.

Reconfigure afterwards

  • >:> This information is outdated, and may no longer be relevant.

After you rebuild/reconfigure your kernel with more filedescriptors, you must then recompile Squid. Squid's configure script determines how many filedescriptors are available, so you must make sure the configure script runs again as well. For example:

cd squid-1.1.x
make realclean
./configure --prefix=/usr/local/squid
make

What are these strange lines about removing objects?

For example:

97/01/23 22:31:10| Removed 1 of 9 objects from bucket 3913
97/01/23 22:33:10| Removed 1 of 5 objects from bucket 4315
97/01/23 22:35:40| Removed 1 of 14 objects from bucket 6391

These log entries are normal, and do not indicate that squid has reached cache_swap_high.

Consult your cache information page in cachemgr.cgi for a line like this:

Storage LRU Expiration Age:     364.01 days

Objects which have not been used for that amount of time are removed as a part of the regular maintenance. You can set an upper limit on the LRU Expiration Age value with reference_age in the config file.

Can I change a Windows NT FTP server to list directories in Unix format?

Why, yes you can! Select the following menus:

  • Start
  • Programs
  • Microsoft Internet Server (Common)
  • Internet Service Manager

This will bring up a box with icons for your various services. One of them should be a little ftp "folder." Double click on this.

You will then have to select the server (there should only be one) Select that and then choose "Properties" from the menu and choose the "directories" tab along the top.

There will be an option at the bottom saying "Directory listing style." Choose the "Unix" type, not the "MS-DOS" type.

by Oskar Pearson

Why am I getting "Ignoring MISS from non-peer x.x.x.x?"

You are receiving ICP MISSes (via UDP) from a parent or sibling cache whose IP address your cache does not know about. This may happen in two situations.

If the peer is multihomed, it is sending packets out an interface which is not advertised in the DNS. Unfortunately, this is a configuration problem at the peer site. You can tell them to either add the IP address interface to their DNS, or use Squid's "udp_outgoing_address" option to force the replies out a specific interface. For example: on your parent squid.conf:

udp_outgoing_address proxy.parent.com

on your squid.conf:

cache_peer proxy.parent.com parent 3128 3130

You can also see this warning when sending ICP queries to multicast addresses. For security reasons, Squid requires your configuration to list all other caches listening on the multicast group address. If an unknown cache listens to that address and sends replies, your cache will log the warning message. To fix this situation, either tell the unknown cache to stop listening on the multicast address, or if they are legitimate, add them to your configuration file.

DNS lookups for domain names with underscores (_) always fail.

The standards for naming hosts ( RFC 952 and RFC 1101) do not allow underscores in domain names:

A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.).

The resolver library that ships with recent versions of BIND enforces this restriction, returning an error for any host with underscore in the hostname. The best solution is to complain to the hostmaster of the offending site, and ask them to rename their host.

See also the comp.protocols.tcp-ip.domains FAQ.

Some people have noticed that RFC 1033 implies that underscores are allowed. However, this is an informational RFC with a poorly chosen example, and not a standard by any means.

Why does Squid say: "Illegal character in hostname; underscores are not allowed?'

See the above question. The underscore character is not valid for hostnames.

Some DNS resolvers allow the underscore, so yes, the hostname might work fine when you don't use Squid.

To make Squid allow underscores in hostnames:

  • {i}

    Squid 2.x

    Re-build with --enable-underscores configure option

    {i}

    Squid-3.x

    add to squid.conf: enable_underscores on

Why am I getting access denied from a sibling cache?

The answer to this is somewhat complicated, so please hold on.

An ICP query does not include any parent or sibling designation, so the receiver really has no indication of how the peer cache is configured to use it. This issue becomes important when a cache is willing to serve cache hits to anyone, but only handle cache misses for its paying users or customers. In other words, whether or not to allow the request depends on if the result is a hit or a miss. To accomplish this, Squid acquired the miss_access feature in October of 1996.

The necessity of "miss access" makes life a little bit complicated, and not only because it was awkward to implement. Miss access means that the ICP query reply must be an extremely accurate prediction of the result of a subsequent HTTP request. Ascertaining this result is actually very hard, if not impossible to do, since the ICP request cannot convey the full HTTP request. Additionally, there are more types of HTTP request results than there are for ICP. The ICP query reply will either be a hit or miss. However, the HTTP request might result in a "304 Not Modified" reply sent from the origin server. Such a reply is not strictly a hit since the peer needed to forward a conditional request to the source. At the same time, its not strictly a miss either since the local object data is still valid, and the Not-Modified reply is quite small.

One serious problem for cache hierarchies is mismatched freshness parameters. Consider a cache C using "strict" freshness parameters so its users get maximally current data. C has a sibling S with less strict freshness parameters. When an object is requested at C, C might find that S already has the object via an ICP query and ICP HIT response. C then retrieves the object from S.

In an HTTP/1.0 world, C (and Cs client) will receive an object that was never subject to its local freshness rules. Neither HTTP/1.0 nor ICP provides any way to ask only for objects less than a certain age. If the retrieved object is stale by Cs rules, it will be removed from Cs cache, but it will subsequently be fetched from S so long as it remains fresh there. This configuration miscoupling problem is a significant deterrent to establishing both parent and sibling relationships.

HTTP/1.1 provides numerous request headers to specify freshness requirements, which actually introduces a different problem for cache hierarchies: ICP still does not include any age information, neither in query nor reply. So S may return an ICP HIT if its copy of the object is fresh by its configuration parameters, but the subsequent HTTP request may result in a cache miss due to any Cache-control: headers originated by C or by C 's client. Situations now emerge where the ICP reply no longer matches the HTTP request result.

In the end, the fundamental problem is that the ICP query does not provide enough information to accurately predict whether the HTTP request will be a hit or miss. In fact, the current ICP Internet Draft is very vague on this subject. What does ICP HIT really mean? Does it mean "I know a little about that URL and have some copy of the object?" Or does it mean "I have a valid copy of that object and you are allowed to get it from me?"

So, what can be done about this problem? We really need to change ICP so that freshness parameters are included. Until that happens, the members of a cache hierarchy have only two options to totally eliminate the "access denied" messages from sibling caches:

  • Make sure all members have the same refresh_rules parameters.

  • Do not use miss_access at all. Promise your sibling cache administrator that your cache is properly configured and that you will not abuse their generosity. The sibling cache administrator can check his log files to make sure you are keeping your word.

If neither of these is realistic, then the sibling relationship should not exist.

Cannot bind socket FD NN to *:8080 (125) Address already in use

This means that another processes is already listening on port 8080 (or whatever you're using). It could mean that you have a Squid process already running, or it could be from another program. To verify, use the netstat command:

netstat -antup | grep 8080
  • {i} (!) Windows Users need to use netstat -ant and manually find the entry.

If you find that some process has bound to your port, but you're not sure which process it is, you might be able to use the excellent lsof program. It will show you which processes own every open file descriptor on your system.

icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe

This means that the client socket was closed by the client before Squid was finished sending data to it. Squid detects this by trying to read(2) some data from the socket. If the read(2) call fails, then Squid konws the socket has been closed. Normally the read(2) call returns ECONNRESET: Connection reset by peer and these are NOT logged. Any other error messages (such as EPIPE: Broken pipe are logged to cache.log. See the "intro" of section 2 of your Unix manual for a list of all error codes.

icpDetectClientClose: FD 135, 255 unexpected bytes

These are caused by misbehaving Web clients attempting to use persistent connections.

Does Squid work with NTLM Authentication?

Yes, Squid supports Microsoft NTLM authentication to authenticate users accessing the proxy server itself (be it in a forward or reverse setup). See ../ProxyAuthentication for further details

Squid 2.6+ and 3.1+ also support the kind of infrastructure that's needed to properly allow an user to authenticate against an NTLM-enabled webserver.

As NTLM authentication backends go, the real work is usually done by Samba on squid's behalf. That being the case, Squid supports any authentication backend supported by Samba, including Samba itself and MS Windows 3.51 and onwards Domain Controllers.

NTLM for HTTP is, however, an horrible example of an authentication protocol, and we recommend to avoid using it in favour of saner and standard-sanctioned alternatives such as Digest.

My Squid becomes very slow after it has been running for some time.

This is most likely because Squid is using more memory than it should be for your system. When the Squid process becomes large, it experiences a lot of paging. This will very rapidly degrade the performance of Squid. Memory usage is a complicated problem. There are a number of things to consider.

WARNING: Failed to start 'dnsserver'

  • {i} (!) All current Squid now contain an optimized internal DNS engine. Which is much faster and responsive that then the dnsserver helper. That should be used by preference.

This could be a permission problem. Does the Squid userid have permission to execute the dnsserver program?

Sending bug reports to the Squid team

see SquidFaq/BugReporting

FATAL: ipcache_init: DNS name lookup tests failed

  • {i} (!) This issue is now permanently resolved in Squid 3.1 and later.

Squid normally tests your system's DNS configuration before it starts server requests. Squid tries to resolve some common DNS names, as defined in the dns_testnames configuration directive. If Squid cannot resolve these names, it could mean:

  • your DNS nameserver is unreachable or not running.
  • your System is in the process of booting
  • your /etc/resolv.conf file may contain incorrect information.
  • your /etc/resolv.conf file may have incorrect permissions, and may be unreadable by Squid.

To disable this feature, use the -D command line option. Due to this issue displaying on Boot. Is is highly recommended that OS startup scripts for Squid earlier than 3.1 use this option to disable tests.

Note, Squid does NOT use the dnsservers to test the DNS. The test is performed internally, before the dnsservers start. '

FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied

Starting with version 1.1.15, we have required that you first run

squid -z

to create the swap directories on your filesystem.

Squid basic default is user nobody. This can be overridden in packages with the --with-default-user option when building or in squid.conf with the cache_effective_user option.

The Squid process takes on the given userid before making the directories. If the cache_dir directory (e.g. /var/spool/cache) does not exist, and the Squid userid does not have permission to create it, then you will get the "permission denied" error. This can be simply fixed by manually creating the cache directory.

Alternatively, if the directory already exists, then your operating system may be returning "Permission Denied" instead of "File Exists" on the mkdir() system call.

FATAL: Cannot open HTTP Port

Either

  1. the Squid userid does not have permission to bind to the port, or
  2. some other process has bound itself to the port

    {i} Remember that root privileges are required to open port numbers less than 1024. If you see this message when using a high port number, or even when starting Squid as root, then the port has already been opened by another process.

    {i} SELinux can also deny squid access to port 80, even if you are starting squid as root. Configure SELinux to allow squid to open port 80 or disable SELinux in this case.

    {i} Maybe you are running in the HTTP Accelerator mode and there is already a HTTP server running on port 80? If you're really stuck, install the way cool lsof utility to show you which process has your port in use.

FATAL: All redirectors have exited!

This is explained in Features/Redirectors.

FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied

In Unix, things like processes and files have an owner. For Squid, the process owner and file owner should be the same. If they are not the same, you may get messages like "permission denied."

To find out who owns a file, use the command:

ls -l

A process is normally owned by the user who starts it. However, Unix sometimes allows a process to change its owner. If you specified a value for the cache_effective_user option in squid.conf, then that will be the process owner. The files must be owned by this same userid.

If all this is confusing, then you probably should not be running Squid until you learn some more about Unix. As a reference, I suggest Learning the UNIX Operating System, 4th Edition.

pingerOpen: icmp_sock: (13) Permission denied

This means your pinger helper program does not have root priveleges.

You should either do this when building Squid:

make install pinger

or

# chown root /usr/local/squid/bin/pinger
# chmod 4755 /usr/local/squid/bin/pinger
  • {i} (!) location of the pinger binary may vary. I recommend searching for it first:

locate bin/pinger

What is a forwarding loop?

A forwarding loop is when a request passes through one proxy more than once. You can get a forwarding loop if

  • a cache forwards requests to itself. This might happen with interception caching (or server acceleration) configurations.
  • a pair or group of caches forward requests to each other. This can happen when Squid uses ICP, Cache Digests, or the ICMP RTT database to select a next-hop cache.

Forwarding loops are detected by examining the Via request header. Each cache which "touches" a request must add its hostname to the Via header. If a cache notices its own hostname in this header for an incoming request, it knows there is a forwarding loop somewhere.

  • /!\ Squid may report a forwarding loop if a request goes through two caches that have the same visible_hostname value. If you want to have multiple machines with the same visible_hostname then you must give each machine a different unique_hostname so that forwarding loops are correctly detected.

When Squid detects a forwarding loop, it is logged to the cache.log file with the recieved Via header. From this header you can determine which cache (the last in the list) forwarded the request to you.

  • (!) One way to reduce forwarding loops is to change a parent relationship to a sibling relationship.

    (!) Another way is to use cache_peer_access rules.

accept failure: (71) Protocol error

This error message is seen mostly on Solaris systems. Mark Kennedy gives a great explanation:

Error 71 [EPROTO] is an obscure way of reporting that clients made it onto your
server's TCP incoming connection queue but the client tore down the
connection before the server could accept it.  I.e.  your server ignored
its clients for too long.  We've seen this happen when we ran out of
file descriptors.  I guess it could also happen if something made squid
block for a long time.

storeSwapInFileOpened: ... Size mismatch

{i}

These messages are specific to squid 2.x

Got these messages in my cache.log - I guess it means that the index contents do not match the contents on disk.

What does Squid do in this case?

These happen when Squid reads an object from disk for a cache hit. After it opens the file, Squid checks to see if the size is what it expects it should be. If the size doesn't match, the error is printed. In this case, Squid does not send the wrong object to the client. It will re-fetch the object from the source.

Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''

These messages are caused by buggy clients, mostly Netscape Navigator. What happens is, Netscape sends an HTTPS/SSL request over a persistent HTTP connection. Normally, when Squid gets an SSL request, it looks like this:

CONNECT www.buy.com:443 HTTP/1.0

Then Squid opens a TCP connection to the destination host and port, and the real request is sent encrypted over this connection. Thats the whole point of SSL, that all of the information must be sent encrypted.

With this client bug, however, Squid receives a request like this:

CONNECT https://www.buy.com/corp/ordertracking.asp HTTP/1.0

Now, all of the headers, and the message body have been sent, unencrypted to Squid. There is no way for Squid to somehow turn this into an SSL request. The only thing we can do is return the error message.

  • <!> (!) This browser bug does represent a security risk because the browser is sending sensitive information unencrypted over the network.

Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html

by Dave J Woolley (DJW at bts dot co dot uk)

  • (!) These are illegal URLs, generally only used by illegal sites; typically the web site that supports a spammer and is expected to survive a few hours longer than the spamming account.

Their intention is to:

  • confuse content filtering rules on proxies, and possibly some browsers' idea of whether they are trusted sites on the local intranet;
  • confuse whois (?);
  • make people think they are not IP addresses and unknown domain names, in an attempt to stop them trying to locate and complain to the ISP.

Any browser or proxy that works with them should be considered a security risk.

RFC 1738 has this to say about the hostname part of a URL:

The fully qualified domain name of a network host, or its IP
address as a set of four decimal digit groups separated by
".". Fully qualified domain names take the form as described
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
[5]: a sequence of domain labels separated by ".", each domain
label starting and ending with an alphanumerical character and
possibly also containing "-" characters. The rightmost domain
label will never start with a digit, though, which
syntactically distinguishes all domain names from the IP
addresses.

I get a lot of "URI has whitespace" error messages in my cache log, what should I do?

  • (!) Whitespace characters (space, tab, newline, carriage return) are not allowed in URI's and URL's.

Unfortunately, a number of web services generate URL's with whitespace. Of course your favorite browser silently accomodates these bad URL's. The servers (or people) that generate these URL's are in violation of Internet standards. The whitespace characters should be encoded.

If you want Squid to accept URL's with whitespace, you have to decide how to handle them. There are four choices that you can set with the uri_whitespace option in squid.conf:

  • STRIP

    • (!) This is the correct way to handle them. This is the default for Squid 3.x.

  • DENY

    • (!) The request is denied with an "Invalid Request" message. This is the default for Squid2.x.

  • ALLOW

    • The request is allowed and the URL remains unchanged.

  • ENCODE

    • The whitespace characters are encoded according to RFC 1738.

  • CHOP

    • The URL is chopped at the first whitespace character and then processed normally.

Only STRIP and DENY are the only approved ways of handling these URI. Others are technically violations and should not be performed. The broken web service should be fixed instead. It is breaking much more of the Internet than just your proxy.

commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address

This likely means that your system does not have a loopback network device, or that device is not properly configured. All Unix systems should have a network device named lo0, and it should be configured with the address 127.0.0.1. If not, you may get the above error message. To check your system, run:

ifconfig
  • {i} Windows users must use: ipfconfig

The result should contain:

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host

If you use FreeBSD, see freebsd-no-lo0

What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?

"Connection reset by peer" is an error code that Unix operating systems sometimes return for read, write, connect, and other system calls.

Connection reset means that the other host, the peer, sent us a RESET packet on a TCP connection. A host sends a RESET when it receives an unexpected packet for a nonexistent connection. For example, if one side sends data at the same time that the other side closes a connection, when the other side receives the data it may send a reset back.

The fact that these messages appear in Squid's log might indicate a problem, such as a broken origin server or parent cache. On the other hand, they might be "normal," especially since some applications are known to force connection resets rather than a proper close.

You probably don't need to worry about them, unless you receive a lot of user complaints relating to SSL sites.

Rick Jones notes that if the server is running a Microsoft TCP stack, clients receive RST segments whenever the listen queue overflows. In other words, if the server is really busy, new connections receive the reset message. This is contrary to rational behaviour, but is unlikely to change.

What does ''Connection refused'' mean?

This is an error message, generated by your operating system, in response to a connect() system call. It happens when there is no server at the other end listening on the port number that we tried to connect to.

It is quite easy to generate this error on your own. Simply telnet to a random, high numbered port:

telnet 12345

It happens because there is no server listening for connections on port 12345.

When you see this in response to a URL request, it probably means the origin server web site is temporarily down. It may also mean that your parent cache is down, if you have one.

squid: ERROR: no running copy

You may get this message when you run commands like squid -k rotate.

This error message usually means that the squid.pid file is missing. Since the PID file is normally present when squid is running, the absence of the PID file usually means Squid is not running. If you accidentally delete the PID file, Squid will continue running, and you won't be able to send it any signals.

  • {i} If you accidentally removed the PID file, there are two ways to get it back.

First locate the proces ID by running ps and find Squid. You'll probably see two processes, like this:

% ps ax | grep squid

  PID TTY      STAT   TIME COMMAND
 2267 ?        Ss     0:00 /usr/sbin/squid-ipv6 -D -sYC
 2735 pts/0    S+     0:00 grep squid
 8893 ?        Rl     2:57 (squid) -D -sYC
 8894 ?        Ss     0:17 /bin/bash /etc/squid3/helper/redirector.sh

You want the (squid) process id, 8893 in this case.

The first solution is to create the PID file yourself and put the process id number there. For example:

echo 8893 > /usr/local/squid/logs/squid.pid
  • /!\ Be careful of file permissions. It's no use having a .pid file if squid can't update it when things change.

The second is to use the above technique to find the Squid process id. Then to send the process a HUP signal, which is the same as squid -k reconfigure:

kill -SIGHUP 8893

The reconfigure process creates a new PID file automatically.

FATAL: getgrnam failed to find groupid for effective group 'nogroup'

You are probably starting Squid as root. Squid is trying to find a group-id that doesn't have any special priveleges that it will run as. The default is nogroup, but this may not be defined on your system.

The best fix for this is to assign squid a low-privilege user-id and assign that uerid to a group-id. There is a good chance that nobody will work for you as part of group nogroup.

Alternatively in older Squid the cache_effective_group in squid.conf my be changed to the name of an unpriveledged group from /etc/group. There is a good chance that nobody will work for you.

Squid uses 100% CPU

There may be many causes for this.

Andrew Doroshenko reports that removing /dev/null, or mounting a filesystem with the nodev option, can cause Squid to use 100% of CPU. His suggested solution is to "touch /dev/null."

Webmin's ''cachemgr.cgi'' crashes the operating system

Mikael Andersson reports that clicking on Webmin's cachemgr.cgi link creates numerous instances of cachemgr.cgi that quickly consume all available memory and brings the system to its knees.

Joe Cooper reports this to be caused by SSL problems in some outdated browsers (mainly Netscape 6.x/Mozilla) if your Webmin is SSL enabled. Try with a more current browser or disable SSL encryption in Webmin.

Segment Violation at startup or upon first request

Some versions of GCC (notably 2.95.1 through 2.95.4 at least) have bugs with compiler optimization. These GCC bugs may cause NULL pointer accesses in Squid, resulting in a "FATAL: Received Segment Violation...dying" message and a core dump.

urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'

By Yomler of fnac.net

A combination of a bad configuration of Internet Explorer and any application which use the cydoor DLLs will produce the entry in the log. See cydoor.com for a complete list.

The bad configuration of IE is the use of a active configuration script (proxy.pac) and an active or inactive, but filled proxy settings. IE will only use the proxy.pac. Cydoor aps will use both and will generate the errors.

Disabling the old proxy settings in IE is not enought, you should delete them completely and only use the proxy.pac for example.

Requests for international domain names do not work

by HenrikNordström.

Some people have asked why requests for domain names using national symbols as "supported" by the certain domain registrars does not work in Squid. This is because there as of yet is no standard on how to manage national characters in the current Internet protocols such as HTTP or DNS. The current Internet standards is very strict on what is an acceptable hostname and only accepts A-Z a-z 0-9 and - in Internet hostname labels. Anything outside this is outside the current Internet standards and will cause interoperability issues such as the problems seen with such names and Squid.

When there is a consensus in the DNS and HTTP standardization groups on how to handle international domain names Squid will be changed to support this if any changes to Squid will be required.

If you are interested in the progress of the standardization process for international domain names please see the IETF IDN working group's dedicated page.

Why do I sometimes get "Zero Sized Reply"?

This happens when Squid makes a TCP connection to an origin server, but for some reason, the connection is closed before Squid reads any data. Depending on various factors, Squid may be able to retry the request again. If you see the "Zero Sized Reply" error message, it means that Squid was unable to retry, or that all retry attempts also failed.

What causes a connection to close prematurely? It could be a number of things, including:

  • An overloaded origin server.
  • TCP implementation/interoperability bugs. See the ../SystemWeirdnesses for details.

  • Race conditions with HTTP persistent connections.
  • Buggy or misconfigured NAT boxes, firewalls, and load-balancers.
  • Denial of service attacks.
  • Utilizing TCP blackholing on FreeBSD (check ../SystemWeirdnesses).

You may be able to use tcpdump to track down and observe the problem.

  • {i} Some users believe the problem is caused by very large cookies. One user reports that his Zero Sized Reply problem went away when he told Internet Explorer to not accept third-party cookies.

Here are some things you can try to reduce the occurance of the Zero Sized Reply error:

  • Delete or rename your cookie file and configure your browser to prompt you before accepting any new cookies.
  • Disable HTTP persistent connections with the server_persistent_connections and client_persistent_connections directives.

  • Disable any advanced TCP features on the Squid system. Disable ECN on Linux with echo 0 > /proc/sys/net/ipv4/tcp_ecn/.

  • (!) Upgrade to Squid-2.6 or later to work around a Host header related bug in Cisco PIX HTTP inspection. The Cisco PIX firewall wrongly assumes the Host header can be found in the first packet of the request.

If this error causes serious problems for you and the above does not help, Squid developers would be happy to help you uncover the problem. However, we will require high-quality debugging information from you, such as tcpdump output, server IP addresses, operating system versions, and access.log entries with full HTTP headers.

If you want to make Squid give the Zero Sized error on demand, you can use a short C program. Simply compile and start the program on a system that doesn't already have a server running on port 80. Then try to connect to this fake server through Squid.

Why do I get "The request or reply is too large" errors?

by Grzegorz Janoszka

This error message appears when you try downloading large file using GET or uploading it using POST/PUT. There are several parameters to look for:

These two are set to 0 by default, which means no limits at all. They should not be limited unless you really know how that affects your squid behavior. Or at all in standard proxy.

These two default to 64kB starting from Squid-3.1. Earlier versions of Squid had defaults as low as 2 kB. In some rather rare circumstances even 64kB is too low, so you can increase this value.

Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit

In some situations where swap.state has been corrupted Squid can be very confused about how much data it has in the cache. Such corruption may happen after a power failure or similar fatal event. To recover first stop Squid, then delete the swap.state files from each cache directory and then start Squid again. Squid will automatically rebuild the swap.state index from the cached files reasonably well.

If this does not work or causes too high load on your server due to the reindexing of the cache then delete the cache content as explained in ../OperatingSquid.

Problems with Windows update


Contents

  1. What are cachable objects?
  2. What is the ICP protocol?
  3. What is a cache hierarchy? What are parents and siblings?
  4. What is the Squid cache resolution algorithm?
  5. What features are Squid developers currently working on?
  6. Tell me more about Internet traffic workloads
  7. What are the tradeoffs of caching with the NLANR cache system?
  8. Where can I find out more about firewalls?
  9. What is the "Storage LRU Expiration Age?"
  10. What is "Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes"?
  11. Does squid periodically re-read its configuration file?
  12. How does ''unlinkd'' work?
  13. What is an icon URL?
  14. Can I make my regular FTP clients use a Squid cache?
  15. Why is the select loop average time so high?
  16. How does Squid deal with Cookies?
  17. How does Squid decide when to refresh a cached object?
  18. What exactly is a ''deferred read''?
  19. Why is my cache's inbound traffic equal to the outbound traffic?
  20. How come some objects do not get cached?
  21. What does ''keep-alive ratio'' mean?
  22. How does Squid's cache replacement algorithm work?
  23. What are private and public keys?
  24. What is FORW_VIA_DB for?
  25. Does Squid send packets to port 7 (echo)? If so, why?
  26. What does "WARNING: Reply from unknown nameserver [a.b.c.d]" mean?
  27. How does Squid distribute cache files among the available directories?
  28. Why do I see negative byte hit ratio?
  29. What does "Disabling use of private keys" mean?
  30. What is a half-closed filedescriptor?
  31. What does --enable-heap-replacement do?
  32. Why is actual filesystem space used greater than what Squid thinks?
  33. How do ''positive_dns_ttl'' and ''negative_dns_ttl'' work?
  34. What does ''swapin MD5 mismatch'' mean?
  35. What does ''failed to unpack swapfile meta data'' mean?
  36. Why doesn't Squid make ''ident'' lookups in interception mode?
  37. What are FTP passive connections?
  38. When does Squid re-forward a client request?

What are cachable objects?

An Internet Object is a file, document or response to a query for an Internet service such as FTP, HTTP, or gopher. A client requests an Internet object from a caching proxy; if the object is not already cached, the proxy server fetches the object (either from the host specified in the URL or from a parent or sibling cache) and delivers it to the client.

What is the ICP protocol?

ICP is a protocol used for communication among squid caches. The ICP protocol is defined in two Internet RFC's. RFC 2186 describes the protocol itself, while RFC 2187 describes the application of ICP to hierarchical Web caching.

ICP is primarily used within a cache hierarchy to locate specific objects in sibling caches. If a squid cache does not have a requested document, it sends an ICP query to its siblings, and the siblings respond with ICP replies indicating a "HIT" or a "MISS." The cache then uses the replies to choose from which cache to resolve its own MISS.

ICP also supports multiplexed transmission of multiple object streams over a single TCP connection. ICP is currently implemented on top of UDP. Current versions of Squid also support ICP via multicast.

What is a cache hierarchy? What are parents and siblings?

A cache hierarchy is a collection of caching proxy servers organized in a logical parent/child and sibling arrangement so that caches closest to Internet gateways (closest to the backbone transit entry-points) act as parents to caches at locations farther from the backbone. The parent caches resolve "misses" for their children. In other words, when a cache requests an object from its parent, and the parent does not have the object in its cache, the parent fetches the object, caches it, and delivers it to the child. This ensures that the hierarchy achieves the maximum reduction in bandwidth utilization on the backbone transit links, helps reduce load on Internet information servers outside the network served by the hierarchy, and builds a rich cache on the parents so that the other child caches in the hierarchy will obtain better "hit" rates against their parents.

In addition to the parent-child relationships, squid supports the notion of siblings: caches at the same level in the hierarchy, provided to distribute cache server load. Each cache in the hierarchy independently decides whether to fetch the reference from the object's home site or from parent or sibling caches, using a a simple resolution protocol. Siblings will not fetch an object for another sibling to resolve a cache "miss."

What is the Squid cache resolution algorithm?

  1. Send ICP queries to all appropriate siblings
  2. Wait for all replies to arrive with a configurable timeout (the default is two seconds).
    1. Begin fetching the object upon receipt of the first HIT reply, or
    2. Fetch the object from the first parent which replied with MISS (subject to weighting values), or
    3. Fetch the object from the source

The algorithm is somewhat more complicated when firewalls are involved.

The cache_peer no-query option can be used to skip the ICP queries if the only appropriate source is a parent cache (i.e., if there's only one place you'd fetch the object from, why bother querying?)

What features are Squid developers currently working on?

The features and areas we work on are always changing. See the Squid Road Maps for more details on current activities.

Tell me more about Internet traffic workloads

Workload can be characterized as the burden a client or group of clients imposes on a system. Understanding the nature of workloads is important to the managing system capacity.

If you are interested in Internet traffic workloads then NLANR's Network Analysis activities is a good place to start.

What are the tradeoffs of caching with the NLANR cache system?

The NLANR root caches are at the NSF supercomputer centers (SCCs), which are interconnected via NSF's high speed backbone service (vBNS). So inter-cache communication between the NLANR root caches does not cross the Internet.

The benefits of hierarchical caching (namely, reduced network bandwidth consumption, reduced access latency, and improved resiliency) come at a price. Caches higher in the hierarchy must field the misses of their descendents. If the equilibrium hit rate of a leaf cache is 50%, half of all leaf references have to be resolved through a second level cache rather than directly from the object's source. If this second level cache has most of the documents, it is usually still a win, but if higher level caches often don't have the document, or become overloaded, then they could actually increase access latency, rather than reduce it.

Where can I find out more about firewalls?

Please see the Firewalls FAQ information site.

What is the "Storage LRU Expiration Age?"

For example:

Storage LRU Expiration Age:      4.31 days

The LRU expiration age is a dynamically-calculated value. Any objects which have not been accessed for this amount of time will be removed from the cache to make room for new, incoming objects. Another way of looking at this is that it would take your cache approximately this many days to go from empty to full at your current traffic levels.

As your cache becomes more busy, the LRU age becomes lower so that more objects will be removed to make room for the new ones. Ideally, your cache will have an LRU age value in the range of at least 3 days. If the LRU age is lower than 3 days, then your cache is probably not big enough to handle the volume of requests it receives. By adding more disk space you could increase your cache hit ratio.

What is "Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes"?

Consider a pair of caches named A and B. It may be the case that A can reach B, and vice-versa, but B has poor reachability to the rest of the Internet. In this case, we would like B to recognize that it has poor reachability and somehow convey this fact to its neighbor caches.

Squid will track the ratio of failed-to-successful requests over short time periods. A failed request is one which is logged as ERR_DNS_FAIL, ERR_CONNECT_FAIL, or ERR_READ_ERROR. When the failed-to-successful ratio exceeds 1.0, then Squid will return ICP_MISS_NOFETCH instead of ICP_MISS to neighbors. Note, Squid will still return ICP_HIT for cache hits.

Does squid periodically re-read its configuration file?

No, you must send a HUP signal to have Squid re-read its configuration file, including access control lists. An easy way to do this is with the -k command line option:

squid -k reconfigure

How does ''unlinkd'' work?

unlinkd is an external process used for unlinking unused cache files. Performing the unlink operation in an external process opens up some race-condition problems for Squid. If we are not careful, the following sequence of events could occur:

  • An object with swap file number S is removed from the cache.

  • We want to unlink file F which corresponds to swap file number S, so we write pathname F to the unlinkd socket. We also mark S as available in the filemap.

  • We have a new object to swap out. It is allocated to the first available file number, which happens to be S. Squid opens file F for writing.

  • The unlinkd process reads the request to unlink F and issues the actual unlink call.

So, the problem is, how can we guarantee that unlinkd will not remove a cache file that Squid has recently allocated to a new object? The approach we have taken is to have Squid keep a stack of unused (but not deleted!) swap file numbers. The stack size is hard-coded at 128 entries. We only give unlink requests to unlinkd when the unused file number stack is full. Thus, if we ever have to start unlinking files, we have a pool of 128 file numbers to choose from which we know will not be removed by unlinkd.

In terms of implementation, the only way to send unlink requests to the unlinkd process is via the storePutUnusedFileno function.

Unfortunately there are times when Squid can not use the unlinkd process but must call unlink(2) directly. One of these times is when the cache swap size is over the high water mark. If we push the released file numbers onto the unused file number stack, and the stack is not full, then no files will be deleted, and the actual disk usage will remain unchanged. So, when we exceed the high water mark, we must call unlink(2) directly.

What is an icon URL?

One of the most unpleasant things Squid must do is generate HTML pages of Gopher and FTP directory listings. For some strange reason, people like to have little icons next to each listing entry, denoting the type of object to which the link refers (image, text file, etc.).

We include a set of icons in the source distribution for this purpose. These icon files are loaded by Squid as cached objects at runtime. Thus, every Squid cache now has its own icons to use in Gopher and FTP listings. Just like other objects available on the web, we refer to the icons with Uniform Resource Locators, or URLs.

Can I make my regular FTP clients use a Squid cache?

Nope, its not possible. Squid only accepts HTTP requests. It speaks FTP on the server-side, but not on the client-side.

The very cool wget will download FTP URLs via Squid (and probably any other proxy cache).

Why is the select loop average time so high?

Is there any way to speed up the time spent dealing with select? Cachemgr shows:

 Select loop called: 885025 times, 714.176 ms avg

This number is NOT how much time it takes to handle filedescriptor I/O. We simply count the number of times select was called, and divide the total process running time by the number of select calls.

This means, on average it takes your cache .714 seconds to check all the open file descriptors once. But this also includes time select() spends in a wait state when there is no I/O on any file descriptors. My relatively idle workstation cache has similar numbers:

Select loop called: 336782 times, 715.938 ms avg

But my busy caches have much lower times:

Select loop called: 16940436 times, 10.427 ms avg
Select loop called: 80524058 times, 10.030 ms avg
Select loop called: 10590369 times, 8.675 ms avg
Select loop called: 84319441 times, 9.578 ms avg

How does Squid deal with Cookies?

The presence of Cookies headers in requests does not affect whether or not an HTTP reply can be cached. Similarly, the presense of Set-Cookie headers in replies does not affect whether the reply can be cached.

The proper way to deal with Set-Cookie reply headers, according to RFC 2109 is to cache the whole object, EXCEPT the Set-Cookie header lines.

However, we can filter out specific HTTP headers. But instead of filtering them on the receiving-side, we filter them on the sending-side. Thus, Squid does cache replies with Set-Cookie headers, but it filters out the Set-Cookie header itself for cache hits.

How does Squid decide when to refresh a cached object?

When checking the object freshness, we calculate these values:

  • OBJ_DATE is the time when the object was given out by the

origin server. This is taken from the HTTP Date reply header.

  • OBJ_LASTMOD is the time when the object was last modified,

given by the HTTP Last-Modified reply header.

  • OBJ_AGE is how much the object has aged since it was retrieved:

OBJ_AGE = NOW - OBJ_DATE
  • LM_AGE is how old the object was when it was retrieved:

LM_AGE = OBJ_DATE - OBJ_LASTMOD
  • LM_FACTOR is the ratio of OBJ_AGE to LM_AGE:

LM_FACTOR = OBJ_AGE / LM_AGE
  • CLIENT_MAX_AGE is the (optional) maximum object age the client will

accept as taken from the HTTP/1.1 Cache-Control request header.

  • EXPIRES is the (optional) expiry time from the server reply headers.

These values are compared with the parameters of the refresh_pattern rules. The refresh parameters are:

  • URL regular expression
  • CONF_MIN: The time (in minutes) an object without an explicit expiry time should be considered fresh. The recommended value is 0, any higher values may cause dynamic applications to be erronously cached unless the application designer has taken the appropriate actions.

  • CONF_PERCENT: A percentage of the objects age (time since last modification age) an object without explicit exipry time will be considered fresh.

  • CONF_MAX: An upper limit on how long objects without an explicit expiry time will be considered fresh.

The URL regular expressions are checked in the order listed until a match is found. Then the algorithms below are applied for determining if an object is fresh or stale.

The refresh algorithm used in Squid-2 looks like this:

    if (EXPIRES) {
        if (EXPIRES <= NOW)
            return STALE
        else
            return FRESH
    }
    if (CLIENT_MAX_AGE)
        if (OBJ_AGE > CLIENT_MAX_AGE)
            return STALE
    if (OBJ_AGE > CONF_MAX)
        return STALE
    if (OBJ_DATE > OBJ_LASTMOD) {
        if (LM_FACTOR < CONF_PERCENT)
            return FRESH
        else
            return STALE
    }
    if (OBJ_AGE <= CONF_MIN)
        return FRESH
    return STALE

What exactly is a ''deferred read''?

The cachemanager I/O page lists deferred reads for various server-side protocols.

Sometimes reading on the server-side gets ahead of writing to the client-side. Especially if your cache is on a fast network and your clients are connected at modem speeds. Squid will read up to read_ahead_gap bytes (default of 16 KB) ahead of the client before it starts to defer the server-side reads.

Why is my cache's inbound traffic equal to the outbound traffic?

I've been monitoring the traffic on my cache's ethernet adapter an found a behavior I can't explain: the inbound traffic is equal to the outbound traffic. The differences are negligible. The hit ratio reports 40%. Shouldn't the outbound be at least 40% greater than the inbound?

by David J N Begley

I can't account for the exact behavior you're seeing, but I can offer this advice; whenever you start measuring raw Ethernet or IP traffic on interfaces, you can forget about getting all the numbers to exactly match what Squid reports as the amount of traffic it has sent/received.

Why?

Squid is an application - it counts whatever data is sent to, or received from, the lower-level networking functions; at each successively lower layer, additional traffic is involved (such as header overhead, retransmits and fragmentation, unrelated broadcasts/traffic, etc.). The additional traffic is never seen by Squid and thus isn't counted - but if you run MRTG (or any SNMP/RMON measurement tool) against a specific interface, all this additional traffic will "magically appear".

Also remember that an interface has no concept of upper-layer networking (so an Ethernet interface doesn't distinguish between IP traffic that's entirely internal to your organization, and traffic that's to/from the Internet); this means that when you start measuring an interface, you have to be aware of *what* you are measuring before you can start comparing numbers elsewhere.

It is possible (though by no means guaranteed) that you are seeing roughly equivalent input/output because you're measuring an interface that both retrieves data from the outside world (Internet), *and* serves it to end users (internal clients). That wouldn't be the whole answer, but hopefully it gives you a few ideas to start applying to your own circumstance.

To interpret any statistic, you have to first know what you are measuring; for example, an interface counts inbound and outbound bytes - that's it. The interface doesn't distinguish between inbound bytes from external Internet sites or from internal (to the organization) clients (making requests). If you want that, try looking at RMON2.

Also, if you're talking about a 40% hit rate in terms of object requests/counts then there's absolutely no reason why you should expect a 40% reduction in traffic; after all, not every request/object is going to be the same size so you may be saving a lot in terms of requests but very little in terms of actual traffic.

How come some objects do not get cached?

To determine whether a given object may be cached, Squid takes many things into consideration. The current algorithm (for Squid-2) goes something like this:

  • Responses with Cache-Control: Private are NOT cachable.

  • Responses with Cache-Control: No-Cache are NOT cachable by Squid older than Squid-3.2.

  • Responses with Cache-Control: No-Store are NOT cachable.

  • Responses for requests with an Authorization header are cachable ONLY if the reponse includes Cache-Control: Public or some other special parameters controling revalidation.

  • The following HTTP status codes are cachable:
    • 200 OK
    • 203 Non-Authoritative Information
    • 300 Multiple Choices
    • 301 Moved Permanently
    • 410 Gone

However, if Squid receives one of these responses from a neighbor cache, it will NOT be cached if ALL of the Date, Last-Modified, and Expires reply headers are missing. This prevents such objects from bouncing back-and-forth between siblings forever.

A 302 Moved Temporarily response is cachable ONLY if the response also includes an Expires header.

The following HTTP status codes are "negatively cached" for a short amount of time (configurable):

  • 204 No Content
  • 305 Use Proxy
  • 400 Bad Request
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 414 Request-URI Too Large
  • 500 Internal Server Error
  • 501 Not Implemented
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Time-out

All other HTTP status codes are NOT cachable, including:

  • 206 Partial Content
  • 303 See Other
  • 304 Not Modified
  • 401 Unauthorized
  • 407 Proxy Authentication Required

What does ''keep-alive ratio'' mean?

The keep-alive ratio shows up in the server_list cache manager page.

This is a mechanism to try detecting neighbor caches which might not be able to deal with persistent connections. Every time we send a Connection: keep-alive request header to a neighbor, we count how many times the neighbor sent us a Connection: keep-alive reply header. Thus, the keep-alive ratio is the ratio of these two counters.

If the ratio stays above 0.5, then we continue to assume the neighbor properly implements persistent connections. Otherwise, we will stop sending the keep-alive request header to that neighbor.

How does Squid's cache replacement algorithm work?

Squid uses an LRU (least recently used) algorithm to replace old cache objects. This means objects which have not been accessed for the longest time are removed first. In the source code, the StoreEntry->lastref value is updated every time an object is accessed.

Objects are not necessarily removed "on-demand." Instead, a regularly scheduled event runs to periodically remove objects. Normally this event runs every second.

Squid keeps the cache disk usage between the low and high water marks. By default the low mark is 90%, and the high mark is 95% of the total configured cache size. When the disk usage is close to the low mark, the replacement is less aggressive (fewer objects removed). When the usage is close to the high mark, the replacement is more aggressive (more objects removed).

When selecting objects for removal, Squid examines some number of objects and determines which can be removed and which cannot. A number of factors determine whether or not any given object can be removed. If the object is currently being requested, or retrieved from an upstream site, it will not be removed. If the object is "negatively-cached" it will be removed. If the object has a private cache key, it will be removed (there would be no reason to keep it -- because the key is private, it can never be "found" by subsequent requests). Finally, if the time since last access is greater than the LRU threshold, the object is removed.

The LRU threshold value is dynamically calculated based on the current cache size and the low and high marks. The LRU threshold scaled exponentially between the high and low water marks. When the store swap size is near the low water mark, the LRU threshold is large. When the store swap size is near the high water mark, the LRU threshold is small. The threshold automatically adjusts to the rate of incoming requests. In fact, when your cache size has stabilized, the LRU threshold represents how long it takes to fill (or fully replace) your cache at the current request rate. Typical values for the LRU threshold are 1 to 10 days.

Back to selecting objects for removal. Obviously it is not possible to check every object in the cache every time we need to remove some of them. We can only check a small subset each time.

Every time an object is accessed, it gets moved to the top of a list. Over time, the least used objects migrate to the bottom of the list. When looking for objects to remove, we only need to check the last 100 or so objects in the list. Unfortunately this approach increases our memory usage because of the need to store three additional pointers per cache object. We also use cache keys with MD5 hashes.

What are private and public keys?

keys refers to the database keys which Squid uses to index cache objects. Every object in the cache--whether saved on disk or currently being downloaded--has a cache key. We use MD5 checksums for cache keys.

The Squid cache uses the notions of private and public cache keys. An object can start out as being private, but may later be changed to public status. Private objects are associated with only a single client whereas a public object may be sent to multiple clients at the same time. In other words, public objects can be located by any cache client. Private keys can only be located by a single client--the one who requested it.

Objects are changed from private to public after all of the HTTP reply headers have been received and parsed. In some cases, the reply headers will indicate the object should not be made public. For example, if the private Cache-Control directive is used.

What is FORW_VIA_DB for?

We use it to collect data for Plankton.

Does Squid send packets to port 7 (echo)? If so, why?

It may. This is an old feature from the Harvest cache software. The cache would send ICP "SECHO" message to the echo ports of origin servers. If the SECHO message came back before any of the other ICP replies, then it meant the origin server was probably closer than any neighbor cache. In that case Harvest/Squid sent the request directly to the origin server.

With more attention focused on security, many administrators filter UDP packets to port 7. The Computer Emergency Response Team (CERT) once issued an advisory note ( CA-96.01: UDP Port Denial-of-Service Attack) that says UDP echo and chargen services can be used for a denial of service attack. This made admins extremely nervous about any packets hitting port 7 on their systems, and they made complaints.

The source_ping feature has been disabled in Squid-2. If you're seeing packets to port 7 that are coming from a Squid cache (remote port 3130), then its probably a very old version of Squid.

What does "WARNING: Reply from unknown nameserver [a.b.c.d]" mean?

It means Squid sent a DNS query to one IP address, but the response came back from a different IP address. By default Squid checks that the addresses match. If not, Squid ignores the response.

There are a number of reasons why this would happen:

  1. Your DNS name server just works this way, either because its been configured to, or because its stupid and doesn't know any better.
  2. You have a weird broadcast address, like 0.0.0.0, in your /etc/resolv.conf file.

  3. Somebody is trying to send spoofed DNS responses to your cache.

If you recognize the IP address in the warning as one of your name server hosts, then its probably numbers (1) or (2).

You can make these warnings stop, and allow responses from "unknown" name servers by setting this configuration option:

ignore_unknown_nameservers off
  • /!\ WARNING: this opens your Squid up to many possible security breaches. You should prefer to configure your set of possible nameserver IPs correctly.

How does Squid distribute cache files among the available directories?

Note: The information here is current for version 2.2.

See storeDirMapAllocate() in the source code.

When Squid wants to create a new disk file for storing an object, it first selects which cache_dir the object will go into. This is done with the storeDirSelectSwapDir() function. If you have N cache directories, the function identifies the 3N/4 (75%) of them with the most available space. These directories are then used, in order of having the most available space. When Squid has stored one URL to each of the 3N/4 cache_dirs, the process repeats and storeDirSelectSwapDir() finds a new set of 3N/4 cache directories with the most available space.

Once the cache_dir has been selected, the next step is to find an available swap file number. This is accomplished by checking the file map, with the file_map_allocate() function. Essentially the swap file numbers are allocated sequentially. For example, if the last number allocated happens to be 1000, then the next one will be the first number after 1000 that is not already being used.

Why do I see negative byte hit ratio?

Byte hit ratio is calculated a bit differently than Request hit ratio. Squid counts the number of bytes read from the network on the server-side, and the number of bytes written to the client-side. The byte hit ratio is calculated as

        (client_bytes - server_bytes) / client_bytes

If server_bytes is greater than client_bytes, you end up with a negative value.

The server_bytes may be greater than client_bytes for a number of reasons, including:

  • Cache Digests and other internally generated requests. Cache Digest messages are quite large. They are counted in the server_bytes, but since they are consumed internally, they do not count in client_bytes.
  • User-aborted requests. If your quick_abort setting allows it, Squid sometimes continues to fetch aborted requests from the server-side, without sending any data to the client-side.

  • Some range requests, in combination with Squid bugs, can consume more bandwidth on the server-side than on the

client-side. In a range request, the client is asking for only some part of the object. Squid may decide to retrieve the whole object anyway, so that it can be used later on. This means downloading more from the server than sending to the client. You can affect this behavior with the range_offset_limit option.

What does "Disabling use of private keys" mean?

First you need to understand the difference between public and private keys.

When Squid sends ICP queries, it uses the ICP 'reqnum' field to hold the private key data. In other words, when Squid gets an ICP reply, it uses the 'reqnum' value to build the private cache key for the pending object.

Some ICP implementations always set the 'reqnum' field to zero when they send a reply. Squid can not use private cache keys with such neighbor caches because Squid will not be able to locate cache keys for those ICP replies. Thus, if Squid detects a neighbor cache that sends zero reqnum's, it disables the use of private cache keys.

Not having private cache keys has some important privacy implications. Two users could receive one response that was meant for only one of the users. This response could contain personal, confidential information. You will need to disable the 'zero reqnum' neighbor if you want Squid to use private cache keys.

What is a half-closed filedescriptor?

TCP allows connections to be in a "half-closed" state. This is accomplished with the shutdown(2) system call. In Squid, this means that a client has closed its side of the connection for writing, but leaves it open for reading. Half-closed connections are tricky because Squid can't tell the difference between a half-closed connection, and a fully closed one.

If Squid tries to read a connection, and read() returns 0, and Squid knows that the client doesn't have the whole response yet, Squid puts marks the filedescriptor as half-closed. Most likely the client has aborted the request and the connection is really closed. However, there is a slight chance that the client is using the shutdown() call, and that it can still read the response.

To disable half-closed connections, simply put this in squid.conf:

half_closed_clients off

Then, Squid will always close its side of the connection instead of marking it as half-closed.

What does --enable-heap-replacement do?

  • This option is only relevant for Squid-2. It has been replaced in Squid-3 by --enable-removal-policies=heap

Squid has traditionally used an LRU replacement algorithm. However with Squid version 2.4 and later you should use this configure option:

./configure --enable-heap-replacement

Currently, the heap replacement code supports two additional algorithms: LFUDA, and GDS.

Then, in squid.conf, you can select different policies with the cache_replacement_policy directive.

The LFUDA and GDS replacement code was contributed by John Dilley and others from Hewlett-Packard. Their work is described in these papers:

  • -

Enhancement and Validation of Squid's Cache Replacement Policy (HP Tech Report).

  • -

Enhancement and Validation of the Squid Cache Replacement Policy (WCW 1999 paper).

Why is actual filesystem space used greater than what Squid thinks?

If you compare df output and cachemgr storedir output, you will notice that actual disk usage is greater than what Squid reports. This may be due to a number of reasons:

  • Squid doesn't keep track of the size of the swap.state file, which normally resides on each cache_dir.

  • Directory entries and take up filesystem space.
  • Other applications might be using the same disk partition.
  • Your filesystem block size might be larger than what Squid thinks. When calculating total disk usage, Squid rounds file sizes up to a whole number of 1024 byte blocks. If your filesystem uses larger blocks, then some "wasted" space is not accounted.
  • Your cache has suffered some minor corruption and some objects have gotten lost without being removed from the swap.state file. Over time, Squid will detect this and automatically fix it.

How do ''positive_dns_ttl'' and ''negative_dns_ttl'' work?

positive_dns_ttl is how long Squid caches a successful DNS lookup. Similarly, negative_dns_ttl is how long Squid caches a failed DNS lookup.

positive_dns_ttl is not always used. It is NOT used in the following cases:

  • Squid-2.3 and later versions with internal DNS lookups. Internal lookups are the default for Squid-2.3 and later.
  • If you applied the "DNS TTL" for BIND as described in ../CompilingSquid.

  • If you are using FreeBSD, then it already has the DNS TTL patch built in.

Let's say you have the following settings:

positive_dns_ttl 1 hours
negative_dns_ttl 1 minutes

When Squid looks up a name like www.squid-cache.org, it gets back an IP address like 204.144.128.89. The address is cached for the next hour. That means, when Squid needs to know the address for www.squid-cache.org again, it uses the cached answer for the next hour. After one hour, the cached information expires, and Squid makes a new query for the address of www.squid-cache.org.

If you have the DNS TTL patch, or are using internal lookups, then each hostname has its own TTL value, which was set by the domain name administrator. You can see these values in the 'ipcache' cache manager page. For example:

 Hostname                      Flags lstref    TTL N
 www.squid-cache.org               C   73043  12784  1( 0)  204.144.128.89-OK
 www.ircache.net                   C   73812  10891  1( 0)   192.52.106.12-OK
 polygraph.ircache.net             C  241768 -181261  1( 0)   192.52.106.12-OK

The TTL field shows how how many seconds until the entry expires. Negative values mean the entry is already expired, and will be refreshed upon next use.

The negative_dns_ttl directive specifies how long to cache failed DNS lookups. When Squid fails to resolve a hostname, you can be pretty sure that it is a real failure, and you are not likely to get a successful answer within a short time period. Squid retries its lookups many times before declaring a lookup has failed. If you like, you can set negative_dns_ttl to zero.

What does ''swapin MD5 mismatch'' mean?

It means that Squid opened up a disk file to serve a cache hit, but it found that the stored object doesn't match what the user's request. Squid stores the MD5 digest of the URL at the start of each disk file. When the file is opened, Squid checks that the disk file MD5 matches the MD5 of the URL requested by the user. If they don't match, the warning is printed and Squid forwards the request to the origin server.

You do not need to worry about this warning. It means that Squid is automatically recovering from a corrupted cache directory.

What does ''failed to unpack swapfile meta data'' mean?

Each of Squid's disk cache files has a metadata section at the beginning. This header is used to store the URL MD5, some StoreEntry data, and more. When Squid opens a disk file for reading, it looks for the meta data header and unpacks it.

This warning means that Squid couln't unpack the meta data. This is non-fatal bug, from which Squid can recover. Perhaps the meta data was just missing, or perhaps the file got corrupted.

You do not need to worry about this warning. It means that Squid is double-checking that the disk file matches what Squid thinks should be there, and the check failed. Squid recovers and generates a cache miss in this case.

Why doesn't Squid make ''ident'' lookups in interception mode?

It is a side-effect of the way interception proxying works.

When Squid is configured for interception proxying, the operating system pretends that it is the origin server. That means that the "local" socket address for intercepted TCP connections is really the origin server's IP address. If you run netstat -n on your interception proxy, you'll see a lot of foreign IP addresses in the Local Address column.

When Squid wants to make an ident query, it creates a new TCP socket and binds the local endpoint to the same IP address as the local end of the client's TCP connection. Since the local address isn't really local (its some far away origin server's IP address), the bind() system call fails. Squid handles this as a failed ident lookup.

So why bind in that way? If you know you are interception proxying, then why not bind the local endpoint to the host's (intranet) IP address? Why make the masses suffer needlessly?

Because thats just how ident works. Please read RFC 931, in particular the RESTRICTIONS section.

What are FTP passive connections?

by Colin Campbell

FTP uses two data streams, one for passing commands around, the other for moving data. The command channel is handled by the ftpd listening on port 21.

The data channel varies depending on whether you ask for passive ftp or not. When you request data in a non-passive environment, you client tells the server "I am listening on <ip-address> <port>." The server then connects FROM port 20 to the ip address and port specified by your client. This requires your "security device" to permit any host outside from port 20 to any host inside on any port > 1023. Somewhat of a hole.

In passive mode, when you request a data transfer, the server tells the client "I am listening on <ip address> <port>." Your client then connects to the server on that IP and port and data flows.

When does Squid re-forward a client request?

When Squid forwards an HTTP request to the next hop (either a cache_peer or an origin server), things may go wrong. In some cases, Squid decides to re-forward the request. This section documents the associated Squid decision logic. Notes in {curly braces} are meant to help developers to correlate these comments with Squid sources. Non-developers should ignore those notes.

Warning: Squid uses two somewhat different methods for making re-forwarding decisions. {FwdState::checkRetry} and {FwdState::reforward}. Unfortunately, there are many different cases when at least one of those methods might be called and method decision may be affected by the calling sequence (i.e. the transaction state). The logic documented below does not match the reality in some corner cases. If you find a serious discrepancy with the real life use case that you care about, please file a documentation bug report.

Squid does not try to re-forward a request if at least one of the following conditions is true:

  • Squid is shutting down, although this is ignored by {FwdState::reforward}, one of the two decision making methods.

  • The number of forwarding attempts exceeded forward_max_tries. For example, if you set forward_max_tries to 1 (one), then no requests will be re-forwarded.

  • Squid successfully received a complete response. See below regarding the meaning of "received" in this context. {!FwdState.self}

  • The process of storing the response body (for the purpose of caching it or just for forwarding it to the client) was aborted. This may happen for numerous reasons usually dealing with some difficult-to-recover-from error conditions, possibly not even related to communication with the next hop. See below regarding the meaning of "received" in this context. {EBIT_TEST(e->flags, ENTRY_ABORTED)} and {entry->store_status != STORE_PENDING}.

  • Squid has not received the end of HTTP response headers but already generated some kind of internal error response. Note that if the response goes through a RESPMOD adaptation service, then "received" here means "received after adaptation" and not "received from the next HTTP hop". {entry->store_status != STORE_PENDING} and {!entry->isEmpty} in {FwdState::checkRetry}?

  • Squid discovers that the origin server speaks an unsupported protocol. {flags.dont_retry} set in {FwdState::dispatch}.

  • Squid detects a persistent connection race on a pinned connection. That is, Squid detects a pinned connection closure after sending [a part of] the request and before receiving anything from the server. Pinned connections are used for connection-based authentication and bumped SSL traffic. {flags.dont_retry} set in {FwdState::fail}.

  • The producer of the request body (either the client or a precache REQMOD adaptation service) has aborted. {flags.dont_retry} set in {ServerStateData::handleRequestBodyProducerAborted}.

  • HTTP response header size sent by the next hop exceeds reply_header_max_size. {flags.dont_retry} set in {HttpStateData::continueAfterParsingHeader}.

  • The received response body size exceeds reply_body_max_size configuration. Currently, this condition may only occur if precache RESPMOD adaptation is enabled for the response. {flags.dont_retry} set in {ServerStateData::sendBodyIsTooLargeError}.

  • A precache RESPMOD adaptation service has aborted. {flags.dont_retry} set in {ServerStateData::handleAdaptationAborted}.

  • A precache RESPMOD adaptation service has blocked the response. {flags.dont_retry} set in {ServerStateData::handleAdaptationBlocked}.

  • Squid FTP code has started STOR data transfer to the origin server.{flags.dont_retry} set in {FtpStateData::readStor}.

  • Squid has consumed some of the request body while trying to send the request to the next hop. This may happen if the request body is larger that the maximum Squid request buffer size: Squid has to consume at least some of the request body bytes in order to receive (and forward) more body bytes. There may be other cases when Squid nibbles at the request body. {request->bodyNibbled}.

  • Squid has successfully established a connection but did not receive HTTP response headers and the request is not "Safe" or "Idempotent" as defined in RFC 2619 Section 9.1. {flags.connected_okay && !checkRetriable}.

  • Squid has no alternative destinations to try. Please note that alternative destinations may include multiple next hop IP addresses and multiple peers.
  • retry_on_error is off and the received HTTP response status code is 403 (Forbidden), 500 (Internal Server Error), 501 (Not Implemented) or 503 (Service not available).

  • The received HTTP response status code is not one of the following codes: 403 (Forbidden), 500 (Internal Server Error), 501 (Not Implemented), 502 (Bad Gateway), 503 (Service not available), and 504 (Gateway Timeout).

In other cases, Squid tries to re-forward the request. If the failure was caused by a persistent connection race, Squid retries using the same destination address. Otherwise, Squid goes to next origin server or peer address in the list of alternative destinations.

Please note that this section covers forwarding retries only. A transaction may fail before Squid tries to forward the request (e.g., an HTTP request itself may be malformed or denied by Squid) or after Squid is done receiving the response (e.g., the response may be denied by Squid).

This analysis is based primarily on {FwdState::checkRetry}, {FwdState::reforward}, and related forwarding source code. This text is based on Squid trunk revision 12993 dated 2013-08-29. Hard-coded logic may have changed since then.


Include: Nothing found for "^Back to the"!

General advice

The settings detailed in this FAQ chapter are suggestion for operating-system-specific settings which may help when running busy caches. It is recommended to check that the settings have the desired effect by using the Cache Manager.

FreeBSD

Filedescriptors

For busy caches, it makes sense to increase the number of system-wide available filedescriptors, by setting in: in /etc/sysctl.conf

kern.maxfilesperproc=8192

Diskd

/!\

This information is out-of-date, as with newer FreeBSD versions these parameters can be tuned at runtime via sysctl. We're looking for contributions to update this page

In order to run diskd you may need to tweak your kernel settings. Try setting in the kernel config file (larger values may be needed for very busy caches):

options         MSGMNB=8192     # max # of bytes in a queue
options         MSGMNI=40       # number of message queue identifiers
options         MSGSEG=512      # number of message segments per queue
options         MSGSSZ=64       # size of a message segment
options         MSGTQL=2048     # max messages in system

options SHMSEG=16
options SHMMNI=32
options SHMMAX=2097152
options SHMALL=4096
options MAXFILES=16384


Solaris

TCP incompatibility?

J.D. Bronson (jb at ktxg dot com) reported that his Solaris box could not talk to certain origin servers, such as moneycentral.msn.com and www.mbnanetaccess.com. J.D. fixed his problem by setting:

tcp_xmit_hiwat 49152
tcp_xmit_lowat 4096
tcp_recv_hiwat 49152

PS. In Solaris 10 and above this parameters is system default (by Yuri Voinov).

select()

select(3c) won't handle more than 1024 file descriptors. The configure script should enable poll() by default for Solaris. poll() allows you to use many more filedescriptors, probably 8192 or more.

For older Squid versions you can enable poll() manually by changing HAVE_POLL in include/autoconf.h, or by adding -DUSE_POLL=1 to the DEFINES in src/Makefile.

malloc

libmalloc.a is leaky. Squid's configure does not use -lmalloc on Solaris.

DNS lookups and ''nscd''

by David J N Begley.

DNS lookups can be slow because of some mysterious thing called ncsd. You should edit /etc/nscd.conf and make it say:

enable-cache            hosts           no

Apparently nscd serializes DNS queries thus slowing everything down when an application (such as Squid) hits the resolver hard. You may notice something similar if you run a log processor executing many DNS resolver queries - the resolver starts to slow.. right.. down.. . . .

According to at online dot ee Andres Kroonmaa, users of Solaris starting from version 2.6 and up should NOT completely disable nscd daemon. nscd should be running and caching passwd and group files, although it is suggested to disable hosts caching as it may interfere with DNS lookups.

Several library calls rely on available free FILE descriptors FD < 256. Systems running without nscd may fail on such calls if first 256 files are all in use.

Since solaris 2.6 Sun has changed the way some system calls work and is using nscd daemon as a implementor of them. To communicate to nscd Solaris is using undocumented door calls. Basically nscd is used to reduce memory usage of user-space system libraries that use passwd and group files. Before 2.6 Solaris cached full passwd file in library memory on the first use but as this was considered to use up too much ram on large multiuser systems Sun has decided to move implementation of these calls out of libraries and to a single dedicated daemon.

DNS lookups and /etc/nsswitch.conf

by Jason Armistead.

The /etc/nsswitch.conf file determines the order of searches for lookups (amongst other things). You might only have it set up to allow NIS and HOSTS files to work. You definitely want the "hosts:" line to include the word dns, e.g.:

hosts:      nis dns [NOTFOUND=return] files

DNS lookups and NIS

by Chris Tilbury.

Our site cache is running on a Solaris 2.6 machine. We use NIS to distribute authentication and local hosts information around and in common with our multiuser systems, we run a slave NIS server on it to help the response of NIS queries.

We were seeing very high name-ip lookup times (avg ~2sec) and ip->name lookup times (avg ~8 sec), although there didn't seem to be that much of a problem with response times for valid sites until the cache was being placed under high load. Then, performance went down the toilet.

After some time, and a bit of detective work, we found the problem. On Solaris 2.6, if you have a local NIS server running (ypserv) and you have NIS in your /etc/nsswitch.conf hosts entry, then check the flags it is being started with. The 2.6 ypstart script checks to see if there is a resolv.conf file present when it starts ypserv. If there is, then it starts it with the -d option.

This has the same effect as putting the YP_INTERDOMAIN key in the hosts table -- namely, that failed NIS host lookups are tried against the DNS by the NIS server.

This is a bad thing(tm)! If NIS itself tries to resolve names using the DNS, then the requests are serialised through the NIS server, creating a bottleneck (This is the same basic problem that is seen with nscd). Thus, one failing or slow lookup can, if you have NIS before DNS in the service switch file (which is the most common setup), hold up every other lookup taking place.

If you're running in this kind of setup, then you will want to make sure that

  • ypserv doesn't start with the -d flag.

  • you don't have the YP_INTERDOMAIN key in the hosts table (find the B=-b line in the yp Makefile and change it to B=)

We changed these here, and saw our average lookup times drop by up to an order of magnitude (~150msec for name-ip queries and ~1.5sec for ip-name queries, the latter still so high, I suspect, because more of these fail and timeout since they are not made so often and the entries are frequently non-existent anyway).

Tuning

Have a look at Tuning your TCP/IP stack and more by Jens-S. Voeckler.

disk write error: (28) No space left on device

You might get this error even if your disk is not full, and is not out of inodes. Check your syslog logs (/var/adm/messages, normally) for messages like either of these:

NOTICE: realloccg /proxy/cache: file system full
NOTICE: alloc: /proxy/cache: file system full

In a nutshell, the UFS filesystem used by Solaris can't cope with the workload squid presents to it very well. The filesystem will end up becoming highly fragmented, until it reaches a point where there are insufficient free blocks left to create files with, and only fragments available. At this point, you'll get this error and squid will revise its idea of how much space is actually available to it. You can do a "fsck -n raw_device" (no need to unmount, this checks in read only mode) to look at the fragmentation level of the filesystem. It will probably be quite high (>15%).

Sun suggest two solutions to this problem. One costs money, the other is free but may result in a loss of performance (although Sun do claim it shouldn't, given the already highly random nature of squid disk access).

The first is to buy a copy of VxFS, the Veritas Filesystem. This is an extent-based filesystem and it's capable of having online defragmentation performed on mounted filesystems. This costs money, however (VxFS is not very cheap!)

The second is to change certain parameters of the UFS filesystem. Unmount your cache filesystems and use tunefs to change optimization to "space" and to reduce the "minfree" value to 3-5% (under Solaris 2.6 and higher, very large filesystems will almost certainly have a minfree of 2% already and you shouldn't increase this). You should be able to get fragmentation down to around 3% by doing this, with an accompanied increase in the amount of space available.

Thanks to Chris Tilbury.

Solaris X86 and IPFilter

by Jeff Madison

Important update regarding Squid running on Solaris x86. I have been working for several months to resolve what appeared to be a memory leak in squid when running on Solaris x86 regardless of the malloc that was used. I have made 2 discoveries that anyone running Squid on this platform may be interested in.

Number 1: There is not a memory leak in Squid even though after the system runs for some amount of time, this varies depending on the load the system is under, Top reports that there is very little memory free. True to the claims of the Sun engineer I spoke to this statistic from Top is incorrect. The odd thing is that you do begin to see performance suffer substantially as time goes on and the only way to correct the situation is to reboot the system. This leads me to discovery number 2.

Number 2: There is some type of resource problem, memory or other, with IPFilter on Solaris x86. I have not taken the time to investigate what the problem is because we no longer are using IPFilter. We have switched to a Alteon ACE 180 Gigabit switch which will do the trans-proxy for you. After moving the trans-proxy, redirection process out to the Alteon switch Squid has run for 3 days strait under a huge load with no problem what so ever. We currently have 2 boxes with 40 GB of cached objects on each box. This 40 GB was accumulated in the 3 days, from this you can see what type of load these boxes are under. Prior to this change we were never able to operate for more than 4 hours.

Because the problem appears to be with IPFilter I would guess that you would only run into this issue if you are trying to run Squid as a interception proxy using IPFilter. That makes sense. If there is anyone with information that would indicate my finding are incorrect I am willing to investigate further.

Changing the directory lookup cache size

by Mike Batchelor

On Solaris, the kernel variable for the directory name lookup cache size is ncsize. In /etc/system, you might want to try

set ncsize = 8192

or even higher. The kernel variable ufs_inode - which is the size of the inode cache itself - scales with ncsize in Solaris 2.5.1 and later. Previous versions of Solaris required both to be adjusted independently, but now, it is not recommended to adjust ufs_inode directly on 2.5.1 and later.

You can set ncsize quite high, but at some point - dependent on the application - a too-large ncsize will increase the latency of lookups.

Defaults are:

Solaris 2.5.1 : (max_nprocs + 16 + maxusers) + 64
Solaris 2.6/Solaris 7 : 4 * (max_nprocs + maxusers) + 320

The priority_paging algorithm

by Mike Batchelor

Another new tuneable (actually a toggle) in Solaris 2.5.1, 2.6 or Solaris 7 is the priority_paging algorithm. This is actually a complete rewrite of the virtual memory system on Solaris. It will page out application data last, and filesystem pages first, if you turn it on (set priority_paging = 1 in /etc/system). As you may know, the Solaris buffer cache grows to fill available pages, and under the old VM system, applications could get paged out to make way for the buffer cache, which can lead to swap thrashing and degraded application performance. The new priority_paging helps keep application and shared library pages in memory, preventing the buffer cache from paging them out, until memory gets REALLY short. Solaris 2.5.1 requires patch 103640-25 or higher and Solaris 2.6 requires 105181-10 or higher to get priority_paging. Solaris 7 needs no patch, but all versions have it turned off by default.

assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1'

by Marc

This crash happen on Solaris, when you don't have the "math.h" file at the compile time. I guess it can happen on every system without the correct include, but I have not verified.

The configure script just report: "math.h: no" and continue. The math functions are bad declared, and this cause this crash.

For 32bit Solaris, "math.h" is found in the SUNWlibm package.

FreeBSD

T/TCP bugs

We have found that with FreeBSD-2.2.2-RELEASE, there some bugs with T/TCP. FreeBSD will try to use T/TCP if you've enabled the "TCP Extensions." To disable T/TCP, use sysinstall to disable TCP Extensions, or edit /etc/rc.conf and set

tcp_extensions="NO"             # Allow RFC1323 & RFC1544 extensions (or NO).

or add this to your /etc/rc files:

sysctl -w net.inet.tcp.rfc1644=0

mbuf size

We noticed an odd thing with some of Squid's interprocess communication. Often, output from the dnsserver processes would NOT be read in one chunk. With full debugging, it looks like this:

1998/04/02 15:18:48| comm_select: FD 46 ready for reading
1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (100 bytes)
1998/04/02 15:18:48| ipcache_dnsHandleRead: Incomplete reply
....other processing occurs...
1998/04/02 15:18:48| comm_select: FD 46 ready for reading
1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (9 bytes)
1998/04/02 15:18:48| ipcache_parsebuffer: parsing:
$name www.karup.com
$h_name www.karup.inter.net
$h_len 4
$ipcount 2
38.15.68.128
38.15.67.128
$ttl 2348
$end

Interestingly, it is very common to get only 100 bytes on the first read. When two read() calls are required, this adds additional latency to the overall request. On our caches running Digital Unix, the median dnsserver response time was measured at 0.01 seconds. On our FreeBSD cache, however, the median latency was 0.10 seconds.

Here is a simple patch to fix the bug:

============================
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.40
retrieving revision 1.41
diff -p -u -r1.40 -r1.41
--- src/sys/kern/uipc_socket.c  1998/05/15 20:11:30     1.40
+++ /home/ncvs/src/sys/kern/uipc_socket.c       1998/07/06 19:27:14     1.41
@@ -31,7 +31,7 @@
  * SUCH DAMAGE.
  *
  *     @(#)uipc_socket.c       8.3 (Berkeley) 4/15/94
- *     $Id: FAQ.sgml,v 1.250 2005/04/22 19:29:50 hno Exp $
+ *     $Id: FAQ.sgml,v 1.250 2005/04/22 19:29:50 hno Exp $
  */
 #include <sys/param.h>
@@ -491,6 +491,7 @@ restart:
                                mlen = MCLBYTES;
                                len = min(min(mlen, resid), space);
                        } else {
+                               atomic = 1;
 nopages:
                                len = min(min(mlen, resid), space);
                                /*

Another technique which may help, but does not fix the bug, is to increase the kernel's mbuf size. The default is 128 bytes. The MSIZE symbol is defined in /usr/include/machine/param.h. However, to change it we added this line to our kernel configuration file:

        options         MSIZE="256"

Dealing with NIS

/var/yp/Makefile has the following section:

        # The following line encodes the YP_INTERDOMAIN key into the hosts.byname
        # and hosts.byaddr maps so that ypserv(8) will do DNS lookups to resolve
        # hosts not in the current domain. Commenting this line out will disable
        # the DNS lookups.
        B=-b

You will want to comment out the B=-b line so that ypserv does not do DNS lookups.

FreeBSD 3.3: The lo0 (loop-back) device is not configured on startup

Squid requires a the loopback interface to be up and configured. If it is not, you will get errors such as [FAQ-11.html#comm-bind-loopback-fail commBind].

From FreeBSD 3.3 Errata Notes:

Fix: Assuming that you experience this problem at all, edit ''/etc/rc.conf''
and search for where the network_interfaces variable is set.  In
its value, change the word ''auto'' to ''lo0'' since the auto keyword
doesn't bring the loop-back device up properly, for reasons yet to
be adequately determined.  Since your other interface(s) will already
be set in the network_interfaces variable after initial installation,
it's reasonable to simply s/auto/lo0/ in rc.conf and move on.

Thanks to at lentil dot org Robert Lister.

FreeBSD 3.x or newer: Speed up disk writes using Softupdates

by Andre Albsmeier

FreeBSD 3.x and newer support Softupdates. This is a mechanism to speed up disk writes as it is possible by mounting ufs volumes async. However, Softupdates does this in a way that a performance similar or better than async is achieved but without loosing security in a case of a system crash. For more detailed information and the copyright terms see /sys/contrib/softupdates/README and /sys/ufs/ffs/README.softupdate.

To build a system supporting softupdates, you have to build a kernel with options SOFTUPDATES set (see LINT for a commented out example). After rebooting with the new kernel, you can enable softupdates on a per filesystem base with the command:

        $ tunefs -n /mountpoint

The filesystem in question MUST NOT be mounted at this time. After that, softupdates are permanently enabled and the filesystem can be mounted normally. To verify that the softupdates code is running, simply issue a mount command and an output similar to the following will appear:

        $ mount
        /dev/da2a on /usr/local/squid/cache (ufs, local, noatime, soft-updates, writes: sync 70 async 225)

Internal DNS problems with jail environment

Some users report problems with running Squid in the jail environment. Specifically, Squid logs messages like:

2001/10/12 02:08:49| comm_udp_sendto: FD 4, 192.168.1.3, port 53: (22) Invalid argument
2001/10/12 02:08:49| idnsSendQuery: FD 4: sendto: (22) Invalid argument

You can eliminate the problem by putting the jail's network interface address in the 'udp_outgoing_addr' configuration option in squid.conf.

"Zero Sized Reply" error due to TCP blackholing

by David Landgren

On FreeBSD, make sure that TCP blackholing is not active. You can verify the current setting with:

# /sbin/sysctl net.inet.tcp.blackhole

It should return the following output:

net.inet.tcp.blackhole: 0

If it is set to a positive value (usually, 2), disable it by setting it back to zero with<

# /sbin/sysctl net.inet.tcp.blackhole=0

To make sure the setting survives across reboots, add the following line to the file /etc/sysctl.conf:

net.inet.tcp.blackhole=0

OSF1/3.2

If you compile both libgnumalloc.a and Squid with cc, the mstats() function returns bogus values. However, if you compile libgnumalloc.a with gcc, and Squid with cc, the values are correct.

BSD/OS

gcc/yacc

Some people report [FAQ-2.html#bsdi-compile difficulties compiling squid on BSD/OS].

process priority

I've noticed that my Squid process seems to stick at a nice value of four, and clicks back to that even after I renice it to a higher priority. However, looking through the Squid source, I can't find any instance of a setpriority() call, or anything else that would seem to indicate Squid's adjusting its own priority.

by Bill Bogstad

BSD Unices traditionally have auto-niced non-root processes to 4 after they used alot (4 minutes???) of CPU time. My guess is that it's the BSD/OS not Squid that is doing this. I don't know offhand if there is a way to disable this on BSD/OS.

by Arjan de Vet

You can get around this by starting Squid with nice-level -4 (or another negative value).

by at nl dot compuware dot com Bert Driehuis

The autonice behavior is a leftover from the history of BSD as a university OS. It penalises CPU bound jobs by nicing them after using 600 CPU seconds. Adding

        sysctl -w kern.autonicetime=0

to /etc/rc.local will disable the behavior systemwide.

Linux

Generally we recommend you use Squid with an up-to-date Linux distribution, preferably one with a 2.6 kernel. Recent 2.6 kernels support some features in new versions of Squid such as epoll and WCCP/GRE support built in that will give better performance and flexibility. Note that Squid will however still function just fine under older Linux kernels. You will need to be mindful of the security implications of running your Squid proxy on the Internet if you are using a very old and unsupported distribution.

There have been issues with GLIBC in some very old distributions, and upgrading or fixing GLIBC is not for the faint of heart.

FATAL: Don't run Squid as root, set 'cache_effective_user'!

Some users have reported that setting cache_effective_user to nobody under Linux does not work. However, it appears that using any cache_effective_user other than nobody will succeed. One solution is to create a user account for Squid and set cache_effective_user to that. Alternately you can change the UID for the nobody account from 65535 to 65534.

Russ Mellon notes that these problems with cache_effective_user are fixed in version 2.2.x of the Linux kernel.

Large ACL lists make Squid slow

The regular expression library which comes with Linux is known to be very slow. Some people report it entirely fails to work after long periods of time.

To fix, use the GNUregex library included with the Squid source code. With Squid-2, use the --enable-gnuregex configure option.

gethostbyname() leaks memory in RedHat 6.0 with glibc 2.1.1.

by at netsoft dot ro Radu Greab

The gethostbyname() function leaks memory in RedHat 6.0 with glibc 2.1.1. The quick fix is to delete nisplus service from hosts entry in /etc/nsswitch.conf. In my tests dnsserver memory use remained stable after I made the above change.

See RedHat bug id 3919.

assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1' on Alpha system.

by Jamie Raymond

Some early versions of Linux have a kernel bug that causes this. All that is needed is a recent kernel that doesn't have the mentioned bug.

tools.c:605: storage size of `rl' isn't known

This is a bug with some versions of glibc. The glibc headers incorrectly depended on the contents of some kernel headers. Everything broke down when the kernel folks rearranged a bit in the kernel-specific header files.

We think this glibc bug is present in versions 2.1.1 (or 2.1.0) and earlier. There are two solutions:

  • Make sure /usr/include/linux and /usr/include/asm are from the kernel version glibc is build/configured for, not any other kernel version. Only compiling of loadable kernel modules outside of the kernel sources depends on having the current versions of these, and for such builds -I/usr/src/linux/include (or where ever the new kernel headers are located) can be used to resolve the matter.
  • Upgrade glibc to 2.1.2 or later. This is always a good idea anyway, provided a prebuilt upgrade package exists for the Linux distribution used.. Note: Do not attempt to manually build and install glibc from source unless you know exactly what you are doing, as this can easily render the system unuseable.

Can't connect to some sites through Squid

When using Squid, some sites may give erorrs such as "(111) Connection refused" or "(110) Connection timed out" although these sites work fine without going through Squid.

Linux 2.6 implements Explicit Congestion Notification (ECN) support and this can cause some TCP connections to fail when contacting some sites with broken firewalls or broken TCP/IP implementations.

As of June 2006, the number of sites that fail when ECN is enabled is very low and you may find you benefit more from having this feature enabled than globally turning it off.

To work around such broken sites you can disable ECN with the following command:

echo 0 > /proc/sys/net/ipv4/tcp_ecn

HenrikNordstrom explains:

ECN is an standard extension to TCP/IP, making TCP/IP behave better in
overload conditions where the available bandwidth is all used up (i.e.
the default condition for any WAN link).
Defined by Internet RFC3168 issued by the Networking Working Group at
IETF, the standardization body responsible for the evolution of TCP/IP
and other core Internet technologies such as routing.
It's implemented by using two previously unused bits (of 6) in the TCP
header, plus redefining two bits of the never standardized TOS field in
the IP header (dividing TOS in 6 bits Diffserv and 2 bit ECN fields),
allowing routers to clearly indicate overload conditions to the
participating computers instead of dropping packets hoping that the
computers will realize there is too much traffic.
The main problem is the use of those previously unused bits in the TCP
header. The TCP/IP standard has always said that those bits is reserved
for future use, but many old firewalls assume the bits will never be
used and simply drops all traffic using this new feature thinking it's
invalid use of TCP/IP to evolve beyond the original standards from 1981.
ECN in it's final form was defined 2001, but earlier specifications was
circulated several years earlier.

See also the thread on the NANOG mailing list, RFC3168 &quot;The Addition of Explicit Congestion Notification (ECN) to IP, PROPOSED STANDARD&quot; , Sally Floyd's page on ECN and problems related to it or ECN Hall of Shame for more information.

Some sites load extremely slowly or not at all

You may occasionally have problems with TCP Window Scaling on Linux. At first you may be able to TCP connect to the site, but then unable to transfer any data across your connection or that data flows extremely slowly. This is due to some broken firewalls on the Internet (it is not a bug with Linux) mangling the window scaling option when the TCP connection is established. More details and a workaround can be found at lwn.net.

Window scaling is a standard TCP feature which makes TCP perform well over high speed wan links. Without window scaling the round trip latency seriously limits the bandwidth that can be used by a single TCP connection.

The reason why this is experienced with Linux and not most other OS:es is that all desktop OS:es advertise a quite small window scaling factor if at all, and therefore the firewall bug goes unnoticed with these OS:es. Windows OS:es is also known to have plenty of workarounds to automatically and silently work around these issues, where the Linux community has as policy to not make such workarounds, most likely in an attempt trying to put some pressure on getting the failing network equipment fixed..

To test if this is the source of your problem try the following:

echo 0 >/proc/sys/net/ipv4/tcp_window_scaling

But be warned that this will quite noticeably degrade TCP performance.

Other possible alternatives is setting tcp_recv_bufsize in squid.conf, or using the /sbin/ip route ... window=xxx option.

IRIX

''dnsserver'' always returns 255.255.255.255

There is a problem with GCC (2.8.1 at least) on Irix 6 which causes it to always return the string 255.255.255.255 for _ANY_ address when calling inet_ntoa(). If this happens to you, compile Squid with the native C compiler instead of GCC.

SCO-UNIX

by F.J. Bosscha

To make squid run comfortable on SCO-unix you need to do the following:

Increase the NOFILES paramater and the NUMSP parameter and compile squid with I had, although squid told in the cache.log file he had 3000 filedescriptors, problems with the messages that there were no filedescriptors more available. After I increase also the NUMSP value the problems were gone.

One thing left is the number of tcp-connections the system can handle. Default is 256, but I increase that as well because of the number of clients we have.

AIX

"shmat failed" errors with ''diskd''

32-bit processes on AIX and later are restricted by default to a maximum of 11 shared memory segments. This restriction can be removed on AIX 4.2.1 and later by setting the environment variable EXTSHM=ON in the script or shell which starts squid.

Core dumps when squid process grows to 256MB

32-bit processes cannot use more than 256MB of stack and data in the default memory model. To force the loader to use large address space for squid, either:

  • set the LDR_CNTRL environment variable,

eg LDR_CNTRL="MAXDATA=0x80000000"; or

  • link with -bmaxdata:0x80000000; or

  • patch the squid binary

See IBM's documentation on large program support for more information, including how to patch an already-compiled program.


Include: Nothing found for "^Back to the"!

Cache Digest FAQs compiled by Niall Doherty <ndoherty AT eei DOT ericsson DOT se>.

What is a Cache Digest?

A Cache Digest is a summary of the contents of an Internet Object Caching Server. It contains, in a compact (i.e. compressed) format, an indication of whether or not particular URLs are in the cache.

A "lossy" technique is used for compression, which means that very high compression factors can be achieved at the expense of not having 100% correct information.

How and why are they used?

Cache servers periodically exchange their digests with each other.

When a request for an object (URL) is received from a client a cache can use digests from its peers to find out which of its peers (if any) have that object. The cache can then request the object from the closest peer (Squid uses the NetDB database to determine this).

Note that Squid will only make digest queries in those digests that are enabled. It will disable a peers digest IFF it cannot fetch a valid digest for that peer. It will enable that peers digest again when a valid one is fetched.

The checks in the digest are very fast and they eliminate the need for per-request queries to peers. Hence:

  • Latency is eliminated and client response time should be improved.
  • Network utilisation may be improved.

Note that the use of Cache Digests (for querying the cache contents of peers) and the generation of a Cache Digest (for retrieval by peers) are independent. So, it is possible for a cache to make a digest available for peers, and not use the functionality itself and vice versa.

What is the theory behind Cache Digests?

Cache Digests are based on Bloom Filters - they are a method for representing a set of keys with lookup capabilities; where lookup means "is the key in the filter or not?".

In building a cache digest:

  • A vector (1-dimensional array) of m bits is allocated, with all bits initially set to 0.
  • A number, k, of independent hash functions are chosen, h1, h2, ..., hk, with range { 1, ..., m } (i.e. a key hashed with any of these functions gives a value between 1 and m inclusive).
  • The set of n keys to be operated on are denoted by: A = { a1, a2, a3, ..., an }.

Adding a Key

To add a key the value of each hash function for that key is calculated. So, if the key was denoted by a, then h1(a), h2(a), ..., hk(a) are calculated.

The value of each hash function for that key represents an index into the array and the corresponding bits are set to 1. So, a digest with 6 hash functions would have 6 bits to be set to 1 for each key added.

Note that the addition of a number of different keys could cause one particular bit to be set to 1 multiple times.

Querying a Key

To query for the existence of a key the indices into the array are calculated from the hash functions as above.

  • If any of the corresponding bits in the array are 0 then the key is not present.
  • If all of the corresponding bits in the array are 1 then the key is likely to be present.

Note the term likely. It is possible that a collision in the digest can occur, whereby the digest incorrectly indicates a key is present. This is the price paid for the compact representation. While the probability of a collision can never be reduced to zero it can be controlled. Larger values for the ratio of the digest size to the number of entries added lower the probability. The number of hash functions chosen also influence the probability.

Deleting a Key

To delete a key, it is not possible to simply set the associated bits to 0 since any one of those bits could have been set to 1 by the addition of a different key!

Therefore, to support deletions a counter is required for each bit position in the array. The procedures to follow would be:

  • When adding a key, set appropriate bits to 1 and increment the corresponding counters.
  • When deleting a key, decrement the appropriate counters (while > 0), and if a counter reaches 0 then the corresponding bit is set to 0.

How is the size of the Cache Digest in Squid determined?

Upon initialisation, the capacity is set to the number of objects that can be (are) stored in the cache. Note that there are upper and lower limits here.

An arbitrary constant, bits_per_entry (currently set to 5), is used to calculate the size of the array using the following formula:

 number of bits in array = capacity * bits_per_entry + 7

The size of the digest, in bytes, is therefore:

digest size = int (number of bits in array / 8)

When a digest rebuild occurs, the change in the cache size (capacity) is measured. If the capacity has changed by a large enough amount (10%) then the digest array is freed and reallocated memory, otherwise the same digest is re-used.

What hash functions (and how many of them) does Squid use?

The protocol design allows for a variable number of hash functions (k). However, Squid employs a very efficient method using a fixed number - four.

Rather than computing a number of independent hash functions over a URL Squid uses a 128-bit MD5 hash of the key (actually a combination of the URL and the HTTP retrieval method) and then splits this into four equal chunks.

Each chunk, modulo the digest size (m), is used as the value for one of the hash functions - i.e. an index into the bit array.

Note: As Squid retrieves objects and stores them in its cache on disk, it adds them to the in-RAM index using a lookup key which is an MD5 hash - the very one discussed above. This means that the values for the Cache Digest hash functions are already available and consequently the operations are extremely efficient!

Obviously, modifying the code to support a variable number of hash functions would prove a little more difficult and would most likely reduce efficiency.

How are objects added to the Cache Digest in Squid?

Every object referenced in the index in RAM is checked to see if it is suitable for addition to the digest.

A number of objects are not suitable, e.g. those that are private, not cachable, negatively cached etc. and are skipped immediately.

A freshness test is next made in an attempt to guess if the object will expire soon, since if it does, it is not worthwhile adding it to the digest. The object is checked against the refresh patterns for staleness...

Since Squid stores references to objects in its index using the MD5 key discussed earlier there is no URL actually available for each object - which means that the pattern used will fall back to the default pattern, ".". This is an unfortunate state of affairs, but little can be done about it. A cd_refresh_pattern option will be added to the configuration file soon which will at least make the confusion a little clearer :-)

Note that it is best to be conservative with your refresh pattern for the Cache Digest, i.e. do not add objects if they might become stale soon. This will reduce the number of False Hits.

Does Squid support deletions in Cache Digests? What are diffs/deltas?

Squid does not support deletions from the digest. Because of this the digest must, periodically, be rebuilt from scratch to erase stale bits and prevent digest pollution.

A more sophisticated option is to use diffs or deltas. These would be created by building a new digest and comparing with the current/old one. They would essentially consist of aggregated deletions and additions since the previous digest.

Since less bandwidth should be required using these it would be possible to have more frequent updates (and hence, more accurate information).

Costs:

  • RAM - extra RAM needed to hold two digests while comparisons takes place.
  • CPU - probably a negligible amount.

When and how often is the local digest built?

The local digest is built:

  • when store_rebuild completes after startup (the cache contents have been indexed in RAM), and
  • periodically thereafter. Currently, it is rebuilt every hour (more data and experience is required before other periods, whether fixed or dynamically varying, can "intelligently" be chosen). The good thing is that the local cache decides on the expiry time and peers must obey (see later).

While the (new) digest is being built in RAM the old version (stored on disk) is still valid, and will be returned to any peer requesting it. When the digest has completed building it is then swapped out to disk, overwriting the old version.

The rebuild is CPU intensive, but not overly so. Since Squid is programmed using an event-handling model, the approach taken is to split the digest building task into chunks (i.e. chunks of entries to add) and to register each chunk as an event. If CPU load is overly high, it is possible to extend the build period - as long as it is finished before the next rebuild is due!

It may prove more efficient to implement the digest building as a separate process/thread in the future...

How are Cache Digests transferred between peers?

Cache Digests are fetched from peers using the standard HTTP protocol (note that a pull rather than push technique is used).

After the first access to a peer, a peerDigestValidate event is queued (this event decides if it is time to fetch a new version of a digest from a peer). The queuing delay depends on the number of peers already queued for validation - so that all digests from different peers are not fetched simultaneously.

A peer answering a request for its digest will specify an expiry time for that digest by using the HTTP Expires header. The requesting cache thus knows when it should request a fresh copy of that peers digest.

Note: requesting caches use an If-Modified-Since request in case the peer has not rebuilt its digest for some reason since the last time it was fetched.

How and where are Cache Digests stored?

Cache Digest built locally

Since the local digest is generated purely for the benefit of its neighbours keeping it in RAM is not strictly required. However, it was decided to keep the local digest in RAM partly because of the following:

  • Approximately the same amount of memory will be (re-)allocated on every rebuild of the digest
  • the memory requirements are probably quite small (when compared to other requirements of the cache server)
  • if ongoing updates of the digest are to be supported (e.g. additions/deletions) it will be necessary to perform these operations on a digest in RAM
  • if diffs/deltas are to be supported the "old" digest would have to be swapped into RAM anyway for the comparisons.

When the digest is built in RAM, it is then swapped out to disk, where it is stored as a "normal" cache item - which is how peers request it.

Cache Digest fetched from peer

When a query from a client arrives, fast lookups are required to decide if a request should be made to a neighbour cache. It it therefore required to keep all peer digests in RAM.

Peer digests are also stored on disk for the following reasons:

  • Recovery If stopped and restarted, peer digests can be reused from the local on-disk copy (they will soon be validated using an HTTP IMS request to the appropriate peers as discussed earlier)

  • Sharing peer digests are stored as normal objects in the cache. This allows them to be given to neighbour caches.

How are the Cache Digest statistics in the Cache Manager to be interpreted?

Cache Digest statistics can be seen from the Cache Manager or through the squidclient utility. The following examples show how to use the squidclient utility to request the list of possible operations from the localhost, local digest statistics from the localhost, refresh statistics from the localhost and local digest statistics from another cache, respectively.

  squidclient mgr:menu
  squidclient mgr:store_digest
  squidclient mgr:refresh
  squidclient -h peer mgr:store_digest

The available statistics provide a lot of useful debugging information. The refresh statistics include a section for Cache Digests which explains why items were added (or not) to the digest.

The following example shows local digest statistics for a 16GB cache in a corporate intranet environment (may be a useful reference for the discussion below).