Contents
- What is Squid?
- What is Internet object caching?
- Why is it called Squid?
- What is the latest version of Squid?
- Who is responsible for Squid?
- Where can I get Squid?
- What Operating Systems does Squid support?
- Does Squid run on Windows ?
- What Squid mailing lists are available?
- I can't figure out how to unsubscribe from your mailing list.
- What other Squid-related documentation is available?
- Does Squid support SSL/HTTPS/TLS?
- What's the legal status of Squid?
- How to add a new Squid feature, enhance, of fix something?
- Can I pay someone for Squid support?
- Squid FAQ contributors
- About This Document
- Want to contribute?
- Which file do I download to get Squid?
- Do you have pre-compiled binaries available?
- How do I compile Squid?
- Building Squid on ...
- I see a lot warnings while compiling Squid.
- undefined reference to __inet_ntoa
- How big of a system do I need to run Squid?
- How do I install Squid?
- How do I start Squid?
- How do I start Squid automatically when the system boots?
- How do I tell if Squid is running?
- squid command line options
- How do I see how Squid works?
- Can Squid benefit from SMP systems?
- Is it okay to use separate drives for Squid?
- Is it okay to use RAID on Squid?
- How do I configure Squid without re-compiling it?
- What does the squid.conf file do?
- Where can I find examples and configuration for a Feature?
- Do you have a squid.conf example?
- How do I join a cache hierarchy?
- How do I join NLANR's cache hierarchy?
- Why should I want to join NLANR's cache hierarchy?
- How do I register my cache with NLANR's registration service?
- How do I find other caches close to me and arrange parent/child/sibling relationships with them?
- My cache registration is not appearing in the Tracker database.
- How do I configure Squid to work behind a firewall?
- How do I configure Squid forward all requests to another proxy?
- What ''cache_dir'' size should I use?
- I'm adding a new cache_dir. Will I lose my cache?
- Squid and http-gw from the TIS toolkit.
- What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?
- Can Squid anonymize HTTP requests?
- Can I make Squid go direct for some sites?
- Can I make Squid proxy only, without caching anything?
- Can I prevent users from downloading large files?
- Communication between browsers and Squid
- Manual Browser Configuration
- Partially Automatic Configuration
- Fully Automatically Configuring Browsers for WPAD
- Redundant Proxy Auto-Configuration
- Proxy Auto-Configuration with URL Hashing
- How do I tell Squid to use a specific username for FTP urls?
- IE 5.0x crops trailing slashes from FTP URL's
- IE 6.0 SP1 fails when using authentication
- Squid Log Files
- squid.out
- cache.log
- useragent.log
- store.log
- access.log
- access.log native format in detail
- sending access.log to syslog
- customizable access.log
- swap.state
- Which log files can I delete safely?
- How can I disable Squid's log files?
- What is the maximum size of access.log?
- My log files get very big!
- I want to use another tool to maintain the log files.
- Managing log files
- Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
- What does ERR_LIFETIME_EXP mean?
- Retrieving "lost" files from the cache
- Can I use store.log to figure out if a response was cachable?
- Can I pump the squid access.log directly into a pipe?
- How do I see system level Squid statistics?
- Managing the Cache Storage
- Using ICMP to Measure the Network
- Why are so few requests logged as TCP_IMS_MISS?
- Why can't I run Squid as root?
- Can you tell me a good way to upgrade Squid with minimal downtime?
- Can Squid listen on more than one HTTP port?
- Can I make origin servers see the client's IP address when going through Squid?
- Why does Squid use so much memory!?
- How can I tell how much memory my Squid process is using?
- My Squid process grows without bounds.
- I set cache_mem to XX, but the process grows beyond that!
- How do I analyze memory usage from the cache manger output?
- The "Total memory accounted" value is less than the size of my Squid process.
- xmalloc: Unable to allocate 4096 bytes!
- fork: (12) Cannot allocate memory
- What can I do to reduce Squid's memory usage?
- Using an alternate malloc library
- How much memory do I need in my Squid server?
- Why can't my Squid process grow beyond a certain size?
- What is the cache manager?
- How do you set it up?
- Cache manager configuration for CERN httpd 3.0
- Cache manager configuration for Apache 1.x
- Cache manager configuration for Apache 2.x
- Cache manager configuration for Roxen 2.0 and later
- Cache manager access from squidclient
- Cache manager ACLs in squid.conf
- Why does it say I need a password and a URL?
- I want to shutdown the cache remotely. What's the password?
- How do I make the cache host default to my cache?
- What's the difference between Squid TCP connections and Squid UDP connections?
- It says the storage expiration will happen in 1970!
- What do the Meta Data entries mean?
- In the utilization section, what is Other?
- In the utilization section, why is the Transfer KB/sec column always zero?
- In the utilization section, what is the Object Count?
- In the utilization section, what is the Max/Current/Min KB?
- What is the I/O section about?
- What is the Objects section for?
- What is the VM Objects section for?
- What does AVG RTT mean?
- In the IP cache section, what's the difference between a hit, a negative hit and a miss?
- What do the IP cache contents mean anyway?
- What is the fqdncache and how is it different from the ipcache?
- What does "Page faults with physical i/o: 4897" mean?
- What does the IGNORED field mean in the 'cache server list'?
- ACL elements
- Access Lists
- How do I allow my clients to use the cache?
- how do I configure Squid not to cache a specific server?
- How do I implement an ACL ban list?
- How do I block specific users or groups from accessing my cache?
- Do you have a CGI program which lets users change their own proxy passwords?
- Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
- Common Mistakes
- I set up my access controls, but they don't work! why?
- Proxy-authentication and neighbor caches
- Is there an easy way of banning all Destination addresses except one?
- How can I block access to porn sites?
- Does anyone have a ban list of porn sites and such?
- Squid doesn't match my subdomains
- Why does Squid deny some port numbers?
- Does Squid support the use of a database such as mySQL for storing the ACL list?
- How can I allow a single address to access a specific URL?
- How can I allow some clients to use the cache at specific times?
- How can I allow some users to use the cache at specific times?
- Problems with IP ACL's that have complicated netmasks
- Can I set up ACL's based on MAC address rather than IP?
- Can I limit the number of connections from a client?
- I'm trying to deny ''foo.com'', but it's not working.
- I want to customize, or make my own error messages.
- I want to use local time zone in error messages.
- I want to put ACL parameters in an external file.
- I want to authorize users depending on their MS Windows group memberships
- Maximum length of an acl name
- Fast and Slow ACLs
- Starting Point
- Why am I getting "Proxy Access Denied?"
- Connection Refused when reaching a sibling
- Running out of filedescriptors
- What are these strange lines about removing objects?
- Can I change a Windows NT FTP server to list directories in Unix format?
- Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
- DNS lookups for domain names with underscores (_) always fail.
- Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
- Why am I getting access denied from a sibling cache?
- Cannot bind socket FD NN to *:8080 (125) Address already in use
- icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
- icpDetectClientClose: FD 135, 255 unexpected bytes
- Does Squid work with NTLM Authentication?
- "Hotmail" complains about: Intrusion Logged. Access denied.
- My Squid becomes very slow after it has been running for some time.
- WARNING: Failed to start 'dnsserver'
- Sending bug reports to the Squid team
- Debugging Squid
- FATAL: ipcache_init: DNS name lookup tests failed
- FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
- FATAL: Cannot open HTTP Port
- FATAL: All redirectors have exited!
- FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
- pingerOpen: icmp_sock: (13) Permission denied
- What is a forwarding loop?
- accept failure: (71) Protocol error
- storeSwapInFileOpened: ... Size mismatch
- Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''
- Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
- I get a lot of "URI has whitespace" error messages in my cache log, what should I do?
- commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
- What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?
- What does ''Connection refused'' mean?
- squid: ERROR: no running copy
- FATAL: getgrnam failed to find groupid for effective group 'nogroup'
- Squid uses 100% CPU
- Webmin's ''cachemgr.cgi'' crashes the operating system
- Segment Violation at startup or upon first request
- urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
- Requests for international domain names do not work
- Why do I sometimes get "Zero Sized Reply"?
- Why do I get "The request or reply is too large" errors?
- Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit
- Squid problems with Windows Update v5
- What are cachable objects?
- What is the ICP protocol?
- What is a cache hierarchy? What are parents and siblings?
- What is the Squid cache resolution algorithm?
- What features are Squid developers currently working on?
- Tell me more about Internet traffic workloads
- What are the tradeoffs of caching with the NLANR cache system?
- Where can I find out more about firewalls?
- What is the "Storage LRU Expiration Age?"
- What is "Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes"?
- Does squid periodically re-read its configuration file?
- How does ''unlinkd'' work?
- What is an icon URL?
- Can I make my regular FTP clients use a Squid cache?
- Why is the select loop average time so high?
- How does Squid deal with Cookies?
- How does Squid decide when to refresh a cached object?
- What exactly is a ''deferred read''?
- Why is my cache's inbound traffic equal to the outbound traffic?
- How come some objects do not get cached?
- What does ''keep-alive ratio'' mean?
- How does Squid's cache replacement algorithm work?
- What are private and public keys?
- What is FORW_VIA_DB for?
- Does Squid send packets to port 7 (echo)? If so, why?
- What does "WARNING: Reply from unknown nameserver [a.b.c.d]" mean?
- How does Squid distribute cache files among the available directories?
- Why do I see negative byte hit ratio?
- What does "Disabling use of private keys" mean?
- What is a half-closed filedescriptor?
- What does --enable-heap-replacement do?
- Why is actual filesystem space used greater than what Squid thinks?
- How do ''positive_dns_ttl'' and ''negative_dns_ttl'' work?
- What does ''swapin MD5 mismatch'' mean?
- What does ''failed to unpack swapfile meta data'' mean?
- Why doesn't Squid make ''ident'' lookups in interception mode?
- What are FTP passive connections?
- What is Multicast?
- How do I know if my network has multicast?
- Should I be using Multicast ICP?
- How do I configure Squid to send Multicast ICP queries?
- How do I know what Multicast TTL to use?
- How do I configure Squid to receive and respond to Multicast ICP?
- General advice
- FreeBSD
- Solaris
- FreeBSD
- OSF1/3.2
- BSD/OS
- Linux
- IRIX
- SCO-UNIX
- AIX
- What is a Cache Digest?
- How and why are they used?
- What is the theory behind Cache Digests?
- How is the size of the Cache Digest in Squid determined?
- What hash functions (and how many of them) does Squid use?
- How are objects added to the Cache Digest in Squid?
- Does Squid support deletions in Cache Digests? What are diffs/deltas?
- When and how often is the local digest built?
- How are Cache Digests transferred between peers?
- How and where are Cache Digests stored?
- How are the Cache Digest statistics in the Cache Manager to be interpreted?
- What are False Hits and how should they be handled?
- How can Cache Digest related activity be traced/debugged?
- What about ICP?
- Is there a Cache Digest Specification?
- Would it be possible to stagger the timings when cache_digests are retrieved from peers?
- Concepts of Interception Caching
- Requirements and methods for Interception Caching
- Steps involved in configuring Interception Caching
- Issues with HotMail
- What are the new features in squid 2.X?
- How do I configure 'ssl_proxy' now?
- Adding a new cache disk
- How do I configure proxy authentication?
- Why does proxy-auth reject all users after upgrading from Squid-2.1 or earlier?
- My squid.conf from version 1.1 doesn't work!
- What is the Reverse Proxy (httpd-accelerator) mode?
- How do I set it up?
- Running the web server on the same server
- Load balancing of backend servers
- When using an httpd-accelerator, the port number or host name for redirects or CGI-generated content is wrong
- Access to password protected content fails via the reverse proxy
- Clients
- Load Balancers
- HA Clusters
- Monitoring
- Logfile Analysis
- Configuration Tools
- Squid add-ons
- Ident Servers
- Cacheability Validators
- Neighbor
- Regular Expression
- Open-access proxies
- Mail relaying
- Hijackable proxies
- X-Forwarded-For fiddling
- Safe_Ports and SSL_Ports ACL
- Way Too Many Cache Misses
- Pruning the Cache Down
- Changing the Cache Levels
What is Squid?
Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Squid handles all requests in a single, non-blocking, I/O-driven process over IPv4 or IPv6.
Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests.
Squid supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, Squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings.
Squid consists of a main server program squid, some optional programs for rewriting requests and performing authentication, and some management and client tools.
Squid is originally derived from the ARPA-funded Harvest project. Since then it has gone through many changes and has many new features.
What is Internet object caching?
Internet object caching is a way to store requested Internet objects (i.e., data available via the HTTP, FTP, and gopher protocols) on a system closer to the requesting site than to the source. Web browsers can then use the local Squid cache as a proxy HTTP server, reducing access time as well as bandwidth consumption.
Why is it called Squid?
Harris' Lament says, "All the good ones are taken."
We needed to distinguish this new version from the Harvest cache software. Squid was the code name for initial development, and it stuck.
What is the latest version of Squid?
This is best answered by the the Squid Versions page where you can also download the sources for releases and versions.
Who is responsible for Squid?
Squid is the result of efforts by numerous individuals from the Internet community. The core team and main contributors list is at WhoWeAre; a list of our excellent contributors can be seen in the CONTRIBUTORS file.
Where can I get Squid?
You can download Squid via FTP from one of the many worldwide mirror sites or the primary FTP site.
Many sushi bars also have Squid.
What Operating Systems does Squid support?
The software is designed to operate on any modern system, and is known to work on at least the following platforms:
BSD:
- BSDI
- DragonflyBSD
- FreeBSD
- Mac OS/X
- NetBSD
- NeXTStep
- OpenBSD
- SunOS/Solaris
Linux:
- CentOS
- Debian
- Fedora Core
- Gentoo
RedHat Enterprise Linux
- Ubuntu
Unix:
- OSF/Digital Unix/Tru64
- IRIX
- SCO Unix
- AIX
- HP-UX
Windows: (Cygwin and MinGW)
- Windows 2000 Server
- Windows NT
- Windows XP Server
- Windows 2003 Server
- Windows Vista Server
Other:
- OS/2
If you encounter any platform-specific problems, please let us know by registering an entry in our bug database. If you're curious about what is the best OS to run Squid, see BestOsForSquid.
If you would like your favorite OS to join the list above, please try to build the latest Squid on it and send any feedback to the squid-dev mailing list.
Does Squid run on Windows ?
Starting from 2.6.STABLE4, Squid will compile and run on Windows NT and later incarnations with the Cygwin / MinGW packages.
GuidoSerassio maintains the official native Windows port of Squid (built using the Microsoft toolchain) and is actively working on having the needed changes integrated into the standard Squid distribution. His effort is partially based on earlier Windows NT port by Romeo Anghelache.
The original development code name of the 2.5 project port was SquidNT, but after the 2.6.STABLE4 release, this project was complete. So when speaking about Squid on Windows, people should always refer to Squid, instead to the old SquidNT name.
What Squid mailing lists are available?
<squid-users AT squid-cache DOT org> hosts general discussions about the Squid cache software. subscribe via <squid-users-subscribe AT squid-cache DOT org>. Previous messages are available for browsing at the Squid Users Archive, and also at theaimsgroup.com and MarkMail.
squid-users-digest: digested (daily) version of above. Subscribe via <squid-users-digest-subscribe AT squid-cache DOT org>.
<squid-announce AT squid-cache DOT org> is a receive-only list for announcements of new versions and any major security issues. Subscribe via <squid-announce-subscribe AT squid-cache DOT org>.
<squid-bugs AT squid-cache DOT org> is meant for sending us bug reports. Bug reports received here are given priority over those mentioned on squid-users.
<squid AT squid-cache DOT org>: A closed list for sending us feed-back and ideas.
<squid-faq AT squid-cache DOT org>: A closed list for sending us feed-back, updates, and additions to the Squid FAQ. Bugzilla Website section can also be used.
<squid-dev AT squid-cache DOT org>: An open list for developer discussions about Squid code.
I can't figure out how to unsubscribe from your mailing list.
All of our mailing lists have "-subscribe" and "-unsubscribe" addresses that you must use for subscribe and unsubscribe requests. To unsubscribe from the squid-users list, you send a message to <squid-users-unsubscribe AT squid-cache DOT org>.
What other Squid-related documentation is available?
The Squid home page for information on the Squid software
Squid: The Definitive Guide written by Duane Wessels and published by O'Reilly and Associates January 2004.
The IRCache Mesh gives information on our operational mesh of caches.
The Squid FAQ (uh, you're reading it).
Authoritative Config Guides are available in the menu on squid-cache.org
Squid documentation in German, Turkish, Italian, Brazilian Portugese, and another in Brazilian Portugese.
Squid Programmers Guide. Yeah, its extremely incomplete. I assure you this is the most recent version. Please send any description updates to the <squid-dev AT squid-cache DOT org> mailing list
RFC 2186 ICPv2 -- Protocol
RFC 2187 ICPv2 -- Application
Does Squid support SSL/HTTPS/TLS?
As of version 2.5, Squid can terminate SSL connections. This is perhaps only useful in a surrogate (http accelerator) configuration. You must run configure with --enable-ssl. See https_port in squid.conf for more information.
Squid also supports these encrypted protocols by "tunneling" traffic between clients and servers. In this case, Squid can relay the encrypted bits between a client and a server.
Normally, when your browser comes across an https URL, it does one of two things:
- - The browser opens an SSL connection directly to the origin server.
- The browser tunnels the request through Squid with the CONNECT request method.
The CONNECT method is a way to tunnel any kind of connection through an HTTP proxy. The proxy doesn't understand or interpret the contents. It just passes bytes back and forth between the client and server. For the gory details on tunnelling and the CONNECT method, please see RFC 2817 and Tunneling TCP based protocols through Web proxy servers (expired).
What's the legal status of Squid?
Squid as a whole is copyrighted by the University of California San Diego. Squid uses some code developed by others. Individual features may be copyrighted by their contributors.
Squid is Free Software, licensed under the terms of the GNU General Public License.
How to add a new Squid feature, enhance, of fix something?
Adding new features, enhancing, or fixing Squid behavior usually requires source code modifications. Several options are generally available to those who need Squid development:
Wait for somebody to do it: Waiting is free but may take forever. If you want to use this option, make sure you file a bugzilla report describing the bug or enhancement so that others know what you need. Posting feature requests to a mailing list is often useful because it can generate interest and discussion, but without a bugzilla record, your request may be overlooked or forgotten.
Do it yourself: Enhancing Squid and working with other developers can be a very rewarding experience. However, this option requires understanding and modifying the source code, which is getting better, but it is still very complex, often ugly, and lacking documentation. These obstacles affect the required development effort. In most cases, you would want your changes to be incorporated into the official Squid sources for long-term support. To get the code committed, one needs to cooperate with other developers. It is a good idea to describe the changes you are going to work on before diving into development. Development-related discussions happen on squid-dev mailing list. Documenting upcoming changes as a bugzilla entry or a wiki feature page helps attract contributors or sponsors.
Pay somebody to do it: Many companies offer commercial Squid development services. When selecting the developer, discuss how they plan to integrate the changes with the official Squid sources and consider the company past contributions to the Squid project.
The best development option depends on many factors. Here is some project dynamics information that may help you pick the right one: Most Squid features and maintenance is done by individual contributors, working alone or in small development/consulting shops. In the early years (1990-2000), these developers were able to work on Squid using their free time, research grants, or similarly broad-scope financial support. Requested features were often added on-demand because many folks could work on them. Most recent (2006-2008) contributions, especially large features, are the result of paid development contracts, reflecting both the maturity of software and the lack of "free" time among active Squid developers.
Can I pay someone for Squid support?
Yes. Please see Squid Support Services. You can also donate money or equipment to members of the squid core team.
Squid FAQ contributors
The following people have made contributions to this document:
Dodjie Nava, Jonathan Larmour, Cord Beermann, Tony Sterrett, Gerard Hynes, Katayama, Takeo, Duane Wessels, K Claffy, Paul Southworth, Oskar Pearson, Ong Beng Hui, Torsten Sturm, James R Grinter, Rodney van den Oever, Kolics Bertold, Carson Gaspar, Michael O'Reilly, Hume Smith, Richard Ayres, John Saunders, Miquel van Smoorenburg, David J N Begley, Kevin Sartorelli, Andreas Doering, Mark Visser, tom minchin, Jens-S. Vöckler, Andre Albsmeier, Doug Nazar, HenrikNordstrom, Mark Reynolds, Arjan de Vet, Peter Wemm, John Line, Jason Armistead, Chris Tilbury, Jeff Madison, Mike Batchelor, Bill Bogstad, Radu Greab, F.J. Bosscha, Brian Feeny, Martin Lyons, David Luyer, Chris Foote, Jens Elkner, Simon White, Jerry Murdock, Gerard Eviston, Rob Poe, FrancescoChemolli, ReubenFarrelly AlexRousskov AmosJeffries
About This Document
This FAQ was maintained for a long time as an XML Docbook file. It was converted to a Wiki in March 2006. The wiki is now the authoritative version.
Want to contribute?
We always welcome help keeping the Squid FAQ up-to-date. If you would like to help out, please register with this Wiki and type away. Please also send a note to the wiki operator <wiki AT kinkie DOT it> to inform him of your changes.
Compiling Squid
Contents
Which file do I download to get Squid?
That depends on the version of Squid you have chosen to try. The list of current versions released can be found at http://www.squid-cache.org/Versions/. Each version has a page of release bundles. Usually you want the release bundle that is listed as the most current.
You must download a source archive file of the form squid-x.y.tar.gz or squid-x.y.tar.bz2 (eg, squid-2.6.STABLE14.tar.bz2).
We recommend you first try one of our mirror sites for the actually download. They are usually faster.
Alternatively, the main Squid WWW site www.squid-cache.org, and FTP site ftp.squid-cache.org have these files.
Context diffs are usually available for upgrading to new versions. These can be applied with the patch program (available from the GNU FTP site or your distribution).
Do you have pre-compiled binaries available?
How do I compile Squid?
You must run the configure script yourself before running make. We suggest that you first invoke ./configure --help and make a note of the configure options you need in order to support the features you intend to use. Do not compile in features you do not think you will need.
% tar xzf squid-2.6.RELEASExy.tar.gz % cd squid-2.6.RELEASExy % ./configure --with-MYOPTION --with-MYOPTION2 etc % make
- .. and finally install...
% make install
Squid will by default, install into /usr/local/squid. If you wish to install somewhere else, see the --prefix option for configure.
What kind of compiler do I need?
To compile Squid, you will need an ANSI C compiler. Almost all modern Unix systems come with pre-installed compilers which work just fine. The old SunOS compilers do not have support for ANSI C, and the Sun compiler for Solaris is a product which must be purchased separately.
If you are uncertain about your system's C compiler, The GNU C compiler is widely available and supplied in almost all operating systems. It is also well tested with Squid. If your OS does not come with GCC you may download it from the GNU FTP site. In addition to gcc, you may also want or need to install the binutils package.
What else do I need to compile Squid?
You will need Perl installed on your system.
How do I apply a patch or a diff?
You need the patch program. You should probably duplicate the entire directory structure before applying the patch. For example, if you are upgrading from squid-2.6.STABLE13 to 2.6.STABLE14, you would run these commands:
cp -rl squid-2.6.STABLE13 squid-2.6.STABLE14 cd squid-2.6.STABLE14 zcat /tmp/squid-2.6.STABLE13-STABLE14.diff.gz | patch -p1
After the patch has been applied, you must rebuild Squid from the very beginning, i.e.:
make distclean ./configure [--option --option...] make make install
If your patch program seems to complain or refuses to work, you should get a more recent version, from the GNU FTP site, for example.
Ideally you should use the patch command which comes with your OS.
configure options
The configure script can take numerous options. The most useful is --prefix to install it in a different directory. The default installation directory is /usr/local/squid/. To change the default, you could do:
% cd squid-x.y.z % ./configure --prefix=/some/other/directory/squid
Type
% ./configure --help
to see all available options. You will need to specify some of these options to enable or disable certain features. Some options which are used often include:
--prefix=PREFIX install architecture-independent files in PREFIX
[/usr/local/squid]
--enable-dlmalloc[=LIB] Compile & use the malloc package by Doug Lea
--enable-gnuregex Compile GNUregex
--enable-splaytree Use SPLAY trees to store ACL lists
--enable-xmalloc-debug Do some simple malloc debugging
--enable-xmalloc-debug-trace
Detailed trace of memory allocations
--enable-xmalloc-statistics
Show malloc statistics in status page
--enable-async-io Do ASYNC disk I/O using threads
--enable-icmp Enable ICMP pinging
--enable-delay-pools Enable delay pools to limit bandwith usage
--enable-mem-gen-trace Do trace of memory stuff
--enable-useragent-log Enable logging of User-Agent header
--enable-kill-parent-hack
Kill parent on shutdown
--enable-cachemgr-hostname[=hostname]
Make cachemgr.cgi default to this host
--enable-arp-acl Enable use of ARP ACL lists (ether address)
--enable-htpc Enable HTCP protocol
--enable-forw-via-db Enable Forw/Via database
--enable-cache-digests Use Cache Digests
see http://www.squid-cache.org/Doc/FAQ/FAQ-16.htmlThese are also commonly needed by Squid-2, but are now defaults in Squid-3.
--enable-carp Enable CARP support
--enable-snmp Enable SNMP monitoring
--enable-err-language=lang
Select language for Error pages (see errors dir)
Building Squid on ...
BSD/OS or BSDI
Known Problem:
cache_cf.c: In function `parseConfigFile': cache_cf.c:1353: yacc stack overflow before `token' ...
You may need to upgrade your gcc installation to a more recent version. Check your gcc version with
gcc -v
If it is earlier than 2.7.2, you might consider upgrading. Gcc 2.7.2 is very old and not widely supported.
Cygwin (Windows)
In order to compile Squid, you need to have Cygwin fully installed.
WCCP is not available on Windows so the following configure options are needed to disable them:
--disable-wccp --disable-wccpv2
|
Squid will by default, install into /usr/local/squid. If you wish to install somewhere else, see the --prefix option for configure. |
Now, add a new Cygwin user - see the Cygwin user guide - and map it to SYSTEM, or create a new NT user, and a matching Cygwin user and they become the squid runas users.
Read the squid FAQ on permissions if you are using CYGWIN=ntsec.
After run squid -z. If that succeeds, try squid -N -D -d1, squid should start. Check that there are no errors. If everything looks good, try browsing through squid.
Now, configure cygrunsrv to run Squid as a service as the chosen username. You may need to check permissions here.
Debian, Ubuntu
From 2.6 STABLE 14 Squid should compile easily on this platform.
There is just one known problem. The Linux system layout differs markedly from the Squid defaults. The following ./configure options are needed to install Squid into the Linux structure properly:
--prefix=/usr
--localstatedir=/var
--libexecdir=${prefix}/lib/squid
--srcdir=.
--datadir=${prefix}/share/squid
--sysconfdir=/etc/squidFrom Squid 3.0 the default user can also be set. The Debian package default is:
--with-default-user=proxy
From Squid 3.1 the log directory and PID file location are also configurable. The Debian package defaults are:
--with-logdir=/var/log --with-pidfile=/var/run/squid.pid
Older Squid needs the following patch to be applied since the /var/logs/ directory for logs has no configure option. This exact patch requires ./bootstrap.sh to be run again. If that is not possible the same line change can be manually made in src/Makefile.in as well.
--- src/Makefile.am 2007-09-17 14:22:33.000000000 +1200 +++ src/Makefile.am-new 2007-09-12 19:31:53.000000000 +1200 @@ -985,7 +985,7 @@ DEFAULT_CONFIG_FILE = $(sysconfdir)/squid.conf DEFAULT_MIME_TABLE = $(sysconfdir)/mime.conf DEFAULT_DNSSERVER = $(libexecdir)/`echo dnsserver | sed '$(transform);s/$$/$(EXEEXT)/'` -DEFAULT_LOG_PREFIX = $(localstatedir)/logs +DEFAULT_LOG_PREFIX = $(localstatedir)/log DEFAULT_CACHE_LOG = $(DEFAULT_LOG_PREFIX)/cache.log DEFAULT_ACCESS_LOG = $(DEFAULT_LOG_PREFIX)/access.log DEFAULT_STORE_LOG = $(DEFAULT_LOG_PREFIX)/store.log
FreeBSD, NetBSD, OpenBSD
Squid is developed on FreeBSD. The general build instructions above should be all you need.
RedHat Enterprise Linux
The following ./configure options install Squid into the RedHat structure properly:
--prefix=/usr --includedir=/usr/include --datadir=/usr/share --bindir=/usr/sbin --libexecdir=/usr/lib/squid --localstatedir=/var --sysconfdir=/etc/squid
|
SELinux on RHEL 5 does not give the proper context to the default SNMP port (3401) (as of selinux-policy-2.4.6-106.el5) . The command "semanage port -a -t http_cache_port_t -p udp 3401" takes care of this problem (via http://tanso.net/selinux/squid/). |
MinGW (Windows)
In order to compile squid using the MinGW environment, the packages MSYS, MinGW and msysDTK must be installed. Some additional libraries and tools must be downloaded separately:
libcrypt: MinGW packages repository
db-1.85: TinyCOBOL download area
uudecode: Native Win32 ports of some GNU utilities
3.0+ releases do not require uudecode.
Unpack the source archive as usual and run configure.
The following are the recommended minimal options for Windows:
--prefix=c:/squid --disable-wccp --disable-wccpv2 --enable-win32-service --enable-default-hostsfile=none
Then run make and install as usual.
Squid will install into c:\squid. If you wish to install somewhere else, change the --prefix option for configure.
After run squid -z. If that succeeds, try squid -N -D -d1, squid should start. Check that there are no errors. If everything looks good, try browsing through squid.
Now, to run Squid as a Windows system service, run squid -n, this will create a service named "Squid" with automatic startup. To start it run net start squid from command line prompt or use the Services Administrative Applet.
Always check the provided release notes for any version specific detail.
OS/2
by Doug Nazar (<nazard AT man-assoc DOT on DOT ca>).
In order in compile squid, you need to have a reasonable facsimile of a Unix system installed. This includes bash, make, sed, emx, various file utilities and a few more. I've setup a TVFS drive that matches a Unix file system but this probably isn't strictly necessary.
I made a few modifications to the pristine EMX 0.9d install.
added defines for strcasecmp() & strncasecmp() to string.h
- changed all occurrences of time_t to signed long instead of unsigned long
- hacked ld.exe
- to search for both xxxx.a and libxxxx.a
- to produce the correct filename when using the -Zexe option
You will need to run scripts/convert.configure.to.os2 (in the Squid source distribution) to modify the configure script so that it can search for the various programs.
Next, you need to set a few environment variables (see EMX docs for meaning):
export EMXOPT="-h256 -c" export LDFLAGS="-Zexe -Zbin -s"
Now you are ready to configure, make, and install Squid.
Now, don't forget to set EMXOPT before running squid each time. I recommend using the -Y and -N options.
Solaris
Many squid are running well on Solaris. There is just one known problem encountered when building.
The following error occurs on Solaris systems using gcc when the Solaris C compiler is not installed:
/usr/bin/rm -f libmiscutil.a /usr/bin/false r libmiscutil.a rfc1123.o rfc1738.o util.o ... make[1]: *** [libmiscutil.a] Error 255 make[1]: Leaving directory `/tmp/squid-1.1.11/lib' make: *** [all] Error 1
Note on the second line the /usr/bin/false. This is supposed to be a path to the ar program. If configure cannot find ar on your system, then it substitues false.
To fix this you either need to:
Add /usr/ccs/bin to your PATH. This is where the ar command should be. You need to install SUNWbtool if ar is not there. Otherwise,
Install the binutils package from the GNU FTP site. This package includes programs such as ar, as, and ld.
Other Platforms
Please let us know of other platforms you have built squid. Whether successful or not.
Please check the page of platforms on which Squid is known to compile. Your problem might be listed there together with a solution. If it isn't listed there, mail us what you are trying, your Squid version, and the problems you encounter.
I see a lot warnings while compiling Squid.
Warnings are usually not usually a big concern, and can be common with software designed to operate on multiple platforms. The Squid developers do wish to make Squid build without errors or warning. If you feel like fixing compile-time warnings, please do so and send us the patches.
undefined reference to __inet_ntoa
Probably you have bind 8.x installed.
UPDATE: That version of bind is now officially obsolete and known to be vulnerable to a critical infrastructure flaw. It should be upgraded to bind 9.x or replaced as soon as possible.
Contents
- How big of a system do I need to run Squid?
- How do I install Squid?
- How do I start Squid?
- How do I start Squid automatically when the system boots?
- How do I tell if Squid is running?
- squid command line options
- How do I see how Squid works?
- Can Squid benefit from SMP systems?
- Is it okay to use separate drives for Squid?
- Is it okay to use RAID on Squid?
How big of a system do I need to run Squid?
There are no hard-and-fast rules. The most important resource for Squid is physical memory, so put as much in your Squid box as you can. Your processor does not need to be ultra-fast. We recommend buying whatever is economical at the time.
Your disk system will be the major bottleneck, so fast disks are important for high-volume caches. SCSI disks generally perform better than ATA, if you can afford them. Serial ATA (SATA) performs somewhere between the two. Your system disk, and logfile disk can probably be IDE without losing any cache performance.
The ratio of memory-to-disk can be important. We recommend that you have at least 32 MB of RAM for each GB of disk space that you plan to use for caching.
How do I install Squid?
on Debian / Ubuntu
Squid-2:
apt-get install squid
Squid-3:
apt-get install squid3
on FreeBSD
yum install squid
from Source Code
After ../CompilingSquid, you can install it with this simple command:
% make install
If you have enabled ICMP or the pinger then you will also want to type
% su # make install-pinger
After installing, you will want to read ../ConfiguringSquid to edit and customize Squid to run the way you want it to.
How do I start Squid?
First you need to check your Squid configuration. The Squid configuration can be found in /usr/local/squid/etc/squid.conf and includes documentation on all directives.
In the Squid distribution there is a small QUICKSTART guide indicating which directives you need to look closer at and why. At a absolute minimum you need to change the http_access configuration to allow access from your clients.
To verify your configuration file you can use the -k parse option
% /usr/local/squid/sbin/squid -k parse
If this outputs any errors then these are syntax errors or other fatal misconfigurations and needs to be corrected before you continue. If it is silent and immediately gives back the command prompt then your squid.conf is syntactically correct and could be understood by Squid.
After you've finished editing the configuration file, you can start Squid for the first time. The procedure depends a little bit on which version you are using.
First, you must create the swap directories. Do this by running Squid with the -z option:
% /usr/local/squid/sbin/squid -z
|
If you run Squid as root then you may need to first create /usr/local/squid/var/logs and your cache_dir directories and assign ownership of these to the cache_effective_user configured in your squid.conf |
Once the creation of the cache directories completes, you can start Squid and try it out. Probably the best thing to do is run it from your terminal and watch the debugging output. Use this command:
% /usr/local/squid/sbin/squid -NCd1
If everything is working okay, you will see the line:
Ready to serve requests.
If you want to run squid in the background, as a daemon process, just leave off all options:
% /usr/local/squid/sbin/squid
|
Depending on which http_port you select you may need to start squid as root (http_port <1024) |
How do I start Squid automatically when the system boots?
by hand
Squid has a restart feature built in. This greatly simplifies starting Squid and means that you don't need to use RunCache or inittab. At the minimum, you only need to enter the pathname to the Squid executable. For example:
/usr/local/squid/sbin/squid
Squid will automatically background itself and then spawn a child process. In your syslog messages file, you should see something like this:
Sep 23 23:55:58 kitty squid[14616]: Squid Parent: child process 14617 started
That means that process ID 14563 is the parent process which monitors the child process (pid 14617). The child process is the one that does all of the work. The parent process just waits for the child process to exit. If the child process exits unexpectedly, the parent will automatically start another child process. In that case, syslog shows:
Sep 23 23:56:02 kitty squid[14616]: Squid Parent: child process 14617 exited with status 1 Sep 23 23:56:05 kitty squid[14616]: Squid Parent: child process 14619 started
If there is some problem, and Squid can not start, the parent process will give up after a while. Your syslog will show:
Sep 23 23:56:12 kitty squid[14616]: Exiting due to repeated, frequent failures
When this happens you should check your syslog messages and cache.log file for error messages.
When you look at a process (ps command) listing, you'll see two squid processes:
24353 ?? Ss 0:00.00 /usr/local/squid/bin/squid 24354 ?? R 0:03.39 (squid) (squid)
The first is the parent process, and the child process is the one called "(squid)". Note that if you accidentally kill the parent process, the child process will not notice.
If you want to run Squid from your termainal and prevent it from backgrounding and spawning a child process, use the -N command line option.
/usr/local/squid/bin/squid -N
from inittab
On systems which have an /etc/inittab file (Digital Unix, Solaris, IRIX, HP-UX, Linux), you can add a line like this:
sq:3:respawn:/usr/local/squid/sbin/squid.sh < /dev/null >> /tmp/squid.log 2>&1
We recommend using a squid.sh shell script, but you could instead call Squid directly with the -N option and other options you may require. A sample squid.sh script is shown below:
C=/usr/local/squid
PATH=/usr/bin:$C/bin
TZ=PST8PDT
export PATH TZ
# User to notify on restarts
notify="root"
# Squid command line options
opts=""
cd $C
umask 022
sleep 10
while [ -f /var/run/nosquid ]; do
sleep 1
done
/usr/bin/tail -20 $C/logs/cache.log \
| Mail -s "Squid restart on `hostname` at `date`" $notify
exec bin/squid -N $opts
from rc.local
On BSD-ish systems, you will need to start Squid from the "rc" files, usually /etc/rc.local. For example:
if [ -f /usr/local/squid/sbin/squid ]; then
echo -n ' Squid'
/usr/local/squid/sbin/squid
fi
from init.d
Squid ships with a init.d type startup script in contrib/squid.rc which works on most init.d type systems. Or you can write your own using any normal init.d script found in your system as template and add the start/stop fragments shown below.
Start:
/usr/local/squid/sbin/squid
Stop:
/usr/local/squid/sbin/squid -k shutdown
n=120
while /usr/local/squid/sbin/squid -k check && [ $n -gt 120 ]; do
sleep 1
echo -n .
n=`expr $n - 1`
done
with daemontools
Create squid service directory, and the log directory (if it does not exist yet).
mkdir -p /usr/local/squid/supervise/log /var/log/squid chown squid /var/log/squid
Then, change to the service directory,
cd /usr/local/squid/supervise
and create 2 executable scripts: run
rm -f /var/run/squid/squid.pid exec /usr/local/squid/sbin/squid -N 2>&1
and log/run.
exec /usr/local/bin/multilog t /var/log/squid
Finally, start the squid service by linking it into svscan monitored area.
cd /service ln -s /usr/local/squid/supervise squid
Squid should start within 5 seconds.
How do I tell if Squid is running?
You can use the squidclient program:
% squidclient http://www.netscape.com/ > test
There are other command-line HTTP client programs available as well. Two that you may find useful are wget and echoping.
Another way is to use Squid itself to see if it can signal a running Squid process:
% squid -k check
And then check the shell's exit status variable.
Also, check the log files, most importantly the access.log and cache.log files.
squid command line options
These are the command line options for Squid-2:
-a Specify an alternate port number for incoming HTTP requests. Useful for testing a configuration file on a non-standard port.
-d Debugging level for "stderr" messages. If you use this option, then debugging messages up to the specified level will also be written to stderr.
-f Specify an alternate squid.conf file instead of the pathname compiled into the executable.
-h Prints the usage and help message.
-k reconfigure Sends a HUP signal, which causes Squid to re-read its configuration files.
-k rotate Sends an USR1 signal, which causes Squid to rotate its log files. Note, if logfile_rotate is set to zero, Squid still closes and re-opens all log files.
-k shutdown Sends a TERM signal, which causes Squid to wait briefly for current connections to finish and then exit. The amount of time to wait is specified with shutdown_lifetime.
-k interrupt Sends an INT signal, which causes Squid to shutdown immediately, without waiting for current connections.
-k kill Sends a KILL signal, which causes the Squid process to exit immediately, without closing any connections or log files. Use this only as a last resort.
-k debug Sends an USR2 signal, which causes Squid to generate full debugging messages until the next USR2 signal is recieved. Obviously very useful for debugging problems.
-k check Sends a "ZERO" signal to the Squid process. This simply checks whether or not the process is actually running.
-s Send debugging (level 0 only) message to syslog.
-u Specify an alternate port number for ICP messages. Useful for testing a configuration file on a non-standard port.
-v Prints the Squid version.
-z Creates disk swap directories. You must use this option when installing Squid for the first time, or when you add or modify the cache_dir configuration.
-D Do not make initial DNS tests. Normally, Squid looks up some well-known DNS hostnames to ensure that your DNS name resolution service is working properly.
obsolete in 3.1 and later.
-F If the swap.state logs are clean, then the cache is rebuilt in the "foreground" before any requests are served. This will decrease the time required to rebuild the cache, but HTTP requests will not be satisfied during this time.
-N Do not automatically become a background daemon process.
-R Do not set the SO_REUSEADDR option on sockets.
-X Enable full debugging while parsing the config file.
-Y Return ICP_OP_MISS_NOFETCH instead of ICP_OP_MISS while the swap.state file is being read. If your cache has mostly child caches which use ICP, this will allow your cache to rebuild faster.
How do I see how Squid works?
Check the cache.log file in your logs directory. It logs interesting things as a part of its normal operation and can be boosted to show all the boring details.
Install and use the ../CacheManager.
Can Squid benefit from SMP systems?
Squid is a single process application and can not make use of SMP. If you want to make Squid benefit from a SMP system you will need to run multiple instances of Squid and find a way to distribute your users on the different Squid instances just as if you had multiple Squid boxes.
Having two CPUs is indeed nice for running other CPU intensive tasks on the same server as the proxy, such as if you have a lot of logs and need to run various statistics collections during peak hours.
The authentication and group helpers barely use any CPU and does not benefit much from dual-CPU configuration.
Is it okay to use separate drives for Squid?
Yes. Running Squid on separate drives to that which your OS is running is often a very good idea.
Generally seek time is what you want to optimize for Squid, or more precisely the total amount of seeks/s your system can sustain. This is why it is better to have your cache_dir spread over multiple smaller disks than one huge drive (especially with SCSI).
If your system is very I/O bound, you will want to have both your OS and log directories running on separate drives.
Is it okay to use RAID on Squid?
see Section on RAID
Include: Nothing found for "^Back to the"!
Configuring Squid
Contents
- Configuring Squid
- Before you start configuring
- How do I configure Squid without re-compiling it?
- What does the squid.conf file do?
- Where can I find examples and configuration for a Feature?
- Do you have a squid.conf example?
- How do I join a cache hierarchy?
- How do I join NLANR's cache hierarchy?
- Why should I want to join NLANR's cache hierarchy?
- How do I register my cache with NLANR's registration service?
- How do I find other caches close to me and arrange parent/child/sibling relationships with them?
- My cache registration is not appearing in the Tracker database.
- How do I configure Squid to work behind a firewall?
- How do I configure Squid forward all requests to another proxy?
- What ''cache_dir'' size should I use?
- I'm adding a new cache_dir. Will I lose my cache?
- Squid and http-gw from the TIS toolkit.
- What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?
- Can Squid anonymize HTTP requests?
- Can I make Squid go direct for some sites?
- Can I make Squid proxy only, without caching anything?
- Can I prevent users from downloading large files?
Before you start configuring
- The best all around advice I can give on Squid is to start simple! Once everything works the way you expect, then start tweaking your way into complexity with a means to track the (in)effectiveness of each change you make (and a known good configuration that you can always go back to when you inevitably fubar the thing!).
by Gregori Parker Seconded by all the Squid developers and Squid helpers.
How do I configure Squid without re-compiling it?
The squid.conf file. By default, this file is located at /usr/local/squid/etc/squid.conf.
Also, a QUICKSTART guide has been included with the source distribution. Please see the directory where you unpacked the source archive.
What does the squid.conf file do?
The squid.conf file defines the configuration for squid. The configuration includes (but not limited to) HTTP port number, the ICP request port number, incoming and outgoing requests, information about firewall access, and various timeout information.
Where can I find examples and configuration for a Feature?
There is still a fair bit of config knowledge buried in the old SquidFaq and Guide pages of this wiki. We are endeavoring to pull them into a layout easier to use.
What we have so far is:
- The general background configuration info here on this page
Specific feature descriptions pros/cons and some config are linked from the main SquidFaq in a features section.
Any complex tuning stuff mixing features and specific demos in ConfigExamples and usually linked from the related features or FAQ pages as well.
Do you have a squid.conf example?
Yes.
For Squid 2.x and 3.0 after you make install, a sample squid.conf.default file will exist in the etc directory under the Squid installation directory.
From 2.6 the Squid developers also provide a set of Configuration Guides online. They list all the options each version of Squid can accept in its squid.conf file
Squid 2.6 Configuration Guide
Squid 2.7 Configuration Guide
Squid 3.0 Configuration Guide
Squid 3.1 Configuration Guide
including guides for the current development test releases
Squid 2-HEAD Configuration Guide
Squid 3-HEAD Configuration Guide
From 3.1 a lot of configuration cleanups have been done to make things easier.
This minimal configuration does not work with versions earlier than 3.1 which are missing special cleanup done to the code.
http_port 3128 hierarchy_stoplist cgi-bin ? refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 refresh_pattern . 0 20% 4320 acl manager proto cache_object acl localhost src 127.0.0.1/32 acl to_localhost dst 127.0.0.0/8 acl localnet src 10.0.0.0/8 # RFC 1918 possible internal network acl localnet src 172.16.0.0/12 # RFC 1918 possible internal network acl localnet src 192.168.0.0/16 # RFC 1918 possible internal network acl SSL_ports port 443 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port 443 # https acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT http_access allow manager localhost http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports http_access allow localnet http_access deny all
How do I join a cache hierarchy?
To place your cache in a hierarchy, use the cache_peer directive in squid.conf to specify the parent and sibling nodes.
For example, the following squid.conf file on childcache.example.com configures its cache to retrieve data from one parent cache and two sibling caches:
# squid.conf - On the host: childcache.example.com # # Format is: hostname type http_port udp_port # cache_peer parentcache.example.com parent 3128 3130 cache_peer childcache2.example.com sibling 3128 3130 cache_peer childcache3.example.com sibling 3128 3130
The cache_peer_domain directive allows you to specify that certain caches siblings or parents for certain domains:
# squid.conf - On the host: sv.cache.nlanr.net # # Format is: hostname type http_port udp_port # cache_peer electraglide.geog.unsw.edu.au parent 3128 3130 cache_peer cache1.nzgate.net.nz parent 3128 3130 cache_peer pb.cache.nlanr.net parent 3128 3130 cache_peer it.cache.nlanr.net parent 3128 3130 cache_peer sd.cache.nlanr.net parent 3128 3130 cache_peer uc.cache.nlanr.net sibling 3128 3130 cache_peer bo.cache.nlanr.net sibling 3128 3130 cache_peer_domain electraglide.geog.unsw.edu.au .au cache_peer_domain cache1.nzgate.net.nz .au .aq .fj .nz cache_peer_domain pb.cache.nlanr.net .uk .de .fr .no .se .it cache_peer_domain it.cache.nlanr.net .uk .de .fr .no .se .it cache_peer_domain sd.cache.nlanr.net .mx .za .mu .zm
The configuration above indicates that the cache will use pb.cache.nlanr.net and it.cache.nlanr.net for domains uk, de, fr, no, se and it, sd.cache.nlanr.net for domains mx, za, mu and zm, and cache1.nzgate.net.nz for domains au, aq, fj, and nz.
How do I join NLANR's cache hierarchy?
We have a simple set of guidelines for joining the NLANR cache hierarchy.
Why should I want to join NLANR's cache hierarchy?
The NLANR hierarchy can provide you with an initial source for parent or sibling caches. Joining the NLANR global cache system will frequently improve the performance of your caching service.
How do I register my cache with NLANR's registration service?
Just enable these options in your squid.conf and you'll be registered:
cache_announce 24 announce_to sd.cache.nlanr.net:3131
|
Announcing your cache is not the same thing as joining the NLANR cache hierarchy. You can join the NLANR cache hierarchy without registering, and you can register without joining the NLANR cache hierarchy |
How do I find other caches close to me and arrange parent/child/sibling relationships with them?
Visit the NLANR cache registration database to discover other caches near you. Keep in mind that just because a cache is registered in the database does not mean they are willing to be your parent/sibling/child. But it can't hurt to ask...
My cache registration is not appearing in the Tracker database.
- Your site will not be listed if your cache IP address does not have a DNS PTR record. If we can't map the IP address back to a domain name, it will be listed as "Unknown."
- The registration messages are sent with UDP. We may not be receiving your announcement message due to firewalls which block UDP, or dropped packets due to congestion.
How do I configure Squid to work behind a firewall?
If you are behind a firewall then you can't make direct connections to the outside world, so you must use a parent cache. Normally Squid tries to be smart and only uses cache peers when it makes sense from a perspective of global hit ratio, and thus you need to tell Squid when it can not go direct and must use a parent proxy even if it knows the request will be a cache miss.
You can use the never_direct access list in squid.conf to specify which requests must be forwarded to your parent cache outside the firewall, and the always_direct access list to specify which requests must not be forwarded. For example, if Squid must connect directly to all servers that end with mydomain.com, but must use the parent for all others, you would write:
acl INSIDE dstdomain .mydomain.com always_direct allow INSIDE never_direct allow all
You could also specify internal servers by IP address
acl INSIDE_IP dst 1.2.3.0/24 always_direct allow INSIDE_IP never_direct allow all
Note, however that when you use IP addresses, Squid must perform a DNS lookup to convert URL hostnames to an address. Your internal DNS servers may not be able to lookup external domains.
If you use never_direct and you have multiple parent caches, then you probably will want to mark one of them as a default choice in case Squid can't decide which one to use. That is done with the default keyword on a cache_peer line. For example:
cache_peer xyz.mydomain.com parent 3128 0 no-query default
How do I configure Squid forward all requests to another proxy?
First, you need to give Squid a parent cache. Second, you need to tell Squid it can not connect directly to origin servers. This is done with three configuration file lines:
cache_peer parentcache.foo.com parent 3128 0 no-query default never_direct allow all
Note, with this configuration, if the parent cache fails or becomes unreachable, then every request will result in an error message.
In case you want to be able to use direct connections when all the parents go down you should use a different approach:
cache_peer parentcache.foo.com parent 3128 0 no-query prefer_direct off
The default behavior of Squid in the absence of positive ICP, HTCP, etc replies is to connect to the origin server instead of using parents. The prefer_direct off directive tells Squid to try parents first.
What ''cache_dir'' size should I use?
This chapter assumes that you are dedicating an entire disk partition to a squid cache_dir, as is often the case.
Generally speaking, setting the cache_dir to be the same size as the disk partition is not a wise choice, for two reasons. The first is that squid is not very tolerant to running out of disk space. On top of the cache_dir size, squid will use some extra space for swap.state and then some more temporary storage as work-areas, for instance when rebuilding swap.state. So in any case make sure to leave some extra room for this, or your cache will enter an endless crash-restart cycle.
The second reason is fragmentation (note, this won't apply to the COSS object storage engine - when it will be ready): filesystems can only do so much to avoid fragmentation, and in order to be effective they need to have the space to try and optimize file placement. If the disk is full, optimization is very hard, and when the disk is 100% full optimizing is plain impossible. Get your disk fragmented, and it will most likely be your worst bottleneck, by far offsetting the modest gain you got by having more storage.
Let's see an example: you have a 9Gb disk (these times they're even hard to find..). First thing, manifacturers often lie about disk capacity (the whole Megabyte vs Mebibyte issue), and then the OS needs some space for its accounting structures, so you'll reasonably end up with 8Gib of useable space. You then have to account for another 10% in overhead for Squid, and then the space needed for keeping fragmentation at bay. So in the end the recommended cache_dir setting is 6000 to 7000 Mebibyte.
cache_dir ... 7000 16 256
Its better to start out with a conservative setting and then, after the cache has been filled, look at the disk usage. If you think there is plenty of unused space, then increase the cache_dir setting a little.
If you're getting "disk full" write errors, then you definitely need to decrease your cache size.
I'm adding a new cache_dir. Will I lose my cache?
No. You can add and delete cache_dir lines without affecting any of the others.
Squid and http-gw from the TIS toolkit.
Several people on both the fwtk-users and the squid-users mailing asked about using Squid in combination with http-gw from the TIS toolkit. The most elegant way in my opinion is to run an internal Squid caching proxyserver which handles client requests and let this server forward it's requests to the http-gw running on the firewall. Cache hits won't need to be handled by the firewall.
In this example Squid runs on the same server as the http-gw, Squid uses 8000 and http-gw uses 8080 (web). The local domain is home.nl.
Firewall configuration
Either run http-gw as a daemon from the /etc/rc.d/rc.local (Linux Slackware):
exec /usr/local/fwtk/http-gw -daemon 8080
or run it from inetd like this:
web stream tcp nowait.100 root /usr/local/fwtk/http-gw http-gw
I increased the watermark to 100 because a lot of people run into problems with the default value.
Make sure you have at least the following line in /usr/local/etc/netperm-table:
http-gw: hosts 127.0.0.1
You could add the IP-address of your own workstation to this rule and make sure the http-gw by itself works, like:
http-gw: hosts 127.0.0.1 10.0.0.1
Squid configuration
The following settings are important:
http_port 8000 icp_port 0 cache_peer localhost.home.nl parent 8080 0 default acl HOME dstdomain .home.nl alwayws_direct allow HOME never_direct allow all
This tells Squid to use the parent for all domains other than home.nl. Below, access.log entries show what happens if you do a reload on the Squid-homepage:
872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl - 872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl - 872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl - 872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl
http-gw entries in syslog:
Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/ Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1 Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1 Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2 Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3
To summarize:
Advantages:
- http-gw allows you to selectively block ActiveX and Java, and it's primary design goal is security.
- The firewall doesn't need to run large applications like Squid.
- The internal Squid-server still gives you the benefit of caching.
Disadvantages:
- The internal Squid proxyserver can't (and shouldn't) work with other parent or neighbor caches.
- Initial requests are slower because these go through http-gw, http-gw also does reverse lookups. Run a nameserver on the firewall or use an internal nameserver.
(contributed by Rodney van den Oever)
What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?
see. Security - X-Forwarded-For
When a proxy-cache is used, a server does not see the connection coming from the originating client. Many people like to implement access controls based on the client address. To accommodate these people, Squid adds the request header called "X-Forwarded-For" which looks like this:
X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30
Entries are always IP addresses, or the word unknown if the address could not be determined or if it has been disabled with the forwarded_for configuration option.
We must note that access controls based on this header are extremely weak and simple to fake. Anyone may hand-enter a request with any IP address whatsoever. This is perhaps the reason why client IP addresses have been omitted from the HTTP/1.1 specification.
Because of the weakness of this header, access controls based on X-Forwarded-For are not used by default. It's needs to be specifically enabled with follow_x_forwarded_for.
Can Squid anonymize HTTP requests?
Yes it can, however the way of doing it has changed from earlier versions of squid. Please follow the instructions for the version of squid that you are using. As a default, no anonymizing is done.
If you choose to use the anonymizer you might wish to investigate the forwarded_for option to prevent the client address being disclosed. Failure to turn off the forwarded_for option will reduce the effectiveness of the anonymizer. Finally if you filter the User-Agent header using the fake_user_agent option can prevent some user problems as some sites require the User-Agent header.
NP: Squid must be built with the --enable-http-violations configure option before building.
Current squid releases provide a mix of header control directives and capability;
- Squid 2.6 - 2.7
Allow erasure or replacement of specific headers through the http_header_access and header_replace options.
- Squid 3.0
Allows selective erasure and replacement of specific headers in either request or reply with the request_header_access and reply_header_access and header_replace settings.
- Squid 3.1
- Adds to the 3.0 capability with truncation, replacement, or removal of X-Forwarded-For header.
For details see the documentation in squid.conf.default or squid.conf.documented for your specific version of squid.
http:/www.squid-cache.org/Versions/v2/HEAD/cfgman/ http:/www.squid-cache.org/Versions/v3/HEAD/cfgman/
References: Anonymous WWW
Can I make Squid go direct for some sites?
Sure, just use the always_direct access list.
For example, if you want Squid to connect directly to hotmail.com servers, you can use these lines in your config file:
acl hotmail dstdomain .hotmail.com always_direct allow hotmail
Can I make Squid proxy only, without caching anything?
Sure, there are few things you can do.
You can use the cache access list to make Squid never cache any response:
cache deny all
With Squid-2.7, Squid-3.1 and later you can also remove all 'cache_dir' options from your squid.conf to avoid having a cache directory.
With Squid-2.4, 2.5, 2.6, and 3.0 you need to use the "null" storage module:
cache_dir null /tmp
Note: a null cache_dir does not disable caching, but it does save you from creating a cache structure if you have disabled caching with cache. The directory (e.g., /tmp) must exist so that squid can chdir to it, unless you also use the coredump_dir option.
To configure Squid for the "null" storage module, specify it on the configure command line:
--enable-storeio=null,...
Can I prevent users from downloading large files?
You can set the global reply_body_max_size parameter. This option controls the largest HTTP message body that will be sent to a cache client for one request.
If the HTTP response coming from the server has a Content-length header, then Squid compares the content-length value to the reply_body_max_size value. If the content-length is larger,the server connection is closed and the user receives an error message from Squid.
Some responses don't have Content-length headers. In this case, Squid counts how many bytes are written to the client. Once the limit is reached, the client's connection is simply closed.
Note that "creative" user-agents will still be able to download really large files through the cache using HTTP/1.1 range requests.
Back to the SquidFaq
Contents
- Communication between browsers and Squid
- Manual Browser Configuration
- Partially Automatic Configuration
- Fully Automatically Configuring Browsers for WPAD
- Redundant Proxy Auto-Configuration
- Proxy Auto-Configuration with URL Hashing
- How do I tell Squid to use a specific username for FTP urls?
- IE 5.0x crops trailing slashes from FTP URL's
- IE 6.0 SP1 fails when using authentication
Communication between browsers and Squid
Most web browsers available today support proxying and are easily configured to use a Squid server as a proxy. Some browsers support advanced features such as lists of domains or URL patterns that shouldn't be fetched through the proxy, or JavaScript automatic proxy configuration.
There are three ways to configure browsers to use Squid. The first method involves manually configuring the proxy in each browser. Alternatively, a proxy.pac file can be manually entered into each browser so that it will download the proxy settings (partial auto configuration), and lastly all modern browsers can also and indeed are configured by default to fully automatically configure themselves if the network is configured to support this.
Manual Browser Configuration
This involves manually specifying the proxy server and port name in each browser.
Firefox and Thunderbird manual configuration
Both Firefox and Thunderbird are configured in the same way. Look in the Tools menu, Options, General and then Connection Settings. The options in there are fairly self explanatory. Firefox and Thunderbird support manually specifying the proxy server, automatically downloading a wpad.dat file from a specified source, and additionally wpad auto-detection.
Thunderbird uses these settings for downloading HTTP images in emails.
In both cases if you are manually configuring proxies, make sure you should add relevant statements for your network in the "No Proxy For" boxes.
Microsoft Internet Explorer manual configuration
Select Options from the View menu. Click on the Connection tab. Tick the Connect through Proxy Server option and hit the Proxy Settings button. For each protocol that your Squid server supports (by default, HTTP, FTP, and gopher) enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port column. For any protocols that your Squid does not support, leave the fields blank.
Netscape manual configuration
Select Network Preferences from the Options menu. On the Proxies page, click the radio button next to Manual Proxy Configuration and then click on the View button. For each protocol that your Squid server supports (by default, HTTP, FTP, and gopher) enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port column. For any protocols that your Squid does not support, leave the fields blank.
Lynx and Mosaic manual configuration
For Mosaic and Lynx, you can set environment variables before starting the application. For example (assuming csh or tcsh):
% setenv http_proxy http://mycache.example.com:3128/ % setenv gopher_proxy http://mycache.example.com:3128/ % setenv ftp_proxy http://mycache.example.com:3128/
For Lynx you can also edit the lynx.cfg file to configure proxy usage. This has the added benefit of causing all Lynx users on a system to access the proxy without making environment variable changes for each user. For example:
http_proxy:http://mycache.example.com:3128/ ftp_proxy:http://mycache.example.com:3128/ gopher_proxy:http://mycache.example.com:3128/
Opera 2.12 manual configuration
by Hume Smith
Select Proxy Servers... from the Preferences menu. Check each protocol that your Squid server supports (by default, HTTP, FTP, and Gopher) and enter the Squid server's address as hostname:port (e.g. mycache.example.com:3128 or 123.45.67.89:3128). Click on Okay to accept the setup.
Notes:
- Opera 2.12 doesn't support gopher on its own, but requires a proxy; therefore Squid's gopher proxying can extend the utility of your Opera immensely.
Unfortunately, Opera 2.12 chokes on some HTTP requests, for example abuse.net.
At the moment I think it has something to do with cookies. If you have trouble with a site, try disabling the HTTP proxying by unchecking that protocol in the Preferences|Proxy Servers... dialogue. Opera will remember the address, so reenabling is easy.
Netmanage Internet Chameleon WebSurfer manual configuration
Netmanage WebSurfer supports manual proxy configuration and exclusion lists for hosts or domains that should not be fetched via proxy (this information is current as of WebSurfer 5.0). Select Preferences from the Settings menu. Click on the Proxies tab. Select the Use Proxy options for HTTP, FTP, and gopher. For each protocol that enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port boxes. For any protocols that your Squid does not support, leave the fields blank.
On the same configuration window, you'll find a button to bring up the exclusion list dialog box, which will let you enter some hosts or domains that you don't want fetched via proxy.
Partially Automatic Configuration
This involves the browser being preconfigured with the location of an autoconfiguration script.
Netscape automatic configuration
Netscape Navigator's proxy configuration can be automated with JavaScript (for Navigator versions 2.0 or higher). Select Network Preferences from the Options menu. On the Proxies page, click the radio button next to Automatic Proxy Configuration and then fill in the URL for your JavaScript proxy configuration file in the text box. The box is too small, but the text will scroll to the r8ight as you go.
You may also wish to consult Netscape's documentation for the Navigator JavaScript proxy configuration
Here is a sample auto configuration file from Oskar Pearson (link to save at the bottom):
//We (www.is.co.za) run a central cache for our customers that they
//access through a firewall - thus if they want to connect to their intranet
//system (or anything in their domain at all) they have to connect
//directly - hence all the "fiddling" to see if they are trying to connect
//to their local domain.
//
//Replace each occurrence of company.com with your domain name
//and if you have some kind of intranet system, make sure
//that you put it's name in place of "internal" below.
//
//We also assume that your cache is called "cache.company.com", and
//that it runs on port 8080. Change it down at the bottom.
//
//(C) Oskar Pearson and the Internet Solution (http://www.is.co.za)
function FindProxyForURL(url, host)
{
//If they have only specified a hostname, go directly.
if (isPlainHostName(host))
return "DIRECT";
//These connect directly if the machine they are trying to
//connect to starts with "intranet" - ie http://intranet
//Connect directly if it is intranet.*
//If you have another machine that you want them to
//access directly, replace "internal*" with that
//machine's name
if (shExpMatch( host, "intranet*")||
shExpMatch(host, "internal*"))
return "DIRECT";
//Connect directly to our domains (NB for Important News)
if (dnsDomainIs( host,"company.com")||
//If you have another domain that you wish to connect to
//directly, put it in here
dnsDomainIs(host,"sistercompany.com"))
return "DIRECT";
//So the error message "no such host" will appear through the
//normal Netscape box - less support queries :)
if (!isResolvable(host))
return "DIRECT";
//We only cache http, ftp and gopher
if (url.substring(0, 5) == "http:" ||
url.substring(0, 4) == "ftp:"||
url.substring(0, 7) == "gopher:")
//Change the ":8080" to the port that your cache
//runs on, and "cache.company.com" to the machine that
//you run the cache on
return "PROXY cache.company.com:8080; DIRECT";
//We don't cache WAIS
if (url.substring(0, 5) == "wais:")
return "DIRECT";
else
return "DIRECT";
}
Microsoft Internet Explorer
Microsoft Internet Explorer, versions 4.0 and above, supports JavaScript automatic proxy configuration in a Netscape-compatible way. Just select Options from the View menu. Click on the Advanced tab. In the lower left-hand corner, click on the Automatic Configuration button. Fill in the URL for your JavaScript file in the dialog box it presents you. Then exit MSIE and restart it for the changes to take effect. MSIE will reload the JavaScript file every time it starts.
Fully Automatically Configuring Browsers for WPAD
by Mark Reynolds
You may like to start by reading the Expired Internet-Draft that describes WPAD.
After reading the 8 steps below, if you don't understand any of the terms or methods mentioned, you probably shouldn't be doing this. Implementing wpad requires you to fully understand:
- web server installations and modifications.
- squid proxy server (or others) installation etc.
- Domain Name System maintenance etc.
|
Please don't bombard the squid list with web server or DNS questions. See your system administrator, or do some more research on those topics. |
This is not a recommendation for any product or version. All major browsers out now implementing WPAD. I think WPAD is an excellent feature that will return several hours of life per month.
There are probably many more tricks and tips which hopefully will be detailed here in the future. Things like wpad.dat files being served from the proxy server themselves, maybe with a round robin dns setup for the WPAD host.
I have only focused on the domain name method, to the exclusion of the DHCP method. I think the dns method might be easier for most people. I don't currently, and may never, fully understand wpad and IE5, but this method worked for me. It may work for you.
But if you'd rather just have a go ...
The PAC file
Create a standard Netscape auto proxy config file. The sample provided above is more than adequate to get you going. No doubt all the other load balancing and backup scripts will be fine also.
Store the resultant file in the document root directory of a handy web server as wpad.dat (Not proxy.pac as you may have previously done.) Andrei Ivanov notes that you should be able to use an HTTP redirect if you want to store the wpad.dat file somewhere else. You can probably even redirect wpad.dat to proxy.pac:
Redirect /wpad.dat http://racoon.riga.lv/proxy.pac
If you do nothing more, a URL like http://www.your.domain.name/wpad.dat should bring up the script text in your browser window.
Insert the following entry into your web server mime.types file. Maybe in addition to your pac file type, if you've done this before.
application/x-ns-proxy-autoconfig dat
And then restart your web server, for new mime type to work.
Browser Configurations
Internet explorer 5
Under Tools, Internet Options, Connections, Settings or Lan Settings, set ONLY Use Automatic Configuration Script to be the URL for where your new wpad.dat file can be found.
i.e. http://www.your.domain.name/wpad.dat.
Test that that all works as per your script and network. There's no point continuing until this works ...
Automatic WPAD with DNS
Create/install/implement a DNS record so that wpad.your.domain.name resolves to the host above where you have a functioning auto config script running. You should now be able to use http://wpad.your.domain.name/wpad.dat as the Auto Config Script location in step 5 above.
And finally, go back to the setup screen detailed in 5 above, and choose nothing but the Automatically Detect Settings option, turning everything else off. Best to restart IE5, as you normally do with any Microsoft product... And it should all work. Did for me anyway.
One final question might be "Which domain name does the client (IE5) use for the wpad... lookup?" It uses the hostname from the control panel setting. It starts the search by adding the hostname wpad to current fully-qualified domain name. For instance, a client in a.b.Microsoft.com would search for a WPAD server at wpad.a.b.microsoft.com. If it could not locate one, it would remove the bottom-most domain and try again; for instance, it would try wpad.b.microsoft.com next. IE 5 would stop searching when it found a WPAD server or reached the third-level domain, wpad.microsoft.com.
Automatic WPAD with DHCP
You can also use DHCP to configure browsers for WPAD. This technique allows you to set any URL as the PAC URL. For ISC DHCPD, enter a line like this in your dhcpd.conf file:
option wpad code 252 = text; option wpad "http://www.example.com/proxy.pac";
Replace the hostname with the name or address of your own server.
Ilja Pavkovic notes that the DHCP mode does not work reliably with every version of Internet Explorer. The DNS name method to find wpad.dat is more reliable.
Another user adds that IE 6.01 seems to strip the last character from the URL. By adding a trailing newline, he is able to make it work with both IE 5.0 and 6.0:
option wpad "http://www.example.com/proxy.pac\n";
Redundant Proxy Auto-Configuration
by Rodney van den Oever
There's one nasty side-effect to using auto-proxy scripts: if you start the web browser it will try and load the auto-proxy-script.
If your script isn't available either because the web server hosting the script is down or your workstation can't reach the web server (e.g. because you're working off-line with your notebook and just want to read a previously saved HTML-file) you'll get different errors depending on the browser you use.
The Netscape browser will just return an error after a timeout (after that it tries to find the site 'www.proxy.com' if the script you use is called 'proxy.pac').
The Microsoft Internet Explorer on the other hand won't even start, no window displays, only after about 1 minute it'll display a window asking you to go on with/without proxy configuration.
The point is that your workstations always need to locate the proxy-script. I created some extra redundancy by hosting the script on two web servers (actually Apache web servers on the proxy servers themselves) and adding the following records to my primary nameserver:
proxy IN A 10.0.0.1 ; IP address of proxy1
IN A 10.0.0.2 ; IP address of proxy2The clients just refer to 'http://proxy/proxy.pac'. This script looks like this:
function FindProxyForURL(url,host)
{
// Hostname without domainname or host within our own domain?
// Try them directly:
// http://www.domain.com actually lives before the firewall, so
// make an exception:
if ((isPlainHostName(host)||dnsDomainIs( host,".domain.com")) &&
!localHostOrDomainIs(host, "www.domain.com"))
return "DIRECT";
// First try proxy1 then proxy2. One server mostly caches '.com'
// to make sure both servers are not
// caching the same data in the normal situation. The other
// server caches the other domains normally.
// If one of 'm is down the client will try the other server.
else if (shExpMatch(host, "*.com"))
return "PROXY proxy1.domain.com:8080; PROXY proxy2.domain.com:8081; DIRECT";
return "PROXY proxy2.domain.com:8081; PROXY proxy1.domain.com:8080; DIRECT";
}
I made sure every client domain has the appropriate 'proxy' entry. The clients are automatically configured with two nameservers using DHCP.
Proxy Auto-Configuration with URL Hashing
The Sharp Super Proxy Script page contains a lot of good information about hash-based proxy auto-configuration scripts. With these you can distribute the load between a number of caching proxies.
How do I tell Squid to use a specific username for FTP urls?
Insert your username in the host part of the URL, for example:
ftp://joecool@ftp.foo.org/
Squid and the browser should then prompt you for your account password.
Alternatively, you can specify both your username and password in the URL itself:
ftp://joecool:secret@ftp.foo.org/
However, we certainly do not recommend this, as it could be very easy for someone to see or grab your password.
IE 5.0x crops trailing slashes from FTP URL's
There was a bug in the 5.0x releases of Internet Explorer in which IE cropped any trailing slash off an FTP URL. The URL showed up correctly in the browser's "Address:" field, however squid logs show that the trailing slash was being taken off.
An example of where this impacted squid if you had a setup where squid would go direct for FTP directory listings but forward a request to a parent for FTP file transfers. This was useful if your upstream proxy was an older version of Squid or another vendors software which displayed directory listings with broken icons and you wanted your own local version of squid to generate proper FTP directory listings instead. The workaround for this is to add a double slash to any directory listing in which the slash was important, or else upgrade to IE 5.5. (Or use Firefox if you cannot upgrade your IE)
IE 6.0 SP1 fails when using authentication
When using authentication with Internet Explorer 6 SP1, you may encounter issues when you first launch Internet Explorer. The problem will show itself when you first authenticate, you will receive a "Page Cannot Be Displayed" error. However, if you click refresh, the page will be correctly displayed.
This only happens immediately after you authenticate.
This is not a Squid error or bug. Microsoft broke the Basic Authentication when they put out IE6 SP1.
There is a knowledgebase article ( KB 331906) regarding this issue, which contains a link to a downloadable "hot fix." They do warn that this code is not "regression tested" but so far there have not been any reports of this breaking anything else. The problematic file is wininet.dll. Please note that this hotfix is included in the latest security update.
Lloyd Parkes notes that the article references another article, KB 312176. He says that you must not have the registry entry that KB 312176 encourages users to add to their registry.
According to Joao Coutinho, this simple solution also corrects the problem:
- Go to Tools/Internet
- Go to Options/Advanced
- UNSELECT "Show friendly HTTP error messages" under Browsing.
Another possible workaround to these problems is to make the ERR_CACHE_ACCESS_DENIED larger than 1460 bytes. This should trigger IE to handle the authentication in a slightly different manner.
Contents
- Squid Log Files
- squid.out
- cache.log
- useragent.log
- store.log
- access.log
- access.log native format in detail
- sending access.log to syslog
- customizable access.log
- swap.state
- Which log files can I delete safely?
- How can I disable Squid's log files?
- What is the maximum size of access.log?
- My log files get very big!
- I want to use another tool to maintain the log files.
- Managing log files
- Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
- What does ERR_LIFETIME_EXP mean?
- Retrieving "lost" files from the cache
- Can I use store.log to figure out if a response was cachable?
- Can I pump the squid access.log directly into a pipe?
Squid Log Files
The logs are a valuable source of information about Squid workloads and performance. The logs record not only access information, but also system configuration errors and resource consumption (e.g. memory, disk space). There are several log file maintained by Squid. Some have to be explicitly activated during compile time, others can safely be deactivated during run-time.
There are a few basic points common to all log files. The time stamps logged into the log files are usually UTC seconds unless stated otherwise. The initial time stamp usually contains a millisecond extension.
squid.out
If you run your Squid from the RunCache script, a file squid.out contains the Squid startup times, and also all fatal errors, e.g. as produced by an assert() failure. If you are not using RunCache, you will not see such a file.
cache.log
The cache.log file contains the debug and error messages that Squid generates. If you start your Squid using the default RunCache script, or start it with the -s command line option, a copy of certain messages will go into your syslog facilities. It is a matter of personal preferences to use a separate file for the squid log data.
From the area of automatic log file analysis, the cache.log file does not have much to offer. You will usually look into this file for automated error reports, when programming Squid, testing new features, or searching for reasons of a perceived misbehavior, etc.
useragent.log
The user agent log file is only maintained, if
you configured the compile time --enable-useragent-log option, and
you pointed the useragent_log configuration option to a file.
From the user agent log file you are able to find out about distribution of browsers of your clients. Using this option in conjunction with a loaded production squid might not be the best of all ideas.
store.log
The store.log file covers the objects currently kept on disk or removed ones. As a kind of transaction log it is usually used for debugging purposes. A definitive statement, whether an object resides on your disks is only possible after analyzing the complete log file. The release (deletion) of an object may be logged at a later time than the swap out (save to disk).
The store.log file may be of interest to log file analysis which looks into the objects on your disks and the time they spend there, or how many times a hot object was accessed. The latter may be covered by another log file, too. With knowledge of the cache_dir configuration option, this log file allows for a URL to filename mapping without recursing your cache disks. However, the Squid developers recommend to treat store.log primarily as a debug file, and so should you, unless you know what you are doing.
The print format for a store log entry (one line) consists of thirteen space-separated columns, compare with the storeLog() function in file src/store_log.c:
9ld.%03d %-7s %02d %08X %s %4d %9ld %9ld %9ld %s %ld/%ld %s %s
time The timestamp when the line was logged in UTC with a millisecond fraction.
action The action the object was sumitted to, compare with src/store_log.c:
CREATE Seems to be unused.
RELEASE The object was removed from the cache (see also file number below).
SWAPOUT The object was saved to disk.
SWAPIN The object existed on disk and was read into memory.
dir number The cache_dir number this object was stored into, starting at 0 for your first cache_dir line.
file number The file number for the object storage file. Please note that the path to this file is calculated according to your cache_dir configuration. A file number of FFFFFFFF indicates "memory only" objects. Any action code for such a file number refers to an object which existed only in memory, not on disk. For instance, if a RELEASE code was logged with file number FFFFFFFF, the object existed only in memory, and was released from memory.
hash The hash value used to index the object in the cache. Squid currently uses MD5 for the hash value.
status The HTTP reply status code.
datehdr The value of the HTTP Date reply header.
lastmod The value of the HTTP Last-Modified reply header.
expires The value of the HTTP "Expires: " reply header.
type The HTTP Content-Type major value, or "unknown" if it cannot be determined.
sizes This column consists of two slash separated fields:
The advertised content length from the HTTP Content-Length reply header.
- The size actually read.
- If the advertised (or expected) length is missing, it will be set to zero. If the advertised length is not zero, but not equal to the real length, the object will be realeased from the cache.
method The request method for the object, e.g. GET.
key The key to the object, usually the URL.
The datehdr, lastmod, and expires values are all expressed in UTC seconds. The actual values are parsed from the HTTP reply headers. An unparsable header is represented by a value of -1, and a missing header is represented by a value of -2.
access.log
Most log file analysis program are based on the entries in access.log.
Squid allows the administrators to configure their logfile format with great flexibility previous version offered a much more limited functionality.
Previous versions allow to log accesses either in native logformat (default) or using the http common logfile format (CLF). The latter is enabled by specifying the emulate_httpd_log option in squid.conf.
The common log file format
The Common Logfile Format is used by numerous HTTP servers. This format consists of the following seven fields:
remotehost rfc931 authuser [date] "method URL" status bytes
It is parsable by a variety of tools. The common format contains different information than the native log file format. The HTTP version is logged, which is not logged in native log file format.
The native log file format
The format is:
time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type
The native log file format logs more and different information than the common log file format: the request duration, some timeout information, the next upstream server address, and the content type.
There exist tools, which convert one file format into the other. Please mind that even though the log formats share most information, both formats contain information which is not part of the other format, and thus this part of the information is lost when converting. Especially converting back and forth is not possible without loss.
squid2common.pl is a conversion utility, which converts any of the squid log file formats into the old CERN proxy style output. There exist tools to analyse, evaluate and graph results from that format.
access.log native format in detail
We recommend that you use Squid's native log format due to its greater amount of information made available for later analysis. The print format line for native access.log entries looks like this:
"%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"
Therefore, an access.log entry usually consists of (at least) 10 columns separated by one ore more spaces:
time A Unix timestamp as UTC seconds with a millisecond resolution. You can convert Unix timestamps into something more human readable using this short perl script:
s/^\d+\.\d+/localtime $&/e;
duration The elapsed time considers how many milliseconds the transaction busied the cache. It differs in interpretation between TCP and UDP:
- For HTTP this is basically the time from having received the request to when Squid finishes sending the last byte of the response.
- For ICP, this is the time between scheduling a reply and actually sending it.
Please note that the entries are logged after the reply finished being sent, not during the lifetime of the transaction.
client address The IP address of the requesting instance, the client IP address. The client_netmask configuration option can distort the clients for data protection reasons, but it makes analysis more difficult. Often it is better to use one of the log file anonymizers. Also, the log_fqdn configuration option may log the fully qualified domain name of the client instead of the dotted quad. The use of that option is discouraged due to its performance impact.
result codes This column is made up of two entries separated by a slash. This column encodes the transaction result:
The cache result of the request contains information on the kind of request, how it was satisfied, or in what way it failed. Please refer to Squid result codes for valid symbolic result codes. Several codes from older versions are no longer available, were renamed, or split. Especially the ERR_ codes do not seem to appear in the log file any more. Also refer to Squid result codes for details on the codes no longer available. The status part contains the HTTP result codes with some Squid specific extensions. Squid uses a subset of the RFC defined error codes for HTTP. Refer to section status codes for details of the status codes recognized.
bytes The size is the amount of data delivered to the client. Mind that this does not constitute the net object size, as headers are also counted. Also, failed requests may deliver an error page, the size of which is also logged here.
request method The request method to obtain an object. Please refer to section request-methods for available methods. If you turned off log_icp_queries in your configuration, you will not see (and thus unable to analyze) ICP exchanges. The PURGE method is only available, if you have an ACL for "method purge" enabled in your configuration file.
URL This column contains the URL requested. Please note that the log file may contain whitespace for the URI. The default configuration for uri_whitespace denies or truncates whitespace, though.
rfc931 The eighth column may contain the ident lookups for the requesting client. Since ident lookups have performance impact, the default configuration turns ident_loookups off. If turned off, or no ident information is available, a "-" will be logged.
hierarchy code The hierarchy information consists of three items:
Any hierarchy tag may be prefixed with TIMEOUT_, if the timeout occurs waiting for all ICP replies to return from the neighbours. The timeout is either dynamic, if the icp_query_timeout was not set, or the time configured there has run up.
A code that explains how the request was handled, e.g. by forwarding it to a peer, or going straight to the source. Refer to Hierarchy Codes for details on hierarchy codes and removed hierarchy codes.
- The IP address or hostname where the request (if a miss) was forwarded. For requests sent to origin servers, this is the origin server's IP address. For requests sent to a neighbor cache, this is the neighbor's hostname. NOTE: older versions of Squid would put the origin server hostname here.
type The content type of the object as seen in the HTTP reply header. Please note that ICP exchanges usually don't have any content type, and thus are logged "-". Also, some weird replies have content types ":" or even empty ones.
There may be two more columns in the access.log, if the (debug) option log_mime_headers is enabled In this case, the HTTP request headers are logged between a "[" and a "]", and the HTTP reply headers are also logged between "[" and "]". All control characters like CR and LF are URL-escaped, but spaces are not escaped! Parsers should watch out for this.
Squid result codes
The TCP_ codes refer to requests on the HTTP port (usually 3128). The UDP_ codes refer to requests on the ICP port (usually 3130). If ICP logging was disabled using the log_icp_queries option, no ICP replies will be logged.
The following result codes were taken from a Squid-2, compare with the log_type enum in src/enums.h:
TCP_HIT A valid copy of the requested object was in the cache.
TCP_MISS The requested object was not in the cache.
TCP_REFRESH_HIT The requested object was cached but STALE. The IMS query for the object resulted in "304 not modified".
TCP_REFRESH_FAIL_HIT The requested object was cached but STALE. The IMS query failed and the stale object was delivered.
TCP_REFRESH_MISS The requested object was cached but STALE. The IMS query returned the new content.
TCP_CLIENT_REFRESH_MISS The client issued a "no-cache" pragma, or some analogous cache control command along with the request. Thus, the cache has to refetch the object.
TCP_IMS_HIT The client issued an IMS request for an object which was in the cache and fresh.
TCP_SWAPFAIL_MISS The object was believed to be in the cache, but could not be accessed.
TCP_NEGATIVE_HIT Request for a negatively cached object, e.g. "404 not found", for which the cache believes to know that it is inaccessible. Also refer to the explainations for negative_ttl in your squid.conf file.
TCP_MEM_HIT A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses.
TCP_DENIED Access was denied for this request.
TCP_OFFLINE_HIT The requested object was retrieved from the cache during offline mode. The offline mode never validates any object, see offline_mode in squid.conf file.
TCP_STALE_HIT The object was cached and served stale. This is usually caused by stale-while-revalidate or stale-if-error.
TCP_ASYNC_HIT A background request (e.g., one started by stale-while-revalidate) resulted in a refresh hit.
TCP_ASYNC_MISS A background request (e.g., one started by stale-while-revalidate) resulted in a miss; i.e., the cached object (if any) was updated).
UDP_HIT A valid copy of the requested object was in the cache.
UDP_MISS The requested object is not in this cache.
UDP_DENIED Access was denied for this request.
UDP_INVALID An invalid request was received.
UDP_MISS_NOFETCH During "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or this code. Neighbours will thus only fetch hits.
NONE Seen with errors and cachemgr requests.
The following codes are no longer available in Squid-2:
ERR_* Errors are now contained in the status code.
TCP_CLIENT_REFRESH See: TCP_CLIENT_REFRESH_MISS.
TCP_SWAPFAIL See: TCP_SWAPFAIL_MISS.
TCP_IMS_MISS Deleted, now replaced with TCP_IMS_HIT.
UDP_HIT_OBJ Refers to an old version that would send cache hits in ICP replies. No longer implemented.
UDP_RELOADING See: UDP_MISS_NOFETCH.
HTTP status codes
These are taken from RFC 2616 and verified for Squid. Squid-2 uses almost all codes except 307 (Temporary Redirect), 416 (Request Range Not Satisfiable), and 417 (Expectation Failed). Extra codes include 0 for a result code being unavailable, and 600 to signal an invalid header, a proxy error. Also, some definitions were added as for RFC 2518 (WebDAV). Yes, there are really two entries for status code 424, compare with http_status in src/enums.h:
000 Used mostly with UDP traffic. 100 Continue 101 Switching Protocols *102 Processing 200 OK 201 Created 202 Accepted 203 Non-Authoritative Information 204 No Content 205 Reset Content 206 Partial Content *207 Multi Status 300 Multiple Choices 301 Moved Permanently 302 Moved Temporarily 303 See Other 304 Not Modified 305 Use Proxy [307 Temporary Redirect] 400 Bad Request 401 Unauthorized 402 Payment Required 403 Forbidden 404 Not Found 405 Method Not Allowed 406 Not Acceptable 407 Proxy Authentication Required 408 Request Timeout 409 Conflict 410 Gone 411 Length Required 412 Precondition Failed 413 Request Entity Too Large 414 Request URI Too Large 415 Unsupported Media Type [416 Request Range Not Satisfiable] [417 Expectation Failed] *424 Locked *424 Failed Dependency *433 Unprocessable Entity 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable 504 Gateway Timeout 505 HTTP Version Not Supported *507 Insufficient Storage 600 Squid header parsing error
Request methods
Squid recognizes several request methods as defined in RFC 2616. Newer versions of Squid (2.2.STABLE5 and above) also recognize RFC 2518 "HTTP Extensions for Distributed Authoring -- WEBDAV" extensions.
method defined cachabil. meaning --------- ---------- ---------- ------------------------------------------- GET HTTP/0.9 possibly object retrieval and simple searches. HEAD HTTP/1.0 possibly metadata retrieval. POST HTTP/1.0 CC or Exp. submit data (to a program). PUT HTTP/1.1 never upload data (e.g. to a file). DELETE HTTP/1.1 never remove resource (e.g. file). TRACE HTTP/1.1 never appl. layer trace of request route. OPTIONS HTTP/1.1 never request available comm. options. CONNECT HTTP/1.1r3 never tunnel SSL connection. ICP_QUERY Squid never used for ICP based exchanges. PURGE Squid never remove object from cache. PROPFIND rfc2518 ? retrieve properties of an object. PROPATCH rfc2518 ? change properties of an object. MKCOL rfc2518 never create a new collection. COPY rfc2518 never create a duplicate of src in dst. MOVE rfc2518 never atomically move src to dst. LOCK rfc2518 never lock an object against modifications. UNLOCK rfc2518 never unlock an object.
Hierarchy Codes
The following hierarchy codes are used with Squid-2:
NONE For TCP HIT, TCP failures, cachemgr requests and all UDP requests, there is no hierarchy information.
DIRECT The object was fetched from the origin server.
SIBLING_HIT The object was fetched from a sibling cache which replied with UDP_HIT.
PARENT_HIT The object was requested from a parent cache which replied with UDP_HIT.
DEFAULT_PARENT No ICP queries were sent. This parent was chosen because it was marked "default" in the config file.
SINGLE_PARENT The object was requested from the only parent appropriate for the given URL.
FIRST_UP_PARENT The object was fetched from the first parent in the list of parents.
NO_PARENT_DIRECT The object was fetched from the origin server, because no parents existed for the given URL.
FIRST_PARENT_MISS The object was fetched from the parent with the fastest (possibly weighted) round trip time.
CLOSEST_PARENT_MISS This parent was chosen, because it included the the lowest RTT measurement to the origin server. See also the closest-only peer configuration option.
CLOSEST_PARENT The parent selection was based on our own RTT measurements.
CLOSEST_DIRECT Our own RTT measurements returned a shorter time than any parent.
NO_DIRECT_FAIL The object could not be requested because of a firewall configuration, see also never_direct and related material, and no parents were available.
SOURCE_FASTEST The origin site was chosen, because the source ping arrived fastest.
ROUNDROBIN_PARENT No ICP replies were received from any parent. The parent was chosen, because it was marked for round robin in the config file and had the lowest usage count.
CACHE_DIGEST_HIT The peer was chosen, because the cache digest predicted a hit. This option was later replaced in order to distinguish between parents and siblings.
CD_PARENT_HIT The parent was chosen, because the cache digest predicted a hit.
CD_SIBLING_HIT The sibling was chosen, because the cache digest predicted a hit.
NO_CACHE_DIGEST_DIRECT This output seems to be unused?
CARP The peer was selected by CARP.
ANY_PARENT part of src/peer_select.c:hier_strings[].
INVALID CODE part of src/peer_select.c:hier_strings[].
Almost any of these may be preceded by 'TIMEOUT_' if the two-second (default) timeout occurs waiting for all ICP replies to arrive from neighbors, see also the icp_query_timeout configuration option.
The following hierarchy codes were removed from Squid-2:
code meaning -------------------- ------------------------------------------------- PARENT_UDP_HIT_OBJ hit objects are not longer available. SIBLING_UDP_HIT_OBJ hit objects are not longer available. SSL_PARENT_MISS SSL can now be handled by squid. FIREWALL_IP_DIRECT No special logging for hosts inside the firewall. LOCAL_IP_DIRECT No special logging for local networks.
sending access.log to syslog
Squid 2.6 allows to send access.log contents to a local syslog server by specifying syslog as a file path, for example as in:
access_log syslog: squid
customizable access.log
Squid 2.6 and later versions feature a customizable access.log format.
swap.state
This file has a rather unfortunate history which has led to it often being called the swap log. It is in fact a journal of the cache index with a record of every cache object written to disk. It is read when Squid starts up to "reload" the cache quickly.
If you remove this file when squid is NOT running, you will effectively wipe out your cache index of contents. Squid can rebuild it from the original files, but that procedure can take a long time as every file in the cache must be fully scanned for meta data.
If you remove this file while squid IS running, you can easily recreate it. The safest way is to simply shutdown the running process:
% squid -k shutdown
This will disrupt service, but at least you will have your swap log back. Alternatively, you can tell squid to rotate its log files. This also causes a clean swap log to be written.
% squid -k rotate
By default the swap.state file is stored in the top-level of each cache_dir. You can move the logs to a different location with the cache_swap_log option.
The file is a binary format that includes MD5 checksums, and StoreEntry fields. Please see the Programmers' Guide for information on the contents and format of that file.
Which log files can I delete safely?
You should never delete access.log, store.log, cache.log, or swap.state while Squid is running. With Unix, you can delete a file when a process has the file opened. However, the filesystem space is not reclaimed until the process closes the file.
If you accidentally delete swap.state while Squid is running, you can recover it by following the instructions in the previous questions. If you delete the others while Squid is running, you can not recover them.
The correct way to maintain your log files is with Squid's "rotate" feature. You should rotate your log files at least once per day. The current log files are closed and then renamed with numeric extensions (.0, .1, etc). If you want to, you can write your own scripts to archive or remove the old log files. If not, Squid will only keep up to logfile_rotate versions of each log file. The logfile rotation procedure also writes a clean swap.state file, but it does not leave numbered versions of the old files.
If you set logfile_rotate to 0, Squid simply closes and then re-opens the logs. This allows third-party logfile management systems, such as newsyslog, to maintain the log files.
To rotate Squid's logs, simple use this command:
squid -k rotate
For example, use this cron entry to rotate the logs at midnight:
0 0 * * * /usr/local/squid/bin/squid -k rotate
How can I disable Squid's log files?
To disable access.log:
cache_access_log none
To disable store.log:
cache_store_log none
To disable cache.log:
cache_log /dev/null
|
It is a bad idea to disable the cache.log because this file contains many important status and debugging messages. However, if you really want to, you can |
|
If /dev/null is specified to any of the above log files, logfile rotate must also be set to 0 or else risk Squid rotating away /dev/null making it a plain log file |
|
Instead of disabling the log files, it is advisable to use a smaller value for logfile_rotate and properly rotating Squid's log files in your cron. That way, your log files are more controllable and self-maintained by your system |
What is the maximum size of access.log?
Squid does not impose a size limit on its log files. Some operating systems have a maximum file size limit, however. If a Squid log file exceeds the operating system's size limit, Squid receives a write error and shuts down. You should regularly rotate Squid's log files so that they do not become very large.
|
Logging is very important to Squid. In fact, it is so important that it will shut itself down if it can't write to its logfiles. This includes cases such as a full log disk, or logfiles getting too big. |
My log files get very big!
You need to rotate your log files with a cron job. For example:
0 0 * * * /usr/local/squid/bin/squid -k rotate
I want to use another tool to maintain the log files.
If you set logfile_rotate to 0, Squid simply closes and then re-opens the logs. This allows third-party logfile management systems, such as newsyslog or logrotate, to maintain the log files.
Managing log files
The preferred log file for analysis is the access.log file in native format. For long term evaluations, the log file should be obtained at regular intervals. Squid offers an easy to use API for rotating log files, in order that they may be moved (or removed) without disturbing the cache operations in progress. The procedures were described above.
Depending on the disk space allocated for log file storage, it is recommended to set up a cron job which rotates the log files every 24, 12, or 8 hour. You will need to set your logfile_rotate to a sufficiently large number. During a time of some idleness, you can safely transfer the log files to your analysis host in one burst.
Before transport, the log files can be compressed during off-peak time. On the analysis host, the log file are concatinated into one file, so one file for 24 hours is the yield. Also note that with log_icp_queries enabled, you might have around 1 GB of uncompressed log information per day and busy cache. Look into you cache manager info page to make an educated guess on the size of your log files.
The EU project DESIRE developed some some basic rules to obey when handling and processing log files:
- Respect the privacy of your clients when publishing results.
- Keep logs unavailable unless anonymized. Most countries have laws on privacy protection, and some even on how long you are legally allowed to keep certain kinds of information.
Rotate and process log files at least once a day. Even if you don't process the log files, they will grow quite large, see My log files get very big above here. If you rely on processing the log files, reserve a large enough partition solely for log files.
- Keep the size in mind when processing. It might take longer to process log files than to generate them!
- Limit yourself to the numbers you are interested in. There is data beyond your dreams available in your log file, some quite obvious, others by combination of different views. Here are some examples for figures to watch:
- The hosts using your cache.
- The elapsed time for HTTP requests - this is the latency the user sees. Usually, you will want to make a distinction for HITs and MISSes and overall times. Also, medians are preferred over averages.
- The requests handled per interval (e.g. second, minute or hour).
Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
This message means that the requested object was in "Delete Behind" mode and the user aborted the transfer. An object will go into "Delete Behind" mode if
It is larger than maximum_object_size
It is being fetched from a neighbor which has the proxy-only option set.
What does ERR_LIFETIME_EXP mean?
This means that a timeout occurred while the object was being transferred. Most likely the retrieval of this object was very slow (or it stalled before finishing) and the user aborted the request. However, depending on your settings for quick_abort, Squid may have continued to try retrieving the object. Squid imposes a maximum amount of time on all open sockets, so after some amount of time the stalled request was aborted and logged win an ERR_LIFETIME_EXP message.
Retrieving "lost" files from the cache
"I've been asked to retrieve an object which was accidentally destroyed at the source for recovery. So, how do I figure out where the things are so I can copy them out and strip off the headers?""
The following method applies only to the Squid-1.1 versions:
Use grep to find the named object (URL) in the cache.log file. The first field in this file is an integer file number.
Then, find the file fileno-to-pathname.pl from the "scripts" directory of the Squid source distribution. The usage is
perl fileno-to-pathname.pl [-c squid.conf]
file numbers are read on stdin, and pathnames are printed on stdout.
Can I use store.log to figure out if a response was cachable?
Sort of. You can use store.log to find out if a particular response was cached.
Cached responses are logged with the SWAPOUT tag. Uncached responses are logged with the RELEASE tag.
However, your analysis must also consider that when a cached response is removed from the cache (for example due to cache replacement) it is also logged in store.log with the RELEASE tag. To differentiate these two, you can look at the filenumber (3rd) field. When an uncachable response is released, the filenumber is FFFFFFFF (-1). Any other filenumber indicates a cached response was released.
Can I pump the squid access.log directly into a pipe?
Several people have asked for this, usually to feed the log into some kind of external database, or to analyze them in real-time.
The answer is No. Well, yes, sorta. Using a pipe directly opens up a whole load of possible problems.
|
Logging is very important to Squid. In fact, it is so important that it will shut itself down if it can't write to its logfiles. |
There are several alternative available which are much safer to setup and use. The basic capabilities present are :
- logging to system syslog
since Squid-2.7:
- logging to an external service via UDP packets
- logging through IPC to a custom local daemon.
See LogDaemon feature for technical details on setting up a daemon.
Contents
- How do I see system level Squid statistics?
- Managing the Cache Storage
- How can I make Squid NOT cache some servers or URLs?
- How can I purge an object from my cache?
- How can I purge multiple objects from my cache?
- How can I find the biggest objects in my cache?
- How can I add a cache directory?
- How can I delete a cache directory?
- I want to restart Squid with a clean cache
- I want to restart Squid with an empty cache
- Using ICMP to Measure the Network
- Why are so few requests logged as TCP_IMS_MISS?
- Why can't I run Squid as root?
- Can you tell me a good way to upgrade Squid with minimal downtime?
- Can Squid listen on more than one HTTP port?
- Can I make origin servers see the client's IP address when going through Squid?
How do I see system level Squid statistics?
The Squid distribution includes a CGI utility called cachemgr.cgi which can be used to view squid statistics with a web browser. See ../CacheManager for more information on its usage and installation.
Managing the Cache Storage
How can I make Squid NOT cache some servers or URLs?
From Squid-2.6, you use the cache option to specify uncachable requests and any exceptions to your cachable rules.
For example, this makes all responses from origin servers in the 10.0.1.0/24 network uncachable:
acl localnet dst 10.0.1.0/24 cache deny localnet
This example makes all URL's with '.html' uncachable:
acl HTML url_regex .html$ cache deny HTML
This example makes a specific URL uncachable:
acl XYZZY url_regex ^http://www.i.suck.com/foo.html$ cache deny XYZZY
This example caches nothing between the hours of 8AM to 11AM:
acl Morning time 08:00-11:00 cache deny Morning
How can I purge an object from my cache?
Squid does not allow you to purge objects unless it is configured with access controls in squid.conf. First you must add something like
acl PURGE method PURGE acl localhost src 127.0.0.1 http_access allow PURGE localhost http_access deny PURGE
The above only allows purge requests which come from the local host and denies all other purge requests.
To purge an object, you can use the squidclient program:
squidclient -m PURGE http://www.miscreant.com/
If the purge was successful, you will see a "200 OK" response:
HTTP/1.0 200 OK Date: Thu, 17 Jul 1997 16:03:32 GMT ...
Sometimes if the object was not found in the cache, you will see a "404 Not Found" response:
HTTP/1.0 404 Not Found Date: Thu, 17 Jul 1997 16:03:22 GMT ...
Such 404 are not failures. It simply means the object has already been purged by other means or never existed. So the final result you wanted (object no longer exists in cache) has happened.
How can I purge multiple objects from my cache?
It's not possible; you have to purge the objects one by one by URL. This is because squid doesn't keep in memory the URL of every object it stores, but only a compact representation of it (a hash). Finding the hash given the URL is easy, the other way around is not possible.
Purging by wildcard, by domain, by time period, etc. are unfortunately not possible at this time.
How can I find the biggest objects in my cache?
sort -r -n +4 -5 access.log | awk '{print $5, $7}' | head -25If your cache processes several hundred hits per second, good luck.
How can I add a cache directory?
Edit squid.conf and add a new cache_dir line.
Shutdown Squid squid -k shutdown
Initialize the new directory by running
squid -z
- Start Squid again
How can I delete a cache directory?
If you don't have any cache_dir lines in your squid.conf, then Squid was using the default. From Squid-3.1 the default has been changed to memory-only cache and does not involve cache_dir. For Squid older than 3.1 using the default you'll need to add a new cache_dir line because Squid will continue to use the default otherwise. You can add a small, temporary directory, for example:
/usr/local/squid/cachetmp ....
see above about creating a new cache directory.
do not use /tmp !! That will cause Squid to periodically encounter fatal errors.
The removal:
Edit your squid.conf file and comment out, or delete the cache_dir line for the cache directory that you want to remove.
- You can not delete a cache directory from a running Squid process; you can not simply reconfigure squid.
You must shutdown Squid:
squid -k shutdown
- Once Squid exits, you may immediately start it up again.
Since you deleted the old cache_dir from squid.conf, Squid won't try to access that directory. If you use the RunCache script, Squid should start up again automatically.
Now Squid is no longer using the cache directory that you removed from the config file. You can verify this by checking "Store Directory" information with the cache manager. From the command line, type:
squidclient mgr:storedir
I want to restart Squid with a clean cache
Squid-2.6 and later contain mechanisms which will automatically detect dirty information in both the cache directories and swap.state file. When squid starts up it runs these validation and security checks. The objects which fail for any reason are automatically purged from the cache.
The above mechanisms can be triggered manually to force squid into a full cache_dir scan and re-load all objects from disk by simply shuttign down Squid and deleting the swap.state journal from each cache_dir before restarting.
NP: Deleting the swap.state before shutting down will cause Squid to generate new ones and fail to do the re-scan you wanted.
I want to restart Squid with an empty cache
To erase the entire contents of the cache and make Squid start fresh the following commands provide the fastest recovery time:
squid -k shutdown mv /dir/cache /dir/cache.old
repeat for each cache_dir location you wish to empty.
squid -z squid rm -rf /dir/cache.old
The rm command may take some time, but since Squid is already back up and running the service downtime is reduced.
Using ICMP to Measure the Network
As of version 1.1.9, Squid is able to utilize ICMP Round-Trip-Time (RTT) measurements to select the optimal location to forward a cache miss. Previously, cache misses would be forwarded to the parent cache which returned the first ICP reply message. These were logged with FIRST_PARENT_MISS in the access.log file. Now we can select the parent which is closest (RTT-wise) to the origin server.
Supporting ICMP in your Squid cache
It is more important that your parent caches enable the ICMP features. If you are acting as a parent, then you may want to enable ICMP on your cache. Also, if your cache makes RTT measurements, it will fetch objects directly if your cache is closer than any of the parents.
If you want your Squid cache to measure RTT's to origin servers, Squid must be compiled with the USE_ICMP option. This is easily accomplished by uncommenting "-DUSE_ICMP=1" in src/Makefile and/or src/Makefile.in.
An external program called pinger is responsible for sending and receiving ICMP packets. It must run with root privileges. After Squid has been compiled, the pinger program must be installed separately. A special Makefile target will install pinger with appropriate permissions.
% make install % su # make install-pinger
There are three configuration file options for tuning the measurement database on your cache. netdb_low and netdb_high specify high and low water marks for keeping the database to a certain size (e.g. just like with the IP cache). The netdb_ttl option specifies the minimum rate for pinging a site. If netdb_ttl is set to 300 seconds (5 minutes) then an ICMP packet will not be sent to the same site more than once every five minutes. Note that a site is only pinged when an HTTP request for the site is received.
Another option, minimum_direct_hops can be used to try finding servers which are close to your cache. If the measured hop count to the origin server is less than or equal to minimum_direct_hops, the request will be forwarded directly to the origin server.
Utilizing your parents database
Your parent caches can be asked to include the RTT measurements in their ICP replies. To do this, you must enable query_icmp in your config file:
query_icmp on
This causes a flag to be set in your outgoing ICP queries.
If your parent caches return ICMP RTT measurements then the eighth column of your access.log will have lines similar to:
CLOSEST_PARENT_MISS/it.cache.nlanr.net
In this case, it means that it.cache.nlanr.net returned the lowest RTT to the origin server. If your cache measured a lower RTT than any of the parents, the request will be logged with
CLOSEST_DIRECT/www.sample.com
Inspecting the database
The measurement database can be viewed from the cachemgr by selecting "Network Probe Database." Hostnames are aggregated into /24 networks. All measurements made are averaged over time. Measurements are made to specific hosts, taken from the URLs of HTTP requests. The recv and sent fields are the number of ICMP packets sent and received. At this time they are only informational.
A typical database entry looks something like this:
Network recv/sent RTT Hops Hostnames
192.41.10.0 20/ 21 82.3 6.0 www.jisedu.org www.dozo.com
bo.cache.nlanr.net 42.0 7.0
uc.cache.nlanr.net 48.0 10.0
pb.cache.nlanr.net 55.0 10.0
it.cache.nlanr.net 185.0 13.0This means we have sent 21 pings to both www.jisedu.org and www.dozo.com. The average RTT is 82.3 milliseconds. The next four lines show the measured values from our parent caches. Since bo.cache.nlanr.net has the lowest RTT, it would be selected as the location to forward a request for a www.jisedu.org or www.dozo.com URL.
Why are so few requests logged as TCP_IMS_MISS?
When Squid receives an If-Modified-Since request, it will not forward the request unless the object needs to be refreshed according to the refresh_pattern rules. If the request does need to be refreshed, then it will be logged as TCP_REFRESH_HIT or TCP_REFRESH_MISS.
If the request is not forwarded, Squid replies to the IMS request according to the object in its cache. If the modification times are the same, then Squid returns TCP_IMS_HIT. If the modification times are different, then Squid returns TCP_IMS_MISS. In most cases, the cached object will not have changed, so the result is TCP_IMS_HIT. Squid will only return TCP_IMS_MISS if some other client causes a newer version of the object to be pulled into the cache.
Why can't I run Squid as root?
by Dave J Woolley
If someone were to discover a buffer overrun bug in Squid and it runs as a user other than root, they can only corrupt the files writeable to that user, but if it runs a root, they can take over the whole machine. This applies to all programs that don't absolutely need root status, not just squid.
Can you tell me a good way to upgrade Squid with minimal downtime?
Here is a technique that was described by Radu Greab.
Start a second Squid server on an unused HTTP port (say 4128). This instance of Squid probably doesn't need a large disk cache. When this second server has finished reloading the disk store, swap the http_port values in the two squid.conf files. Set the original Squid to use port 5128, and the second one to use 3128. Next, run "squid -k reconfigure" for both Squids. New requests will go to the second Squid, now on port 3128 and the first Squid will finish handling its current requests. After a few minutes, it should be safe to fully shut down the first Squid and upgrade it. Later you can simply repeat this process in reverse.
Can Squid listen on more than one HTTP port?
Note: The information here is current for version 2.3.
Yes, you can specify multiple http_port lines in your squid.conf file. Squid attempts to bind() to each port that you specify. Sometimes Squid may not be able to bind to a port, either because of permissions or because the port is already in use. If Squid can bind to at least one port, then it will continue running. If it can not bind to any of the ports, then Squid stops.
With version 2.3 and later you can specify IP addresses and port numbers together (see the squid.conf comments).
Can I make origin servers see the client's IP address when going through Squid?
Normally you cannot. Most TCP/IP stacks do not allow applications to create sockets with the local endpoint assigned to a foreign IP address. However, some folks have some patches to Linux that allow exactly that.
In this situation, you must ensure that all HTTP packets destined for the client IP addresses are routed to the Squid box. If the packets take another path, the real clients will send TCP resets to the origin servers, thereby breaking the connections.
Contents
- Why does Squid use so much memory!?
- How can I tell how much memory my Squid process is using?
- My Squid process grows without bounds.
- I set cache_mem to XX, but the process grows beyond that!
- How do I analyze memory usage from the cache manger output?
- The "Total memory accounted" value is less than the size of my Squid process.
- xmalloc: Unable to allocate 4096 bytes!
- fork: (12) Cannot allocate memory
- What can I do to reduce Squid's memory usage?
- Using an alternate malloc library
- How much memory do I need in my Squid server?
- Why can't my Squid process grow beyond a certain size?
Why does Squid use so much memory!?
Squid uses a lot of memory for performance reasons. It takes much, much longer to read something from disk than it does to read directly from memory.
A small amount of metadata for each cached object is kept in memory. This is the StoreEntry data structure. For Squid-2 this is 56-bytes on "small" pointer architectures (Intel, Sparc, MIPS, etc) and 88-bytes on "large" pointer architectures (Alpha). In addition, there is a 16-byte cache key (MD5 checksum) associated with each StoreEntry. This means there are 72 or 104 bytes of metadata in memory for every object in your cache. A cache with 1,000,000 objects therefore requires 72MB of memory for metadata only. In practice it requires much more than that.
Other uses of memory by Squid include:
- Disk buffers for reading and writing
- Network I/O buffers
- IP Cache contents
- FQDN Cache contents
- Netdb ICMP measurement database
- Per-request state information, including full request and reply headers
- Miscellaneous statistics collection.
- "Hot objects" which are kept entirely in memory.
How can I tell how much memory my Squid process is using?
One way is to simply look at ps output on your system. For BSD-ish systems, you probably want to use the -u option and look at the VSZ and RSS fields:
wessels ~ 236% ps -axuhm USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND squid 9631 4.6 26.4 141204 137852 ?? S 10:13PM 78:22.80 squid -NCYs
For SYSV-ish, you probably want to use the -l option. When interpreting the ps output, be sure to check your ps manual page. It may not be obvious if the reported numbers are kbytes, or pages (usually 4 kb).
A nicer way to check the memory usage is with a program called top:
last pid: 20128; load averages: 0.06, 0.12, 0.11 14:10:58 46 processes: 1 running, 45 sleeping CPU states: % user, % nice, % system, % interrupt, % idle Mem: 187M Active, 1884K Inact, 45M Wired, 268M Cache, 8351K Buf, 1296K Free Swap: 1024M Total, 256K Used, 1024M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 9631 squid 2 0 138M 135M select 78:45 3.93% 3.93% squid
Finally, you can ask the Squid process to report its own memory usage. This is available on the Cache Manager info page. Your output may vary depending upon your operating system and Squid version, but it looks similar to this:
Resource usage for squid: Maximum Resident Size: 137892 KB Memory usage for squid via mstats(): Total space in arena: 140144 KB Total free: 8153 KB 6%
If your RSS (Resident Set Size) value is much lower than your process size, then your cache performance is most likely suffering due to Paging. See also ../CacheManager
My Squid process grows without bounds.
You might just have your cache_mem parameter set too high. See What can I do to reduce Squid's memory usage? below.
When a process continually grows in size, without levelling off or slowing down, it often indicates a memory leak. A memory leak is when some chunk of memory is used, but not free'd when it is done being used.
Memory leaks are a real problem for programs (like Squid) which do all of their processing within a single process. Historically, Squid has had real memory leak problems. But as the software has matured, we believe almost all of Squid's memory leaks have been eliminated, and new ones are least easy to identify.
Memory leaks may also be present in your system's libraries, such as libc.a or even libmalloc.a. If you experience the ever-growing process size phenomenon, we suggest you first try #alternate-malloc.
I set cache_mem to XX, but the process grows beyond that!
The cache_mem parameter does NOT specify the maximum size of the process. It only specifies how much memory to use for caching "hot" (very popular) replies. Squid's actual memory usage is depends very strongly on your cache size (disk space) and your incoming request load. Reducing cache_mem will usually also reduce the process size, but not necessarily, and there are other ways to reduce Squid's memory usage (see below).
See also How much memory do I need in my Squid server?.
How do I analyze memory usage from the cache manger output?
Note: This information is specific to Squid-1.1 versions
Look at your cachemgr.cgi Cache Information page. For example:
Memory usage for squid via mallinfo():
Total space in arena: 94687 KB
Ordinary blocks: 32019 KB 210034 blks
Small blocks: 44364 KB 569500 blks
Holding blocks: 0 KB 5695 blks
Free Small blocks: 6650 KB
Free Ordinary blocks: 11652 KB
Total in use: 76384 KB 81%
Total free: 18302 KB 19%
Meta Data:
StoreEntry 246043 x 64 bytes = 15377 KB
IPCacheEntry 971 x 88 bytes = 83 KB
Hash link 2 x 24 bytes = 0 KB
URL strings = 11422 KB
Pool MemObject structures 514 x 144 bytes = 72 KB ( 70 free)
Pool for Request structur 516 x 4380 bytes = 2207 KB ( 2121 free)
Pool for in-memory object 6200 x 4096 bytes = 24800 KB ( 22888 free)
Pool for disk I/O 242 x 8192 bytes = 1936 KB ( 1888 free)
Miscellaneous = 2600 KB
total Accounted = 58499 KBFirst note that mallinfo() reports 94M in "arena." This is pretty close to what top says (97M).
Of that 94M, 81% (76M) is actually being used at the moment. The rest has been freed, or pre-allocated by malloc(3) and not yet used.
Of the 76M in use, we can account for 58.5M (76%). There are some calls to malloc(3) for which we can't account.
The Meta Data list gives the breakdown of where the accounted memory has gone. 45% has gone to StoreEntry and URL strings. Another 42% has gone to buffering hold objects in VM while they are fetched and relayed to the clients (Pool for in-memory object).
The pool sizes are specified by squid.conf parameters. In version 1.0, these pools are somewhat broken: we keep a stack of unused pages instead of freeing the block. In the Pool for in-memory object, the unused stack size is 1/2 of cache_mem. The Pool for disk I/O is hardcoded at 200. For MemObject and Request it's 1/8 of your system's FD_SETSIZE value.
If you need to lower your process size, we recommend lowering the max object sizes in the 'http', 'ftp' and 'gopher' config lines. You may also want to lower cache_mem to suit your needs. But if you make cache_mem too low, then some objects may not get saved to disk during high-load periods. Newer Squid versions allow you to set memory_pools off to disable the free memory pools.
The "Total memory accounted" value is less than the size of my Squid process.
We are not able to account for all memory that Squid uses. This would require excessive amounts of code to keep track of every last byte. We do our best to account for the major uses of memory.
Also, note that the malloc and free functions have their own overhead. Some additional memory is required to keep track of which chunks are in use, and which are free. Additionally, most operating systems do not allow processes to shrink in size. When a process gives up memory by calling free, the total process size does not shrink. So the process size really represents the maximum size your Squid process has reached.
xmalloc: Unable to allocate 4096 bytes!
Messages like "FATAL: xcalloc: Unable to allocate 4096 blocks of 1 bytes!" appear when Squid can't allocate more memory, and on most operating systems (inclusive BSD) there are only two possible reasons:
- The machine is out of swap
- The process' maximum data segment size has been reached
The first case is detected using the normal swap monitoring tools available on the platform (pstat on SunOS, perhaps pstat is used on BSD as well).
To tell if it is the second case, first rule out the first case and then monitor the size of the Squid process. If it dies at a certain size with plenty of swap left then the max data segment size is reached without no doubts.
The data segment size can be limited by two factors:
- Kernel imposed maximum, which no user can go above
- The size set with ulimit, which the user can control.
When squid starts it sets data and file ulimit's to the hard level. If you manually tune ulimit before starting Squid make sure that you set the hard limit and not only the soft limit (the default operation of ulimit is to only change the soft limit). root is allowed to raise the soft limit above the hard limit.
This command prints the hard limits:
ulimit -aH
This command sets the data size to unlimited:
ulimit -HSd unlimited
BSD/OS
by Arjan de Vet
The default kernel limit on BSD/OS for datasize is 64MB (at least on 3.0 which I'm using).
Recompile a kernel with larger datasize settings:
maxusers 128 # Support for large inpcb hash tables, e.g. busy WEB servers. options INET_SERVER # support for large routing tables, e.g. gated with full Internet routing: options "KMEMSIZE=\(16*1024*1024\)" options "DFLDSIZ=\(128*1024*1024\)" options "DFLSSIZ=\(8*1024*1024\)" options "SOMAXCONN=128" options "MAXDSIZ=\(256*1024*1024\)"
See /usr/share/doc/bsdi/config.n for more info.
In /etc/login.conf I have this:
default:\
:path=/bin /usr/bin /usr/contrib/bin:\
:datasize-cur=256M:\
:openfiles-cur=1024:\
:openfiles-max=1024:\
:maxproc-cur=1024:\
:stacksize-cur=64M:\
:radius-challenge-styles=activ,crypto,skey,snk,token:\
:tc=auth-bsdi-defaults:\
:tc=auth-ftp-bsdi-defaults:
#
# Settings used by /etc/rc and root
# This must be set properly for daemons started as root by inetd as well.
# Be sure reset these values back to system defaults in the default class!
#
daemon:\
:path=/bin /usr/bin /sbin /usr/sbin:\
:widepasswords:\
:tc=default:
# :datasize-cur=128M:\
# :openfiles-cur=256:\
# :maxproc-cur=256:\This should give enough space for a 256MB squid process.
FreeBSD (2.2.X)
by [wessels Duane Wessels]
The procedure is almost identical to that for BSD/OS above. Increase the open filedescriptor limit in /sys/conf/param.c:
int maxfiles = 4096; int maxfilesperproc = 1024;
Increase the maximum and default data segment size in your kernel config file, e.g. /sys/conf/i386/CONFIG:
options "MAXDSIZ=(512*1024*1024)" options "DFLDSIZ=(128*1024*1024)"
We also found it necessary to increase the number of mbuf clusters:
options "NMBCLUSTERS=10240"
And, if you have more than 256 MB of physical memory, you probably have to disable BOUNCE_BUFFERS (whatever that is), so comment out this line:
#options BOUNCE_BUFFERS #include support for DMA bounce buffers
Also, update limits in /etc/login.conf:
# Settings used by /etc/rc
#
daemon:\
:coredumpsize=infinity:\
:datasize=infinity:\
:maxproc=256:\
:maxproc-cur@:\
:memoryuse-cur=64M:\
:memorylocked-cur=64M:\
:openfiles=4096:\
:openfiles-cur@:\
:stacksize=64M:\
:tc=default:And don't forget to run "cap_mkdb /etc/login.conf" after editing that file.
OSF, Digital Unix
by Ong Beng Hui
To increase the data size for Digital UNIX, edit the file /etc/sysconfigtab and add the entry...
proc:
per-proc-data-size=1073741824Or, with csh, use the limit command, such as
> limit datasize 1024M
Editing /etc/sysconfigtab requires a reboot, but the limit command doesn't.
fork: (12) Cannot allocate memory
When Squid is reconfigured (SIGHUP) or the logs are rotated (SIGUSR1), some of the helper processes (dnsserver) must be killed and restarted. If your system does not have enough virtual memory, the Squid process may not be able to fork to start the new helper processes. This is due to the UNIX way of starting child processes using the fork() system call which temporary duplicates the whole Squid process, and when rapidly starting many child processes such as on "squid -k rotate" the memory usage can temporarily grow to many times the normal memory usage due to several temporary copies of the whole process.
The best way to fix this is to increase your virtual memory by adding swap space. Normally your system uses raw disk partitions for swap space, but most operating systems also support swapping on regular files (Digital Unix excepted). See your system manual pages for swap, swapon, and mkfile. Alternatively you can use the sleep_after_fork directive to make Squid sleep a little while invoking helpers to allow the helper to start up before trying to start the next one. This can be helpful if you find that Squid sometimes fail to restart all helpers on "squid -k reconfigure".
What can I do to reduce Squid's memory usage?
If your cache performance is suffering because of memory limitations, you might consider buying more memory. But if that is not an option, There are a number of things to try:
- Try a different malloc library (see below)
Reduce the cache_mem parameter in the config file. This controls how many "hot" objects are kept in memory. Reducing this parameter will not significantly affect performance, but you may recieve some warnings in cache.log if your cache is busy.
Turn the memory_pools off in the config file. This causes Squid to give up unused memory by calling free() instead of holding on to the chunk for potential, future use. Generally speaking, this is a bad idea as it will induce heap fragmentation. Use memory_pools_limit instead.
Reduce the cache_swap parameter in your config file. This will reduce the number of objects Squid keeps. Your overall hit ratio may go down a little, but your cache will perform significantly better.
Using an alternate malloc library
Many users have found improved performance and memory utilization when linking Squid with an external malloc library. We recommend either GNU malloc, or dlmalloc.
GNU malloc
To make Squid use GNU malloc follow these simple steps:
- Download the GNU malloc source, available from one of The. - Compile it
% gzip -dc malloc.tar.gz | tar xf - % cd malloc % vi Makefile # edit as needed % make
- Copy libmalloc.a to your system's library directory and be sure to name it libgnumalloc.a.
% su # cp malloc.a /usr/lib/libgnumalloc.a
- (Optional) Copy the GNU malloc.h to your system's include directory and be sure to name it gnumalloc.h. This step is not required, but if you do this, then Squid will be able to use the mstat() function to report memory usage statistics on the cachemgr info page.
# cp malloc.h /usr/include/gnumalloc.h
- - Reconfigure and recompile Squid
% make distclean % ./configure ... % make % make install
As Squid's configure script runs, watch its output. You should find that it locates libgnumalloc.a and optionally gnumalloc.h.
dlmalloc
dlmalloc has been written by Doug Lea. According to Doug:
This is not the fastest, most space-conserving, most portable, or most tunable malloc ever written. However it is among the fastest while also being among the most space-conserving, portable and tunable.
dlmalloc is included with the Squid-2 source distribution. To use this library, you simply give an option to the configure script:
% ./configure --enable-dlmalloc ...
How much memory do I need in my Squid server?
As a rule of thumb on Squid uses approximately 10 MB of RAM per GB of the total of all cache_dirs (more on 64 bit servers such as Alpha), plus your cache_mem setting and about an additional 10-20MB. It is recommended to have at least twice this amount of physical RAM available on your Squid server. For a more detailed discussion on Squid's memory usage see the sections above.
The recommended extra RAM besides what is used by Squid is used by the operating system to improve disk I/O performance and by other applications or services running on the server. This will be true even of a server which runs Squid as the only tcp service, since there is a minimum level of memory needed for process management, logging, and other OS level routines.
If you have a low memory server, and a large disk, then you will not necessarily be able to use all the disk space, since as the cache fills the memory available will be insufficient, forcing Squid to swap out memory and affecting performance. A very large cache_dir total and insufficient physical RAM + Swap could cause Squid to stop functioning completely. The solution for larger caches is to get more physical RAM; allocating more to Squid via cache_mem will not help.
Why can't my Squid process grow beyond a certain size?
by [AdrianChadd Adrian Chadd]
A number of people are running Squid with more than a gigabyte of memory. Here are some things to keep in mind.
- The Operating System may put a limit on how much memory available per-process. Check the resource limits (/etc/security/limits.conf or similar under PAM systems, 'ulimit', etc.)
- The Operating System may have a limit on the size of processes. 32-bit platforms are sometimes "split" to be 2gb process/2gb kernel; this can be changed to be 3gb process/1gb kernel through a kernel recompile or boot-time option. Check your operating system's documentation for specific details.
Some malloc implementations may not support > 2gb of memory - eg dlmalloc. Don't use dlmalloc unless your platform is very broken (and then realise you won't be able to use >2gb RAM using it.)
- Make sure the Squid has been compiled to be a 64 bit binary (with modern Unix-like OSes you can use the 'file' command for this); some platforms may have a 64 bit kernel but a 32 bit userland, or the compiler may default to a 32 bit userland.
Contents
- What is the cache manager?
- How do you set it up?
- Cache manager configuration for CERN httpd 3.0
- Cache manager configuration for Apache 1.x
- Cache manager configuration for Apache 2.x
- Cache manager configuration for Roxen 2.0 and later
- Cache manager access from squidclient
- Cache manager ACLs in squid.conf
- Why does it say I need a password and a URL?
- I want to shutdown the cache remotely. What's the password?
- How do I make the cache host default to my cache?
- What's the difference between Squid TCP connections and Squid UDP connections?
- It says the storage expiration will happen in 1970!
- What do the Meta Data entries mean?
- In the utilization section, what is Other?
- In the utilization section, why is the Transfer KB/sec column always zero?
- In the utilization section, what is the Object Count?
- In the utilization section, what is the Max/Current/Min KB?
- What is the I/O section about?
- What is the Objects section for?
- What is the VM Objects section for?
- What does AVG RTT mean?
- In the IP cache section, what's the difference between a hit, a negative hit and a miss?
- What do the IP cache contents mean anyway?
- What is the fqdncache and how is it different from the ipcache?
- What does "Page faults with physical i/o: 4897" mean?
- What does the IGNORED field mean in the 'cache server list'?
Chapter contributed by Jonathan Larmour
What is the cache manager?
The cache manager (cachemgr.cgi) is a CGI utility for displaying statistics about the squid process as it runs. The cache manager is a convenient way to manage the cache and view statistics without logging into the server.
How do you set it up?
That depends on which web server you're using. Below you will find instructions for configuring the CERN and Apache servers to permit cachemgr.cgi usage.
|
EDITOR'S NOTE: readers are encouraged to submit instructions for configuration of cachemgr.cgi on other web server platforms, such as Netscape. |
After you edit the server configuration files, you will probably need to either restart your web server or or send it a SIGHUP signal to tell it to re-read its configuration files.
When you're done configuring your web server, you'll connect to the cache manager with a web browser, using a URL such as:
http://www.example.com/Squid/cgi-bin/cachemgr.cgi
Cache manager configuration for CERN httpd 3.0
First, you should ensure that only specified workstations can access the cache manager. That is done in your CERN httpd.conf, not in squid.conf.
Protection MGR-PROT {
Mask @(workstation.example.com)
}Wildcards are acceptable, IP addresses are acceptable, and others can be added with a comma-separated list of IP addresses. There are many more ways of protection. Your server documentation has details.
You also need to add:
Protect /Squid/* MGR-PROT Exec /Squid/cgi-bin/*.cgi /usr/local/squid/bin/*.cgi
This marks the script as executable to those in MGR-PROT.
Cache manager configuration for Apache 1.x
First, make sure the cgi-bin directory you're using is listed with a ScriptAlias in your Apache httpd.conf file like this:
ScriptAlias /Squid/cgi-bin/ /usr/local/squid/cgi-bin/
It's probably a bad idea to ScriptAlias the entire //usr/local/squid/bin/ directory where all the Squid executables live.
Next, you should ensure that only specified workstations can access the cache manager. That is done in your Apache httpd.conf, not in squid.conf. At the bottom of httpd.conf file, insert:
<Location /Squid/cgi-bin/cachemgr.cgi> order allow,deny allow from workstation.example.com </Location>
You can have more than one allow line, and you can allow domains or networks.
Alternately, cachemgr.cgi can be password-protected. You'd add the following to httpd.conf:
<Location /Squid/cgi-bin/cachemgr.cgi> AuthUserFile /path/to/password/file AuthGroupFile /dev/null AuthName User/Password Required AuthType Basic require user cachemanager </Location>
Consult the Apache documentation for information on using htpasswd to set a password for this "user."
Cache manager configuration for Apache 2.x
First, make sure the cgi-bin directory you're using is listed with a ScriptAlias in your Apache config. In the Apache config there is a sub-directory /etc/apache2/conf.d for application specific settings (unrelated to any specific site). Create a file conf.d/squid containing this:
ScriptAlias /Squid/cgi-bin/cachemgr.cgi /usr/local/squid/cgi-bin/cachemgr.cgi <Location /Squid/cgi-bin/cachemgr.cgi> order allow,deny allow from workstation.example.com </Location>
SECURITY NOTE: It's possible but a bad idea to ScriptAlias the entire //usr/local/squid/bin/ directory where all the Squid executables live.
You should ensure that only specified workstations can access the cache manager. That is done in your Apache conf.d/squid <Location> settings, not in squid.conf.
You can have more than one allow line, and you can allow domains or networks.
Alternately, cachemgr.cgi can be password-protected. You'd add the following to conf.d/squid:
<Location /Squid/cgi-bin/cachemgr.cgi> AuthUserFile /path/to/password/file AuthGroupFile /dev/null AuthName User/Password Required AuthType Basic require user cachemanager </Location>
Consult the Apache 2.0 documentation for information on using htpasswd to set a password for this "user."
To further protect the cache-manager on public systems you should consider creating a whole new <VirtualHost> segment in the Apache configuration for the squid manager. This is done by creating a file in the Apache configuration sub-directory .../apache2/sites-enabled/ usually with the domain name of the new site, see the Apache 2.0 documentation for further details for your system.
Cache manager configuration for Roxen 2.0 and later
Notice: this is not how things would get best done with Roxen, but this what you need to do go adhere to the example. Also, knowledge of basic Roxen configuration is required.
This is what's required to start up a fresh Virtual Server, only serving the cache manager. If you already have some Virtual Server you wish to use to host the Cache Manager, just add a new CGI support module to it.
Create a new virtual server, and set it to host http://www.example.com/. Add to it at least the following modules:
- Content Types
- CGI scripting support
In the CGI scripting support module, section Settings, change the following settings:
- CGI-bin path: set to /Squid/cgi-bin/
Handle *.cgi: set to no
Run user scripts as owner: set to no
- Search path: set to the directory containing the cachemgr.cgi file
In section Security, set Patterns to:
allow ip=1.2.3.4
where 1.2.3.4 is the IP address for workstation.example.com
Save the configuration, and you're done.
Cache manager access from squidclient
A simple way to test the access to the cache manager is:
% ./squidclient -p 8080 mgr:info@yourcachemanagerpassword
Note, 8080 and yourcachemanagerpassword come from your exact squid.configuration See squidclient -h for more options.
Cache manager ACLs in squid.conf
The default cache manager access configuration in squid.conf is:
acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl all src 0.0.0.0/0.0.0.0
With the following rules:
http_access deny manager !localhost http_access allow all
The first ACL is the most important as the cache manager program interrogates squid using a special cache_object protocol. Try it yourself by doing:
telnet mycache.example.com 3128 GET cache_object://mycache.example.com/info HTTP/1.0
The default ACLs say that if the request is for a cache_object, and it isn't the local host, then deny access; otherwise allow access.
In fact, only allowing localhost access means that on the initial cachemgr.cgi form you can only specify the cache host as localhost. We recommend the following:
acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl example src 123.123.123.123/255.255.255.255 acl all src 0.0.0.0/0.0.0.0
Where 123.123.123.123 is the IP address of your web server. Then modify the rules like this:
http_access allow manager localhost http_access allow manager example http_access deny manager http_access allow all
If you're using miss_access, then don't forget to also add a miss_access rule for the cache manager:
miss_access allow manager
The default ACLs assume that your web server is on the same machine as squid. Remember that the connection from the cache manager program to squid originates at the web server, not the browser. So if your web server lives somewhere else, you should make sure that IP address of the web server that has cachemgr.cgi installed on it is in the example ACL above.
Always be sure to send a SIGHUP signal to squid any time you change the squid.conf file, or to run squid -k reconfigure.
Why does it say I need a password and a URL?
If you "drop" the list box, and browse it, you will see that the password is only required to shutdown the cache, and the URL is required to refresh an object (i.e., retrieve it from its original source again) Otherwise these fields can be left blank: a password is not required to obtain access to the informational aspects of cachemgr.cgi.
I want to shutdown the cache remotely. What's the password?
See the cachemgr_passwd directive in squid.conf.
How do I make the cache host default to my cache?
When you run configure use the --enable-cachemgr-hostname option:
% ./configure --enable-cachemgr-hostname=`hostname` ...
Note, if you do this after you already installed Squid before, you need to make sure cachemgr.cgi gets recompiled. For example:
% cd src % rm cachemgr.o cachemgr.cgi % make cachemgr.cgi
Then copy cachemgr.cgi to your HTTP server's cgi-bin directory.
What's the difference between Squid TCP connections and Squid UDP connections?
Browsers and caches use TCP connections to retrieve web objects from web servers or caches. UDP connections are used when another cache using you as a sibling or parent wants to find out if you have an object in your cache that it's looking for. The UDP connections are ICP queries.
It says the storage expiration will happen in 1970!
Don't worry. The default (and sensible) behavior of squid is to expire an object when it happens to overwrite it. It doesn't explicitly garbage collect (unless you tell it to in other ways).
What do the Meta Data entries mean?
- StoreEntry
- Entry describing an object in the cache.
- IPCacheEntry
- An entry in the DNS cache.
- Hash link
- Link in the cache hash table structure.
- URL strings
The strings of the URLs themselves that map to an object number in the cache, allowing access to the StoreEntry.
Basically just like the log file in your cache directory:
PoolMemObject structures
- Info about objects currently in memory, (eg, in the process of being transferred).
- Pool for Request structures
- Information about each request as it happens.
- Pool for in-memory object
- Space for object data as it is retrieved.
If squid is much smaller than this field, run for cover! Something is very wrong, and you should probably restart squid.
In the utilization section, what is Other?
Other is a default category to track objects which don't fall into one of the defined categories.
In the utilization section, why is the Transfer KB/sec column always zero?
This column contains gross estimations of data transfer rates averaged over the entire time the cache has been running. These numbers are unreliable and mostly useless.
In the utilization section, what is the Object Count?
The number of objects of that type in the cache right now.
In the utilization section, what is the Max/Current/Min KB?
These refer to the size all the objects of this type have grown to/currently are/shrunk to.
What is the I/O section about?
These are histograms on the number of bytes read from the network per read(2) call. Somewhat useful for determining maximum buffer sizes.
What is the Objects section for?
|
This will download to your browser a list of every URL in the cache and statistics about it. It can be very, very large. Sometimes it will be larger than the amount of available memory in your client! You probably don't need this information anyway. |
What is the VM Objects section for?
VM Objects are the objects which are in Virtual Memory. These are objects which are currently being retrieved and those which were kept in memory for fast access (accelerator mode).
What does AVG RTT mean?
Average Round Trip Time. This is how long on average after an ICP ping is sent that a reply is received.
In the IP cache section, what's the difference between a hit, a negative hit and a miss?
A HIT means that the document was found in the cache. A MISS, that it wasn't found in the cache. A negative hit means that it was found in the cache, but it doesn't exist.
What do the IP cache contents mean anyway?
The hostname is the name that was requested to be resolved.
For the Flags column:
C means positively cached.
N means negatively cached.
P means the request is pending being dispatched.
D means the request has been dispatched and we're waiting for an answer.
L means it is a locked entry because it represents a parent or sibling.
The TTL column represents "Time To Live" (i.e., how long the cache entry is valid). (May be negative if the entry has expired.)
The N column is the number of hostnames which the cache has translations for.
The rest of the line lists all the host names that have been associated with that IP cache entry.
What is the fqdncache and how is it different from the ipcache?
IPCache contains data for the Hostname to IP-Number mapping, and FQDNCache does it the other way round. For example:
IP Cache Contents:
Hostname Flags lstref TTL N [IP-Number]
gorn.cc.fh-lippe.de C 0 21581 1 193.16.112.73
lagrange.uni-paderborn.de C 6 21594 1 131.234.128.245
www.altavista.digital.com C 10 21299 4 204.123.2.75 ...
2/ftp.symantec.com DL 1583 -772855 0
Flags: C --> Cached
D --> Dispatched
N --> Negative Cached
L --> Locked
lstref: Time since last use
TTL: Time-To-Live until information expires
N: Count of addressesFQDN Cache Contents:
IP-Number Flags TTL N Hostname
130.149.17.15 C -45570 1 andele.cs.tu-berlin.de
194.77.122.18 C -58133 1 komet.teuto.de
206.155.117.51 N -73747 0
Flags: C --> Cached
D --> Dispatched
N --> Negative Cached
L --> Locked
TTL: Time-To-Live until information expires
N: Count of names
What does "Page faults with physical i/o: 4897" mean?
This question was asked on the squid-users mailing list, to which there were three excellent replies.
by Jonathan Larmour
You get a "page fault" when your OS tries to access something in memory which is actually swapped to disk. The term "page fault" while correct at the kernel and CPU level, is a bit deceptive to a user, as there's no actual error - this is a normal feature of operation.
Also, this doesn't necessarily mean your squid is swapping by that much. Most operating systems also implement paging for executables, so that only sections of the executable which are actually used are read from disk into memory. Also, whenever squid needs more memory, the fact that the memory was allocated will show up in the page faults.
However, if the number of faults is unusually high, and getting bigger, this could mean that squid is swapping. Another way to verify this is using a program called "vmstat" which is found on most UNIX platforms. If you run this as "vmstat 5" this will update a display every 5 seconds. This can tell you if the system as a whole is swapping a lot (see your local man page for vmstat for more information).
It is very bad for squid to swap, as every single request will be blocked until the requested data is swapped in. It is better to tweak the cache_mem and/or memory_pools setting in squid.conf, or switch to the NOVM versions of squid, than allow this to happen.
by Peter Wemm
There's two different operations at work, Paging and swapping. Paging is when individual pages are shuffled (either discarded or swapped to/from disk), while "swapping" generally means the entire process got sent to/from disk.
Needless to say, swapping a process is a pretty drastic event, and usually only reserved for when there's a memory crunch and paging out cannot free enough memory quickly enough. Also, there's some variation on how swapping is implemented in OS's. Some don't do it at all or do a hybrid of paging and swapping instead.
As you say, paging out doesn't necessarily involve disk IO, eg: text (code) pages are read-only and can simply be discarded if they are not used (and reloaded if/when needed). Data pages are also discarded if unmodified, and paged out if there's been any changes. Allocated memory (malloc) is always saved to disk since there's no executable file to recover the data from. mmap() memory is variable.. If it's backed from a file, it uses the same rules as the data segment of a file - ie: either discarded if unmodified or paged out.
There's also "demand zeroing" of pages as well that cause faults.. If you malloc memory and it calls brk()/sbrk() to allocate new pages, the chances are that you are allocated demand zero pages. Ie: the pages are not "really" attached to your process yet, but when you access them for the first time, the page fault causes the page to be connected to the process address space and zeroed - this saves unnecessary zeroing of pages that are allocated but never used.
The "page faults with physical IO" comes from the OS via getrusage(). It's highly OS dependent on what it means. Generally, it means that the process accessed a page that was not present in memory (for whatever reason) and there was disk access to fetch it. Many OS's load executables by demand paging as well, so the act of starting squid implicitly causes page faults with disk IO - however, many (but not all) OS's use "read ahead" and "prefault" heuristics to streamline the loading. Some OS's maintain "intent queues" so that pages can be selected as pageout candidates ahead of time. When (say) squid touches a freshly allocated demand zero page and one is needed, the OS can page out one of the candidates on the spot, causing a 'fault with physical IO' with demand zeroing of allocated memory which doesn't happen on many other OS's. (The other OS's generally put the process to sleep while the pageout daemon finds a page for it).
The meaning of "swapping" varies. On FreeBSD for example, swapping out is implemented as unlocking upages, kernel stack, PTD etc for aggressive pageout with the process. The only thing left of the process in memory is the 'struct proc'. The FreeBSD paging system is highly adaptive and can resort to paging in a way that is equivalent to the traditional swapping style operation (ie: entire process). FreeBSD also tries stealing pages from active processes in order to make space for disk cache. I suspect this is why setting 'memory_pools off' on the non-NOVM squids on FreeBSD is reported to work better - the VM/buffer system could be competing with squid to cache the same pages. It's a pity that squid cannot use mmap() to do file IO on the 4K chunks in it's memory pool (I can see that this is not a simple thing to do though, but that won't stop me wishing. :-).
by John Line
The comments so far have been about what paging/swapping figures mean in a "traditional" context, but it's worth bearing in mind that on some systems (Sun's Solaris 2, at least), the virtual memory and filesystem handling are unified and what a user process sees as reading or writing a file, the system simply sees as paging something in from disk or a page being updated so it needs to be paged out. (I suppose you could view it as similar to the operating system memory-mapping the files behind-the-scenes.)
The effect of this is that on Solaris 2, paging figures will also include file I/O. Or rather, the figures from vmstat certainly appear to include file I/O, and I presume (but can't quickly test) that figures such as those quoted by Squid will also include file I/O.
To confirm the above (which represents an impression from what I've read and observed, rather than 100% certain facts...), using an otherwise idle Sun Ultra 1 system system I just tried using cat (small, shouldn't need to page) to copy (a) one file to another, (b) a file to /dev/null, (c) /dev/zero to a file, and (d) /dev/zero to /dev/null (interrupting the last two with control-C after a while!), while watching with vmstat. 300-600 page-ins or page-outs per second when reading or writing a file (rather than a device), essentially zero in other cases (and when not cat-ing).
So ... beware assuming that all systems are similar and that paging figures represent *only* program code and data being shuffled to/from disk - they may also include the work in reading/writing all those files you were accessing...
Ok, so what is unusually high?
You'll probably want to compare the number of page faults to the number of HTTP requests. If this ratio is close to, or exceeding 1, then Squid is paging too much.
What does the IGNORED field mean in the 'cache server list'?
This refers to ICP replies which Squid ignored, for one of these reasons:
- The URL in the reply could not be found in the cache at all.
- The URL in the reply was already being fetched. Probably this ICP reply arrived too late.
The URL in the reply did not have a MemObject associated with it. Either the request is already finished, or the user aborted before the ICP arrived.
The reply came from a multicast-responder, but the cache_peer_access configuration does not allow us to forward this request to that neighbor.
- Source-Echo replies from known neighbors are ignored.
- ICP_OP_DENIED replies are ignored after the first 100.
Contents
- ACL elements
- Access Lists
- How do I allow my clients to use the cache?
- how do I configure Squid not to cache a specific server?
- How do I implement an ACL ban list?
- How do I block specific users or groups from accessing my cache?
- Do you have a CGI program which lets users change their own proxy passwords?
- Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
- Common Mistakes
- I set up my access controls, but they don't work! why?
- Proxy-authentication and neighbor caches
- Is there an easy way of banning all Destination addresses except one?
- How can I block access to porn sites?
- Does anyone have a ban list of porn sites and such?
- Squid doesn't match my subdomains
- Why does Squid deny some port numbers?
- Does Squid support the use of a database such as mySQL for storing the ACL list?
- How can I allow a single address to access a specific URL?
- How can I allow some clients to use the cache at specific times?
- How can I allow some users to use the cache at specific times?
- Problems with IP ACL's that have complicated netmasks
- Can I set up ACL's based on MAC address rather than IP?
- Can I limit the number of connections from a client?
- I'm trying to deny ''foo.com'', but it's not working.
- I want to customize, or make my own error messages.
- I want to use local time zone in error messages.
- I want to put ACL parameters in an external file.
- I want to authorize users depending on their MS Windows group memberships
- Maximum length of an acl name
- Fast and Slow ACLs
Squid's access control scheme is relatively comprehensive and difficult for some people to understand. There are two different components: ACL elements, and access lists. An access list consists of an allow or deny action followed by a number of ACL elements.
ACL elements
|
The information here is current for version 3.1 see http://www.squid-cache.org/Doc/config/acl/ for current configuration guide. |
Squid knows about the following types of ACL elements:
***** ACL TYPES AVAILABLE *****
src: source (client) IP addresses
dst: destination (server) IP addresses
myip: the local IP address of a client's connection
arp: Ethernet (MAC) address matching
srcdomain: source (client) domain name
dstdomain: destination (server) domain name
srcdom_regex: source (client) regular expression pattern matching
dstdom_regex: destination (server) regular expression pattern matching
src_as: source (client) Autonomous System number
dst_as: destination (server) Autonomous System number
peername: name tag assigned to the cache_peer where request is expected to be sent.
time: time of day, and day of week
url_regex: URL regular expression pattern matching
urlpath_regex: URL-path regular expression pattern matching, leaves out the protocol and hostname
port: destination (server) port number
myport: local port number that client connected to
myportname: name tag assigned to the squid listening port that client connected to
proto: transfer protocol (http, ftp, etc)
method: HTTP request method (get, post, etc)
http_status: HTTP response status (200 302 404 etc.)
browser: regular expression pattern matching on the request user-agent header
referer_regex: regular expression pattern matching on the request http-referer header
ident: string matching on the user's name
ident_regex: regular expression pattern matching on the user's name
proxy_auth: user authentication via external processes
proxy_auth_regex: user authentication via external processes
snmp_community: SNMP community string matching
maxconn: a limit on the maximum number of connections from a single client IP address
max_user_ip: a limit on the maximum number of IP addresses one user can login from
req_mime_type: regular expression pattern matching on the request content-type header
req_header: regular expression pattern matching on a request header content
rep_mime_type: regular expression pattern matching on the reply (downloaded content) content-type header. This is only usable in the http_reply_access directive, not http_access.
rep_header: regular expression pattern matching on a reply header content. This is only usable in the http_reply_access directive, not http_access.
external: lookup via external acl helper defined by external_acl_type
user_cert: match against attributes in a user SSL certificate
ca_cert: match against attributes a users issuing CA SSL certificate
ext_user: match on user= field returned by external acl helper defined by external_acl_type
ext_user: regular expression pattern matching on user= field returned by external acl helper defined by external_acl_type
Notes:
Not all of the ACL elements can be used with all types of access lists (described below). For example, snmp_community is only meaningful when used with snmp_access. The src_as and dst_as types are only used in cache_peer_access access lists.
The arp ACL requires the special configure option --enable-arp-acl. Furthermore, the ARP ACL code is not portable to all operating systems. It works on Linux, Solaris, and some *BSD variants.
The SNMP ACL element and access list require the --enable-snmp configure option.
Some ACL elements can cause processing delays. For example, use of src_domain and srcdom_regex require a reverse DNS lookup on the client's IP address. This lookup adds some delay to the request.
Each ACL element is assigned a unique name. A named ACL element consists of a list of values. When checking for a match, the multiple values use OR logic. In other words, an ACL element is matched when any one of its values is a match.
You can't give the same name to two different types of ACL elements. It will generate a syntax error.
You can put different values for the same ACL name on different lines. Squid combines them into one list.
Access Lists
There are a number of different access lists:
http_access: Allows HTTP clients (browsers) to access the HTTP port. This is the primary access control list.
http_reply_access: Allows HTTP clients (browsers) to receive the reply to their request. This further restricts permissions given by http_access, and is primarily intended to be used together with the rep_mime_type acl type for blocking different content types.
icp_access: Allows neighbor caches to query your cache with ICP.
miss_access: Allows certain clients to forward cache misses through your cache. This further restricts permissions given by http_access, and is primarily intended to be used for enforcing sibling relations by denying siblings from forwarding cache misses through your cache.
cache: Defines responses that should not be cached.
redirector_access: Controls which requests are sent through the redirector pool.
ident_lookup_access: Controls which requests need an Ident lookup.
always_direct: Controls which requests should always be forwarded directly to origin servers.
never_direct: Controls which requests should never be forwarded directly to origin servers.
snmp_access: Controls SNMP client access to the cache.
broken_posts: Defines requests for which squid appends an extra CRLF after POST message bodies as required by some broken origin servers.
cache_peer_access: Controls which requests can be forwarded to a given neighbor (peer).
Notes:
An access list rule consists of an allow or deny keyword, followed by a list of ACL element names.
An access list consists of one or more access list rules.
Access list rules are checked in the order they are written. List searching terminates as soon as one of the rules is a match.
If a rule has multiple ACL elements, it uses AND logic. In other words, all ACL elements of the rule must be a match in order for the rule to be a match. This means that it is possible to write a rule that can never be matched. For example, a port number can never be equal to both 80 AND 8000 at the same time.
To summarise the acl logics can be described as:
http_access allow|deny acl AND acl AND ...
OR
http_access allow|deny acl AND acl AND ...
OR
...If none of the rules are matched, then the default action is the opposite of the last rule in the list. Its a good idea to be explicit with the default action. The best way is to use the all ACL. For example:
http_access deny all
How do I allow my clients to use the cache?
Define an ACL that corresponds to your client's IP addresses. For example:
acl myclients src 172.16.5.0/24
Next, allow those clients in the http_access list:
http_access allow myclients
how do I configure Squid not to cache a specific server?
acl someserver dstdomain .someserver.com cache deny someserver
How do I implement an ACL ban list?
As an example, we will assume that you would like to prevent users from accessing cooking recipes.
One way to implement this would be to deny access to any URLs that contain the words "cooking" or "recipe." You would use these configuration lines:
acl Cooking1 url_regex cooking acl Recipe1 url_regex recipe acl myclients src 172.16.5.0/24 http_access deny Cooking1 http_access deny Recipe1 http_access allow myclients http_access deny all
The url_regex means to search the entire URL for the regular expression you specify. Note that these regular expressions are case-sensitive, so a url containing "Cooking" would not be denied.
Another way is to deny access to specific servers which are known to hold recipes. For example:
acl Cooking2 dstdomain www.gourmet-chef.com http_access deny Cooking2 http_access allow all
The dstdomain means to search the hostname in the URL for the string "www.gourmet-chef.com." Note that when IP addresses are used in URLs (instead of domain names), Squid-1.1 implements relaxed access controls. If the a domain name for the IP address has been saved in Squid's "FQDN cache," then Squid can compare the destination domain against the access controls. However, if the domain is not immediately available, Squid allows the request and makes a lookup for the IP address so that it may be available for future reqeusts.
How do I block specific users or groups from accessing my cache?
Using Ident
You can use ident lookups to allow specific users access to your cache. This requires that an ident server process runs on the user's machine(s). In your squid.conf configuration file you would write something like this:
ident_lookup_access allow all acl friends ident kim lisa frank joe http_access allow friends http_access deny all
Using Proxy Authentication
Another option is to use proxy-authentication. In this scheme, you assign usernames and passwords to individuals. When they first use the proxy they are asked to authenticate themselves by entering their username and password.
In Squid v2 this authentication is handled via external processes. For information on how to configure this, please see ../ProxyAuthentication.
Do you have a CGI program which lets users change their own proxy passwords?
Pedro L Orso has adapted the Apache's htpasswd into a CGI program called [/htpasswd/chpasswd-cgi.tar.gz chpasswd.cgi].
Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
You can use the ident_access directive to control for which hosts Squid will issue ident lookup requests.
Additionally, if you use a ident ACL in squid conf, then Squid will make sure an ident lookup is performed while evaluating the acl even if ident_access does not indicate ident lookups should be performed.
However, Squid does not wait for the lookup to complete unless the ACL rules require it. Consider this configuration:
acl host1 src 10.0.0.1 acl host2 src 10.0.0.2 acl pals ident kim lisa frank joe http_access allow host1 http_access allow host2 pals
Requests coming from 10.0.0.1 will be allowed immediately because there are no user requirements for that host. However, requests from 10.0.0.2 will be allowed only after the ident lookup completes, and if the username is in the set kim, lisa, frank, or joe.
Common Mistakes
And/Or logic
You've probably noticed (and been frustrated by) the fact that you cannot combine access controls with terms like "and" or "or." These operations are already built in to the access control scheme in a fundamental way which you must understand.
All elements of an acl entry are OR'ed together.
All elements of an access entry are AND'ed together (e.g. http_access and icp_access)
For example, the following access control configuration will never work:
acl ME src 10.0.0.1 acl YOU src 10.0.0.2 http_access allow ME YOU
In order for the request to be allowed, it must match the "ME" acl AND the "YOU" acl. This is impossible because any IP address could only match one or the other. This should instead be rewritten as:
acl ME src 10.0.0.1 acl YOU src 10.0.0.2 http_access allow ME http_access allow YOU
Or, alternatively, this would also work:
acl US src 10.0.0.1 10.0.0.2 http_access allow US
allow/deny mixups
I have read through my squid.conf numerous times, spoken to my neighbors, read the FAQ and Squid Docs and cannot for the life of me work out why the following will not work.
I can successfully access cachemgr.cgi from our web server machine here, but I would like to use MRTG to monitor various aspects of our proxy. When I try to use 'squidclient' or GET cache_object from the machine the proxy is running on, I always get access denied.
acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl server src 1.2.3.4/255.255.255.255 acl ourhosts src 1.2.0.0/255.255.0.0 http_access deny manager !localhost !server http_access allow ourhosts http_access deny all
The intent here is to allow cache manager requests from the localhost and server addresses, and deny all others. This policy has been expressed here:
http_access deny manager !localhost !server
The problem here is that for allowable requests, this access rule is not matched. For example, if the source IP address is localhost, then "!localhost" is false and the access rule is not matched, so Squid continues checking the other rules. Cache manager requests from the server address work because server is a subset of ourhosts and the second access rule will match and allow the request. Also note that this means any cache manager request from ourhosts would be allowed.
To implement the desired policy correctly, the access rules should be rewritten as
http_access allow manager localhost http_access allow manager server http_access deny manager http_access allow ourhosts http_access deny all
If you're using miss_access, then don't forget to also add a miss_access rule for the cache manager:
miss_access allow manager
You may be concerned that the having five access rules instead of three may have an impact on the cache performance. In our experience this is not the case. Squid is able to handle a moderate amount of access control checking without degrading overall performance. You may like to verify that for yourself, however.
Differences between ''src'' and ''srcdomain'' ACL types
For the srcdomain ACL type, Squid does a reverse lookup of the client's IP address and checks the result with the domains given on the acl line. With the src ACL type, Squid converts hostnames to IP addresses at startup and then only compares the client's IP address. The src ACL is preferred over srcdomain because it does not require address-to-name lookups for each request.
I set up my access controls, but they don't work! why?
If ACLs are giving you problems and you don't know why they aren't working, you can use this tip to debug them.
In squid.conf enable debugging for section 33 at level 2. For example:
debug_options ALL,1 33,2
Then restart or reconfigure squid.
From now on, your cache.log should contain a line for every request that explains if it was allowed, or denied, and which ACL was the last one that it matched.
If this does not give you sufficient information to nail down the problem you can also enable detailed debug information on ACL processing
debug_options ALL,1 33,2 28,9
Then restart or reconfigure squid as above.
From now on, your cache.log should contain detailed traces of all access list processing. Be warned that this can be quite some lines per request.
See also ../TroubleShooting.
Proxy-authentication and neighbor caches
The problem
[ Parents ]
/ \
/ \
[ Proxy A ] --- [ Proxy B ]
|
|
USERProxy A sends and ICP query to Proxy B about an object, Proxy B replies with an ICP_HIT. Proxy A forwards the HTTP request to Proxy B, but does not pass on the authentication details, therefore the HTTP GET from Proxy A fails.
Only ONE proxy cache in a chain is allowed to "use" the Proxy-Authentication request header. Once the header is used, it must not be passed on to other proxies.
Therefore, you must allow the neighbor caches to request from each other without proxy authentication. This is simply accomplished by listing the neighbor ACL's first in the list of http_access lines. For example:
acl proxy-A src 10.0.0.1 acl proxy-B src 10.0.0.2 acl user_passwords proxy_auth /tmp/user_passwds http_access allow proxy-A http_access allow proxy-B http_access allow user_passwords http_access deny all
Squid 2.5 allows two exceptions to this rule, by defining the appropriate cache_peer options:
cache_peer parent.foo.com parent login=PASS
This will forward the user's credentials as-is to the parent proxy which will be thus able to authenticate again.
|
This will only work with the Basic authentication scheme. If any other scheme is enabled, it will fail |
cache_peer parent.foo.com parent login=*:somepassword
This will perform Basic authentication against the parent, sending the username of the current client connection and as password always somepassword. The parent will need to authorization against the child cache's IP address, as if there was no authentication forwarding, and it will need to perform client authentication for all usernames against somepassword via a specially-designed authentication helper. The purpose is to log the client cache's usernames into the parent's access.log. You can find an example semi-tested helper of that kind as parent_auth.pl .
Is there an easy way of banning all Destination addresses except one?
acl GOOD dst 10.0.0.1 http_access allow GOOD http_access deny all
How can I block access to porn sites?
Often, the hardest part about using Squid to deny pornography is coming up with the list of sites that should be blocked. You may want to maintain such a list yourself, or get one from somewhere else (see below).
The ACL syntax for using such a list depends on its contents. If the list contains regular expressions, use this:
acl PornSites url_regex "/usr/local/squid/etc/pornlist" http_access deny PornSites
On the other hand, if the list contains origin server hostnames, simply change url_regex to dstdomain in this example.
Does anyone have a ban list of porn sites and such?
The SquidGuard redirector folks have links to some lists.
Bill Stearns maintains the sa-blacklist of known spammers. By blocking the spammer web sites in squid, users can no longer use up bandwidth downloading spam images and html. Even more importantly, they can no longer send out requests for things like scripts and gifs that have a unique identifer attached, showing that they opened the email and making their addresses more valuable to the spammer.
The SleezeBall site has a list of patterns that you can download.
Squid doesn't match my subdomains
If you are using Squid-2.4 or later then keep in mind that dstdomain acls uses different syntax for exact host matches and entire domain matches. www.example.com matches the exact host www.example.com, while .example.com matches the entire domain example.com (including example.com alone)
There is also subtle issues if your dstdomain ACLs contains matches for both an exact host in a domain and the whole domain where both are in the same domain (i.e. both www.example.com and .example.com). Depending on how your data is ordered this may cause only the most specific of these (e.g. www.example.com) to be used.
|
Current Squid versions (as of Squid-2.4) will warn you when this kind of configuration is used. If your Squid does not warn you while reading the configuration file you do not have the problem described below. Also the configuration here uses the dstdomain syntax of Squid-2.1 or earlier.. (2.2 and later needs to have domains prefixed by a dot) |
There is a subtle problem with domain-name based access controls when a single ACL element has an entry that is a subdomain of another entry. For example, consider this list:
acl FOO dstdomain boulder.co.us vail.co.us co.us
In the first place, the above list is simply wrong because the first two (boulder.co.us and vail.co.us) are unnecessary. Any domain name that matches one of the first two will also match the last one (co.us). Ok, but why does this happen?
The problem stems from the data structure used to index domain names in an access control list. Squid uses Splay trees for lists of domain names. As other tree-based data structures, the searching algorithm requires a comparison function that returns -1, 0, or +1 for any pair of keys (domain names). This is similar to the way that strcmp() works.
The problem is that it is wrong to say that co.us is greater-than, equal-to, or less-than boulder.co.us.
For example, if you said that co.us is LESS than fff.co.us, then the Splay tree searching algorithm might never discover co.us as a match for kkk.co.us.
similarly, if you said that co.us is GREATER than fff.co.us, then the Splay tree searching algorithm might never discover co.us as a match for bbb.co.us.
The bottom line is that you can't have one entry that is a subdomain of another. Squid-2.2 will warn you if it detects this condition.
Why does Squid deny some port numbers?
It is dangerous to allow Squid to connect to certain port numbers. For example, it has been demonstrated that someone can use Squid as an SMTP (email) relay. As I'm sure you know, SMTP relays are one of the ways that spammers are able to flood our mailboxes. To prevent mail relaying, Squid denies requests when the URL port number is 25. Other ports should be blocked as well, as a precaution.
There are two ways to filter by port number: either allow specific ports, or deny specific ports. By default, Squid does the first. This is the ACL entry that comes in the default squid.conf:
acl Safe_ports port 80 21 443 563 70 210 1025-65535 http_access deny !Safe_ports
The above configuration denies requests when the URL port number is not in the list. The list allows connections to the standard ports for HTTP, FTP, Gopher, SSL, WAIS, and all non-priveleged ports.
Another approach is to deny dangerous ports. The dangerous port list should look something like:
acl Dangerous_ports 7 9 19 22 23 25 53 109 110 119 http_access deny Dangerous_ports
...and probably many others.
Please consult the /etc/services file on your system for a list of known ports and protocols.
Does Squid support the use of a database such as mySQL for storing the ACL list?
Yes, Squid supports acl interaction with external data sources via the external_acl_type directive. Helpers for LDAP and NT Domain group membership is included in the distribution and it's very easy to write additional helpers to fit your environment.
How can I allow a single address to access a specific URL?
This example allows only the special_client to access the special_url. Any other client that tries to access the special_url is denied.
acl special_client src 10.1.2.3 acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$ http_access allow special_client special_url http_access deny special_url
How can I allow some clients to use the cache at specific times?
Let's say you have two workstations that should only be allowed access to the Internet during working hours (8:30 - 17:30). You can use something like this:
acl FOO src 10.1.2.3 10.1.2.4 acl WORKING time MTWHF 08:30-17:30 http_access allow FOO WORKING http_access deny FOO
How can I allow some users to use the cache at specific times?
acl USER1 proxy_auth Dick acl USER2 proxy_auth Jane acl DAY time 06:00-18:00 http_access allow USER1 DAY http_access deny USER1 http_access allow USER2 !DAY http_access deny USER2
Problems with IP ACL's that have complicated netmasks
The following ACL entry gives inconsistent or unexpected results:
acl restricted src 10.0.0.128/255.0.0.128 10.85.0.0/16
The reason is that IP access lists are stored in "splay" tree data structures. These trees require the keys to be sortable. When you use a complicated, or non-standard, netmask (255.0.0.128), it confuses the function that compares two address/mask pairs.
The best way to fix this problem is to use separate ACL names for each ACL value. For example, change the above to:
acl restricted1 src 10.0.0.128/255.0.0.128 acl restricted2 src 10.85.0.0/16
Then, of course, you'll have to rewrite your http_access lines as well.
Can I set up ACL's based on MAC address rather than IP?
Yes, for some operating systes. Squid calls these "ARP ACLs" and they are supported on Linux, Solaris, and probably BSD variants.
|
MAC address is only available for clients that are on the same subnet. If the client is on a different subnet, then Squid can not find out its MAC address as the MAC is replaced by the router MAC when a packet is router. |
To use ARP (MAC) access controls, you first need to compile in the optional code. Do this with the --enable-arp-acl configure option:
% ./configure --enable-arp-acl ... % make clean % make
If src/acl.c doesn't compile, then ARP ACLs are probably not supported on your system.
If everything compiles, then you can add some ARP ACL lines to your squid.conf:
acl M1 arp 01:02:03:04:05:06 acl M2 arp 11:12:13:14:15:16 http_access allow M1 http_access allow M2 http_access deny all
Can I limit the number of connections from a client?
Yes, use the maxconn ACL type in conjunction with http_access deny. For example:
acl losers src 1.2.3.0/24 acl 5CONN maxconn 5 http_access deny 5CONN losers
Given the above configuration, when a client whose source IP address is in the 1.2.3.0/24 subnet tries to establish 6 or more connections at once, Squid returns an error page. Unless you use the deny_info feature, the error message will just say "access denied."
The maxconn ACL requires the client_db feature. If you've disabled client_db (for example with client_db off) then maxconn ALCs will not work.
Note, the maxconn ACL type is kind of tricky because it uses less-than comparison. The ACL is a match when the number of established connections is greater than the value you specify. Because of that, you don't want to use the maxconn ACL with http_access allow.
Also note that you could use maxconn in conjunction with a user type (ident, proxy_auth), rather than an IP address type.
I'm trying to deny ''foo.com'', but it's not working.
In Squid-2.3 we changed the way that Squid matches subdomains. There is a difference between .foo.com and foo.com. The first matches any domain in foo.com, while the latter matches only "foo.com" exactly. So if you want to deny bar.foo.com, you should write
acl yuck dstdomain .foo.com http_access deny yuck
I want to customize, or make my own error messages.
You can customize the existing error messages as described in Customizable Error Messages in ../MiscFeatures. You can also create new error messages and use these in conjunction with the deny_info option.
For example, lets say you want your users to see a special message when they request something that matches your pornography list. First, create a file named ERR_NO_PORNO in the /usr/local/squid/etc/errors directory. That file might contain something like this:
Our company policy is to deny requests to known porno sites. If you feel you've received this message in error, please contact the support staff (support@this.company.com, 555-1234).
Next, set up your access controls as follows:
acl porn url_regex "/usr/local/squid/etc/porno.txt" deny_info ERR_NO_PORNO porn http_access deny porn (additional http_access lines ...)
I want to use local time zone in error messages.
Squid, by default, uses GMT as timestamp in all generated error messages. This to allow the cache to participate in a hierarchy of caches in different timezones without risking confusion about what the time is.
To change the timestamp in Squid generated error messages you must change the Squid signature. See Customizable Error Messages in MiscFeatures. The signature by defaults uses %T as timestamp, but if you like then you can use %t instead for a timestamp using local time zone.
I want to put ACL parameters in an external file.
by Adam Aube
Squid can read ACL parameters from an external file. To do this, first place the acl parameters, one per line, in a file. Then, on the ACL line in squid.conf, put the full path to the file in double quotes.
For example, instead of:
acl trusted_users proxy_auth john jane jim
you would have:
acl trusted_users proxy_auth "/usr/local/squid/etc/trusted_users.txt"
Inside trusted_users.txt, there is:
john jane jim
I want to authorize users depending on their MS Windows group memberships
There is an excellent resource over at http://workaround.org/squid-ldap on how to use LDAP-based group membership checking.
Also the LDAP or Active Directory config example]] here in the squid wiki might prove useful.
Maximum length of an acl name
By default the maximum length of an ACL name is 32-1 = 31 characters, but it can be changed by editing the source: in defines.h
#define ACL_NAME_SZ 32
Fast and Slow ACLs
Some ACL types require information which may not be already available to Squid. Checking them requires suspending work on the current request, querying some external source, and resuming work when the needed information becomes available. This is for example the case for DNS, authenticators or external authorization scripts. ACLs can thus be divided in FAST ACLs, which do not require going to external sources to be fulfilled, and SLOW ACLs, which do.
Fast ACLs include (as of squid 3.1.0.7):
- all (built-in)
- src
- myip
- arp
- src_as
- peername
- time
- url_regex
- urlpath_regex
- port
- myport
- myportname
- proto
- method
- http_status {R}
- browser
- referer_regex
- snmp_community
- maxconn
- max_user_ip
- req_mime_type
- req_header
- rep_mime_type {R}
- user_cert
- ca_cert
Slow ACLs include:
- dst
- dst_as
- srcdomain
- dstdomain
- srcdom_regex
- dstdom_regex
- ident
- ident_regex
- proxy_auth
- proxy_auth_regex
- external
- ext_user
- ext_user_regex
This list may be incomplete or out-of-date. See your squid.conf.documented file for details. ACL types marked with {R} are reply ACLs, see the dedicated FAQ chapter.
Squid caches the results of ACL lookups whenever possible, thus slow ACLs will not always need to go to the external data-source.
Knowing the behaviour of an ACL type is relevant because not all ACL matching directives support all kinds of ACLs. Some check-points will not suspend the request: they allow (or deny) immediately. If a SLOW acl has to be checked, and the results of the check are not cached, the corresponding ACL result will be as if it didn't match. In other words, such ACL types are in general not reliable in all access check clauses.
The following are SLOW access clauses:
- http_access
- http_access2
- http_reply_access
- url_rewrite_access
- storeurl_access
- location_rewrite_access
- always_direct
- never_direct
These are instead FAST access clauses:
- icp_access
- htcp_access
- htcp_clr_access
- miss_access
- ident_lookup_access
- reply_body_max_size {R}
- authenticate_ip_shortcircuit_access
- log_access
- header_access
- delay_access
- snmp_access
- cache_peer_access
Thus the safest course of action is to only use fast ACLs in fast access clauses, and any kind of ACL in slow access clauses.
A possible workaround which can mitigate the effect of this characteristic consists in exploiting caching, by setting some "useless" ACL checks in slow clauses, so that subsequent fast clauses may have a cached result to evaluate against.
Contents
- Starting Point
- Why am I getting "Proxy Access Denied?"
- Connection Refused when reaching a sibling
- Running out of filedescriptors
- What are these strange lines about removing objects?
- Can I change a Windows NT FTP server to list directories in Unix format?
- Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
- DNS lookups for domain names with underscores (_) always fail.
- Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
- Why am I getting access denied from a sibling cache?
- Cannot bind socket FD NN to *:8080 (125) Address already in use
- icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
- icpDetectClientClose: FD 135, 255 unexpected bytes
- Does Squid work with NTLM Authentication?
- "Hotmail" complains about: Intrusion Logged. Access denied.
- My Squid becomes very slow after it has been running for some time.
- WARNING: Failed to start 'dnsserver'
- Sending bug reports to the Squid team
- Debugging Squid
- FATAL: ipcache_init: DNS name lookup tests failed
- FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
- FATAL: Cannot open HTTP Port
- FATAL: All redirectors have exited!
- FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
- pingerOpen: icmp_sock: (13) Permission denied
- What is a forwarding loop?
- accept failure: (71) Protocol error
- storeSwapInFileOpened: ... Size mismatch
- Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''
- Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
- I get a lot of "URI has whitespace" error messages in my cache log, what should I do?
- commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
- What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?
- What does ''Connection refused'' mean?
- squid: ERROR: no running copy
- FATAL: getgrnam failed to find groupid for effective group 'nogroup'
- Squid uses 100% CPU
- Webmin's ''cachemgr.cgi'' crashes the operating system
- Segment Violation at startup or upon first request
- urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
- Requests for international domain names do not work
- Why do I sometimes get "Zero Sized Reply"?
- Why do I get "The request or reply is too large" errors?
- Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit
- Squid problems with Windows Update v5
Starting Point
If your Squid version is older than 2.6 is is very outdated. Many of the issues experienced in those versions are now fixed in 2.6 and later.
Your first point of troubleshooting should be to test with a newer supported release and resolve any remaining issues with that install.
Current releases can be retrieved from http://www.squid-cache.org/Versions or your operating system distributor.
RHEL users will need to use an unofficial package or build their own. Due to RedHat update policies.
Why am I getting "Proxy Access Denied?"
You may need to set up the http_access option to allow requests from your IP addresses. Please see ../SquidAcl for information about that.
Alternately, you may have misconfigured one of your ACLs. Check the access.log and squid.conf files for clues.
Connection Refused when reaching a sibling
I get Connection Refused when the cache tries to retrieve an object located on a sibling, even though the sibling thinks it delivered the object to my cache.
If the HTTP port number is wrong but the ICP port is correct you will send ICP queries correctly and the ICP replies will fool your cache into thinking the configuration is correct but large objects will fail since you don't have the correct HTTP port for the sibling in your squid.conf file. If your sibling changed their http_port, you could have this problem for some time before noticing.
Running out of filedescriptors
If you see the Too many open files error message, you are most likely running out of file descriptors. This may be due to running Squid on an operating system with a low filedescriptor limit. This limit is often configurable in the kernel or with other system tuning tools. There are two ways to run out of file descriptors: first, you can hit the per-process limit on file descriptors. Second, you can hit the system limit on total file descriptors for all processes.
|
Squid 2.0-2.6 provide a ./configure option --with-maxfd=N |
|
Squid 2.7+ provide a squid.conf option max_filedescriptors |
|
Squid 3.x provide a ./configure option --with-filedescriptors=N |
Linux
This information is outdated, and may no longer be relevant.
Linux kernel 2.2.12 and later supports "unlimited" number of open files without patching. So does most of glibc-2.1.1 and later (all areas touched by Squid is safe from what I can tell, even more so in later glibc releases). But you still need to take some actions as the kernel defaults to only allow processes to use up to 1024 filedescriptors, and Squid picks up the limit at build time.
Before configuring Squid run "ulimit -HSn ####" (where #### is the number of filedescriptors you need to support). Be sure to run "make clean" before configure if you have already run configure as the script might otherwise have cached the prior result.
- Configure, build and install Squid as usual
Make sure your script for starting Squid contains the above ulimit command to raise the filedescriptor limit. You may also need to allow a larger port span for outgoing connections (set in /proc/sys/net/ipv4/, like in "echo 1024 32768 > /proc/sys/net/ipv4/ip_local_port_range")
Alternatively you can
- Run configure with your needed configure options
- edit include/autoconf.h and define SQUID_MAXFD to your desired limit. Make sure to make it a nice and clean modulo 64 value (multiple of 64) to avoid various bugs in the libc headers.
- build and install Squid as usual
- Set the runtime ulimit as described above when starting Squid.
If running things as root is not an option then get your sysadmin to install a the needed ulimit command in /etc/inittscript (see man initscript), install a patched kernel where INR_OPEN in include/linux/fs.h is changed to at least the amount you need or have them install a small suid program which sets the limit (see link below).
More information can be found from Henriks How to get many filedescriptors on Linux 2.2.X and later page.
Solaris
This information is outdated, and may no longer be relevant.
Add the following to your /etc/system file and reboot to increase your maximum file descriptors per process:
set rlim_fd_max = 4096
Next you should re-run the configure script in the top directory so that it finds the new value. If it does not find the new limit, then you might try editing include/autoconf.h and setting #define DEFAULT_FD_SETSIZE by hand. Note that include/autoconf.h is created from autoconf.h.in every time you run configure. Thus, if you edit it by hand, you might lose your changes later on.
Jens-S. Voeckler advises that you should NOT change the default soft limit (rlim_fd_cur) to anything larger than 256. It will break other programs, such as the license manager needed for the SUN workshop compiler. Jens-S. also says that it should be safe to raise the limit for the Squid process as high as 16,384 except that there may be problems duruing reconfigure or logrotate if all of the lower 256 filedescriptors are in use at the time or rotate/reconfigure.
FreeBSD
This information is outdated, and may no longer be relevant.
- How do I check my maximum filedescriptors?
Do sysctl -a and look for the value of kern.maxfilesperproc.
- How do I increase them?
sysctl -w kern.maxfiles=XXXX sysctl -w kern.maxfilesperproc=XXXX
|
You probably want maxfiles > maxfilesperproc if you're going to be pushing the limit. |
- What is the upper limit?
- I don't think there is a formal upper limit inside the kernel. All the data structures are dynamically allocated. In practice there might be unintended metaphenomena (kernel spending too much time searching tables, for example).
General BSD
This information is outdated, and may no longer be relevant.
For most BSD-derived systems (SunOS, 4.4BSD, OpenBSD, FreeBSD, NetBSD, BSD/OS, 386BSD, Ultrix) you can also use the "brute force" method to increase these values in the kernel (requires a kernel rebuild):
- How do I check my maximum filedescriptors?
Do pstat -T and look for the files value, typically expressed as the ratio of currentmaximum.
- How do I increase them the easy way?
One way is to increase the value of the maxusers variable in the kernel configuration file and build a new kernel. This method is quick and easy but also has the effect of increasing a wide variety of other variables that you may not need or want increased.
- Is there a more precise method?
Another way is to find the param.c file in your kernel build area and change the arithmetic behind the relationship between maxusers and the maximum number of open files.
Here are a few examples which should lead you in the right direction:
SunOS
This information is outdated, and may no longer be relevant.
Change the value of nfile in 'usr/kvm/sys/conf.common/param.c/tt> by altering this equation:
Where NPROC is defined by:
#define NPROC (10 + 16 * MAXUSERS)
FreeBSD (from the 2.1.6 kernel)
This information is outdated, and may no longer be relevant.
Very similar to SunOS, edit /usr/src/sys/conf/param.c and alter the relationship between maxusers and the maxfiles and maxfilesperproc variables:
int maxfiles = NPROC*2; int maxfilesperproc = NPROC*2;
Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) The per-process limit can also be adjusted directly in the kernel configuration file with the following directive: options OPEN_MAX=128
BSD/OS (from the 2.1 kernel)
This information is outdated, and may no longer be relevant.
Edit /usr/src/sys/conf/param.c and adjust the maxfiles math here:
int maxfiles = 3 * (NPROC + MAXUSERS) + 80;
Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) You should also set the OPEN_MAX value in your kernel configuration file to change the per-process limit.
Reconfigure afterwards
This information is outdated, and may no longer be relevant.
After you rebuild/reconfigure your kernel with more filedescriptors, you must then recompile Squid. Squid's configure script determines how many filedescriptors are available, so you must make sure the configure script runs again as well. For example:
cd squid-1.1.x make realclean ./configure --prefix=/usr/local/squid make
What are these strange lines about removing objects?
For example:
97/01/23 22:31:10| Removed 1 of 9 objects from bucket 3913 97/01/23 22:33:10| Removed 1 of 5 objects from bucket 4315 97/01/23 22:35:40| Removed 1 of 14 objects from bucket 6391
These log entries are normal, and do not indicate that squid has reached cache_swap_high.
Consult your cache information page in cachemgr.cgi for a line like this:
Storage LRU Expiration Age: 364.01 days
Objects which have not been used for that amount of time are removed as a part of the regular maintenance. You can set an upper limit on the LRU Expiration Age value with reference_age in the config file.
Can I change a Windows NT FTP server to list directories in Unix format?
Why, yes you can! Select the following menus:
- Start
- Programs
- Microsoft Internet Server (Common)
- Internet Service Manager
This will bring up a box with icons for your various services. One of them should be a little ftp "folder." Double click on this.
You will then have to select the server (there should only be one) Select that and then choose "Properties" from the menu and choose the "directories" tab along the top.
There will be an option at the bottom saying "Directory listing style." Choose the "Unix" type, not the "MS-DOS" type.
by Oskar Pearson
Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
You are receiving ICP MISSes (via UDP) from a parent or sibling cache whose IP address your cache does not know about. This may happen in two situations.
If the peer is multihomed, it is sending packets out an interface which is not advertised in the DNS. Unfortunately, this is a configuration problem at the peer site. You can tell them to either add the IP address interface to their DNS, or use Squid's "udp_outgoing_address" option to force the replies out a specific interface. For example: on your parent squid.conf:
udp_outgoing_address proxy.parent.com
on your squid.conf:
cache_peer proxy.parent.com parent 3128 3130
You can also see this warning when sending ICP queries to multicast addresses. For security reasons, Squid requires your configuration to list all other caches listening on the multicast group address. If an unknown cache listens to that address and sends replies, your cache will log the warning message. To fix this situation, either tell the unknown cache to stop listening on the multicast address, or if they are legitimate, add them to your configuration file.
DNS lookups for domain names with underscores (_) always fail.
The standards for naming hosts ( RFC 952 and RFC 1101) do not allow underscores in domain names:
A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.).
The resolver library that ships with recent versions of BIND enforces this restriction, returning an error for any host with underscore in the hostname. The best solution is to complain to the hostmaster of the offending site, and ask them to rename their host.
See also the comp.protocols.tcp-ip.domains FAQ.
Some people have noticed that RFC 1033 implies that underscores are allowed. However, this is an informational RFC with a poorly chosen example, and not a standard by any means.
Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
See the above question. The underscore character is not valid for hostnames.
Some DNS resolvers allow the underscore, so yes, the hostname might work fine when you don't use Squid.
To make Squid allow underscores in hostnames:
Squid 2.x
Re-build with --enable-underscores configure option
Squid-3.x
add to squid.conf: enable_underscores on
Why am I getting access denied from a sibling cache?
The answer to this is somewhat complicated, so please hold on.
