Tuning Linux
Prev	Chapter 7. Application Tuning	Next

7.1. Squid Proxy Server

Squid proxy server has two common uses. The first is as a proxy cache for internal clients access the external web. The second is as a reverse proxy, caching content from internal web servers.

7.1.1. Squid as a proxy cache

Squid as a proxy cache stores large numbers of web pages, enabling them to be sent to clients upon request instead of requiring another internet fetch. Squid uses large amounts of disk space for the cache, but by nature doesn't access any one that much more frequently than any other.

The first thing to avoid is a disk bottleneck. In a high-usage environment, you will definitely want to distribute the cache file across as many spindles as is practical. Since the cache data is relatively worthless, in that it is flushed out as a matter of course, the additional overhead of a RAID 5 or other relatively fail-safe multi-spindle system is not necessary. RAID 0 striping will provide the fastest access with no fail-safe requirements.

The next thing to avoid is a memory bottleneck. There should ABSOLUTELY be no OS paging with Squid, Squid will page enough on its own. This is a case where Squid has in effect its own virtual memory system, with thousands of pages on disk and tens or hundreds in memory. If the OS is paging to provide the tens or hundreds in memory, your performance will be horrible.

Network performance will probably not be a big deal. If your byte hit rate is 25%, and you are maxing out a T1, you are still only putting around 2Mbps out your Ethernet connection.

7.1.2. Squid as a reverse proxy

As a reverse proxy, the total set of web pages that Squid is proxying is reduced greatly. It must store only the limited number of pages behind it on your web servers, as opposed to the entire internet. Since disk storage requirements are down by an order of magnitude, so are memory requirements.

However, one approach to this situation would be to build up real memory until the entire working set of pages could be stored, with no paging required.

In this case we're trading large store/infrequent access for a smaller store/frequent access. We want to optimize the byte hit rate in Squid with real memory to reduce disk operations. Again, there should be NO OS PAGING. If any DNS lookups are required, it might be a good idea to run a DNS cache locally. Otherwise, this shouldn't be too difficult to tune to get good performance.