Skip to content

Alawadhi Blog

Everything in my Mind

Archive

Tag: squid

Configuring Squid to query other cache servers for content

Introduction

If there are more than one cache server available at the same place, there is a way for any server to query the others for cached content. An example of this would be if a company has two buildings each with its own cache, e.g. building1-cache and building2-cache, where both are configured as [cache siblings], a mean to share cached contents.

What is ICP

As RFC 2186 indicates:

“ICP is a lightweight message format used for communicating among Web caches. ICP is used to exchange hints about the existence of URLs in neighbor caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate location from which to retrieve an object.”

and

“ICP is a message format used for communicating between Web caches. Although Web caches use HTTP for the transfer of object data, caches benefit from a simpler, lighter communication protocol. ICP is primarily used in a cache mesh to locate specific Web objects in neighboring caches. One cache sends an ICP query to its neighbors. The neighbors send back ICP replies indicating a (HIT) or a (MISS)”

So, if the content is available at one of the cache siblings (HIT), it serves the requester directly instead of going to the origin server -on Internet- to download it.

Configuring Squid

Make sure that the following line exists in /etc/squid/squid.conf and is not commented:

icp_port 3130

As this line enables Squid’s ICP to serve other cache servers.
Next is to have the following line:

cache_host x.x.x.x sibling yyyy 3130 proxy-only

Where x.x.x.x is the other cache IP address, yyyy is the cache HTTP port (squid default is 3128), and proxy-only tells Squid to load content directly and not save it locally. If you have more servers, just list them accordingly.
Issue the Squid reload command to apply the updated config:

#/etc/init.d/squid reload

Do the same steps for the rest of servers.

Testing The Settings

If the service is reloaded, a new entry will be there in /var/log/squid/cache.log:

Configuring Sibling x.x.x.x/3128/3130

If after a while (ICP timeout), an entry like:

Detected DEAD Sibling: x.x.x.x

is there, double check for any network related problem as Squid service had failed to contact the other cache server, otherwise, cache exchange is working.

Configuring Squid to block ad sites

Introduction

Basic network is operational now, with a gateway and cache/proxy, and it is configured transparently with WCCP. Now, blocking some sites (mainly ads) is the next step.

Configuring Squid

First, beginning with an ACL that contains URLs/domains to be blocked, edit /etc/squid/squid.conf to add the following at the right place:

acl blocked_domains dstdomain .clicksor.com
acl blocked_domains dstdomain .paypopup.com
acl blocked_domains dstdomain .bidvertiser.com
acl blocked_domains dstdomain .zedo.com
acl blocked_domains dstdomain .quantserve.com
acl blocked_domains dstdomain .quantcast.com
acl blocked_domains dstdomain .dmoglobal.net
acl blocked_domains dstdomain ads.mininova.org
acl blocked_domains dstdomain .yieldmanager.com
acl blocked_domains dstdomain .bluelithium.com
acl blocked_domains dstdomain .pubmatic.com
acl blocked_domains dstdomain .adbrite.com
acl blocked_domains dstdomain .advertising.com
acl blocked_domains dstdomain .imvu.com
acl blocked_domains dstdomain .games888.com
acl blocked_domains dstdomain .firstperson.nl
acl blocked_domains dstdomain .mario-sonic.com
acl blocked_domains dstdomain .yahwroom.org
acl blocked_domains dstdomain .yieldmanager.edgesuite.net
acl blocked_domains dstdomain .z5x.net

Where blocked_domains is the ACL name, .domain.com (notice the dot in the beginning) is the domain (and all sub-domains) contained within.
Next is to tell Squid what to do with the ACL created, the following line has to be at the right place:

http_access deny blocked_domains

Here, Squid engine is told to deny access to any domain contained within the ACL blocked_domains.
With these steps indicated above, any domain can be denied access.Finally, the next command has to be issued every time after finishing:

#/etc/init.d/squid reload

Where Squid is instructed to reload its config file without restarting the full service.

Using Squid as a transparent proxy/cache with Cisco Routers

Introduction

When I was planning my home network, I wanted to have basic components available, e.g. local DNS, local proxy/cache, etc… I started by having a Cisco 1750 router as my home ADSL device, as it has a wide range of configuration capabilities.
One item was in my home networking to-do list, a proxy/cache service. Having such a service in any multiuser environment is a must, at least for common Internet related activities (e.g. Windows update, antivirus updates, etc…) which have the same files downloaded again and again for each and every PC you have connected. Another thing by the way, from time to time, I bring several PCs/Laptops home for maintenance or reinstallation of Windows, so the need is obvious, having these files locally save both the Internet bandwidth (download it once – have it locally then) and time.
Besides saving bandwidth, tricks can be done with schedule downloads, most of my family members read newspapers online (PDF version), by having a schedule task to download all PDF files from all newspapers we read, I would have them all saved locally in the central cache, thus ready for local access from all PCs/Laptops in my local network.

The Hardware

Any PC with at least 128MB of RAM can do the required job efficiently. Squid runs under Linux (mainly) operating system.

The Software

Linux as an operating system.

Preparing the Software

I am not going into details on how to install the Linux OS. One thing I would like to mention, usually I install Linux Command-Prompt only, as I never use the graphical interface.
After installing Linux (by the way, I use the latest CentOS Linux distro), I make sure that I use yum to install Squid and any dependencies automatically. After installing Squid, I configure it directly with the following (do a search in /etc/squid/squid.conf file and edit accordingly):

http_port xxx.xxx.xxx.xxx:3128 transparent
icp_port 0
maximum_object_size 71680 KB
cache_replacement_policy heap GDSF
memory_replacement_policy heap GDSF
cache_dir diskd /dir ssss 16 256
acl localnet src nnn.nnn.nnn.0/24
wccp2_router rrr.rrr.rrr.rrr

Where:

  • xxx.xxx.xxx.xxx is the IP address you want Squid to bind to,
  • transparent instructs Squid to run in transparent mode (a mode which does not need enduser configuration, it transparently redirects http traffic),
  • maximum_object_size is the maximum file size stored in the disk cache (here I allow up to 70MB filesize to be stored),
  • /dir is the cache directory on disk,
  • ssss is the total size (in MB) of the cache directory (the maximum size of which if reached a replacement policy is performed),
  • nnn.nnn.nnn.0 is the network number you are using,
  • rrr.rrr.rrr.rrr is the IP address of your Cisco router.

I use a general rule for the total cache size based upon the link speed you have, for example, I have a 1Mbps ADSL line to the Internet, multiply this by 60 and then by 60 to have the maximum size downloaded (theoretically of course in megabits) per hour, 1*60*60 = 3600Mbph, multiply the result by 24 to have it for a full day, 3600*24=86400Mbpd, then divide this result by 8 (remember, it is still in bits not bytes) and the final result is 86400/8=10800MBpd. So you have around 10GBs of Internet traffic if it is 100% utilized for 24 hours (neglecting the effect of TCP and other headers to simplify the calculation). Then, based on the total disk size you have (nowdays, disks are cheap, so getting a 160GB one is easy), you can decide how many days worth of cache wanted (under 100% utilization). My choice was 40GB of cache so ssss in my case is 40960.

Other Linux Settings

A GRE network interface configuration is needed in the Linux box, so do the following:

    • In /etc/rc.local add the following lines:
iptables -F; iptables -t nat -F; iptables -t mangle -F
iptables -P INPUT ACCEPT
iptables -t nat -A PREROUTING -s nnn.nnn.nnn.0/255.255.255.0 -d ! nnn.nnn.nnn.0/255.255.255.0 -i gre0 -p tcp -m tcp --dport 80 -j DNAT --to-destination xxx.xxx.xxx.xxx:3128

Where: nnn.nnn.nnn.0 is the network number you are using, xxx.xxx.xxx.xxx is the IP address of the Linux box which Squid is bind to. Keep in mind that the third line is continuous with the forth and fifth lines. What these line do is to redirect any port 80 request (HTTP) coming from the router through GRE0 interface to port 3128 (SQUID) to have it processed by Squid.

    • In /etc/modprobe.conf add the following in a separate line:
alias gre0 ip_gre
    • Make a new file in /etc/sysconfig/network-scripts named ifcfg-gre0 with the following lines:
DEVICE=gre0
BOOTPROTO=static
IPADDR=10.190.19.19
NETMASK=255.255.255.252
ONBOOT=yes

First addition is to enable the GRE interface module in Linux Kernel, second addition is to configure it with a static IP address (any private IP will do the trick, make sure that it is not the same range used locally).

Now you can bring the GRE interface up using:

#ifup gre0

If everything is going smooth, you should have it up without any error, for checking, issue this command:

#ifconfig gre0

and you should have an output similar to this:

gre0 Link encap:UNSPEC HWaddr 00-00-05-08-60-FC-00-00-00-00-00-00-00-00
inet addr:10.190.19.19 Mask:255.255.255.252
UP RUNNING NOARP MTU:1476 Metric:1
RX packets:14168479 errors:0 dropped:0 overruns:0 frame:0
TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1707102933 (1.5 GiB) TX bytes:3611 (3.5 KiB)

Here, our Linux settings are finished.

Configuring the Router

Configuring the router is a straight forward job, do the following:

router1#conf t
router1(config)#ip wccp web-cache
router1(config)#int f0
router1(config-if)#ip wccp web-cache redirect in
router1(config-if)#exit
router1(config)#exit

Here, router configuration is finished.

Cache is Working

A simple way to findout whether cache redirection is working or not, issue this command in the router:

router1#show ip wccp

The output should be similar to this:

Global WCCP information:
Router information:
Router Identifier: xxx.xxx.xxx.xxx
Protocol Version: 2.0
Service Identifier: web-cache
Number of Cache Engines: 1
Number of routers: 1
Total Packets Redirected: 2967084
Redirect access-list: -none-
Total Packets Denied Redirect: 0
Total Packets Unassigned: 22
Group access-list: -none-
Total Messages Denied to Group: 0
Total Authentication failures: 0

Where xxx.xxx.xxx.xxx is your router IP address. Another thing you will notice in the router console output if you stop Squid (#/etc/init.d/squid stop):

.Dec 16 2008 12:54:17: %WCCP-1-CACHELOST: Web Cache ccc.ccc.ccc.ccc lost

And when you start Squid (#/etc/init.d/squid start):

.Dec 16 2008 12:55:14: %WCCP-5-CACHEFOUND: Web Cache ccc.ccc.ccc.ccc acquired

Where ccc.ccc.ccc.ccc is your Squid IP address.
Now browse the Internet for a while, then issue this command in the Linux box:

#tail /var/log/squid/access.log

If you have some output with your PC IP address and some sites you visited, your cache and router redirection are working perfectly.

%d bloggers like this: