Skip on down to the menu.
redir.pl -- The Squid Proxy Redirector
What is it?
This simple script will read one or more blacklist and/or whitelist of domain names and use that data to direct the Squid proxy server to deny/accept a requested resource based on whether the domain name is in a blacklist/whitelist.
Why did I write this? Are there not methods already built into Squid to do this?
Yes, Squid can do some of this out of the box, but it is really slow. This method is quite fast as well as very flexible since you can modify it yourself to meet your own needs.
More details are available in the source comments.
The Code
#!c:/Perl/bin/perl.exe
# redir.pl
#
#------------------------------------------------------------------------------#
# Copyright 2005, 2009 Tom Sneddon
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#------------------------------------------------------------------------------#
#
# A simple redirector for the Squid proxy server written in Perl.
#
# Compares the domain names of url requests to a blacklist.
#
# If the domain is on the blacklist, then substitute a special blacklist
# page or a special blacklist image or script. If the domain name is not
# in the blacklist then echo it back to Squid unchanged.
#
# If you require all Internet traffic to go through Squid, then you can
# also filter external images that are called by HTML formatted email
# messages. Url's that end in specified extensions are replaced by a special
# blacklist image. Not a fancy or foolproof filter, but one that works
# for >90% of all images.
#
# A similar method is used to substitute a known, safe, file for any
# url's that appear to be scripts. You can specify your own list of script
# file extensions. The file to be substituted can be anything you like.
# I prefer to use a blank text file. That way I don't have to bother
# detecting what kind of script or how it is inserted into the markup.
# It breaks the page, but hey, that's the whole point anyway.
#
# The blacklist text file should be formatted so as to have one domain name
# or IP address per line.
#
# Here's how you integrate this program into Squid:
#
# TAG: redirect_program
# Specify the location of the executable for the URL redirector.
# Since they can perform almost any function there isn't one included.
# See the FAQ (section 15) for information on how to write one.
# By default, a redirector is not used.
#
#This is an example for a Windows server (*nix/Linux is similar):
# redirect_program C:/perl/bin/perl.exe C:/squid/libexec/redir.pl
#
# TAG: redirect_children
# The number of redirector processes to spawn. If you start
# too few Squid will have to wait for them to process a backlog of
# URLs, slowing it down. If you start too many they will use RAM
# and other system resources.
#
#Adjust as needed for server load (check log files):
# redirect_children 10
#
#
# This program is many times faster than using the acl regex method in Squid.
#
# What is unique about this program is that the blacklist domains are loaded
# as hash keys. Hopefully looking up hash keys is faster than any optimized
# search routine that I could have come up with. I haven't tested it, but I'm
# sure that it's faster than using regular expressions also.
#
#
# 2005-01-12
# Tom Sneddon
# Initial release.
#
# 2005 --> 2008
# Tom Sneddon
# Many changes as needed over the years.
#
# 2008-10-03
# Tom Sneddon
# Released to the public.
#
# 2009-07-06
# Tom Sneddon
# Changed the substitution character for domain names from "-" to "/".
# Now "domain-name.com" and "domain.name.com" will not result in the same
# hash key.
# Look at entire domain name, not just 2nd level domain.
# Added option to log list of domain keys to a file for inspection/debug.
#
$|=1; # Turn off I/O buffer
# The blacklist is a text file of domain names. There is one domain name per
# line in the file. When one of these domain names is requested, either the
# redirect page ($redir_page_url) or the redirect image ($redir_image_url) is
# returned instead. The site is blocked.
# You can get a good blacklist at
# http://www.malwaredomains.com/files/domains.txt
my %blacklist; # Make a hash to store blacklist
my $blacklist_file1 = 'c:/squid/blacklist/domains.txt';
my $blacklist_file2 = 'c:/squid/blacklist/blacklist.txt';
# Path/Filename of blacklist(s)
my $ignore_comment_lines = 1; # ignore lines beginning with "#"
my $log_domain_hash_keys = 1; # Log domain hash keys to a text file.
my $hash_key_log_suffix = '_hashkeys.txt'; # Filename suffix for hashkey log files.
# The whitelist is a text file of domain names. There is one domain name per
# line in the file. When one of these domain names is requested, the requested
# url is returned regardless of the status of the url in the blacklist.
# The site is never blocked.
my %whitelist;
my $whitelist_file1 = 'c:/squid/whitelist/whitelist.txt';
# Path/Filename of whitelist
# The redirect page is a html page that indicates to the user that the requested
# domain name has been blocked.
my $url; # A scalar to store a url
my $redir_page_url = 'http://your.domain.name/blacklisted.html';
# Page to substitute for
# blacklisted sites
# The redirect image is an image that replaces a blocked image. It indicates
# to the user that the requested image has been blocked.
my $redir_image_url = 'http://your.domain.name/images/blacklist-icon.gif';
# Image to substitute for
# blacklisted images
# The redirect script is a file that replaces a blocked script.
# I use a blank text file. It is will break pages for sure,
# but then that's the point isn't it?
my $redir_script_url = 'http://your.domain.name/scripts/blacklist_blank.txt';
# Script to substitute for
# blacklisted scripts
# $domain is a variable that is used to store the currently requested domain
# name. The domain name is extracted from the requested url.
my $domain; # A scalar to store a domain name
# The input to this program is a stream of url's. These url's are the requests
# that are made to the squid proxy server.
my @input; # Input to this program
################################################################################
sub get_domain {
my $domain = shift; # Get the url
$domain = lc($domain); # Force to lower case
$domain =~ s!^.+?://!!; # Remove the protocol
$domain =~ s!/.*$!!; # Remove the path
$domain =~ s!:.*$!!; # Remove port number
# unless ($domain =~ m!\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}!) {
# $domain =~ s!.*\.(.+?\..+?)$!$1!; # Give me the 2nd level domain
# # if not numeric
}
$domain =~ s!\.!/!g; # Replace "." with "/" in domain
return $domain; # Return this altered form of
# domain name or ip
}
################################################################################
sub load_list {
my $list = shift; # pointer to the list hash
my $list_file = shift; # path and filename of the list file
my $domains_file = $list_file . $hash_key_log_suffix;
if ($log_domain_hash_keys) { # are we logging hash keys?
open(OUTFILE, ">$domains_file"); # open hash key log file
}
open(INFILE, $list_file); # or die is pointless.
# Where would we write the output to?
while (<INFILE>) { # assigns each line in turn to $_
my $domain = $_; # read one line of the list
if ($ignore_comment_lines) { # are we ignoring comments?
$domain =~ s/^#.*//; # remove comment line
}
$domain =~ s/^\s*(\S+).*/$1/; # look at the next non-whitespace blob of text
$domain = lc($domain); # force to lower case
# unless ($domain =~ m!\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}!) {
# $domain =~ m!(.+?\..+?)$!; # Give me the 2nd level domain if not numeric
# }
$domain =~ s!\.!/!g; # replace '.' with '/'
$domain =~ s/\n//; # remove newline character
if ($domain ne "") { # is this domain name a null string?
$list->{$domain} = 1; # any scalar will work here
if ($log_domain_hash_keys) { # are we logging hash keys?
print OUTFILE "$domain\n"; # print this domain name to hash key log
}
}
}
close INFILE; # we're done with the file now
if ($log_domain_hash_keys) { # are we logging hash keys?
close OUTFILE; # close hash key log file
}
}
################################################################################
# Load the whitelist and blacklist into memory.
# If there are more than one whitelist/blacklist files, the routines should be
# called once for each file. The list will be appended to each time the
# load_list routine is called.
load_list(\%whitelist, $whitelist_file1);# Load up the whitelist hash
# with whitelist1
load_list(\%blacklist, $blacklist_file1);# Load up the blacklist hash
# with blacklist1
load_list(\%blacklist, $blacklist_file2);# Load up the blacklist hash
# with blacklist2
while (<>) { # Do this forever...
@input = split; # Get some input
$url = @input[0]; # Assign the $url from input string
$domain = get_domain($url); # Determine the domain for this request
# Is it in the whitelist?
if ($whitelist{$domain}) {
print "$url\n"; # Echo the original url if in whitelist
}
# Is it in the blacklist?
elsif ($blacklist{$domain}) {
# If so, then...
# Is the request for an image?
# (Add additional image types here if desired.)
if ($url =~ m/\.(jpg|jpeg|gif|png|bmp|mng|xbm|ico|wmf)$/i) {
# If so then send the blacklist image.
print "$redir_image_url\n"; # Send the redirect image
}
# Is the request for a script?
# (Add additional script types here if desired.)
elsif ($url =~ m/\.(js|htc|hta|pac|vb)$/i) {
# If so then substitue our file for the script.
print "$redir_script_url\n"; # Send the redirect script
}
else {
# Send the blacklist page.
print "$redir_page_url\n"; # Send the redirect url
}
}
# # Banned file types.
# # Currently only .wmf files.
# elsif ($url =~ m/\.wmf$/i) {
# # Send the blacklist image.
# print "$redir_image_url\n"; # Send the redirect image
# }
# If not, then get page.
else {
print "$url\n";
}
}