SpamAssassin: A practical guide to integration and configuration - Alistair Mcdonald - E-Book

SpamAssassin: A practical guide to integration and configuration E-Book

Alistair Mcdonald

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

As a busy administrator, you know Spam is a major distraction in todays network. The effects range from inappropriate content arriving in the mailboxes up to contact email addresses placed on a website being deluged with unsolicited mail, causing valid enquiries and sales leads to be lost and wasting employee time. The perception of the problem of spam is as big as the reality. In response to the growing problem of spam, a number of free and commercial applications and services have been developed to help network administrators and email users combat spam. Its up to you to choose and then get the most out of an antispam solution. Free to use, flexible, and effective, SpamAssassin has become the most popular open source antispam application. Its unique combination of power and flexibility make it the right choice. This book will now help you set up and optimize SpamAssassin for your network.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 345

Veröffentlichungsjahr: 2004

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

SpamAssassin
Credits
About the Author
About the Reviewers
Introduction
What This Book Covers
What You Need for Using This Book
Conventions
Reader Feedback
Customer Support
Downloading the Example Code for the Book
Errata
Questions
1. Introducing Spam
Defining Spam
Definitions
The History of Spam
Spammers
The Costs of Spam
Costs to the Spammer
Costs to the Recipient
Spam and the Law
Summary
2. Spam and Anti-Spam Techniques
Spamming Techniques
Open Relay Exploitation
Collecting Email Addresses
Hiding Content
Statistical Filter Poisoning
Unique Email Generation
Trojanned Machines
Anti-Spam Techniques
Keyword Filters
Open Relay Blacklists (ORBLs)
ISP Complaints
Statistical Filters
Email Header Analysis
Non-Spam Content Tests
Whitelists
Email Content Databases
Sender Validation Systems
Sender Policy Framework (SPF)
Spam Filtering Services
Collect and Forward
Collect and Return
Send and Forward
Choosing an Anti-Spam Service Provider
ISP-Provided Services
Anti-Spam Tools
SpamAssassin
How SpamAssassin Works
Easy to Use
Techniques Used by SpamAssassin
Summary
3. Open Relays
Email Delivery
Open Relay Tests
Automated Open Relay Testers
Manual Open Relay Testing
MTA Configuration
Sendmail
Sendmail Versions 8.9 and Above
Sendmail Versions Below 8.9
Postfix
The mynetworks Configuration Directive
The relay_domains Configuration Directive
Exim
Exim Configuration Parameters
qmail
Summary
4. Protecting Email Addresses
Websites
Alternative Character Representations
JavaScript
Usenet
Trojan Software
Mailing Lists and Archives
Registration for Websites
Tracking Email Address Usage
Sendmail Plus Technique
Rogue Employees
Employees
Business Cards and Promotional Material
How Spammers Verify Email Addresses
Web Bugs
Summary
5. Detecting Spam
Content Tests
Header Tests
DNS-Based Blacklists
Statistical Tests
Message Recognition
URL Recognition
Examining Headers
Faked Headers
Reporting Spammers
Valid Bulk Email Delivery
Summary
6. Installing SpamAssassin
Building from Source
Prerequisites
Checking Current Configuration
Installing Perl
Installing CPAN
Testing for a C Compiler
Using CPAN
Installing by Hand
Resolving Build Failures
Packaged Distributions
RPM
Debian
Gentoo
Other Formats
Windows
Verifying the Installation
Upgrading
Uninstalling
Uninstalling from Source
Using CPANPLUS
Other Packages
Uninstalling on Windows
SpamAssassin Components
Executables
Perl Modules
Documentation
Summary
7. Configuration Files
Configuration Files
Standard Configuration
Site-Wide Configuration
User-Specific Configuration
Rule Files
Rules
Scores
Summary
8. Using SpamAssassin
SpamAssassin as a Daemon
Creating a User Account
SpamAssassin and Procmail
Testing for Procmail
Obtaining and Installing Procmail
Configuring Procmail
MTA Configuration
sendmail
Postfix
Exim
qmail
Configuring User Accounts
Site-Wide Procmail Usage
Integrating SpamAssassin into the MTA
Sendmail
Sendmail Milter Support
MIMEDefang
Postfix
Exim
qmail
Testing and Troubleshooting
Check the MTA
Further Diagnosis
Rejecting Spam
Summary
9. Bayesian Filtering
Scoring
Training
Confirming Operation
Filter Training
User Involvement
Local Users
Unlearning
Auto-learn Thresholds
Bayesian Database Files
Removing a Bayesian Database
Sharing a Bayesian Database
Disabling Bayesian Filtering
Summary
10. Look and Feel
Headers
Changing Headers
Creating Headers
Removing Headers
Reports
Enabling and Disabling Reports
Changing Reports
Subject Rewriting
Summary
11. Network Tests
RBLs
SURBLs
SpamAssassin 2.63
Vipul's Razor
Installing Razor
Configuring Razor
Configuring SpamAssassin
Testing Razor
Pyzor
Installing Pyzor
Configuring Pyzor
Configuring SpamAssassin
Testing Pyzor
Pyzor Headers
DCC
Installing DCC
Configuring SpamAssassin
Testing DCC
DCC Headers
Spamtraps
Choosing a Spamtrap Address
Baiting the Spamtrap
Configuring the Email Account
Summary
12. Rules
Writing Rules
Rules Performance
Meta Rules
Writing Positive Rules
Examples of Positive Rules
Rawbody Rules
Using a Corpus to Test Rules and Scoring
Corpus Development
The Public Corpus
Testing SpamAssassin on a Corpus
Examining Hit Frequencies
Using Other Rulesets
Summary
13. Improving Filtering
Whitelists and Blacklists
Manual Whitelisting and Blacklisting
Whitelisting Domains
The Auto-Whitelist
Resolving Incorrect Classifications
Examining Messages
Changing the Spam Threshold
Re-weighting Test Scores
Increasing the Score of Spam Emails
Coping with False Positives
Bayesian Unlearning and Relearning
Character Sets and Languages
Disallowing Languages
Disallowing Character Sets
Summary
14. Performance
Bottlenecks
Memory
CPUs
Disk I/O
Network I/O
Determining Bottlenecks
Performance Improvement Methodology
Using the SpamAssassin Daemon
Integrating SpamAssassin into the MTA
Omitting Messages
Large Messages
Disabling Tests
Running Network-Based Tests First
Razor, Pyzor, and DCC
Using Additional Machines
Faster File Locking
Using SQL
Requirements
MySQL
Configuration
Spamd with SQL
SQL for User Preferences
Adding New User Preferences
Displaying User Preferences
Altering User Preferences
Deleting User Preferences
Testing if SQL User Preferences Are Being Used
Preference Precedence
SQL for Bayesian Databases
Testing if the SQL Bayesian Database Is Being Used
The Auto-Whitelist Database
Testing if the SQL Auto-Whitelist Database Is Being Used
Summary
15. Housekeeping and Reporting
Separating Levels of Spam
Detecting When SpamAssassin Fails
Spam and Ham Reports
Spam Counter
Keeping Statistics Over a Period of Time
Determining SpamAssassin Processing Time
Summary
16. Building an Anti-Spam Gateway
Choosing a PC Platform
Choosing a Linux Distribution
Installing Linux
Configuring Postfix
Accepting Email for the Domain
Mail for the root User
Basic Spam Filtering with Postfix
Forwarding Email to the Original Email Server
Reloading Postfix
Testing Postfix
Installing Amavisd-new
Installation from Package
Installing Prerequisites
Installing from Source
Creating a User Account for Amavisd-new
Configuring Amavisd-new
Configuring Postfix to Run Amavisd-new
Configuring External Services
Firewall Configuration
Backups
Testing
Going Live
Summary
17. Email Clients
General Configuration Rules
Microsoft Outlook
Microsoft Outlook Express
Mozilla Thunderbird
Qualcomm Eudora
Summary
18. Choosing Other Spam Tools
Spam Policies
Evaluating Spam Filters
Configuring the Second Filter
Using a Single Machine
Using Separate Machines
Sendmail
Postfix
Exim
qmail
Other Techniques
Greylisting
SPF
Sender Validation
Summary
A. Glossary
Index

SpamAssassin

Alistair McDonald

SpamAssassin

A Practical Guide to Configuration, Customization, and Integration

Copyright © 2004 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, Packt Publishing, nor its dealers or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First edition: September 2004

Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.

ISBN 1-904811-12-4

www.packtpub.com

Cover Design by www.visionwt.com

Credits

Author

Alistair McDonald

Additional Material

Chris Santerre

Technical Reviewers

Kevin Peuhkurinen

Chris Santerre

Commissioning Editor

Louay Fatoohi

Technical Editors*

Deepa Aswani

Ashutosh Pande

Layout*

Ashutosh Pande

Indexers*

Niranjan Jahagirdar

Ashutosh Pande

Proofreader

Chris Smith

Cover Designer

Helen Wood

* Services Provided by Editorialindia.com

About the Author

Alistair McDonald is the founder and Managing Director of InRevo Ltd, an IT consultancy based in Berkshire, UK. He worked for several large corporations before founding InRevo in 1994. The company offers security, email, and other IT consultancy, as well as bespoke development.

Alistair is a developer specializing in C++ and Perl. When first introduced to Perl, he described it as "a whole new level of flexibility". Alistair got involved with the role of email administrator for one of InRevo’s clients, and subsequently honed his skills setting up servers for InRevo.

Alistair lists his favorite open-source projects as GNU Emacs, the Linux kernel, the Gentoo Linux distribution, the Perl language, SpamAssassin, and Postfix. He is also a big fan of xplanet and xscreensaver for eye candy.

Alistair is very much a family man, and enjoys spending time with his wife and two children in and around Berkshire, where they have lived for the past ten years.

I can recall getting my first spam email. This was in the mid nineties, when CompuServe provided Internet email addresses for the first time. I had heard of spam, but not experienced it. Strangely, that first spam made me feel that I’d come of age in the Internet, but the second, third and fourth spams soon made me realize what an inconvenience spam was. Back then, I did not realize how much spam would affect the Internet, and how much effort would be put into solving it. I guarded future email addresses until I started using SpamAssassin.

I hope that this book assists fellow system administrators to install and configure SpamAssassin. It really is a great solution to spam and takes very little time to set up.

Writing this book has not been a solo effort, and several people deserve special mentions.

First, my wife Louise, who has put in many long nights critically examining drafts and improving my use of English, while single-handedly bringing up two very lively children. Despite her attempts to eradicate all commas from the text, one or two may remain.

Several friends and colleagues have commented on draft chapters and contributed ideas and inspiration. I would like to take this opportunity to thank them publicly for their efforts. They are: Paul Serjeant, Ian Haycox, Colin Jenkins, and Jamie O’Shaughnessey.

During the writing of this book, I had the misfortune to spend much of my time away from home. This was made bearable as much of the time was spent with my parents. I’d like to thank them for looking after me so well, and I’d also like to apologize for being such an appalling and antisocial house guest at times.

Of course, there are many more people to thank. All the SpamAssassin developers, past and present, should be congratulated for creating such an effective tool. Their work is based on the many developers of the Perl language, another great free software project. Hats off to all of you for your hard work and ingenuity.

Finally, a big thank you to the Trade Router team, for all their inspirational comments. Keep having five a day!

I wrote this book on a Dell laptop running Gentoo Linux. I used vmware to install a total of seven different virtual machines for testing—four separate Gentoo configurations for Sendmail, Postfix, Exim, and Gmail, a Windows 2000 installation, a RedHat 9 installation, and a Debian installation, installed from the wonderful Knoppix CD.

This book is dedicated to my children, Imogen and Keir— So lively during the day, and so peaceful at night.

About the Reviewers

Kevin Peuhkurinen lives in rural Ontario and works as a network security analyst for a financial institution in Toronto, where his incessant Open Source evangelism often annoys his co-workers. When not fighting spam he likes to ride large motorcycles and go lure coursing with his two Irish Wolfhounds. He can be reached on the SpamAssassin-users mailing list and is always happy to help out others.

Chris Santerre is a System Administrator working in Providence, Rhode Island. He started the SpamAssassin Rule Emporium (SARE) at www.rulesemporium.com, which hosts custom rulesets for SpamAssassin. He created a ruleset called BigEvil that looks for known spammer URLs in an email. He is also a content provider for www.SURBL.org. Chris continues to work with the SARE ‘ninjas’ to update SARE rules for SpamAssassin, and keep it as fresh as possible. He also encourages everyone to go see a live professional ice hockey game!

Introduction

SpamAssassin is an open-source spam detector. It is considered the best of breed, and is used by many large organizations and also as the basis for commercial services and products.

SpamAssassin is free to download, install, and use, and is very customizable, configurable, and scalable to large architectures. It can be installed in one afternoon, but rewards further time spent on improving the detection rate.

This book provides a complete guide to the installation, configuration, and customization of SpamAssassin. It also discusses the history of Spam and the various techniques used to combat it. It includes detailed instructions for the most popular Mail Transport Agents (MTAs): Sendmail, Postfix, Exim, and Qmail. It also includes details on installing SpamAssassin on Windows, and adding a separate spam filter to an existing email infrastructure, such as Microsoft Exchange.

Most spam detection systems use only one or two methods of detecting spam. SpamAssassin uses many, and is extensible, allowing users to develop their own rules to identify spam. New techniques to identify spam, such as Sender Policy Framework (SPF) can be added to SpamAssassin by developing them as modules. Users or System Administrators can configure almost every aspect of SpamAssassin, leading to exceptional success rates in detecting spam.

SpamAssassin is Open Source, which means that the program code is freely available for others to examine and modify. SpamAssassin is developed, documented, and supported by a team of volunteers who give their time freely.

What This Book Covers

This book has three main areas or sections. The first section discusses spam, spammers, and anti-spam techniques. The second section discusses SpamAssassin basics, including obtaining, installing, and configuring SpamAssassin. The final section describes techniques to improve the spam detection of SpamAssassin, and to improve the performance of a SpamAssassin installation.

Chapter 1 introduces spam and provides some definitions of terms used in this book.Chapter 2 discusses various spam detection techniques used by spam detection engines and the techniques developed by spammers to subvert them.

Chapter 3 discusses open relays, historically the source of much spam, and includes information on how to check that an existing email server cannot be abused by spammers. It also describes how to rectify an MTA that is acting as an open relay. Chapter 4 describes how spammers collect email addresses and provides solutions to publish email addresses on websites without making them targets for spam.Chapter 5 discusses the mechanics of detecting spam.

Chapter 6 gives detailed instructions on how to install SpamAssassin on Unix, Linux, and Windows platforms, including obtaining and installing any prerequisite packages that SpamAssassin requires.

Chapter 7 provides a brief run through the SpamAssassin configuration files, and provides a foundation for the remaining chapters. Chapter 8 discusses how to integrate SpamAssassin with the MTA, or invoke it using procmail. A variety of strategies are discussed, to suit the needs of different organizations.

Chapter 9 covers the use of SpamAssassin’s Bayesian filter, a tool that learns from spam emails and can improve detection rates dramatically.

SpamAssassin is incredibly flexible, and Chapter 10 discusses how SpamAssassin can alter emails to mark them as spam.Chapter 11 covers adding external Network Tests which utilize databases of known spam emails to improve spam detection rates.

Chapter 12 provides a description of SpamAssassin’s rules, and describes how rules can be written, tested, and scored.

Chapter 13 covers methods to improve the detection rate of SpamAssassin, including whitelists and blacklists.

Chapter 14 describes how to improve the performance of a SpamAssassin installation.

Chapter 15 describes some useful reports and utilities that an administrator can use to streamline the running of a SpamAssassin installation.

Chapter 16 has a complete description of how to create a spam filtering gateway—this covers installing Linux and SpamAssassin, and configuring them all to filter email and forward the non-spam (or ‘ham’) to the existing email server.

Chapter 17 describes how to configure several major email clients to filter email based on the tags that SpamAssassin places in emails.

Finally, Chapter 18 discusses the advantages, disadvantages, and options available when adding an additional spam filter to an existing SpamAssassin installation.

What You Need for Using This Book

SpamAssassin and all the tools it uses are available for download from the Internet. Perl, the main prerequisite, is included in all major Linux distributions and available for most Unix-like operating systems. It can be downloaded from http://www.perl.org/get.html. The Perl CPAN module is normally used to install SpamAssassin; all that is required is an Internet connection.

This book covers integrating with four of the most popular MTAs—Sendmail, Postfix, Exim, and Qmail. MTA integration is only a small part of this book, and most of this book will be relevant no matter which MTA is in use. SpamAssassin can be integrated with most MTAs.

Reader Feedback

Feedback from our readers is always welcome. Let us know what you think about this book, what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply drop an e-mail to <[email protected]>, making sure to mention the book title in the subject of your message.

If there is a book that you need and would like to see us publish, please send us a note in the Suggest a title form on www.packtpub.com or e-mail <[email protected]>.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer Support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the Example Code for the Book

Visit http://www.packtpub.com/support, and select this book from the list of titles to download any example code or extra resources for this book. The code files available for download will then be displayed.

Note

The downloadable files contain instructions on how to use them.

Errata

Although we have taken every care to ensure the accuracy of our contents, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in text or code—we would be grateful if you would report this to us. By doing this you can save other readers from frustration, and also help to improve subsequent versions of this book.

If you find any errata, report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the Submit Errata link, and entering the details of your errata. Once your errata have been verified, your submission will be accepted and the errata added to the list of existing errata. The existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Questions

You can contact us at <[email protected]> if you are having a problem with some aspect of the book, and we will do our best to address it.

Chapter 1. Introducing Spam

Spam is an often-used term, but as with many terms, it means different things to different people. This chapter defines the term 'spam' as used in this book and reviews its history. By examining the economics and costs involved with spam, we will explain why spam has become so invasive to modern computing. Finally, we will describe the current legal position against spam.

Defining Spam

Spam, in computing terms, means something unwanted. It has normally been used to refer to unwanted email or Usenet messages, and it is now also being used to refer to unwanted Instant Messenger (IM) and telephone Short Message Service (SMS) messages. Spam email is unwanted, uninvited, and inevitably promotes something for sale. Often the terms junk email, Unsolicited Bulk Email (UBE), or Unsolicited Commercial Email (UCE) are used to refer to spam email. Spam generally promotes Internet-based sales, but it also occasionally promotes telephone-based or other methods of sales too.

People who specialize in sending spam are called spammers. Companies pay spammers to send emails on their behalf, and the spammers have developed a range of computerized tools and techniques to send these messages. Spammers also run their own online businesses and market them using spam email.

The term 'spam email' generally precludes email from known sources, regardless of however unwanted the content is. One example of this would be an endless list of jokes sent from acquaintances. Email viruses, trojan horses, and other malware (short for malicious software) are not normally categorized as spam either, although they share some common traits with spam. Emails that are not spam are often referred to as ham, particularly in the anti-spam community. Spam is subjective, and a message considered spam by one recipient may be welcomed by another.

Anti-spam tools can be partially effective in blocking malware, however, they are best at blocking spam. Special anti-virus software can and should be used to protect your inbox from other undesirable emails.

Definitions

The following definitions will be used throughout this book:

Spam: Unsolicited Commercial Email or UCE. It is any email that has not been requested and contains an advertisement of some kind.Ham: The opposite of spam—email that is wanted.False negative: A spam email message that was not detected successfully.False positive: A ham email message that was wrongly detected as spam.

The History of Spam

Here are some important dates in the development of the Internet:

1969: Two computers networked via a router1971: First email sent using a rudimentary system1979: Usenet (newsgroups) established1990: The World Wide Web concept born2004: The Internet is a major global network annually responsible for billions of dollars of commerce.

There is one omission from this time line:

1978: The first spam email was sent.

Spam has been part of the Internet from a relatively early stage in its development. The first spam email was sent on May 3rd, 1978, when the U.S. Government funded Arpanet, as it was called then. The first spammer was a DEC engineer called Gary Thuerk who invited recipients of his email to attend a product presentation. This email was sent using the Arpanet, and caused an immediate response from the chief of the Arpanet, Major Raymond Czahor, at the violation of the non-commercial policy of the Arpanet.

Spam really took off in 1994 when an Arizona attorney, Laurence Carter, automated the posting of messages to many internet newsgroups (Usenet) to advertise his firm's services. The resultant outcry from Usenet users included the coining of the term 'spam', when one respondent wrote "Send coconuts and cans of Spam to Cantor & Co.". This sparked the beginning of spam as it is now experienced.

Spam email has increased in volume as the Internet has developed. In April 2004, PC Magazine reported that 67% of all email is spam.

Spammers

Typically, spammers are paid to advertise particular websites, products, and companies, and are specialists in sending spam emails. There are several well-known spammers who are responsible for a large proportion of spam and have evaded legal action.

Individual managers of websites can send their own spams, but spammers have extensive mailing lists and superior tools to bypass spam filters and avoid detection. Spammers have a niche in today's marketing industry, and their clients capitalize on this.

Most spam emails are now sent from 'Trojanned' computers, as reported in a press release by broadband specialist Sandvine. The owners or users of trojanned computers have been tricked into running software that allows a spammer to send spam email from the computer without the knowledge of the user. The Trojan software often exploits security holes in the operating system, browser, or email client of a user. When a malicious website is visited, the Trojan software is installed on the computer. Unknown to users, their computer may become the source of thousands of spam emails a day.

Another related risk is from phishing, which occurs when a website appears to represent a bank or other financial provider, but is actually a fake and is used to collect login details of a victim. These details can then be used to perpetuate fraud. Phishing is often initiated via an email, with a web link to the fake site that is disguised to look like the real web address.

The Costs of Spam

Spam is very cheap to send. The costs are insignificant as compared to conventional marketing techniques, so marketing by spam is very cost-effective, despite very low rates of purchases in response. But it translates into major costs for the victim.

Costs to the Spammer

A report by Tom Geller, Executive Director of SpamCon Foundation, estimated that the cost to send a single spam email was as little as one thousandth of a cent, yet the cost to the recipient was around 10 cents.

The overheads in sending spam are low. The main costs are:

An Internet connection: There are lots of flat-rate Internet Service Providers (ISPs) offering packages at around $10/month. A spammer doesn't particularly need a Digital Subscriber Line (DSL) or cable modem service—a dial-up connection will also allow large quantities of spam to be sent. In fact, dial-up accounts are preferable, as spammer accounts are routinely shut down when complaints about spam are received. Dial-up accounts are easy to set up and can quickly be activated within minutes, but DSL typically has a lead time of days.Software: Specialist spam software is essential. A normal email client will restrict the number of spam messages that can be sent, and require the spammer to spend more time in front of the computer. Spammers usually write their own software, steal someone else's, or buy one. A spammer with some technical knowledge and starting from scratch can have software ready after a week. To pay someone to develop that software would cost the spammer $1000.A mailing list: Most spammers will build up their own list of email addresses. For beginners, it is possible to buy a CD with 6 million email addresses on it for around $50. Ironically, these CDs are marketed via spam email. Email addresses that are guaranteed to be currently active sell for larger sums.A web server: This is an optional cost. It allows a spammer to deliver 'web bug' images to validate their mailing list. Web bugs are discussed further in later chapters. Basic web hosting packages cost less than $10 a month.

For less than $1100, plus monthly costs of less than $150, a spammer could have the software, Internet connection, and a supply of addresses required to be operational.

A single computer can send thousands of emails an hour using dial-up. Spam varies, but a typical message size might be around 6,000 bytes. On a fast dial-up of around 50Kbps, it would take one second to send this email to one recipient. It would take only a little longer to send it to 100 recipients. In other words, at least 3,600 emails can be sent in an hour. For smaller emails, the number sent per hour would be greater. Once the spammer has set up the software, they can leave the computer unattended and go and do something else; they need to invest 15 minutes of their time, and the software will continue to send spam for many hours. With three phone lines, they could work for a total of an hour, and send approximately 10,000 emails an hour or 200,000 a day.

Costs to the Recipient

The European Union performed a study into UCE in 2001. In the findings, it estimated the cost of receiving spam borne by consumers and businesses to be around $8 billion. These costs are partly incurred through lost productivity or time, partly in direct costs, and partly in indirect costs incurred by suppliers, and passed on.

The cost of spam in a commercial environment is estimated to be as high as $600 to $1000 per year, per employee. For a 50-person company, this cost could be as high as $50,000 per year. Spam emails distract or take employees time and use disk space, processing power, and network bandwidth. Removing spam by hand is time consuming and laborious when there is a large amount of spam. In addition, there is a business risk, as genuine messages may be removed along with unwanted ones. Spam can also contain unsavory topics that some employees won't tolerate.

Spam and the Law

In the USA, legislation proceedings on spam have been in progress since 1997. The latest legislation is the CAN-SPAM act (Bill number S.877) of 2003. This supersedes many state laws and is currently being used to prosecute persistent spammers. However, it is not proving a deterrent; the Coalition Against Unsolicited Commercial Email (CAUCE) reported in June 2004 that despite several high-profile lawsuits by the Federal Trade Commission (FTC) and ISPs, spam volumes were still increasing. The CAN-SPAM act is seen as weak on two counts: that consumers have to explicitly opt-out from commercial emails, and secondly, only ISPs can take action against spammers.

In Europe, legislation exists that makes spamming illegal. However, when Directive 2002/58/EC was passed in 2002, there were several problems with it. Business-to-business emails were excluded—a business could spam each and every account at any other business and stay within the law. Additionally, each individual member state has to pass its own laws and penalties for offenders. The law requires spammers to use opt-in emailing, where recipients have to explicitly request to receive commercial emails rather than the opt-out model proposed in the USA, where anyone can receive spam and has to request to be removed from mailing lists.

The Guardian, a UK newspaper, reported in June 2004 that gangs of spammers are moving their operations to the UK due to the leniency of the laws there. The maximum penalty they face in the UK is £5000, while in Italy spammers face up to three years of imprisonment. Until June 2004, no one had been convicted under this act in the UK.

In Australia, The Spam Act 2003 came into effect in April 2004. This makes spam illegal, using an opt-in model. Additionally, there have already been successful prosecutions for spamming in Australia using previous laws.

The Internet is a multinational network and domestic legislation cannot reach to another country. A U.S.-based spammer would be at risk of prosecution if it spammed U.S. citizens and advertised a product made and sold in the US. But a spammer from the Far East would be at very little risk of prosecution. Domestic legislation will not affect the volume of spam, but it may occasionally affect the types of products advertised via spam.

Spammers will often reroute spam via other nations, so spam is sent from the US to another country and then relayed back to the US again. This makes it more difficult to trace the source of the email and to prosecute them. Many countries have no anti-spam laws and there is little or even no risk to the spammer. The blurring of geographical boundaries by the Internet does little to aid in tracing spam email to its source. Anti-spam is now moving towards tracking spammers through other means. In May 2004, the New York Times reported that the Direct Marketing Association is using paper trails in the real world to trace spammers in the virtual world with success.

Summary

Apart from defining the term spam, this chapter has shown that there are true financial rewards for spammers, and that the costs to market via spam email are very low for the spammer. However, they affect the recipient of the email to a large extent.

Spam has been a part of the Internet for a long time, even before the World Wide Web took roots, and despite legislation, it will probably continue to be a problem in the future. Even if one country had a truly solid anti-spam law, the global nature of the Internet implies that spam could still arrive from an overseas source.

As the use of email increases, both for business and personal use, and as the ratio of spam increases still further, it is important for companies to filter out spam to preserve efficient operations. SpamAssassin is a spam-tagging tool that can provide very effective filtering capabilities when configured correctly. This book will describe how to install, configure, and maintain SpamAssassin to provide an effective anti-spam solution.

Chapter 2. Spam and Anti-Spam Techniques

As spam increased in volume and became more of a problem, anti-spam techniques were developed to counteract it. Tools to block spam were developed by a group of professionals. These tools were not always automated, but when used by system administrators of large sites, they could successfully filter spam for a large number of users. In response, spammers evolved their techniques to increase the number of spams delivered by working around and through the filters. As spam filters improved, spammers designed other methods of bypassing the filters and the cycle repeated. This resulted in the development of both spam and anti-spam techniques and tools over a number of years. This evolutionary process continues today.

Anti-spam tools use a wide range of techniques to reduce the volume of spam received by a user. A number of these techniques will be described in this chapter. SpamAssassin is an important Open Source tool that we will examine in the light of the various techniques it uses to filter spam.

Spamming Techniques

Spammers have developed a complex arsenal of techniques for spamming. Important spamming techniques are described in the following sections.

Open Relay Exploitation

An open relay is a computer that allows any user to send email. Spammers use such computers to send spam without the email being traced to its true origin. Open relays are discussed in detail in Chapter 3.

Collecting Email Addresses

Early spammers had to collect email addresses in order to send spam. They use a variety of methods, from collecting email addresses from the Internet and Internet newsgroups to simply guessing email addresses. Email address collection is discussed in detail in Chapter 4.

Hiding Content

Most people can detect spam from the email subject or sender. It is often easy to discard spam emails without even looking at the body. One technique used by spammers is to hide the true content of their emails. Often, the subject of an email is a simple "Hi"; alternatively, an email might appear to be a reply to a previous email, for example "Re: tonight". Other tricks that spammers use include using random names either for the sender or within the email subject. Spammers can also make an email look important, for example, by alluding to a credit card or loan missed payments or work-related subjects.

As spam filters block obvious spam words, such as 'Viagra', spammers deliberately include misspelled words that are less likely to be filtered out; for example, "Viagra" might become "V1agra" or "V-iaggr@". Although the human mind can easily translate the meaning of misspellings unconsciously, a computer program will not associate these words with spam.

Statistical Filter Poisoning

Statistical filter poisoning involves including many random words within an email to confuse a statistical filter. Statistical filters are described in the Anti-Spam Techniques section in this chapter.

Unique Email Generation

To combat email content databases, which store the content of known spam emails doing the rounds, spammers generate unique emails. To confuse the email content database, the spammer only needs to change one random word in the main body of the email. One popular technique is to use the recipient's name within the body of an email.

Trojanned Machines

Spammers are limited by the speed of their Internet connection, be it DSL or dial-up. They are also directly traceable through ISP records. A recent trend among spammers is to use PC virus technology to infect innocent users' computers with virus-like programs. These programs send spam from the innocent parties' PCs. Such an infection is commonly called a Trojan, after the story of the Greeks invading the city of Troy by surreptitious means.

The computers are infected by either emails or websites that target vulnerabilities in email clients or web browsers. Users may be unaware that their computers are being used to send spam, and this can carry on for months before the breach is detected and the computer taken offline and repaired or rebuilt.

The throughput of 100 computers could be at least 10 million emails a day, and this figure could be much greater if the spammer infects computers on a DSL line. Trojan software can also harvest further email addresses for the spammer's database from the address books stored on the trojanned computer.

Anti-Spam Techniques

As the techniques to deliver spam have become more sophisticated, so have the techniques to detect and filter spam from legitimate email. The main techniques are described in the following sections. These techniques can be used on the email server by a system administrator, or an anti-spam service can be purchased from an external vendor.

Keyword Filters

Filters are based upon common words or phrases in an email body, for example 'buy', 'last chance', and 'Viagra'. SpamAssassin includes a variety of keyword filters and allows easy addition of new rules.

Open Relay Blacklists (ORBLs)

Open relay blacklists (ORBLs) are lists of open relays that have been reported and added to these blacklists after being tested. Anti-spam tools can query open relay blacklists and filter out emails originating from these sources. SpamAssassin can integrate with several open relay blacklists.

ISP Complaints

It has always been possible to complain to an ISP about a spammer. Some ISPs take complaints seriously, give a single warning, and after another complaint, they terminate the account of the offender. Other ISPs take a less active approach to spam that will rarely stop a spammer. Spammers naturally gravitate towards ISPs that are lenient with spammers.

ISP complaints remain a manually managed technique, due to the effort that might be wasted if an automatic report is wrong and the email reported is not spam. The website http://www.spamcop.net can examine an email, determine where ISP reports should be directed, and send appropriate messages of complaint to the corresponding ISPs.

Statistical Filters

Statistical filters are those that learn common words in both spam and ham. Subsequently, the data collected is used to examine emails and determine whether they are spam or ham. These filters are often based on the mathematical theory called Bayesian analysis. Statistical filters need to be trained by passing both ham and spam emails through, enabling the filter to learn the difference between the two. Ideally, a statistical filter should be trained regularly, and some anti-spam tools allow statistical filters to be trained automatically.

SpamAssassin includes a Bayesian filter, along with utilities to train it. SpamAssassin's Bayesian filter can also be configured to automatically learn from incoming spam and ham email.

Email Header Analysis

The software that spammers use often generates unusual headers in the emails produced. Anti-spam tools can detect these unusual headers and use them to separate spam from ham. SpamAssassin includes many email header tests.

Non-Spam Content Tests

There are possibilities that ham emails could inadvertently trigger some anti-spam tests. For example, many emails are legitimately but unfortunately routed through a blacklisted open relay. Non-spam content tests indicate that an email is not spam. They are usually created specifically for an individual or organization.

Non-spam content tests are rarely shared in public, as they are specific to an industry or company, and should not get into the hands of spammers as they would start using this information to their advantage.

SpamAssassin allows users to create rules that will subtract from the score of an email if certain content is received. An email administrator might add negative rules for the names of products sold by the company or for industry-related jargon.

Whitelists

Whitelists are the opposite of blacklists—lists of email senders who are trusted to send ham and not spam. Email from someone listed on a whitelist will normally not be marked as spam, no matter what the content of their email.

SpamAssassin allows system administrators or users to create a whitelist for users that send content that may be like spam; for example, mailing lists that discuss spam. SpamAssassin also allows the use of a blacklist. It creates auto-whitelists and blacklists, based on previous emails received from trusted and non-trusted senders.

Email Content Databases

Email content databases store the content of spam emails. These work because the same spam email will often be sent to hundreds or thousands of recipients. Email content databases store these emails and compare the content of new emails to that contained in the database. A single person reporting a spam email to such a database will assist all other users of the service.

SpamAssassin can integrate with several email content databases automatically.

Sender Validation Systems

A slightly different approach to spam is taken by sender validation systems. In these systems, when an email is received from an unknown source, the source is sent a challenge email. If a valid response is received to such an email, then the sender is added to a whitelist, the original email is delivered to the recipient, and the sender is never sent a challenge again.