|
|||||||
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
#1 |
|
Member (9 bit)
Join Date: Sep 2006
Posts: 393
|
How do SPAM filters actually work?
I have a couple of online email packages, Yahoo and GMail, and find myself getting very little SPAM at this point. Within Yahoo I consistently click on SPAM whenever a junk message comes in, and today I wondered:
Lacking my having clicked on a message, how does a filter know an e mail is SPAM? Is it the number of recipients, a word or two in the body of the message, or what? Of course most of the filters have their idiosynchosies, but are there general rules for how SPAM filters actually work? |
|
|
|
|
|
#2 |
|
Come in Ray...
Join Date: Sep 2004
Posts: 1,668
|
Since pretty much all spam follows the same pattern, the most effective filters implement some sort of a "learning" Bayes Algorithm.
http://en.wikipedia.org/wiki/Bayes_Theorem |
|
|
|
|
|
#3 |
|
Member (9 bit)
Join Date: Sep 2006
Posts: 393
|
Thanks much, but I was looking for a more understandable answer than the Bayesian theorems. It is rather tough to interpret the sort of thing on the site referenced above.
I sincerely appreciate the effort, however... marginal probabilities of stochastic events, standardized likeliehood, formulas up the wazoo, etc, etc. If there were an English version I could understand that, since I speak a version of same.... I think this is saying that most SPAMMERS use the same tools, so anti spam software builds those rules into their software. What this does not answer is whether the text is important to filtering SPAM, or the number of recipients, or some combination of words.... My goal is simply to reduce SPAM to a minimum prior to a long trip during which I will be 3-5 days without checking my e mail. |
|
|
|
|
|
#4 |
|
Come in Ray...
Join Date: Sep 2004
Posts: 1,668
|
In plain english, a Bayes Algorthm basically builds a dictionary of "good" and "bad" words which is used to calculate a rating. Good and bad word dictionaries are built by classifying messages as spam or not spam (about 50 messages is a good starting size) and the text is stored in the respective dictionary. Then when an incoming message is received the text is scanned against the good and bad dictionaries and, depending on the rating, it is classified as spam or not spam.
|
|
|
|
|
|
#5 |
|
Member (10 bit)
Join Date: Mar 2006
Location: Toronto, Canada
Posts: 810
|
These may or may not work in conjunction with filters that block inline images, as spammers have tried to use images to circumvent filtering based on words.
|
|
|
|
|
|
#6 |
|
Staff
Premium Member
Join Date: Jul 1999
Location: Arlington, TN
Posts: 5,538
|
There is not just one spam filter. Several products that I have used in the past had 12-14 different methods of filtering. The 2 worst were Bayesian and Keyword since they yielded the most false positives. It is pretty much a process. First you generally looks up the IP address against various blacklists. If the domain is on that blacklist it gets deleted. Then you might check to see if the sending mail server is listed on a relay list. Then you might use a SURBL list that will detect spam via the links in the email. If a link goes to a known spammer it is deleted. Then you have other filters like SPF that detects forged headers. Of course if you are large enough then any spam that gets sent to a user account can be identified as spam and the rules changed or the sender blacklisted.
Despite all of the methods when you are on the defense all the time you have a harder job since you have to keep tweaking your filter.
__________________
Want to Make $$$$ with your Computer? No Risk! Simply press shift-4 four times in a row |
|
|
|
|
|
#7 | |
|
Member (9 bit)
Join Date: Sep 2006
Posts: 393
|
Quote:
Now THAT's a pretty darn comprehensive and simple explanation for the process. Thanks! |
|
|
|
|
|
|
#8 |
|
Come in Ray...
Join Date: Sep 2004
Posts: 1,668
|
If you use Outlook (not Express), this is hands down the best spam prevention program.
http://spambayes.sourceforge.net/ It is a Bayean algorithm, (sorry mairv, I have to respectfully disagree with regards to them being the worst) and I've been using it here at work for over 6 months and I can count the number of false positives on 1 hand. |
|
|
|
|
|
#9 | |
|
Staff
Premium Member
Join Date: Jul 1999
Location: Arlington, TN
Posts: 5,538
|
Quote:
|
|
|
|
|
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Show off your Custom Case Work. | Byte 2.0 | Computer Hardware | 292 | 03-14-2012 08:48 PM |
| Confused | Siberian Bear | Distributed Computing | 15 | 06-16-2005 06:42 AM |
| Connecting to work server... | KINGOFOOTBALL33 | Networking & Online Security | 1 | 05-28-2005 04:10 PM |
| Work out routine / Keeping fit | james8547 | General Discussion | 4 | 05-22-2005 08:17 AM |
| wireless is now setup in my work shop | Byte 2.0 | Networking & Online Security | 1 | 06-26-2003 06:04 PM |