The never ending fight against SPAM emails

Many of the websites I program offer the possibility to contact the website owner directly. I have already implemented such a contact form in many facets, either as a simple form, as an appointment booking system, or in the form of a callback request. The great advantage of a contact function is that visitors can book an appointment, send an inquiry, or even ask questions with one click.

Unfortunately, contact forms are very susceptible to spam. In addition to real visitors, automated programs can seek out contact forms and fill them with advertising or other unwanted content. It is particularly annoying for a website operator if his own email inbox has to be sorted out regularly in order to filter out the spam from the real messages.

Collecting SPAM on purpose

Since I don’t want to sort out spam in my mailbox any longer, I have decided to conduct a small study on my website to find effective countermeasures to SPAM.

During an observation period of three months, I tracked all SPAM messages that I received. I created a duplicate of my contact form page so that I would have one contact form as a baseline and one contact form that I could experiment with. During the three months, I received over 60 spam messages. To my surprise, some messages were even nicely written and quite elaborately designed. However, most of the emails contained rather dubious offers and obviously malicious links.

While setting up this experiment, I gathered two interesting insights. Firstly, after two weeks of being online, the duplicate contact form received the same SPAM messages as the original form. This suggests that an automatic crawler targets any contact form on my website. Secondly, the SPAM is sent at a constant rate, implying that a recurring routine sends the SPAM to my website.

The Countermeasures

To combat SPAM, I tested five countermeasures with the goal of finding the highest usability and effectiveness in filtering unwanted emails. Unfortunately, usability and effectiveness often conflict with each other because the stronger the spam protection, the more hurdles normal visitors have to overcome to send a message. The following image shows the baseline contact form first and then five iterations of it where I added some sort of spam protection:

  • Default Form: My normal contact form, which has no protection against spam.
  • Visibility Honeypot: This method adds an extra field to the contact form. However, this field only appears in the code and is hidden by the CSS attribute visibility:hidden when displaying the website. As a result, the field is detected by SPAM programs that analyze the code of a website. However, real visitors cannot see the field when viewing the website. If the hidden field is filled in, the website owner can be sure that it is a SPAM message and not a real request. The great advantage of this method is that it does not limit the usability of the contact form at all, as the form remains unchanged from the real visitor’s point of view.
  • Z-Index Honeypot: This method uses the same principle as the previous honeypot. However, the additional field is not hidden with visbility:hidden, but pushed behind another field using the CSS attribute z-index:-1. Intelligent SPAM algorithms check whether a field is hidden using visbility:hidden and do not fill such fields. However, checking if a field is hidden behind another field in the contact form is much more complex, which makes the z-index honeypot a good alternative to the visibility honeypot.
  • Gestures: In this method, a complex gesture such as a swipe is inserted into the contact form. Such a gesture protects against SPAM since most programs are usually not able to simulate complex movements. Unfortunately, gestures are also somewhat less user-friendly than the previous methods, as they are rather uncommon in desktop applications and can thus be misunderstood.
  • Mathematical calculation: This method requires the visitor to solve a simple mathematical calculation, such as “What results in 2 + 2?”. The more complex the calculation, the higher the spam protection. At the same time, however, user-friendliness decreases with increasing complexity, as it becomes more difficult for the user to solve the additional question.
  • Knowledge question: In this method, the visitor has to answer a knowledge question, which can be easily found out or is commonly known. I chose the question “Who hosts the show: Who wants to be a millionaire?”. Even though the SPAM protection is relatively high with such a question, this method also requires a rather high effort from real visitors. If they do not know the answer to the question, you have to pause filling out the form and research the correct answer first. Besides, I personally also noticed that it is very difficult to formulate questions where the answer is commonly known and clearly determinable.
    CAPTCHA: CAPTCHAs are small tasks that are easy to solve for humans, but not for computer programs. Google, for example, offers a CAPTCHA service that can be easily integrated into your own contact form. When a visitor views the contact form, this service automatically builds distorted text or image puzzles into the form. The visitor then has to enter the displayed text or select matching images. Since these CAPTCHA services are very advanced and widely tested, they offer a very high level of protection against SPAM. Unfortunately, especially image puzzles can be ambiguous and thus frustrating for visitors, which in turn significantly limits the user experience.

Results – June 2022

  • Default: Without spam protection, no SPAM is filtered out.
  • Visibility Honeypot: This type of honeypot was detected and not filled by all spam algorithms targeting my website. As a result, all spam passed my security check and all unwanted emails showed up in my inbox.
  • Z-Index Honeypot: Unlike the previous honeypot, the Z-Index honeypot was not detected. All spam programs fell into the trap and filled in the hidden field. So my security check worked perfectly and all spam was detected and filtered.
  • Gesture: Including a gesture also blocks any spam emails. Unfortunately, with this method, I found that many real visitors also had problems performing unfamiliar gestures.
  • Mathematical calculation: This result stunned me a bit. All other methods either stopped all emails or none at all. However, the mathematical calculation only stopped some of the spam. Specifically, 40 % of spam emails were allowed through with the correct result. In my opinion, the most plausible explanation for this phenomenon is that my website is targeted by several spam programs, which have different levels of sophistication. This would also explain the large qualitative differences in the design and wording of the spam.
  • Knowledge question: The knowledge question was not answered correctly by any spam algorithm. Thus, all malicious emails were detected and filtered out. Unfortunately, real visitors also had problems submitting the contact form. For example, some visitors answered “Guenther Jauch” instead of “G√ľnther Jauch” and were therefore sorted out by my spam protection.
  • Captcha: As expected, the advanced captcha successfully filtered out all spam emails.

In my situation, the z-index-based honeypot is definitely the best choice, as this method detects all spam while being invisible to real visitors. Since this method does not limit the user experience, I will include it by default in all my contact forms in the future.

Updated Results – November 2022

I started to receive SPAM emails again. After conducting some tests I concluded that the z-index honeypot is no longer effective, which means that either the SPAM bots are able to detect the honeypot or I am now targeted by a different SPAM software which is more advanced than the one that previously targeted my website.

To avoid confusing visitors with unusual gestures or frustrating knowledge questions, I have implemented a CAPTCHA. I found a helpful tutorial on how to create a DSGVO-friendly CAPTCHA here. Since adding the CAPTCHA, spam emails are no longer an issue.

Updates Results – January 2023

The recently implemented contact form featuring CAPTCHA is allowing SPAM emails to pass through unexpectedly. Despite previously believing CAPTCHA to be an effective safeguard, it appears that new and improved bot versions emerge each time I add a new protection method.

This is a testament to the ever-evolving cybersecurity field. No system can be guaranteed as permanently secure. Security is an ongoing battle between attackers and countermeasures, each becoming more advanced. As countermeasures become more sophisticated, attackers adapt and employ more sophisticated tactics to circumvent them. As a result, CAPTCHA is continually changing, and the current status quo of selecting images or swiping puzzle pieces may soon become obsolete.

So what now?

Implementing SPAM protection on my own is no longer feasible. This is unfortunate because it means relying on outside services, which adds more dependencies and may not always be straightforward from a data privacy standpoint. The most advanced protection for my contact forms is Google’s reCAPTCHA V3. It analyzes user activity on the website to determine whether it is human or bot.

However, technologies such as these are vulnerable to threats from AI advancements. The ease of training neural networks with data on human behavior poses a challenge, as AI can now mimic human behavior in the digital realm to a point where it may be indistinguishable from genuine human behavior. Similarly, AI-generated images and text are increasingly difficult to differentiate from their human-made counterparts. Thus, there is a pressing need for a new web standard that can accurately discern uniquely human capabilities using biometric data, including fingerprint and retina scans, to effectively prevent bots from infiltrating the online domain.