Chapter 6: Filtering

Previous: Chapter 5

Book Review: Qmail Quickstarter

Next: Chapter 7

Chapter 6 covers filtering email, explaining that qmail's modular architecture usually makes such filtering easiest by employing the various architectural interfaces normally used between cooperating (if not always mutually trusting) qmail components.

The illustration of the basic qmail architecture, on the first page (Page 85), is enhanced in a subsequent illustration that shows where and how filtering typically is done via use of wrappers around various qmail components. The filters are classified by the author as "Connection Decisions", "Content Gateways and Modifiers", and "Content Modifiers", which are thoroughly elaborated upon. Popular qmail add-ons that provide for flexible filtering at certain points, such as Inter7's simscan, Qscanq, qmail-qfilter, and qmail-scanner, are described.

Also covered are exchanging (usually sending) mail without a queue, using a setup called (by the qmail author) "mini-qmail", which involves the Quick Mail Queueing Protocol (QMQP); blocking viruses (via heavyweight or lightweight filtering); stopping spam (via sender validation such as SPF or DomainKeys); identifying spam (via lightweight or heavyweight techniques); quarantines and challenges; anti-spam classification mistakes; and stopping outgoing spam.

The section on anti-spam classification mistakes is an excellent (one-page) summary of the dilemmas of anti-spam systems, and it avoids the "easy-fix" mentality often found in discussions about how to stop spam.

In addition to describing recipient validation, the author gives several good reasons why validating recipients in order to avoid subsequent bounces (also known as "outscatter" or "backscatter", a widely-recognized problem with typical qmail installations when they are subject to heavy spam or virus attacks) is insufficient to completely avoid generating bounces.

Here, it might have been helpful to explain that the very attributes of qmail that make it inherently secure and flexible also greatly complicate its ability to assure that a delivery to all endpoints (such as user mboxes or Maildirs, but also all forwarding and piping to programs) will succeed once the incoming message is accepted by the SMTP server.

Further, these problems are inherent to the SMTP protocol, due to its push-then-bounce-on-failure model. Other models (including IM2000's "pull-and-unpin-when-not-needed" model) would not necessarily suffer from these particular problems.

Well into "Basic Filtering Architecture", on Page 88, there appear two sample invocations of rblsmtpd via tcpserver command lines, but they are in a different coding style than corresponding examples (of tcpserver usage) earlier in the book (such as on Page 81).

At the end of that section, in the paragraph beginning "This telescoping, cascading, or chaining behavior..." on Page 89, I believe this technique is also referred to as "Bernstein chaining", crediting the author of qmail for its effective use for security as well as efficiency reasons.

At the end of the "SPF" section, on Page 95, perhaps point out the arbitrary number of queues that might be involved, by changing "before it is bounced." to "before it is bounced, with virtually no limit on the number of queues through which a message passes (though usually no more than 10 different queues are involved)".

Under "Lightweight" (methods for identifying spam) on Page 98, it is perhaps worth pointing out that SMTP clients (and corresponding servers) sitting behind Network Address Translation (NAT) routers are just one example of the challenging problem of "mixed-blessing" sources of email — sources that send some or even mostly spam, but also send (or might send) legitimate email, making outright blocking of all email based on the source IP address risk blocking legitimate email as well.

Under "Quarantines and Challenges" on Page 101, the second sentence should include the word "typically" between "These types of filters" and "fall into two categories", because a third type of filter, which, for lack of a better term, I call Abeyance, is possible — indeed, greylisting (normally) is a special case of putting email in abeyance.

Under "Bounce-Back Spam" on Page 103, I prefer the term "outscatter" (or, to hyphenate consistently with the author, "out-scatter") over "blow-back" or "back-scatter", because "outscatter" does not imply the recipient of the "scatter" had anything to do with initiating whatever chain of events led to generation of the "scatter". Indeed, in most cases, "outscatter" is sent not to the initiator, just to the alleged initiator — to whoever owns the email address forged in the "From:" header of the original email, or typed into the web form, etc. So it doesn't necessarily go "back" to whoever "started it".

Under "Recipient Validation is Insufficient" on Page 105, the second bullet item beginning "The recipient forwards the message elsewhere", add that another reason a bounce might be generated is that the server might successfully forward the message to another server that subsequently bounces it.

In the paragraph after that list of bullets, clarify that bounce-back spam can never be fully eliminated in SMTP — other protocols exist or can be created that eliminate bounce-back spam by eliminating bounces.

Nits: Well into "Basic Filtering Architecture", on Page 89, in the paragraph beginning "tcpserver knows nothing...", in the second-to-last sentence that reads "rblsmtpd, behaves similarly", delete that spurious comma.

Under "Lightweight Filtering" on Page 92, after the parenthetical URL for Russ Nelson's patch, a comma is preferable.

In the next paragraph, at the top of Page 93, insert a comma after "executable files" and before "so Russ...".

Under "Mistakes" on Page 102, in the last sentence at the end of the second paragraph (the first after the numbered list of questions), move "only" to after "makes sense".

In the next two paragraphs, "On the other hand" is used twice, making it a bit difficult to follow.

Under "Recipient Validation is Insufficient", on Page 106, the final paragraph in that section, insert "the" between "in" and "future", and either hyphenate the full adjective strings at the end (to read "with much-less-time-consuming methods") or, probably better, reword to reduce the number of adjectives.

Under "Summary" on Page 106, hyphenate "mailing list" before "support" (adjective string).

Email Abeyance

When put in abeyance, an email is not normally in the state of having been accepted by the incoming SMTP server (although an upstream server might have accepted it prior to forwarding it along to the next hop). Instead, the server has returned a temporary rejection response to whoever sent (or forwarded) the message, but still keeps at least some information on the message as is done by quarantine and challenge filters.

This normally results in a legitimate sender retrying the delivery at a later point in time, which might be sufficient to trigger successful transition of the email from abeyance to delivery and thereby cause the incoming SMTP server to return an acceptance response to the sender. (This is what typically happens when the sender retries delivery after an initial attempt results in greylisting of the email, though many greylisting implementations concern themselves solely with envelope and sender-IP information, not message content.)

Whether by sender retries, or as a result of human (or some other) intervention (which is required by quarantine-based and challenge-based filtering), a message can successfully transition from abeyance to delivery. Or, absent appropriate stimuli over a period of time, it can transition to permanent failure, which is reported to the sender if and when it next tries to transmit the (same) message.

As with greylisting, abeyance can cause increased bandwidth and other resource utilization between a legitimate (or legitimate-acting) client and a server that doesn't already trust the client. That's mainly because the SMTP protocol currently requires the entire message to be transmitted for the server to be able to (mostly) reliably determine which particular message is being (re-)delivered, as the message envelope (sent in distinct stages in the SMTP conversation prior to the email message itself) does not currently provide for transmitting a unique message identifier.

Abeyance has significant advantages over quarantine-based and challenge-based filters, however:

Senders who do not persist in retrying delivery attempts have less impact on the filtering system, because no human intervention is required of either the receiving system (somebody has to check into quarantined messages) or the alleged sender (somebody has to receive Challenges and decide whether to respond to them, potentially on a case-by-case basis).

(Almost all senders who do not retry delivery attempts are spammers and virus transmitters, because they send their payloads to huge numbers of recipients with little or no concern regarding whether any particular recipient acknowledges successful receipt of any particular payload.)
Because a message is not accepted by the SMTP server until the abeyance mechanism reasonably believes the sender and message are legitimate, generating bounces to the alleged sender after acceptance is rarely, if ever, required.
Abeyance does not depend on issuing Challenges of any sort (as the sender-retry mechanism is inherent to the SMTP protocol and its proper implementation by the client), so there's none of the significant additional burden of sending Challenges and then being prepared for Responses (including validating them).
Many error conditions discovered while a message is in abeyance can be reported to the sender via the SMTP protocol when the sender retries delivery, rather than reported to the alleged sender via a distinct bounce message.

Indeed, if the SMTP protocol supported it, replies to messages could be returned via this mechanism, thus potentially eliminating concerns about sender-address forgery in many situations, while explicitly allowing for sending of anonymous emails (to willing recipients) and receiving replies to them.
Abeyance can theoretically be used on successive SMTP servers as a message "hops" from one to the next.

While the first server that uses abeyance-based filtering can't prevent the upstream SMTP client from responding to a rejected message by sending a bounce to the alleged sender, it can itself avoid sending bounces as a result of attempting to forward a message (still locally in abeyance) to servers that themselves use abeyance-based filtering. Only when a server accepts (or permanently rejects) the message is that final resolution (and explanation, if any) fed back through the chain of SMTP servers.

This isn't ideally implementable within SMTP as it currently stands (for example, a recipient could receive multiple rejection notices from different servers as a result of sending a single message), but is feasible with modest extensions, whereas quarantine-based and challenge-based filters cannot simply and easily handle cases where a message they "accept" is to be forwarded to another email address or server, because that server might itself quarantine or challenge the message, resulting in significant duplication of effort.

Though I'm unaware of any implementations of abeyance-based filters for SMTP, I believe they might potentially be quite useful for SMTP in at least some situations, and conceptually very useful for many client/server protocols whenever the server cannot necessarily trust that a client is sufficiently committed to injecting content into (or requesting some type of action by) that particular server, versus just trying millions or billions of servers as part of a spam, scam, or virus-infection operation.

Previous: Chapter 5

Book Review: Qmail Quickstarter

Next: Chapter 7

More Reviews