Software Consulting Tornado Icon Software Consulting Tornado Icon

Cache Explosion Attacks


Given a sufficiently large database, such that caching is used to assure acceptable performance in most cases, allowing an untrusted entity to trigger searches using arbitrary keys can invite an attack on the performance of the cache.

(Note that a database to which a new record is added as a result of a search for a nonexistent record can be viewed, in this context, as a cache for a larger, possibly conceptual, database. E.g. a database of known Internet domain names becomes a sort of cache for all potential domain names if a search for an arbitrary domain name can be triggered by an untrusted party and a new record added if an existing record is not found.)

The Internet's DNS is an example of such a database. It relies heavily on caching to perform properly. To the extent cache miss rates decline, the root servers are increasingly relied upon to respond to each query, potentially overloading the system.

An attacker can "blow up" a cache if it can formulate queries to that cache for nonexistent records by providing arbitrary keys. That is, it can overload the ability of the cache to keep useful information.

(In the case of DNS, "useful information" is that which helps users of the local system find resources, such as jcb-sc.com or google.com. In contrast, an incoming spam email with a URL to the spammer's site does not result in useful information — such as the spammer's site's domain name — being searched for in DNS, if the spam itself would not otherwise be followed up on by its recipients.)

Examples of such vulnerabilities include what might be called "backward lookups" on domain names, email addresses, and so on. If an SMTP server does any DNS queries on the domain name for the envelope sender on each incoming email, an attacker can blow up the DNS cache by injecting lots of email, each with an arbitrary (perhaps randomized) envelope sender domain name.

Similarly, automatic backward lookups on URLs in incoming email risk running afoul of this problem, since spammers can provide any number of arbitrary domain names in those URLs. This highlights a weakness in Paul Graham's Filters that Fight Back proposal.

(This attack is not so problematic when the keys are not entirely arbitrary. For example, reverse-IP lookups — finding the host name that is associated with an IP address — do not suffer from this weakness on quite the same scale, as long as the attacker cannot easily forge arbitrary IP addresses, especially IPv4 addresses.)

Anti-forgery systems such as SPF and DomainKeys have this weakness.

(To those who claim DNS is "strong enough" to support the overhead of not only having potentially every domain publish SPF and DK records, but having each and every email transaction involve multiple backward-looking queries on arbitrary keys to what would, in most cases, be a variety of nameservers, I ask this question: "Why not just use DNS to publish all the outgoing email sent by each domain?" Putting aside technical details, DNS is conceptually able to provide a record such as "message-id.MsgIDs.example.com" containing each outgoing email with the corresponding Message-ID. That seems like a superb way to handle distribution of submissions to mailing list subscribers, and similar approaches could be used to enable roaming users to efficiently query their mailboxes. If DNS isn't strong enough cope with this kind of load, how do we know it'll cope with the loads placed on it by schemes like SPF and DK, once they are widely adopted?)

Generally, this form of attack reflects the asymmetry, of resources invested in the pertinent transactions, between an attacker and the target: the attacker need only forge an arbitrary key, while the target searches a (potentially large, possibly remote and therefore unreliable) database using that key.

Countermeasures can be used to offset (but not eliminate) the problem, but the best way to avoid this weakness is to "design it out" of the system in the first place (in the protocol, the architecture, whatever), by avoiding inherent asymmetries in interactions between untrusted and trusted entities (usually that means between clients and servers, respectively).

In "real life", we do not perform "backward lookups" on the alleged source(s) of each and every communication we receive to assure ourselves that they are the authentic sources of those communications. That's because we realize that we do not have the resources, nor the need, to do so.

Instead, we first determine whether the communication inherently requires such authentication: does it require some kind of action on our part?; does that action depend on the source of the communication being authenticated?; and, are the resources we'd expend on the authentication justified by the nature of the communication?

Communications protocols in hostile environment should be designed to avoid this pitfall, by leaving the decision to perform backward lookups to the ultimate recipient of the communication, where authentication can be deemed desireable or necessary based on the recipient's understanding of the implications of the communication.

For these and other reasons, automatic Challenge/Response systems also have potential problems in sufficiently hostile environments.

Google

Back to my "hostile environments" page.


Copyright (C) 2006 James Craig Burley, Software Craftsperson
Last modified on 2007-07-10.