> Phil's technical blog : DNS cache poisoning

Friday, August 22, 2008

DNS cache poisoning - the state of things!

After reading the excellent book on BGP that Kip got me I've been turning my attention more to how the internet works and the subject on everyone's lips at the moment is the problems with DNS that Dan Kaminsky discovered earlier this year. Traditional DNS-cache poisoning has been around for ages (I blogged an article from my institute's magazine back in 2005 - read it here). The aspect that made that kind of DNS spoofing not so much of an issue was that the window of opportunity that a hacker had to persuade a DNS server to point a domain name at an incorrect (i.e. compromised!) IP address was tiny - you had to get the fake DNS response in between the end of the record's TTL and the next time the server made a proper query. For that reason it was never a big deal.
The best explanation I've heard about the new attack was best summerised by the mighty Steve Gibson on this week's Security Now podcast;

LEO: ...you stated that the idea of debouncing spoofed DNS by requiring two identical query results had been aired but discarded because this would double the number of packets required for each lookup. Is that really true? The key point is to only allow changes to the DNS when they've been validated with a second query and reply. Each DNS server holds a cache of results from previous lookups. So it will only generate a new request in one of two cases: One, it hasn't seen the address before, boom, then you're going to ask again; or, two, the TTL, the time to live, has run out on the cached result. In this case a single request would be generated. If the result is the same as the cache, no second request necessary. If it has changed, then it would be verified with another request.

So what he's saying is you'd only have to make that second request if something seemed out of the ordinary or was a new address. Assuming that DNS records rarely change, the number of additional verification query packets related to TTL expiration would be very low. The question then becomes, how often are DNS lookups a complete cache miss, that is, no record exists, expired or not, because that's the only case when the 2x increase would occur. Is that often enough that you get a cache miss? Is it still a problem? So is he understanding the idea of debounce properly?

STEVE: Well, kind of. First of all, I liked his clever notion that, if a DNS name server had an entry in its cache which it was going to verify by asking again, then if it got the same answer, and we know that, I mean, the whole concept of DNS is that it's a distributed database, so chunks of the current mapping between domain names and IP addresses are able to float around on servers that only periodically check in to see if there's been any change. So that's a really nice tradeoff. It means that there's no, like, sort of central server that bears the brunt of everyone asking questions. But the tradeoff is that you don't get instantaneous updates in the event that an IP address changes. So, for example, I know you and I, Leo, have from time to time needed to change the IP address of a domain and there's this notion of it needing to propagate through the Internet. What that really means is that all spread around the world are, in locations where people have accessed a domain recently, there is a slowly expiring copy. And as long as it's not yet expired, then when questions come to that server, they'll be answered from the cache rather than going back and having to re-resolve it immediately. So it's a clever, beautifully clever solution. So he's noted that, in the event that a name server has some data in its cache, when it sees that it's expired it could only make a single query in order to verify that what's there is still current. And if the result it gets back is different, then that raises its suspicion that, oh, wait a minute, maybe I need to check this a few more times to increase confidence that I didn't get a spoofed response.
The one thing that this misunderstands, which is why I really liked the question, is that what it was that Dan Kaminsky realized was that you could force servers to make queries rather than waiting for their caches to expire. That was the brilliant gotcha that occurred to him. It used to be, I mean, if you - like three months ago, before this whole DNS spoofing nightmare arose, you could put DNS spoofing or spoofing DNS or something into Google, and you'd get pages and pages and pages. I mean, this notion of spoofing DNS is not new. But everyone believed that you had to wait for the server's existing cached record to expire, and then you could make - as soon as it expired, it's not going to replace it all by itself. It's going to wait for a query. And then, upon receiving a query, it checks the record to see if it's expired, thereby launching its own query out to resolve that name.

So the idea would be you would, you know, you bad hacker person would sit there, wait for the record to expire - because when you ask the server, it tells you how much time there is left remaining on that record. And that's a cool thing, too. If you think about it, it has to because then you might be caching that record, and you don't want to start at eight hours again. You want to understand how stale the record is so that your own cache will expire at the same time that the cache of the source of that record was. So anyone asking a DNS server - you can tell I've been living in DNS for a couple weeks, and I've really got this stuff now on the tip of my tongue. Anyone talking to a DNS server, querying it, knows how much longer it'll be before one of its records will expire. So they're able, as soon as that happens, that's when they launch their query to it, knowing that it's having to launch a query out, and then they rush their spoofed response back in. That's the traditional way. And that limited you to one shot at replacing a record, only every TTL interval. Which is typically one day. A standard Internet time to live for DNS is a day. So once a day you had a tiny window of opportunity. Clearly this wasn't a huge problem.

What Dan realized is you could ask it for a bunch of nonexistent machines in a domain, forcing it to constantly ask upstream for that domain's name server if this machine exists. And you could sneak in a response to that request that carried replacement name server records. And of course that's the nature of the attack. So Paul was right that, if we didn't have what Kaminsky realized happening, then you could do verification essentially in a cost-effective manner, only duplicating some queries if a domain was asked for that was not in the cache, and then only again if a verification that the IP had not changed from what was in the cache came back with a different answer. So I think Paul was clever, but that doesn't solve the problem that we've just had.

LEO: Could you rewrite the way DNS works so that you couldn't ask for these multiple updates? I mean, isn't that a bug, too?

STEVE: Oh, yeah. I mean, there's all kinds, well, I'm sure right now, Leo, there are firewall vendors, you know, corporate firewall vendors madly adding code to their firewalls, and they'll have new bullet points on their new brochures in a month saying "Special next-generation DNS spoof-proofing for your corporate DNS server." So there's all kinds of things you could do that would, I mean, make it really obvious that, you know, here comes a flood of responses in response to one query? Okay, that's wrong. I mean, it stands out like a sore thumb. But right now nothing is aware of that. So I'm sure there are - doubtless firewall vendors are madly rushing to get theirs to market first because essentially one query goes out and 10,000 come in. It's like, okay, maybe I'm going to ask that question again.

As mentioned previously OpenDNS suffers none of these issues and protects you from ISP snooping via the Phorm or NebuAD 'services'(!)

If you go back and listen to the last couple of Security Now podcasts you'll see that the thing that DNS server writers (who maintains BIND?) are going for is to increase the entropy in their DNS look-up requests (and hence DNS responses which is where the poisoning can be inserted) - Randomising the transaction ID and the UDP-port number means that the chance of the spoofed response being accepted drop but the real clever hack that requires no changing to the upstream DNS is the one Steve spoke about last week;

STEVE: This is the galactic hack. I don't know if I can think of something that is more a hack, more sort of like, oh my goodness, but clever. And so, I mean, this is the definition of a hack. And it works without any additional overhead, without any additional packets, without any additional anything within the existing infrastructure of DNS. So, and remember that, as I said toward the beginning of the show, what we want is more bits. The idea being, we want the query that the resolving name server sends out to contain more entropy, more bits of randomness which the responder will be able to easily send back, such as the matching query ID and the matching port number, but also something else. What more could it possibly send back within the existing definition of DNS? It's referred to as the 0x20, or the 0X20 hack. It refers to the 20 bit because that is the difference in an ASCII representation of text between uppercase and lowercase.

LEO: Right. You add that bit, and it's uppercase.

STEVE: For example, 41 and 61 are the hex for upper and lowercase A. The difference being that 20 bit. So get a load of this. The DNS spec, the original RFC that everybody wrote to and coded to and follows, says that case is not significant, but it will be preserved. Meaning that - and you'll notice this. You could do GRC.com in all uppercase, or GRC.com in all lowercase, or GRC.com in any random combination of upper and lowercase, and you'd get to GRC.com. And I notice that, like, TWiT.tv is capital T, capital W, lowercase I, capital T; right? And that works. The point is, case doesn't matter. But in a DNS query and response, case is preserved, meaning that when I issue a query, the response I get comes back with the same case of the alphabetic characters as I issued. Yet the authoritative name server that is checking to see if it's got a match within its records, it ignores the case, doesn't care if the alphabetic characters are upper or lower, to find a match. That gives us more bits.

LEO: Oh. That is clever.

STEVE: Oh my goodness, isn't that just incredibly cool? What it means is that the querying server can randomly upper and lowercase all the alphabetic characters in its query. So, you know, we've got www.domainname.com. So we've got, okay, www, there's three; com is three more alphabetic characters. Plus however many alphabetic characters are in the name. And more if you are several - if you have several domain, dotted domain names. All those characters can have a random case. They will go to the resolver that will - I'm sorry. They will go to the authoritative name server that will ignore your wacky casing that you've done.

LEO: Right, because it's case insensitive, yeah.

STEVE: But it will return your - it will return in its reply, it returns the same case that you sent it, which means you can - you the querier, who is trying to be spoof-resistant, can verify...

LEO: Oh, if it's the same one you sent.

STEVE: In addition to. So the domain name - basically the response echoes the query and adds the answers. So you get back what you queried as part of the answer.

LEO: Plus the dotted quad. But now if somebody's in the middle, if they're doing a man in the middle, they're going to see your randomly capitalized query. So can't they just copy it back?

STEVE: Oh, Leo, DNS is completely hosed by man in the middle. Man in the middle...

LEO: Oh, this doesn't protect against that. This - okay.

STEVE: No, no. In fact, man in the middle is a single query attack because if you're able to see the query and block the reply, you simply respond because you know exactly...

LEO: Okay, all right, all right. So this only is good against cache poisoning.

STEVE: Well, this is good - well, and which is the problem that we're trying to solve. This is good against a third party trying to guess queries in order to spoof replies. What this means is the query is now much more difficult to guess by the number of alphabetic, you know, two to the power of the number of alphabetic characters in the name being queried. So, for example, there could easily be 10 alphabetic characters - www and com gives us six. Seven, eight, nine, 10, so even a four-letter domain would give us ten additional bits. Well, 10, that's another 1,024 possibilities. So we've just added 10 bits of difficulty, looking up www and a four-letter domain, and many domains are much longer than that. So it's just, I mean, that is a hack. I mean, it's like, it's ugly, but it works.

LEO: So that's just something you obviously have to patch the DNS servers to do.

STEVE: You don't have to patch the authoritative name servers. That is to say, they're already programmed to echo the incoming packet's case and ignore the incoming packet's case.

LEO: Oh, so it only would have to be the querying servers you'd have to change.

STEVE: Right. So, I mean, just as we - we just changed the querying servers this month so that they would do reliable source port randomization. So we would just need to change them again if the decision were made or if anyone wanted to make the decision, or you could imagine this would be, you know, all this stuff is open source, in the case of BIND, for example. So you could have a BIND compile time flag saying I want this server to employ the 0x20 hack, and then your build of your BIND would issue queries with random alphabetic case on all the alphabetic characters, thus making itself far more difficult to spoof. And the beauty is it would automatically work with all the other servers on the 'Net that don't need to be changed. So this is something that individual, for example, BIND users, companies, ISPs, end-users, could easily do when this is an available compile-time or run-time option of BIND that just makes that one server much more resistant to spoofing.

LEO: Easy thing to add. But it would be a patch. You'd have to change the send, and you'd also have to remember that and the query and look at the result and make sure that they matched. But that's an easy compare. That's pretty fast.

STEVE: Right. So we're talking, exactly, we're talking additional code. Already obviously there is code which verifies that the incoming port and the incoming transaction ID match up with what's expected. So you just increase the expectation to verify that the incoming reply had case of its alphabetic characters that matched what you sent out, which is what you were expecting. Right now, no servers, as far as I know, care. So all you need to do is to add that they care about matching returning response case, and then randomize your outgoing case, and you've got a much harder to spoof server. Just you. You don't need anything of anybody else. Your server is now substantially more difficult to spoof.

Sorry to cut'n'paste so much from another source but Security Now really is essential listening if you are at all interested in the inner workings of the Internet.

1 comment:

Anonymous said...: I don't understand all of this, but I get some of it. Interesting.

Question, who is transcribing so much of the podcast?

p.s. Outnumbered season 2 is coming!; 11:30 pm