brunch

You can make anything
by writing

C.S.Lewis

by Younggi Seo Jul 18. 2019

Lecture 12: Network Security

MIT OPEN COURSEWARE

Description: In this lecture, Professor Zeldovich discusses network security, and how TCP/IP has evolved.

Instructor: Nickolai Zeldovich


The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.


https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-858-computer-systems-security-fall-2014/video-lectures/lecture-12-network-security


PROFESSOR: All right, guys, let's get started. So today, we're going to talk about networksecurity. And in particular, we're going to talk about this paper on TCP/IP security by this guy Steve Bellovin, who used to be at AT&T and now is at Columbia. One interesting thing aboutthis paper is it's actually a relatively old paper. It's more than 10 years old. And in fact, it'scommentary on a paper that was 10 years before that. And many of you guys actually ask,why are we reading this if many of these problems have been solved in today's TCP protocol stacks?


So one interesting point-- so it's true that some of these problems that Steve describes in thispaper have been solved since then. Some of them are still actually problems today. We'll sort of look at that and see what's going on. But you might actually wonder, why didn't peoplesolve all these problems in the first place when they were designing TCP? What were they thinking?


And it's actually not clear. So what do you guys think? Why wasn't TCP designed to besecure with all these considerations up front? Yeah, any guesses? All right, anyone else?Yeah.

AUDIENCE: The internet was a much more trusting place back then.

PROFESSOR: Yeah, this was almost literally a quote from this guy's paper. Yeah, at the time-- the whole internet set of protocols was designed I guess about 40 years ago now. Therequirements were totally different. It was to connect a bunch of relatively trusting sites that allknew each other by name.


And I think this is often the case in any system that becomes successful. The requirementschange. So it used to be that this was a protocol for a small number of sites. Now it's theentire world. And you don't know all the people connected to the internet by name anymore.You can't call them up on the phone if they do something bad, etcetera.

So I think this is a story for many of the protocols we look at. And many of you guys havequestions, like, what the hell were these guys thinking? This is so broken. But in fact, theywere designing a totally different system. It got adopted.


Same for the web, like we were looking at in the last couple of weeks. It was designed for avery different goal. And it expanded. And you sort of have these growing pains you have tofigure out how to make the protocol adapt to new requirements.


And another thing that somewhat suddenly happened is I think people also in the processgained a much greater appreciation for the kinds of problems you have to worry about insecurity. And it used to be the case that you didn't really understand all the things that youshould worry about an attacker doing to your system.


And I think it's partly for this reason that it's sort of interesting to look at what happened toTCP security, what went wrong, how could we fix it, etcetera, to both figure out what kinds of problems you might want to avoid when designing your own protocols, and also what's the right mindset for thinking about these kinds of attacks. How do you figure out what an attacker might be able to do in your own protocol when you're designing it so you can avoid similar *pitfalls?


*pitfall: 위험, 곤란


All right, so with that preamble aside, let's actually start talking about what the paper is about.So how should we think about security in a network? So I guess we could try to start from firstprinciples and try to figure out, what is our threat model? So what do we think the attacker isgoing to be able to do in our network?


Well, relatively straightforwardly, there's presumably being able to intercept packets, andprobably being able to modify them. So if you send a packet over the network, it might beprudent to assume that some bad guy out there is going to see your packet and might be ableto change it before it reaches the destination, might be able to drop it, and in fact might beable to inject packets of their own that you never sent with arbitrary contents.


And probably-- so this you can sort of come up with fairly straightforwardly by just thinking,well, if you don't trust the network, some bad guy is going to send arbitrary packets, see yours, modify them, etcetera. Somewhat more worryingly, as this paper talks about, the badguy can also participate in your protocols. They have their own machine, right?

So the attacker has their own computer that they have full control over. So even if all thecomputers that you trust are reasonably maintained, they all behave correctly, the bad guyhas his own computer that he can make it do whatever he wants. And in fact, he canparticipate in a protocol or distribute a system.


So if you have a routing protocol, which involves many people talking to each other, at somescale, it's probably going to be impractical to keep the bad guys out. If you're running a routing protocol with 10 participants, then maybe you can just call all them up and say, well,yeah, yeah, I know all you guys.


But at the scale of the internet today, it's unfeasible to have sort of direct knowledge of whateveryone else or who everyone else in this protocol is. So probably some bad guy is going tobe participating in your protocols or distributed systems. And it's important to designdistributed systems that can nonetheless do something reasonable with that.

All right, so what are the implications of all these things? I guess we'll go down the list. Sointercepting is-- it's on the whole easy to understand. Well, you shouldn't send any importantdata over the network if you expect a bad guy to intercept them, or at least not in clear text.Maybe you should encrypt your data.


So that seems relatively straightforward to sort of figure out. Although still you should sort ofkeep it in mind, of course, when designing protocols. Now, injecting packets turns out to leadto a much wider range of interesting problems that this paper talks about. And in particular,attackers can inject packets that can pretend to be from any other sender. Because the waythis works in IP is that the IP packet itself has a header that contains the source of the packet and the destination.


And it's up to whoever creates the packet to fill in the right values for the source anddestination. And no one checks that the source is necessarily the correct. There's somefiltering going on these days. But it's sort of fairly spotty, and it's hard to rely on. So to a firstapproximation, an attacker could fill in any IP address as the source, and it will get to thedestination correctly. And it's interesting to try to figure out what could an attacker do withsuch a capability of sending arbitrary packets.


Now, in the several weeks up to this, like in buffer overflows and web security, we looked at,to a large extent, implementation bugs, like, how could you exploit a buffer overflow? Andinterestingly, the author of this paper is actually not at all interested in implementation bugs.He's really interested in protocol errors or protocol mistakes.

So what's the big deal? Why is he down on implementation bugs, even though we spentseveral weeks looking at them? Why does it matter? Yeah.


AUDIENCE: Because we have to keep those bugs [INAUDIBLE].

PROFESSOR: Yeah, so this is the really big bummer about a bug in your protocol design.Because it's hard to change. So if you have an implementation bug, well, you had a memcpyor a print-out out of some sort that didn't check the range. OK, well, you had a range check,and it still works, and now it's also secure. So that's great.


But if you have some bug in the protocol specification, in how the protocol has to work, thenfixing a bug is going to require fixing a protocol, which means potentially affecting all thesystems that are out there speaking this protocol. So if we find some problem in TCP, it'spotentially quite devastating. Because every machine that uses TCP is going to have tochange. Because it's going to be hard to make it potentially backwards compatible.


We'll see exactly what these bugs are. But this is the real reason he's so excited aboutlooking at protocol bugs. Because they're fairly fundamental to the TCP protocol thateveryone agrees to speak.


So let's look at one of these guys. So the first example he points out has to do with how TCPsequence numbers work. So just to re-explain-- yeah, question.


AUDIENCE: I'm just curious. This is a tiny bit off topic. But let's say you do find a bug in TCP.How do you make the change to it? How do you tell all the computers in the world to changethat?

PROFESSOR: Yeah, I think it's a huge problem. What if you find a bug in TCP? Well, it's unclear what to do. And I think the authors here struggle a lot with that. And in many ways, ifyou could redesign TCP, many of these bugs are relatively easy to fix if you knew what tolook for ahead of time.


But because TCP is sort of relatively hard to fix or change, what ends up happening is thatpeople or designers try to look for backwards compatible tweaks that either allow oldimplementations to coexist with the new implementation or to add some optional field that ifit's there, then the communication is more secure in some way.


But it is a big problem. If it's some security issue that's deeply ingrained in TCP, then it's goingto be a pretty humongous issue for everyone to just pack up and move onto a TCP versionwhatever, n plus 1. And you can look at IPv6 as one example of this not happening. We'veknown this problem was going to come up for like 15 years or 20 years.


IPv6 has been around for well over 10 years now. And it's just hard to convince people tomove away from IPv4. It's good enough. It sort of works. It's a lot of overhead to move over.And no one else is speaking IPv6, so why should I start speaking this bizarre protocol that no one else is going to speak to me in?


So it's sort of moving along. But I think it takes a long time. And there's going to be reallysome motivation to migrate. And backwards compatibility helps a lot. Not good enough for, I guess, IPv6-- IPv6 has lots of backwards compatibility plans in it. You can talk to an IPv4 hostfrom IPv6. So they try to engineer all this support. But still, it's hard to convince people toupgrade.

All right, but yeah, looking back at the TCP sequence numbers, we're going to look at actuallytwo problems that have to do with how the TCP handshake works. So let's just spend a littlebit of time working out what are the details of how a TCP connection gets initially established.


So there's actually three packets that have to get sent in order for a new TCP connection to be established. So our client generates a packet to connect to a server. And it says, well,here's my IP address, C, client. I'm sending this to the server.


And there's various fields. But the ones that are interesting for the purpose of this discussionis going to be a sequence number. So there's going to be a syn flag saying, I want tosynchronize state and establish a new connection. And you include a client sequence numberin the initial syn packet.


Then when the server receives this, the server is going to look and say, well, a client wants to connect to me, so I'll send a packet back to whatever this address is, whoever said they'retrying to connect to me. So it'll send a packet from the server to the client and include its ownsynchronization number, SN server. And it'll acknowledge the client's number.


And finally, the client replies back, acknowledging the server synchronization number--acknowledge SNS. And now the client can actually start sending data. So in order to senddata, the client has to include some data in the packet, and also put in the sequence numberof the client to indicate that this is actually sort of legitimate client data at the start of theconnection. It's not some data from later on, for example, that just happens to arrive nowbecause the server missed some initial parts of the data.


So generally, all these sequence numbers were meant for ensuring in order delivery ofpackets. So if the client sends two packets, the one that has the initial sequence number,that's the first chunk of data. And the one with the next sequence number is the next chunk ofdata.


But it turns out to also be useful for providing some security properties. Here's an example ofthese requirements changing. So initially, no one was thinking TCP provides any securityproperties. But then applications started using TCP and sort of relying on these TCPconnections not being able to be broken by some arbitrary attacker, or an attacker not beingable to inject data into your existing TCP connection. And all of a sudden, this mechanismthat was initially meant for just packet ordering now gets used to guarantee some semblanceof security for these connections.


So in this case, I guess the problem stems from what could a server assume about this TCPconnection. So typically, the server assumes-- implicitly, you might imagine-- that thisconnection is established with the right client at this IP address C. It seems like a naturalthing to assume.


Is there any basis for making this assumption? If a server gets this message saying, here'ssome data on this connection from a client to a server, and it has sequence number C, whymight the server conclude that this was actually the real client sending this?


AUDIENCE: Because the sequence number is hard to guess.

PROFESSOR: Right, so that's sort of the implicit thing going on, that it has to have the rightsequence number C here. And in order for this connection to get established, the client musthave acknowledged the server sequence number S here. And the server sequence number Swas only sent by the server to the intended client IP address. Yeah.


AUDIENCE: How many bits are available for the sequence number?

PROFESSOR: So sequence numbers in TCP are 32 bits long. That's not entirely easy toguess. If it was really a random 32 bit number, it would be hard to just guess. And you'dprobably waste a lot of bandwidth trying to guess this. Yeah, question.


AUDIENCE: The data frequency number is higher than the initial sequence number?

PROFESSOR: Yeah, so basically, these things get incremented. So every time you send asyn, that counts as one byte against your sequence number. So this is SNC. I think actuallywhat happens is this is SNC plus 1. And then it goes on from there.

So if you send 5 bytes, then the next one is SNC initial plus 6. So this just counts the bytesthat you're sending. SYNs count as 1 byte each. Make sense? Other questions about this?

All right, so typically, or at least the way the TCP specification recommended that peoplechoose these sequence numbers, was to increment them at some roughly fixed rate. So theinitial RFCs suggested that you increment these things at something like 250,000 units, plus250,000, per second.


And the reason that it wasn't entirely random is that these sequence numbers are actuallyused to prevent out of order packets, or packets from previous connections, from interferingwith new connections. So if every time you established a new connection you chose acompletely random sequence number, then there's some chance if you establish lots ofconnections over and over that some packet from a previous connection is going to have asimilar enough sequence number to your new connection and is going to be accepted as avalid piece of data on that new connection.


So this is something that the TCP designers worried a lot about-- these out of order packetsor delayed packets. So as a result, they really wanted these sequence numbers to progressin a roughly monotonic matter over time, even across connections. If I opened oneconnection, it might have the same source and destination, port numbers, IP addresses, etcetera. But because I established this connection now instead of earlier, packets from earlierhopefully aren't going to match up with the sequence numbers I have for my new connection.So this was a mechanism to prevent confusion across repeated connection establishments.Yeah.


AUDIENCE: So if you don't know exactly how much your other grid that you're talking to isgoing to improve the sequencing pack, how do you know that the packet you're getting is thenext packet if there wasn't [INAUDIBLE] immediate packet that you--

PROFESSOR: So typically you'll remember the last packet that you received. And if the nextsequence number is exactly that, then this is the next packet in sequence. So for example,here, the server knows that I've seen exactly SNC plus 1 worth of data. If the next packet hassequence number SNC plus 1, that's the next one.


AUDIENCE: So you're saying that when you establish a sequence number, then even afterthat you're committing it--

PROFESSOR: Well, absolutely, yeah, yeah. So these sequence numbers, initially when you establish it, they get picked according to some plan. We'll talk about that plan. You can sort ofthink they might be random. But over time, they have to have some flow for initial sequencenumbers for connection.


But within a connection, once they're established, that's it. They're fixed. And they just tickalong as the data gets sent on the connection, exactly. Make sense? All right, so there weresome plans suggested for how to manage these sequence numbers. And it was actually areasonable plan for avoiding duplicate packets in the network causing trouble.

But the problem, of course, showed up that attackers were able to sort of guess thesesequence numbers. Because there wasn't a lot of randomness being chosen. So the way thatthe host machine would choose these sequence numbers is they have just a running counter in memory. Every second they bump it by 250,000. And every time a new connection comesin, they also bump it by some constant like 64k or 128k. I forget the exact number. So this was relatively easy to guess, as you can tell. You send them their connection request, and you see what sequence number comes back. And then you know the next one is going to be64k higher than that. So there wasn't a huge amount of randomness in this protocol.


So we can just sketch out what this looks like. So if I'm an attacker that wants to connect to aserver but pretend to be from a particular IP address, then what I might do is send a requestto the server, very much like the first step there, include some initial sequence number that Ichoose. At this point, any sequence number is just as good, because the server shouldn'thave any assumptions about what the client's sequence number is.


Now, what does the server do? The server gets the same packet as before. So it performsthe same way as before. It sends a packet back to the client with some server sequencenumber and acknowledges SNC. And now the attacker, if the attacker wants to establish aconnection, needs to somehow synthesize a packet that looks exactly like the third packetover there. So it needs to send a packet from the client to the server.


That's easy enough. You just fill in these values in the header. But you have to acknowledgethis server sequence number SNS. And this is where sort of the problems start. If the SNSvalue is relatively easy to guess, then the attacker is good to go. And now the server thinksthey have an established connection with a client coming from this IP address.


And now an attacker could inject data into this connection just as before. They just synthesizea packet that looks like this, has the data, and it has the client sequence number that in factthe adversary chose. Maybe it's plus 1 here. But it all hinges on being able to guess thisparticular server supplied sequence number. All right, does this make sense? Yeah.


AUDIENCE: What's the reason that the server sequence number isn't completely random?

PROFESSOR: So there's two reasons. One, as I was describing earlier, the server wants tomake sure that packets from different connections over time don't get confused for oneanother. So if you establish a connection from one source port to another destination port,and then you close the connection and establish another one of the same source anddestination port, you want to make sure the packets from one connection don't appear to bevalid in another connection.


AUDIENCE: So the server sequence number is incremented for every one of their packets?

PROFESSOR: Well, so the sequence numbers within a connection, as I was describing, getbumped with all the data in a connection. But there's also the question of, how do you choosethe initial sequence number here?


And that gets bumped every time a new connection is established. So the hope is that by thetime it wraps around 2 to the 32 and comes back, there's been enough time so that oldpackets in the network have actually been dropped and will not appear as duplicatesanymore. So that's the reason why you don't just choose random points, or they didn't initiallychoose random points. Yeah.


AUDIENCE: So this is a problem between connections, for a connection between the sameguide, the same client, the same server, the same source port, the same destination. Andwe're worried about old packets--

PROFESSOR: So this is what the original, yeah, TCP designers were worried about, which iswhy they prescribed this way of picking these initial sequence numbers.


AUDIENCE: If you have different new connections, you could differentiate.

PROFESSOR: That's right, yeah.


AUDIENCE: So then I don't see why the incrementing stuff and not just take randomly.

PROFESSOR: So I think the reason they don't pick randomly is that if you did pick randomly,and you established, I don't know, 1,000 connections within a short amount of time from thesame source to the same destination, then, well, every one of them is some random value of module 2 to the 32. And now there's a nontrivial chance that some packet from oneconnection will be delayed in the network, and eventually show up again, and will getconfused for a packet from another connection. This is just sort of nothing to do with security.This is just their design consideration initially for reliable delivery.


AUDIENCE: [INAUDIBLE] some other client to the server, right?

PROFESSOR: Sorry?

AUDIENCE: This is [INAUDIBLE] some other client?

PROFESSOR: That's right, yeah. So we haven't actually said why this is interesting at all forthe attacker to do. Why bother? You could just go from his old IP address, right?

AUDIENCE: So what happens for the server [INAUDIBLE]?

PROFESSOR: Yes, this is actually an interesting question. What happens here? So this packet doesn't just get dropped. It actually goes to this computer. And what happens?

AUDIENCE: [INAUDIBLE], they just mentioned you try and do it like they would try and do itwhen the other computer was updating or rebooting or off, or something.

PROFESSOR: Right, certainly they felt, oh, that computer is offline. The packet will just getdropped, and you don't have to worry about it too much. If a computer is actually listening onthat IP address, then in the TCP protocol, you're supposed to send a reset packet resettingthe connection. Because this is not a connection that computer C knows about.

And in TCP, this is presumed to be because, oh, this is some old packet that I requested longago, but I've since forgotten about it. So the machine C here might send a packet to theserver saying, I want a reset. I actually forget exactly which sequence number goes in there.But the client C here knows all the sequence numbers and send any sequence number as necessary and reset this connection.


So if this computer C is going to do this, then it might interfere with your plan to establish aconnection. Because when S gets this packet, it says, oh, sure, if you don't want it, I'll reset your connection.


There's some implementation-ish bugs that you might exploit, or at least the author talks about, and an potentially exploiting, that would prevent client C from responding. So forexample, if you flood C with lots of packets, it's an easy way to get him to drop this one. Itturns out there are other more interesting bugs that don't require flooding C with lots ofpackets that still get C to drop this packet, or at least it used to on some implementations onTCP stacks. Yeah.


AUDIENCE: Presumably, most firewalls would also [INAUDIBLE].

PROFESSOR: This one?

AUDIENCE: No, the SYN.

PROFESSOR: This one.

AUDIENCE: That came into a client, and a client didn't originally send a SYN to that server.And the firewall is going to drop it.

PROFESSOR: It depends, yeah. So certainly if you have a very sophisticated stateful firewallthat keeps track of all existing connections, or for example if you have a NAT, then this might happen. On the other hand, a NAT might actually send the RST on behalf of the client.

So it's not clear. I think this is not as common. So for example, on a Comcast network, Icertainly don't have anyone intercepting these packets and maintaining state for me andsending RSTs on my behalf or anything like that. Yeah.

AUDIENCE: So why can't the server have independent sequence numbers for each possiblesource?

PROFESSOR: Right, so this is in fact what TCP stacks do today. This is one example of howyou fix this problem in a backwards compatible manner. So we'll get to exactly the formulationof how you arrange this. But yeah, it turns out that if you look at this carefully, as you're doing,you don't need to have this initial sequence number be global. You just scope it to everysource/destination pair. And then you have all the duplicate avoidance properties we hadbefore, and you have some security as well.


So just to sort of write this out on the board of how the attacker is getting this initial sequencenumber, the attacker would probably just send a connection from its own IP address to theserver saying, I want to establish a new connection, and the server would send a responseback to the attacker containing its own sequence number S. And if the SNS for this connection and the SNS for this connection are related, then this is a problem.

But you're saying, let's make them not related. Because this is from a different address. Thenthis is not a problem anymore. You can't guess what this SNS is going to be based on thisSNS for a different connection. Yeah.


AUDIENCE: So you still have a collision problem, because you could engage the 32 bits bythe addresses of your peers. So you have a lot of ports for each one of these. So you stillhave conflicting sequence numbers for all of these connections that you're getting, right?

PROFESSOR: So these sequence numbers are specific, as it turns out, to an IP address and a port number source/destination duple. So if it's different ports, then they don't interfere with each other at all.

AUDIENCE: Oh, because you're using the port--

PROFESSOR: That's right, yeah, you also use the port in this as well.

AUDIENCE: Because I thought those ports--

PROFESSOR: Yeah, so the ports are sort of below the sequence numbers in some way ofthinking about it. Question?


AUDIENCE: If the sequence numbers are global, then doesn't the attacker [INAUDIBLE]?

PROFESSOR: Yeah, good point. So in fact, if the server increments the sequence numberby, I don't know, 64k I think it is, or it was, for every connection, then, well, you connect. Andthen maybe five other people connect. And then you have to do this attack.

So to some extent, you're right, this is a little troublesome. On the other hand, you couldprobably arrange it for your packet here to be delivered just before this packet. So if you sendthese guys back to back, then there's a good chance they'll arrive at the server back to back.


The server will get this one, respond with this sequence number. It'll get the next one, thisone, respond with the sequence number right afterwards. And then you know exactly what toput in this third packet in your sequence.


So I think this is not a foolproof method of connecting to a server. There's some guessinginvolved. But if you carefully arrange your packets right, then it's quite easy to make the rightguess. Or maybe you try several times, and you'll get lucky. Yeah.


AUDIENCE: So even if it's totally random, and you have to guess it, there are only like 4 billion possibilities. It's not a huge number, right? I feel like in the course of a year, you should be able to probably get through.

PROFESSOR: Right, yeah, so you're absolutely right. You shouldn't really be relying on TCPto provide security very strongly. Because you're right, it's only 4 billion guesses. And you canprobably send that many packets certainly within a day if you have a fast enough connection.


So it's sort of an interesting argument we're having here in the sense that at some level, TCPis hopefully insecure. Because it's only 32 bits. There's no way we could make it secure. But I think many applications rely on it enough that not providing any security at all is so much of anuisance that it really becomes a problem.


But you're absolutely right. In practice, you do want to do some sort of encryption on top ofthis that will provide stronger guarantees that no one tampered with your data, but where thekeys are more than 32 bits long. It still turns out to be useful to prevent people from tamperingwith TCP connections in most cases.


All right, other questions? All right, so let's see what actually goes wrong. Why is it a badthing if people are able to spoof TCP connections from arbitrary addresses? So one reasonwhy this is bad is if there is any kind of IP-based authorization. So if some server decideswhether an operation is going to be allowed or not based on the IP address it comes from,then this is potentially going to be a problem for an attacker who spoofed connections from anarbitrary source address.


So one example where this was a problem-- and it largely isn't anymore-- is this family of rcommands, things like rlogin. So it used to be the case that you could run something likerlogin into a machine, let's say athena.dialup.mit.edu. And if your connection was comingfrom a host at MIT, then this rlogin command would succeed if you say, oh yeah, I'm userAlice on this machine. Let me log in as user Alice onto this other machine. And it'll just trust that all the machines at mit.edu are trustworthy to make these statements.


I should say I think dial-up never actually had this problem. It was using Cerberus from thevery beginning. But other systems certainly did have such problems. And this is an exampleof using the IP address where the connection is coming from some sort of authenticationmechanism for whether the caller or the client is trustworthy or not. So this certainly used tobe a problem, isn't a problem anymore. So relying on IP seems like such a clearly bad plan.

Yet, this actually is still the case. So rlogin is gone. It was recently replaced by SSH now,which is good. On the other hand, there are still many other examples of protocols that relyon IP-based authentication.


One of them is SMTP. So when you send email, you use SMTP to talk to some mail server tosend a message. And to prevent spam, many SMTP servers will only accept incomingmessages from a particular source IP address. So for example, Comcast's mail server willonly accept mail from Comcast IP addresses. Same for MIT mail servers-- will only acceptmail from MIT IP addresses. Or there was at least one server that ISNT runs that has thisproperty.


So this is the case where it's still using IP-based authentication. Here it's not so bad. Worstcase, you'll send some piece of spam through the mail server. So that's probably why they'restill using it, whereas things that allow you to log into an arbitrary account stopped using IP-based authentication. So does this make sense, why this is a bad plan?

And just to double check, suppose that some server was using rlogin. What would you do to attack it? What bad thing would happen? Suggestions? Yeah.


AUDIENCE: Just getting into your computer, and then make a user that you want to log into,and then you get into the network.

PROFESSOR: Yeah, so basically you get your computer. You synthesize this data to look likea legitimate set of rlogin commands that say, log in as this user and run this command in myUnix shell there. You sort of synthesize this data and you mount this whole attack and sendthis data as if a legitimate user was interacting with an rlogin client, and then you're good togo.

OK, so this is one reason why you probably don't want your TCP sequence numbers to be soguessable. Another problem is these reset attacks. So much like we were able to send a SYNpacket, if you know someone's sequence number, you could also send a reset packet. Wesort of briefly talked about it here as the legitimate client potentially sending a reset to resetthe fake connection that the attacker is establishing.


But in a similar vain, the adversary could try to send reset packets for an existing connectionif there's some way that the adversary knows what your sequence number is on thatconnection. So this is actually not clear if this is such a big problem or or.

At some level, maybe you should be assuming that all your TCP connections could be brokenat any time anyway. It's not like the network is reliable. So maybe you should be expectingyour connections to drop.


But one place where this turned out to be particularly not a good assumption to make is in thecase of routers talking to one another. So if you have multiple routers that speak some routingprotocol, then they're connected, of course, by some physical links. But over some physicallinks, they actually speak some network protocol. And that network protocol runs over TCP.So there's actually some TCP session running over each of these physical links that therouters use to exchange routing information.


So this is certainly the case for this protocol called BGP we'll talk about a bit more in asecond. And BGP uses the fact that the TCP connection is alive to also infer that the link isalive. So if the TCP connection breaks, then the routers assume the link broke. And theyrecompute all their routing tables.


So if an adversary wants to mount some sort of a denial of service attack here, they could tryto guess the sequence numbers of these routers and reset these sessions. So if the TCPsession between two routers goes down, both routers are like, oh, this link is dead. We have to recompute all the routing tables, and the routes change. And then you might shoot downanother link, and so on.


So this is a bit of a worrisome attack, not because it violates someone's secrecy, etcetera, orat least not directly, but more because it really causes a lot of availability problems for otherusers in the system. Yeah.


AUDIENCE: So if you're an attacker, and you wanted to target one particular user, could youjust keep sending connection requests to a server on behalf of his IP and make him keepdropping his connections to the servers and so you just [INAUDIBLE]?

PROFESSOR: Well, so it requires you guessing. So you're saying, suppose I'm using Gmail,and you want to stop me from learning something in Gmail, so just send packets to mymachine pretending to be from Gmail. Well, you have to guess the right source anddestination port numbers.


The destination port number is probably 443, because I'm using HTTPS. But the source portnumber is going to be some random 16-bit thing. And it's also going to be the case thatprobably the sequence numbers are going to be different. So unless you guess a sequencenumber that's within my TCP window, which is in order of probably tens of kilobytes, you'realso going to be not successful in that regard.


So you have to guess a fair amount of stuff. There's no sort of oracle access. You can't justquery the server and say, well, what is that guy's sequence number? So that's the reasonwhy that doesn't work out as well.


So again, many of these issues were fixed, including this RST-based thing, especially forBGP routers. There was actually two sort of amusing fixes. One really shows you how you can carefully exploit existing things or take advantage of them to fix particular problems.Here, the insight is that these routers only want to talk to each other, not to someone else over the network. And as a result, if the packet is coming not from the immediate router nextacross the link, but from someone else, I want to drop this packet all together.


And what the designers of these writing protocols realized is that there's this wonderful field in a packet called time to live. It's an 8-bit field that gets decremented by every router to makesure that packets don't go into an infinite loop. So the highest this TTL value could ever be is255. And then it'll get decremented from there.


So what these writing protocols do-- it's sort of a clever hack-- is they reject any packet with a TTL value that's not 255. Because if a packet has a value of 255, it must have come from therouter just on the other side of this link. And if the an adversary tries to inject any packet totamper with this existing BGP connection, it'll have a TTL value less than 255, because it'll bedecremented by some other routers along the path, including this one.


And then it'll just get rejected by the recipient. So this is one example of a clever combinationof techniques that's backwards compatible and solves this very specific problem. Yeah.


AUDIENCE: Doesn't the bottom right router also send something with a TTL of 255?

PROFESSOR: Yeah, so these routers are actually-- this is a physical router. And it knows these are separate links. So it looks at the TTL and which link it came on. So if a packet came in on this link, it will not accept it for this TCP connection.

But you're right. For the most part, these routers trust their immediate neighbors. It need notnecessarily be the case. But if you keep seeing this problem, and you know you'veimplemented this hack, then it must be one of your neighbors. You're going to look. TCPdumped these interfaces. Why are you sending me these reset packets?

This problem is not as big. You can manage it by some Auto Pan mechanism. Make sense?All right, there are other fixes for BGP where they implemented some form of headerauthentication, MD5 header authentication as well. But they're really targeting this particularapplication where this reset attack is particularly bad.


This is still a problem today. If there's some long-lived connection out there that I really wantto shoot down, I just have to send some large number of RST packets, probably on the order of hundreds of thousands or so, but probably not exactly 4 billion. Because the servers areactually somewhat lax in terms of which sequence number they accept for a reset.

It can be any packet within a certain window. And in that case, I could probably, or anyattacker, reset an existing connection with a modest but not a huge amount of effort. That's still a problem. And people haven't really found any great solution for that.


All right, and I guess the sort of last bad thing that happens because these sequencenumbers are somewhat predictable is just data injection into existing connections. Sosuppose there is some protocol like rlogin, but maybe rlogin doesn't-- suppose we have somehypothetical protocol that's kind of like rlogin, but actually it doesn't do IP-basedauthentication. You have to type in your password to log in, all this great stuff.


The problem is once you've typed your password, maybe your TCP connection is justestablished and can accept arbitrary data. So wait for one of you guys to log into a machine,type in your password. I don't know what that password is. But once you've established TCPconnection, I'll just try to guess your sequence number and inject some data into your existingconnection. So if I can guess your sequence numbers correctly, then this allows me to makeit pretend like you've typed some command after you authenticated correctly with yourpassword.


So this all sort of suggests that you really don't want to rely on these 32-bit sequencenumbers for providing security. But let's actually see what modern TCP stacks actually do totry to mitigate this problem. So as we were sort of discussing, I guess one approach that we'lllook at in the next two lectures is how to implement some security at the application level. Sowe'll use cryptography to authenticate and encrypt and sign and verify messages at theapplication level without really involving TCP so much.


But there are some existing applications that would benefit from making this slightly better, at least not make it so easy to exploit these problems. And the way that I guess people do thisin practice today-- for example Linux and Windows-- is they implement the suggestion thatJohn gave earlier, that we maintain different initial sequence numbers for every sourcedestination pair.


So what most TCP SYN implementations do is they still compute this initial sequence numberas we were computing before. So this is the old style ISN, let's say. And in order to actuallygenerate the actual ISN for any particular connection, we're going to add a random 32-bitoffset. So we're going to include some sort of a function. Think of it like as like a hash functionlike SHA-1 or something maybe better.


And this is going to be a function of the source IP, the source port number, the destination IPaddress, destination port, and some sort of a secret key that only the server knows in thiscase. So this has the nice property that within any particular connection, as identified by asource and destination IP port pair, it still preserves all these nice properties of this old stylesequence number algorithm had.


But if you have connections from different source/destination tuples, then there's nothing you can learn about the exact value of another connection tuple's sequence number. And in fact,you'll have to guess this key in order to infer that value. And hopefully the server, presumablythe OS kernel, stores this key somewhere in its memory and doesn't give it out to anyoneelse.


So this is how pretty much most TCP stacks deal with this particular problem today to theextent allowed by the total 32-bit sequence number. It's not great, but sort of works. Yeah.

AUDIENCE: Could you repeat that again? Is the key unique to--

PROFESSOR: So when my machine boots up, or when any machine boots up, it generates arandom key. Every time you reboot it it generates a new key. And this means that every timethat for a particular source/destination pair, the sequence numbers advance at the same rateas controlled by this.


So for a given source/destination pair, this thing is fixed. So you observe your sequencenumbers evolving according to your initial sequence numbers for new connections evolvingaccording to a particular algorithm. So that still provides all these defences against oldpackets from previous connections being injected into new connections, just like packetreordering problems.


So that still works. And that's the only real thing for which we needed this sequence numberchoosing algorithms to prevent these duplicate packets from causing problems. However, thething that we were exploiting before, which is that if you get the sequence number for oneconnection from A to S, then from that you can infer the sequence number for a differentconnection.


That's now gone. Because every connection has a different offset in this 32-bit space asimplemented by its F function. So this completely decouples the initial sequence numbersseen by every connection. Yeah.


AUDIENCE: What's the point in including the key?

PROFESSOR: Well, if you don't include the key, then I can connect to you. I'll compute the same function F. I'll subtract it out. I'll get this. I'll compute this function F for the connection I actually want to fake. And I'll guess what the initial sequence number for that one is going tobe.


AUDIENCE: So can you-- because machines now restart infrequently, can you still[INAUDIBLE] by reversing--

PROFESSOR: I think typically this function F is something like a cryptographically securehash function, which has a semi-proved property that it's very difficult. It's cryptographicallyhard to invert it. So even if you were given the literal inputs and outputs of this hash functionexcept for this key part, it would be very hard for you guess what this key is cryptographically,even in an isolated setting.


So hopefully this will be at least as hard in this setting as well. We'll talk a little bit more aboutwhat these functions F are a bit later on and how you to use them correctly. Make sense?Other questions of this problem and solution?


All right, so in fact, this was mostly sort of an example of these TCP sequence numberattacks that aren't as relevant anymore. Because every operating system basicallyimplements this plan these days. So it's hard to infer what someone's sequence number is going to be.

On the other hand, people keep making the same mistakes. So even after this wasimplemented for TCP, there was this other protocol called DNS that is hugely vulnerable tosimilar attacks. And the reason is that DNS actually runs over UDP.


So UDP is a stateless protocol where you actually don't do any connection establishmentwhere you exchange sequence numbers. In UDP, you simply send a request from yoursource address to the server. And the server figures out what the reply should be and sends itback to whatever source address appeared in the packet.


So it's a single round trip, so there's no time to exchange sequence numbers and to establishthat, oh, yeah, you're actually talking to the right guy. So with DNS, as a result, for a while, it was quite easy to fake responses from a DNS server.

So how would a query look like in DNS, typically? Well, you send some queries-- so supposea client sends a packet from client to some DNS server that knows the DNS server's IPaddress ahead of time, maybe preconfigured somewhere, say, well, here's my query. MaybeI'm looking for mit.edu. And that's basically it.


And the server's destination port number is always 53 for DNS. And the clients used to alsorun on the same port number for ease of use or something. So you send this packet from theclient on this port to the server on this port. Here's the query. And the server eventually sendsback a reply saying, mit.edu has a particular IP address, 18.9 dot something.


The problem is that some adversary could easily send a similar response packet pretendingto be from the server. And there's not a whole lot of randomness here. So if I know that you'retrying to connect to mit.edu, I'll just send a lot of packets like this to your machine.

I know exactly what DNS server you're going to query. I know exactly what your IP address is.I know the port numbers. I know what you're querying for. I can just supply my own IPaddress here.


And if my packet gets there after you send this but before you get the real response, yourclient machine is going to use my packet. So this is another example where insufficientrandomness in this protocol makes it very easy to inject responses or inject packets ingeneral.


And this is actually in some ways even worse than the previous attack. Because here youcould convince a client to connect to another IP address all together. And it'll probably cachethis result, because DNS involves caching. Maybe you can supply a very long time to live inthis response saying, this is valid for years. And then your client, again till it reboots, is goingto keep using this IP address for mit.edu. Yeah.


AUDIENCE: Could you fix this by having the client include some random value in the query,and the server customer exactly?

PROFESSOR: That's right, yeah, so this is typically what people have done now. Theproblem, as we were sort of talking about earlier, is backward compatibility. It's very hard tochange the DNS server software that everyone runs.

So you basically have to figure out, where can you inject randomness? And people havefigured out two places. It's not great. But basically there's a source port number, which is 16bits of randomness. So if you can choose the source port number randomly, then you get 16bits. And there's also a query ID inside of the packet, which is also 16 bits. And the serverdoes echo back the query ID.


So combining these two things together, most resolvers these days get 32 bits of randomnessout of this protocol. And it, again, makes it noticeably harder, but still not cryptographicallyperfect, to fake this kind of response and have it be accepted by the client. But theseproblems keep coming up, unfortunately. So even though it was well understood for TCP,some people I guess suggested that this might be a problem. But it wasn't actually fixed untilonly a few years ago. Make sense?


All right, so I guess maybe as an aside, there are solutions to this DNS problem as well byenforcing security for DNS at the application level. So instead of relying on these randomnessproperties of small numbers of bits in the packet, you could try to use encryption in the DNSprotocols. So protocols like DNS SEC that the paper briefly talks about try to do this. So instead of relying on any network level security properties, they require that all DNS nameshave signatures attached to them.


That seems like a sensible plan. But it turns out that working out the details is actually quitedifficult. So one example of a problem that showed up is name and origin. Because in DNS,you want to get responses. Well, this name has that IP address.


Or you could get a response saying, no, so sorry, this name doesn't exist. So you want to signthe it doesn't exist response as well. Because otherwise, that adversary could send back adoesn't exist response and pretend that a name doesn't exist, even though it does. So how do you sign responses that certain names don't exist ahead of time? I guess one possibility is you could give your DNS server the key that signs all your records.


That seems like a bad plan. Because then someone who compromises your DNS servercould walk away with this key. So instead, the model the DNS SEC operates under is that yousign all your names in your domain ahead of time, and you give the signed blob to your DNSserver. And the DNS server can then respond to any queries. But even if it's compromised,there's not much else that that attacker can do. All these things are signed, and the key is notto be found on the DNS server itself.


So the DNS SEC protocol had this clever mechanism called NSEC for signing nonexistentrecords. And the way you would do this is by signing gaps in the namespace. So an NSECrecord might say, well, there's a name called foo.mit.edu, and the next name alphabetically ismaybe goo.mit.edu.


And there's nothing alphabetical in between these two names. So if you query for a namebetween these two names alphabetically sorted, then the server could send back this signed message saying, oh, there's nothing between these two names. You can safely return,doesn't exist.


But then this allows some attacker to completely enumerate your domain name. You can justask for some domain name and find this record and say, oh, yeah, great. So these two thingsexist. Let me query for gooa.mit.edu. That'll give me a response saying, what's the next name in your domain, etcetera.


So it's actually a little bit hard to come up with the right protocol that both preserves all thenice properties of DNS and prevents name enumeration and other problems. There's actuallya nice thing now called NSEC3 that tries to solve this problem partially-- sort of works, sort of not. We'll see, I guess, what gets it [INAUDIBLE]. Yeah.


AUDIENCE: Is there any kind of signing of nonexistent top level domains?

PROFESSOR: Yeah, I think actually yeah. The dot domain is just another domain. And theysimilarly have this mechanism implemented as well. So actually dot and dot com nowimplement DNS SEC, and there's all these records there that say, well, .in is a domain name that exists, and dot something else exists, and there's nothing in between. So there's all thesethings.


AUDIENCE: So other than denial of service, why do we care so much about repeatingdomain names within mit.edu?

PROFESSOR: Well, probably we don't. Actually, there's a text file in AFS that lists all thesedomain names at MIT anyway. But I think in general, some companies feel a little uneasyabout revealing this. They often have internal names that sit in DNS that should never beexposed to the outside. I think it's actually this fuzzy area where it was never really formalizedwhat guarantees DNS was providing to you or was not. And people started assuming thingslike, well, if we stick some name, and it's not really publicized anywhere, then it's probablysecure here.


I think this is another place where this system doesn't have a clear spec in terms of what ithas and doesn't have to provide. And when you make some changes like this, then people say, oh, yeah, I was sort of relying on that. Yeah.


AUDIENCE: [INAUDIBLE] replay attack where you could send in bold gap signature?

PROFESSOR: Yeah, there's actually time outs on these things. So when you sign this, youactually sign and say, I'm signing that this set of names is valid for, I don't know, a week. Andthen the clients, if they have a synchronized clock, they can reject old signed messages.Make sense?


All right, so this is on the TCP SYN guessing attacks. Another interesting problem that alsocomes up in the TCP case is a denial of service attack that exploits the fact that the serverhas to store some state. So if you look at this handshake that we had on the board before,we'll see that when a client establishes a connection to the server, the server has to actuallyremember the sequence number SNC. So the server has to maintain some data structure onthe side that says, for this connection, here's the sequence number.

And it's going to say, well, my connection from C to S has the sequence number SNC. Andthe reason the server has to store this table is because the server needs to figure out whatSNC value to accept here later. Does this make sense?


AUDIENCE: [INAUDIBLE] SNS?

PROFESSOR: Yeah, the server also needs SNS I guess, yeah. But it turns out that-- well,yeah, you're right. And the problem is that-- actually, yeah, you're right. SNS is actually much more important. Sorry, yeah. [INAUDIBLE] SNS is actually much more important. BecauseSNS is how you know that you're talking to the right guy.


The problem is that there's no real bound on the size of this table. So you might get packetsfrom some machine. You don't even know who sent it. You just get a packet that looks likethis with a source address that claims to be C.


And in order to potentially accept a connection later from this IP address, you have to createthis table entry. And these table entries are somewhat long lived. Because maybe someone isconnecting to you from a really far away place. There's lots of packet loss. It might be not formaybe a minute until someone finishes this TCP handshake in the worst case.


So you have to store this state in your TCP stack for a relatively long time. And there's noway to guess whether this is a valid connection or not. So one denial of service attack thatpeople discovered against most TCP stacks is to simply send lots of packets like this.


So if I'm an attacker, then I'll just send lots of SYN packets to a particular server and get it to fill up its table. And the problem is that in the best case, maybe the attacker just always usesthe same source IP address.


In that case, you can just say, well, every client machine is allowed two entries in my table, orsomething like this. And then the attacker can use up two table entries but not much more. The problem, of course, is that the attacker can fake these client IP addresses, make themlook random. And then for the server, it's going to be very difficult to distinguish whether this is an attacker trying to connect to me or some client I've never heard of before.


So if you're some website that's supposed to accept connections from anywhere in the world,this is going to be a big problem. Because either you deny access to everyone, or you have astore state for all these mostly fake connection attempts. Does that make sense?

So this is a bit of a problem for TCP, and in fact for most protocols that allow some sort ofconnection initiation, and the server has to store state. So there's some fixes. We'll talk about in a second what workaround TCP implements to try to deal with this problem. This is called SYN flooding in TCP.


But in general, this is a problem that's worth knowing about and trying to avoid in any protocolyou design on top as well. So you want to make sure that the server doesn't have to keepstate until it can actually authenticate and identify, who is the client?


Because by that time, if you've identified who the client is, you've authenticated themsomehow, then you can actually make a decision, well, every client is allowed to only connectonce, or something. And then I'm not going to keep more state.

Here, the problem is you're guaranteeing that you're storing state before you have any ideawho it is that is connecting to you. So let's look at how you can actually solve this SYNflooding attack where the server accumulates lots of state.


So of course, if you could change TCP again, you could fix this pretty easily by usingcryptography or something or changing exactly who's responsible for storing what state. Theproblem is we have TCP as is. And could we fix this problem without changing the TCP wireprotocol? So this is, again, an exercise in trying to figure out, well, what exactly tricks we could play or exactly what assumptions we could relax and still stick to the TCP headerformat and other things.


And the trick is to in fact figure out a clever way to make the server stateless without havingto-- so the server isn't going to have to keep this table around in memory. And the way we'regoing to do this is by carefully choosing SMS. Instead of using this formula we were lookingat before, where we were to add this function, we're instead going to choose this sequencenumber in a different way.


And I'll give you exactly the formula. And then we'll talk about why this is actually interestingand what nice properties it has. So if the server detects that it's under this kind of attack, it'sgoing to switch into this mode where it chooses SNS using this formula of applying basicallythe same or similar kind of function F we saw before. And what it's going to apply it to is thesource IP, destination IP, the same things as before, source port, destination port, and alsotimestamp, and also a key in here as well.


And we're going to concatenate it with a timestamp as well. So this timestamp is going to befairly coarse grained. It's going to go in order of minutes. So every minute, the timestampticks off by one. It's a very coarse grained time.


And there's probably some split between this part of the header and this part of the header.This timestamp doesn't need a whole lot of bits. So I forget exactly what this protocol does inreal machines. But you could easily imagine maybe using 8 bits. For the timestamp, I'm going to be using 24 bits for this chunk of the sequence number.


All right, so why is this a good plan? What's going on here? Why this weird formula? So Ithink you have to remember, one was the property that we were trying to achieve of thesequence number. So there's two things going on. One is there's this defense againstduplicated packets that we were trying to achieve by-- maybe the formula is still here. Nope--oh, yeah, yeah, here.

Right, so just to compare these guys-- so when we're not under attack, we were previouslymaintaining this old style sequence number scheme to prevent duplicate packets fromprevious connections, all this good stuff. It turns out people couldn't figure out a way todefend against these kinds of SYN flooding attacks without giving up on this property, sobasically saying, well, here's one plan that works well in some situations. Here's a differentplan where we'll give up on that ISN old style component.


And instead, we'll focus on just ensuring that if someone presents us this sequence number Sin response to a packet, like here, then we know it must've been the right client. Soremember that in order to prevent IP spoofing attacks, we sort of rely on this SNS value. So ifthe server sends this SNS value to some client, then hopefully only that client can send usback the correct SNS value, finish establishing the connection.


And this is why you had to store it in this table over here. Because otherwise, how do youknow if this is a real response or a fake response? And the reason for using this function Fhere is that now we can maybe not store this table in memory. And instead, when a connection attempt arrives here, we're going to compute SNS according to this formula overhere and just send it back to whatever client pretends to have connected to us.


And then we'll forget all about this connection. And then if this third packet eventually comesthrough, and its SNS value here matches what we would expect to see, then we'll say, ohyeah, this must've been someone got our response from step two and finally sent it back to us.


And now we finally commit after step three to storing a real entry for this TCP connection inmemory. So this is a way to sort of defer the storage of this state at the server by requiringthe server, the client, to echo back this exact value. And by constructing it in this careful way,we can actually check whether the client just made up this value, or if it's the real thing we'reexpecting. Does that make sense?


AUDIENCE: [INAUDIBLE] SNC [INAUDIBLE]?

PROFESSOR: Yeah, so SNC now, we basically don't store it. It's maybe not great. But so itis. So in fact, I guess what really happens is in-- I didn't show it here. But there's probablygoing to be sort of a null data field here that says this packet has no data. But it still includesthe sequence number SNC just because there's a field for it.


So this is how the server can reconstruct what this SNC value is. Because the client is going to include it in this packet anyway. It wasn't relevant before. But it sort of is relevant now. Andwe weren't going to check it against anything. But it turns out to be pretty much good enough.

It has some unfortunate consequences. Like if this is-- well, there's some complicated things you might abuse here. But it doesn't seem to be that bad. It seems certainly better than theserver filling up its memory and swapping serving requests all together.


And then we don't include in this computation. Because the only thing we care about here isoffloaded the storage of this table and making sure that the only connections that eventuallyyou do get established are legitimate clients. Because therefore, we can say, well, if this clientis establishing a million connections to me, I'll stop accepting connections from him.

That's easy enough, finally. The problem is that all these source addresses, if they'respoofed, are hard to distinguish from legitimate clients. Make sense? Yeah.


AUDIENCE: Would you need to store the timestamp?

PROFESSOR: Ahh, so the clever thing, the reason this timestamp is sort of on the slide here,is that when we receive this SNS value in step three, we need to figure out, how do youcompute the input to this function F to check whether it's correct? So actually, we take thetimestamp from the end of the packet, and we use that inside of this computation.

Everything else we can reconstruct. We know who just sent us the third step and packet. Andwe have all these fields. And we have our key, which is, again, still secret. And this timestampjust comes from the end of the sequence, from the last 8 bits. And then it might be that we'llreject timestamps that are too old, just disallow old connections. Yeah.

AUDIENCE: So I'm guessing the reason you only use this when you're under attack is because you lose 8 bits of security, or whatever?

PROFESSOR: Yes, it's not great. It has many bad properties. One is you sort of lose 8 bits ofsecurity in some sense. Because now the unguessable part is just 24 bits instead of 32 bits.Another problem is what happens if you lose certain packets? So if this packet is lost-- so it's typically, in TCP, there's someone responsible for retransmitting something if a particular packet is lost. And in TCP, if the third packet is lost, then the client might not be waiting foranything. Or sorry, maybe the protocol we're running on top of this TCP connection is onewhere the server is supposed to say something initially.


So I connect. I just listen. And in the SMTP, for example, the server is supposed to send mesome sort of an initial greeting in the protocol. So OK, suppose I'm connecting to an SMTPserver. I send my third packet. I think I'm done. I'm just waiting for the server to tell me, greetings as an SMTP server. Please send mail.


This packet could get lost. And in real TCP, the way this gets handled is that the server fromstep two remembers that, hey, I sent this response. I never heard back, this third thing. So it's the server that's supposed to resend this packet to trigger the client to resend this third packet.

Of course, if the server isn't storing any state, it has no idea what to resend. So this actuallymakes connection establishment potentially programmatic where you could enter this weird state where both sides are waiting for each other. Well, the server doesn't even know that it'swaiting for anything. And the client is waiting for the server. And the server basically droppedresponsibility by not storing state. So this is another reason why you don't run this inproduction mode all the time. Yeah.


AUDIENCE: Presumably also you could have data commissions if you establish two veryshort-lived connections right after each other from the same host.

PROFESSOR: Absolutely, yeah, yeah. So another thing is, of course, because we gave upon using this ISN old style part, we now give up protection against these multiple connectionsin a short time period being independent from one another. So I think there's a number of trade-offs. We just talked about three. There's several more things you worry about.

But it's not great. If we could design a protocol from scratch to be better, we could just have aseparate nice 64-bit header for this and a 64-bit value for this. And then we could enable thisall the time without giving up the other stuff and all these nice things. Yeah.


AUDIENCE: I just had one quick question on the SNS. In step two, [INAUDIBLE], do theyhave to be the same?

PROFESSOR: This SNS and this SNS?

AUDIENCE: Mhm.

PROFESSOR: Yeah, because otherwise, the server has no way to conclude that this clientgot our packet. If the server didn't check that this SNS was the same value as before, thenthese actually would be even worse.

Because I could fake a connection from some arbitrary IP address, then get this response.Maybe I don't even get it, because it goes to a different IP. Then I establish a connection fromsome other IP address. And then the server is maintaining a whole live connection. Probablya server crosses another side waiting for me to send data and so on.

AUDIENCE: But the timestamp is going to be different, right? So how can the serverrecalculate that with a new timestamp and null the one before if it doesn't store any state?

PROFESSOR: So the way this works is these timestamps, as I was saying, are coursegrained. So they're on a scale of minutes. So if you connect within the same minute, thenyou're in good shape. And if you connect on the minute boundary, well, too bad.

Yet another problem with the scheme-- it's imperfect in many ways. But most operating systems, including Linux, actually have ways of detecting if there's too many entries buildingup in this table that aren't being completed. It switches to this other scheme instead to makesure it doesn't overflow this table. Yeah.


AUDIENCE: So if the attacker has control of a lot of IP addresses, and they do this, and evenif you switch it the same--

PROFESSOR: Yeah, so then actually there's not much you can do. The reason that we wereso worried about this scheme in the first place is because we wanted to filter out or somehowdistinguish between the attacker and the good guys. And if the attacker has more IPaddresses and just controls more machines than the good guys, then he can just connect to our server and request lots of web pages or maintain connections.

And it's very hard then for the server to distinguish whether these are legitimate clients or justthe attacker tying up resources of the server. So you're absolutely right. This only addressesthe case where the attacker has a small number of IP addresses and wants to amplify hiseffect.


But it is a worry. And in fact, today it might be that some attackers control a large number ofcompromised machines, like just desktop machines of someone that didn't patch theirmachine correctly. And then they can just mount denial of service attacks from this distributedset of machines all over the world. And that's pretty hard to defend against.


So another actually interesting thing I want to mention is denial of service attacks, but in theparticular way that other protocols make them worse. I guess other protocols allow denial of service attacks in the first place. I'm sorry. But there are some that are protocols that areparticularly susceptible to abuse. And probably a good example of that is, again, this DNSprotocol that we were looking at before.


So the DNS protocol-- we still have it here-- involves the client sending a request to theserver and the server sending a response back to the client. And in many cases, theresponse is larger than the request. The request could be just, tell me about mit.edu. And theresponse might be all the records the server has about mit.edu-- the email address, the mailserver for mit.edu, the assigned record if it's using DNS SEC, and so on.


So the query might be 100 bytes. The response could well be over 1,000 bytes. So supposethat you want to flood some guy with lots of packets or lots of bandwidth. Well, you might onlybe able to send a small amount of bandwidth.


But what you could do is you could fake queries to DNS servers on behalf of that guy. So you only have to send 100 bytes to some DNS server pretending to be a query from that poorguy. And the DNS server is going to send 1,000 bytes to him on your behalf.


So this is a problematic feature of this protocol. Because it allows you to amplify bandwidthattacks. And partly for the same reason we were talking about with TCP's SYN floodingattacks, it's very hard for the server, for the DNS server, in this case, to know whether this request is valid or not. Because there's no authentication or no sort of sequence numberexchanges going on to tell that this is the right guy connecting to you, etcetera.


So in fact this is still a problem in DNS today. And it gets used quite frequently to attackpeople with bandwidth attacks. So if you have a certain amount of bandwidth, you'll be that much more effective if you reflect your attack off of a DNS server.


And these DNS servers are very well provisioned. And they basically have to respond toevery query out there. Because if they stop responding to requests, then probably somelegitimate requests are going to get dropped. So this is a big problem in practice. Yeah.


AUDIENCE: So if you can still see it on the DNS server, [INAUDIBLE] requests and never reply to-

PROFESSOR: Right, yeah, so it's possible to maybe modify the DNS server to keep somesort of state like this.

AUDIENCE: That's the reason why this still works now, because they don't store state?

PROFESSOR: Yeah, well I think some people are starting to modify DNS server to try tostore state. A lot of times, there's so many DNS servers out there that it doesn't matter. Evenif you appear to do 10 queries against every DNS server, that's still every packet getsamplified by some significant factor. And they have to respond. Because maybe that clientreally is trying to issue this query. So this is a problem. Yeah, so you're right, if this was oneDNS server, then this would be maybe not as big of a deal.


The problem is also that the root servers for DNS, for example, aren't a single machine. It'sactually racks and racks of servers. Because they're so heavily used. And trying to maintain astate across all these machines is probably nontrivial. So as it gets abused more, probably it will be more worthwhile to maintain this state.


I guess a general principle you want to follow in any protocol-- well, might be a good principle-- is to make the client do at least as much work as the server is doing. So here, the problem is the client isn't doing as much work as the server. That's why the server can help the clientamplify this effect.


If you were redesigning DNS from scratch, and this was really your big concern, then it'dprobably be fairly straightforward to fix this. The client has to send a request that has extrapadding bytes just there just wasting bandwidth. And then the server is going to respond backwith a response that's at most as big as that.


And if you want a response that's bigger, maybe the server will say, sorry, your paddingwasn't big enough. Send me more padding. And this way, you guarantee that the DNS servercannot be used ever to amplify these kinds of bandwidth attacks.


Actually, these kinds of problems happen also at higher levels as well. So in web applications,you often have web services that do lots and lots of computation on behalf of a singlerequest. And there's often denial of service attacks at that level where adversaries know thata certain operation is very expensive, and they'll just ask for that operation to be done overand over again. And unless you carefully design your protocol and application to allow theclient to prove that, oh, I'm burning at least as much work as you, or something like this, thenit's hard to defend against these things as well. Make sense?


All right, so I guess the last thing I want to briefly touch on about the paper we talked about aswell is these routing attacks. And the reason these attacks are interesting is they're maybepopping up a level above these protocol transport level issues. And look at what goes wrongin an application.


And the routing protocol is a particularly interesting example. Because it's often the placewhere trust and sort of initial configuration gets bootstrapped in the first place. And it's easy tosort of get that wrong. And even today, there's not great authentication mechanisms for that.


Perhaps the clearest example is the DHCP protocol that all of you guys use when you open a computer or connect to some wireless or wired network. The computer just sends out apacket saying, I want an IP address and other stuff. And some DHCP server at MIT typicallyreceives that packet and sends you back, here's an IP address that you should use. And alsohere's a DNS server you should use, and other interesting configuration data.


And the problem is that the DHCP request packet is just broadcasting on the local networktrying to reach the DHCP server. Because you actually don't know what the DHCP is going tobe ahead of time. You're just plugging into the network, the first time you've been here, let'ssay. And your client doesn't know what else to do or who to trust.


And consequently, any machine on the local network could intercept these DHCP requestsand respond back with any IP address that the client could use, and also maybe tell theclient, hey you should use my DNS server instead of the real one. And then you couldintercept those future DNS requests from the client and so on. That make sense?


So I think these protocols are fairly tricky to get right. And on a global scale, the protocols likeBGP allow any participant to announce a particular IP address prefix for the world to sort ofknow about and route packets toward the attacker. There's certainly been attacks wheresome router participating in BGP says, oh, I'm a very quick way to reach this particular IPaddress range. And then all the routers in the world say, OK, sure, we'll send those packets to you.


And probably the most frequent abuse of this is by spammers who want to send spam, buttheir old IP addresses are blacklisted everywhere, because they are sending spam. So theyjust pick some random IP address. They announce that, oh yeah, this IP address is now here.And then they sort of announce this IP address, send spam from it, and then disconnect. Andit gets abused a fair amount this way.


It's sort of getting less now. But it's kind of hard to fix. Because in order to fix it, you have toknow whether someone really owns that IP address or not. And it's hard to do withoutestablishing some global database of, maybe, cryptographic keys for every ISP in the world.And it takes quite a bit of effort by someone to build this database.


The same actually applies to DNS SEC as well. In order to know which signature to look for inDNS, you have to have a cryptographic key associated with every entity in the world. And it's not there now. Maybe it'll get built up slowly. But it's certainly one big problem for adopting DNS SEC.


All right, so I guess the thing to take away from this is maybe just a bunch of lessons aboutwhat not to do in general in protocols. But also actually one thing I want to mention is thatwhile probably secrecy and integrity are good properties and driving force of higher levels ofabstraction, like in cryptographic protocols in the application-- and we'll look at that in nextlectures-- one thing that you really do want from the network is some sort of availability andDOS resistance. Because these properties are much harder to achieve at higher levels in thestack.


So you really want to avoid things like maybe these amplification attacks, maybe these SYNflooding attacks, maybe these RST attacks where you can shoot down an arbitrary person'sconnection. These are things that are really damaging at the low level and that are hard to fixhigher up. But the integrity and confidentiality you can more or less solve with encryption. And we'll talk about how we do that in the next lecture on Cerberus. See you guys then.

매거진의 이전글 DNS-The Internet's Directory
브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari