Many of us take the Internet as it exists today for granted, but to work with cloud platforms like Exoscale, a cursory knowledge of its inner workings is essential. As such, this handbook aims to explain the basics of how connections are made across the Internet.
Layers of the Internet
The Internet is built on the concept of layers. The most fundamental protocol is Ethernet: its purpose is to facilitate communication between devices on the same physical network. They are often referred to as layer-2, where layer-1 would be the physical cabling. (See: the OSI model)
On top of layer-2-protocols are the layer-3-protocols, which facilitate communication across the boundaries of networks. Like IP (Internet Protocol), these protocols add the ability to route a data packet through the vast network of devices to its correct destination.
Layer-3 serves as a basis of layer-4 (transport) protocols, which tie the stream of individual data packets together into a continuous connection where data can transparently flow in both directions.
On top of layer-4, there are various application protocols in layer-7 (yes, 5, and 6 are missing), such as the popular HTTP protocol, which is the foundation of the World Wide Web, and connection can be encrypted well.
Layer-2: Data Link Layer
Layer-2 is for communication on the local network. Each device on a network usually has a physical hardware address (MAC address). When one device wants to send a data packet to another device, it will create a frame containing the sender and destination hardware addresses and packages the data. In other words, this frame acts as an envelope on the local network.
Since layer-2 protocols usually don’t have a concept of routing, the packet is blasted out on the local network and received by all network devices. Modern smart switches can keep track of which device is on which connection and filter these packets, but from an application perspective, it is imperative to think about a layer-2 frame as an unsecured medium.
Layer-3: Network Layer
As you might imagine, the layer-2 solution does not scale. If every data packet in the world were sent to every machine in the world, the Internet would collapse. That’s why layer-2 networks are separated from each other by routers. Routers, as the name, suggests route packets across the networks.
To make routing possible, each device on a layer-3 (Internet Protocol) network receives an address different from the hardware (MAC) address on layer-2. This address is commonly referred to as an IP address and comes in two variants: IPv4 and IPv6.
IPv4 addresses are stored on 32 bits of space, or 4 bytes, and are usually written in this form of 126.96.36.199. For reading convenience, each of the four numbers can take a value from 0 to 255, adding up to around 4 billion IPv4 addresses.
Since 4 billion addresses are not enough for all humans, not to mention that most people have more than one device, IPv6 was born. IPv6, among other changes, stores addresses on 128 bits, making it possible to assign every grain of sand in the Sahara an IP address.
The addresses are usually written in a form of
0db8:0:0:0:0:0:0:1. There is a maximum of 8 segments. Each segment can contain four letters from 0-9 or a-f (hexadecimal numbers). For easier readability, the longers sequence of 0’s can be shortened:
Since IPv6 deployment has been slow over the decades, there are workarounds for IPv4 to make it last as long as possible. These workarounds include NAT (Network Address Translation), in which home routers often translate private IP addresses from home devices to a single public IP. Recently network carriers have also started deploying NAT on a large scale to conserve IP addresses is called CGN (Carrier Grade NAT).
Network Ranges, IP Address Assignments
IP addresses are assigned to providers in large blocks to make routing across the global network more efficient. Organizations like RIPE, ARIN are responsible for that. The address block received by the provider is usually denominated like this:
To understand what this means, let’s write this IP address in its binary form:
00000001 00000001 00000001 00000001
This is 32 ones and zeroes. Let’s mark the first 20:
**00000001 00000001 0000***0001 00000001*
/20 denotation (mask) means that a provider gets all addresses that start with the given sequence of 20 bits. The provider gets to assign the remaining 12 bits as they see fit, resulting in a range of
212 = 4096 IP addresses.
This way of writing IP address ranges is commonly known as CIDR notation or prefix notation. It is also essential to specify Security Group rules on Exoscale as IP address blocks in the CIDR notation.
In other words, if you want to allow a single IP, you can write it like this:
In contrast, if you want to allow the whole Internet, you can write it like this:
Be careful! The
/0 notation means the whole Internet, regardless of what is in front of it!
IPv6 also has a CIDR notation, but the possible mask numbers go from 0-128 instead of 0-32.
ARP and NDP
When an IP packet is sent to a machine on the local network, the router needs a way to determine the target device’s MAC address. Using for IPv4, the Address Resolution Protocol, and for IPv6, the Neighbor Discovery Protocol. The query triggers a broadcast on the local network asking all machines what the MAC address corresponding to a specific IP is.
It is worth noting that ARP suffers from attacks such as ARP poisoning since ARP responses are accepted from any device. An attacker can inject themselves into a data stream by hijacking IP addresses. This is not a typical problem in the cloud but can be a problem in a shared hosting environment, underlining the need for encryption on all network connections, even if the source or destination IPs are trusted.
Layer-4: Transport Layer
TCP (stable connections)
The IP protocol only enables us to send single data packets between two remote machines. However, in reality, we often need more than that: we want a continuous stream of data, which means that the data needs to be chunked on one end and reassembled. Responsible for that is the TCP protocol.
On an established TCP connection, a so-called handshake takes place. Here the two communicating devices agree on a few parameters. After creating the connection, both parties can send data packets with a sequence number. The sequence number helps with reassembling the data stream on the other side. TCP also enables recovering lost packets by sending back acknowledgments to the other side about the received packets. If a packet is not acknowledged when the sending party retransmits it, it helps recover a lost connection.
It is also worth noting that TCP incorporates the concept of ports. Ports work similarly to how phone extensions work with phones: each service is listening on a well-known port number (e.g., 80 for HTTP), so hosting multiple services on the same machine is possible.
However, some applications do not require a continuous stream of data but still need specific TCP features, e.g., ports for services. In this case, the User Datagram Protocol comes in handy, sending single packets with the extended meta-information (ports) to a service. UDP is typically used for applications such as Domain Name Resolution (translating domain names into IPs and back) and provides a backbone for VPN applications.
Apart from TCP and UDP, there are, of course, other protocols that are used to manage the Internet. One crucial protocol is ICMP, which is known most commonly from the ping function for checking the machine’s availability. However, besides ping (echo request/response), it also has other essential network management functions.
Never block ICMP! Many sysadmins tend to block ICMP out of a misguided sense of security. ICMP is required for the Internet to function properly! If ICMP is blocked, the service may not be reachable if the end-user is sitting behind a VPN due to the loss of the
fragmentation-needed ICMP packet. If you wish to block ping, only block types. 0 and 8, but leave the other types untouched!
Layer-7: Application Layer
Several application-level protocols are used to exchange data. Out of these, we will only discuss the most important ones.
The Domain Name System is responsible for storing information about domain names such as
example.com. These domains often have various services, such as a web service or a mail service.
The system works by each internet provider providing a DNS resolver to its customers. When an end customer asks for the domain
exoscale.com, these resolvers first query the root nameservers for information about the
com TLD. Then the
com TLD DNS servers are asked for details about
exoscale.com. Finally, the DNS servers for
exoscale.com are queried for information about
Exoscale offers a service to operate the DNS servers for an individual domain such as
If you wish to learn more about the DNS system, we recommend the excellent Wikipedia article.
Many protocols on the Internet nowadays support encryption. This encryption is done using the TLS protocol or by its now-deprecated predecessor, SSL.
TLS provides not only encryption but also authenticates servers that they are who they claim to be. This is done using a network of Certificate Authorities (CA) which browsers and other clients trust. These CAs issue cryptographic certificates that servers can present and are simultaneously used to establish a secure connection. TLS connections can be used to wrap other connection types, such as HTTP.
HTTP (Hypertext Transfer Protocol) is the protocol that powers the modern-day World Wide Web. It was initially built to make it easier to transfer text documents with extra markup (hypertext documents) to client applications (browsers). Since it is an entirely text-based protocol, it is easy to write program code for it, and it became the standard for websites and is often also used in machine-to-machine communication.