tdoylend.dev

Home Articles

Recent Articles

Good Code is Like Good Writing

2026-06-24

Introduction to Network Programming

2026-05-20

A New Programming Language

2025-03-31

All articles →

Introduction to Network Programming

Lately, someone brought up that they wanted to make a multiplayer game from scratch (no game engine) but felt writing the netcode might be too challenging. It took me about six years to become remotely (ha!) good at network programming, but hopefully I can summarize what I’ve learned here, and you can become better at it in less time.

I will use Python for the examples in this article, although most of my work nowadays is in C.

1. Basics

The purpose of the Internet is to be able to connect two programs running on different machines so they can communicate. On each side, the connection is represented by an object called a socket: one socket on machine A, the other on machine B. Data written into one socket can be read out of the other socket, and vice versa.

The two major kinds of socket are stream sockets and datagram sockets; we will only be talking about stream sockets in this article because they are simpler. In stream sockets, the “data” is a sequence of bytes. You can write bytes out to socket A, and they will come in in the same order – although not necessarily all at once – on socket B, and vice versa.

We can begin experimenting with sockets by exchanging some data with the example.com server.

1.1. Connecting to example.com

When you create a socket, it isn’t connected to anything yet. You must call the .connect(..) function to create the connection and link the socket. When using internet sockets, .connect(..) takes a 2-tuple of address and port number to connect to; think of this as a phone number identifying the machine and an extension identifying the program on it, respectively.

Here is some sample code showing how to connect to example.com. Note that while sending/receiving data from a file uses the .read(..) and .write(..) methods, the corresponding methods in sockets are .recv(..) and .send(..).

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('example.com', 80))
s.send(
    b'GET / HTTP/1.1\r\n'         \
    b'Host: example.com\r\n'      \
    b'User-Agent: curl/8.6.0\r\n' \
    b'Accept: */*\r\n\r\n'
)
print(s.recv(4096))
s.close()

This code does the following:

Create a new stream socket.
Connect to example.com on port 80. Websites are usually served on port 80.

This is true of unencrypted websites. The TLS-encrypted version of a website is usually served on port 443.

Send an HTTP request; this is a short message to the server requesting a particular webpage.
Read back the first 4096 bytes of the response using .recv(..) and display it as a string.
Close the connection.

The machine you are connected to is called the peer. In this case, the peer is example.com. Data you pass to .send(..) will appear when the peer called .recv(..) on their side of the connection, and vice versa.

1.2. Testing with socket pairs

Sockets do not have to be between two different machines; you can create a connection between two programs on the same machine, or even within the same program. This is very useful for testing. The socketpair(..) function allows you to create two pre-connected sockets belonging to the same program:

a, b = socket.socketpair(socket.AF_INET, sock.SOCK_STREAM)
a.send(b'Hello world!')
print(b.recv(4096))
b.send(b'Ahoy there!')
print(a.recv(4096))

In this case we send the text “Hello World!” into one end of the pipe and receive it on the other; then we send “Ahoy there!” back.

Sockets queue un-recv’d data. If we send multiple pieces of data, they will pile up and will be returned all at once to the next .recv(..) call:

a.send(b'Explicit is better than implicit.')
a.send(b'Sparse is better than dense.')
print(b.recv(4096))

Data can also flow in both directions at once:

a, b = socket.socketpair(socket.AF_INET, sock.SOCK_STREAM)
a.send(b'This is a gift from me to you.')
b.send(b'This is a gift from *me* to *you*!')
print(a.recv(4096))
print(b.recv(4096))

The value 4096 which we pass into .recv(..) decides the maximum amount of data we are willing to read from the socket. If the data waiting in the queue is longer than that, it will be cut off and the next part will be returned on the next call to .recv(..):

a.send(b'Life goes on!')
print(b.recv(4))
print(b.recv(4))
print(b.recv(4))
print(b.recv(4))

If there are fewer bytes available that you requested, you’ll get whatever’s available; that is, .recv(4096) isn’t guaranteed to return exactly 4096 bytes, just up to 4096 bytes.

1.3. Message Passing

A continuous stream of bytes is hard to work with. It is easier to think of the data coming from the socket as a sequence of messages, which can be parsed similarly to, e.g., user input returned from input(..). However, as we saw above, it’s possible for .recv(..) to return the content of multiple messages at once; or, it could return only part of a message (e.g. if the message is longer than 4KB and you call .recv(4096)). So we need a way of determining where one message ends and the next starts.

There are three ways to do this: (1) give all messages a fixed length, (2) give messages a prefix indicating how long they are, or (3) separate messages with a particular character (called a delimiter). Although (2) is often regarded as the best solution, (3) is simplest to use so we will start with that.

For this article, we will place the following constraints on messages:

They are always ASCII text.
They are separated by newlines.

This has several advantages. For one, because we are not using binary, we can print out each message on the console as it comes in, which is very useful for debugging.

As we are writing games, the client and server both have loops that run once per frame. In single-player games, the loop is usually structured something like the following:

Read input.
Update world state.
Render output.

The exact same loop will work for both client and server. However, in addition to processing keyboard input on the client, we also need to process input from the socket; and vice versa for output. The server does not collect input from game controllers and does not render to the screen; its sole job is to collect all messages from all connected players during the input step, and then write out relevant messages during the output step.

In general, servers send more data than they receive: each client only sends its own updates, but servers have to update all clients on the position/velocity/etc of all other clients.

By default, if there is no data waiting in the queue, .recv(..) will block until at least one byte becomes available. For games trying to hit a 60FPS target, this is usually not what you want. To avert this, call .setblocking(0) on the socket, which puts it into non-blocking mode; in this mode, .recv(..) and .send(..) will raise BlockingIOError rather than blocking; you should intercept this with a try-catch and treat it as “no data was received”.

1.4. Gotchas and glitches

There are a lot of these. The Internet can be notoriously unreliable.

Calling .recv(..) is not guaranteed to get you an entire message. This is true even if the message is shorter than 4096 bytes; messages might get sent piecewise rather than all at once, especially if you or the peer are on a slow connection.
Correspondingly, calling .send(..) is not guaranteed to actually send all the data you asked it to. It returns the number of bytes that actually went out; if this is less than what you asked to send, you need to queue the remaining bytes and try again later.
In non-blocking mode, both .send(..) and .recv(..) can raise BlockingIOError; treat this as “0 bytes sent” for .send(..), and “0 bytes available” for .recv(..).
In general, any call to a socket method can raise IOError. If it is important to you that your program keep running even if one of its sockets has an error (as a server should), you must intercept this with a try-catch. Note that BlockingIOError is a subclass of IOError, so it must be first in the catch chain.
.recv(..) does not return an empty string when there is no incoming data (see above). However, it does return an empty string when the peer closes the connection; you should test for this case and close your side of the connection if it happens.
If the peer crashes/loses power/is doing something malicious, it may not close the socket correctly, in which case you will be left waiting indefinitely without receiving any data. To prevent this, use a timeout: if no data is received for N seconds, assume the other side is dead and close the connection. If you do this, you must make sure to send “keepalive” messages every so often so that, e.g., players are not kicked if they are AFK.
By default, Nagle’s algorithm is enabled, which delays sending data in case it can bundle multiple small messages together into one larger one to improve efficiency. This increases the average latency, which is bad for games; you should generally disable it using .setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

1.5. A basic connection object

It’s usually helpful to encapsulate raw socket objects in a class which deals with message parsing and error handling. Here is the code I use as a starting point:

from time import time
from datetime import datetime
from sys import stderr

INACTIVITY_TIMEOUT = 600 # The maximum time to allow to pass without
    # receiving a message. Assuming the other side has the same rule,
    # we send a blank message twice as often to make sure the
    # connection stays alive.
MAX_MESSAGE_LENGTH = 4096 # The maximum message to allow sending.
RECV_MAX = 65536 # The max. amount of data to try receiving each frame.

def log(message):
    # A simple logging function. You should probably replace this with
    # something better (e.g., logging to the database on server-side,
    # or the in-game console client-side).
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]
    print(f'[{timestamp}] {message}', file=stderr)

class Connection:
    id_counter = 1

    def __init__(self, sock):
        self.sock = sock
        self.sock.setblocking(0)
        self.inbox = b''
        self.outbox = b''
        self.last_sent = time()
        self.last_received = time()
        self.id = self.__class__.id_counter
        self.__class__.id_counter += 1
        log(f'New connection #{self.id} from '
                f'{self.sock.getpeername()[0]}')

    def close(self):
        if self.sock:
            try:
                self.sock.close()
            except IOError:
                pass
        self.sock = None

    def receive_messages(self):
        # Try to absorb any incoming data and return a list of all
        # available message since the last time this was called.

        # This is a no-op if the socket has already been closed.
        if not self.sock: return []

        try:
            data = self.sock.recv(RECV_MAX)
            if data == b'':
                # Other side closed connection.
                log(f'Connection #{self.id} closed by peer '
                        '(empty read).')
                self.close()
                return
        except BlockingIOError:
            data = b''
        except IOError as e:
            log(f'Connection #{self.id} had an unexpected I/O error: '
                    f'{str(e)}')
            self.close()
            return

        if data:
            self.last_received = time()
        elif (time() - self.last_received) >= INACTIVITY_TIMEOUT:
            log(f'Connection #{self.id} timed out '
                    f'({INACTIVITY_TIMEOUT}+ seconds of inactivity).')
            self.close()
            return

        unprocessed = self.inbox + data
        chunks = unprocessed.split(b'\n')
        for i, chunk in enumerate(chunks):
            if len(chunk) > MAX_MESSAGE_LENGTH:
                log(f'Connection #{self.id} sent a message over the '
                        'size limit; closing it.')
                self.close()
                # Continue processing all messages up to the offending
                # one.
                chunks = chunks[:i] + [b'']
                break

        self.inbox = chunks[-1]
        res = []
        for chunk in chunks[:-1]:
            # Some OSes/protocols use '\r\n' as the end-of-line
            # delimiter, instead of just '\n'; we can be flexible about
            # this.
            chunk = chunk.removesuffix(b'\r')
            chunk = chunk.decode('ascii', 'replace')
            # Ignore zero-length messages, which we use to keep the
            # connection alive rather than to send actual data.
            if chunk != '':
                res.append(chunk)
        
        return res

    def enqueue(self, message):
        # Add a message to the send queue. It will not actually get
        # sent until the next call to f'try_send_queued_messages(..)'.

        assert isinstance(message, str)
        assert message != ''
        assert '\n' not in message
        assert message[-1] != '\r'
        self.outbox += message.encode('ascii', 'error') + b'\r\n'

    def send_queued_messages(self):
        # Attempt to send as many of the messages in queue as possible.

        if not self.outbox: return
        if not self.sock:   return

        try:
            bytes_sent = self.sock.send(self.outbox)
            if bytes_sent == 0:
                # Other side closed connection.
                log(f'Connection #{self.id} closed by peer '
                        '(failed write).')
                self.close()
        except BlockingIOError:
            bytes_sent = 0
        except IOError as e:
            log(f'Connection #{self.id} had an unexpected I/O error: '
                    f'{str(e)}')
            self.close()
            return

        if bytes_sent:
            self.outbox = self.outbox[bytes_sent:]
            self.last_sent = time()
        elif (time() - self.last_sent) > (INACTIVITY_TIMEOUT / 2):
            self.outbox += b'\r\n'

    @property
    def alive(self):
        return self.sock is not None

Feel free to use this in your own program as well. In the client, you generally have one Connection object which represents the server; the server will have a list of Connection objects which it iterates through every frame.

1.6. Listener sockets

Up until now, we’ve been creating sockets using socketpair(..). This works when both sockets are owned by the same program, but now we need to actually support separate programs on separate machines (the client and server).

Whenever a new client connects, it needs to create a new socket on the server. A special “listener” socket is responsible for receiving requests to connect from prospective clients, and minting new sockets for them. You create “listener” sockets using the .bind(..) method, which takes a 2-tuple address/port, similar to .connect(..). The address/port you pass to .bind(..) is what your clients will pass to .connect(..) to reach that socket.

The rules for Internet addresses are moderately complicated; we will discuss them in full below. For now, use address “127.0.0.1” and pick a random port between 4096 and 32767.

import socket
listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listener.bind(('127.0.0.1', 5001))
listener.listen(10)
listener.setblocking(0)

The .bind(..) and .listen(..) methods must both be called, in that order; if you omit either the listener will not work correctly. The parameter to .listen(..) controls the backlog, which is the number of connection requests that are allowed to pile up before new connections are turned away.

The .accept() method accepts a new connection. It returns a (sock, addr) 2-tuple: sock is the newly-minted socket connected to the client, and addr is the client’s address (itself a host/port 2-tuple).

Like .recv(..), .accept() will block until a new connection arrives unless you have called .setblocking(0) on the listener. Also like .recv(), .accept() can also raise spurious IOErrors; make sure to handle these appropriately.

The .setsockopt(..) call above is important on Linux. By default, after a listener socket stops using a particular port, that port is “locked” by default for a small timeout (somewhere between 1 and 4 minutes on most systems). This is extremely unhelpful when debugging the server, since you’ll be crashing and restarting it often. Setting SO_REUSEADDR disables the timeout.

1.7. Connection pool and event loop

On the server side, you need to maintain a pool of active connections, adding new connections as they are accepted from your listener socket, and closing old ones. In general, a first stab at your server-side event loop should might something like this:

# Assume "listener" has been created and bound previously
connections = []
while True:
    # Accept new connections
    while True:
        try:
            new_sock, new_addr = listener.accept()
            connections.append(Connection(new_sock))
            print('Accepted from', new_addr)
        except BlockingIOError as e:
            break
        except IOError as e:
            print('Error on .accept(): ', str(e))
            break

    messages = []

    # Receive all incoming data
    for conn in connections:
        for message in conn.receive_messages():
            messages.append((conn.id, message))

    # Update world state
    world.update(messages)

    # Send out any generated messages
    for conn in connections:
        conn.send_queued_messages()

    # Delete all inactive connections from the list
    i = 0
    while i < len(connections):
        conn = connections[i]
        if conn.alive:
            i += 1
        elif i < (len(connections) - 1):
            connections[i] = connections.pop()
        else:
            connections.pop()

1.8. Improving performance with DefaultSelector

If, at any given time, most of your connections are idling (maybe your game is turn-based), you will waste a lot of time calling .receive_messages(..) and .send_queued_messages(..) on all your Connections even though most of them don’t have any data to receive or send. To avoid this, Python provides a class called DefaultSelector, which is an abstraction over the C functions poll(..) and WSAPoll(..) on Linux and Windows respectively. We will not do this optimization now, as it is somewhat involved, but it’s worth researching if your server performance is being constrained by polling hundreds or thousands of inactive sockets each frame.

With the initial details out of the way, we can discuss the details of IP addresses and the structure of the Internet.

2. Internet Addresses and Hosting

Data is not send continuously over the Internet. It is divided into chunks called “packets” which are passed from computer to computer. Packets are like postcards: they carry a destination address, a return address, and a small amount of data. They are not guaranteed to arrive, and even if they do they may show up out-of-order, so when you use stream sockets your operating system handles the task of reassembling packets into their correct order and re-requesting ones that were lost in transit. This happens invisibly to you.

Incidentally, this is the difference between stream and datagram sockets: datagram sockets more or less let you construct the packets directly, and you are responsible for dealing with it when they get lost or swapped (or even duplicated; the same packet can arrive twice!).

2.1. IP addresses

Machines on the internet are identified by IP address, and each computer has 65536 virtual “ports”. Packets are addressed to an IP and port, and they also have a “reply address” where responses go.

There are two kinds of IP addresses, IPv4 and IPv6, but we will stick with IPv4 for this article because it is simpler. IPv4 addresses are made of four numbers called octets separated by periods, e.g. 127.0.0.1. Each octet ranges from 0 to 255.

2.2. Port numbers

Not much to ports; they go from 1 to 65535. If you ask for something to happen on port 0, the operating system will pick a port for you. On Linux, ports below 4096 are restricted; you either need to be root or have special permission to bind to them (although you are allowed to connect to low-numbered ports). You should also generally not use ports about 32767, as the operating system likes to use these for incoming connections or “nonce” ports.

2.3. Networks

A “network” is a group of computers linked together so they can all reach one another. Within a network, all computers have different IP addresses. Usually they are assigned within a particular range: for example, the computers on my home network are 10.0.0.1, 10.0.0.2, and so on. Small networks, like your home network, business Wifi, or the network at a coffee shop, are generally called LANs. You can identify IP addresses within LANs because they usually follow one of the following three formats:

192.168.x.x
10.x.x.x
172.16-31.x.x

The computer I’m typing this on has an address of 10.0.0.10.

2.4. The WAN and localhost

The Internet is one particular very big network; in some contexts it is referred to as the WAN (although this can mean other things depending on the situation; there are other networks besides the Internet that are sometimes also called WANs.). Your computer isn’t directly part of the Internet, but it is linked to it via your router.

A computer can be on multiple networks at once, in which case it will have multiple IP addresses; one for each network. Your router is on two networks: the LAN (where it talks to your computer), and the WAN (the Internet). Your computer is also on two networks: the LAN, and a special network called the “localhost network”, which is a special internal network of which your computer is the only member. On this network, the computer has an address of 127.0.0.1. So if you ever see the address 127.0.0.1, it means you are connecting to the machine you are currently on.

In fact, any address 127.x.x.x will route to the same machine on the localhost network.

2.5. What do routers do?

Computers on the same network can all talk to one another directly: you can send packets directly to another machine on your home network, and it will receive them (it probably won’t respond though). The Internet is a giant network, and you are connected to the Internet – so you might think: can I send a packet to anyone else’s machine?

The answer is no, you can’t. Your computer has an IP address on your LAN, but it doesn’t actually have an IP address on the Internet. So how is your computer “connected” to the Internet?

The answer is: packets can jump between networks, if there is one machine that joins the networks together. This is called forwarding, and your router is the one that does it. When you ask to connect to an IP address that isn’t on your LAN, your computer sends the packet to your router and asks it to forward the packet onto the WAN for you. The router will oblige, but it will rewrite the response address on the packet to point to itself (since computers outside the LAN do not know about addresses on the LAN). In other words, when the destination server receives the packet, it will have your router’s WAP IP address as the return address; and so the response will get back to your router, which will in turn forward it to you. Congratulations – you are connecting to the Internet!

However, the router only allows response packets when you have initiated the conversation by sending an outgoing one. This means, people on the outside cannot talk to your machine unless you have connected to them first. Have you ever heard people say that home Wifi is safer than coffee shop Wifi? This is why: anyone can log into the coffee shop Wifi and send packets to your machine, but on your home network only other machines in your house – presumably trusted – can do that.

One effect of this system is that many machines may appear to have a single IP address. For example, college dorms often share a router; meaning that if you ban by IP, you may inadvertently ban several innocent people besides the bad actor.

This system is called NAT and while it is good for security it is something of an annoyance for game developers. In particular, it makes it tricky to design games that let users host their own servers (unless they are hosting for players on the same LAN with them).

2.6. Domains

People do not like storing long sequences of numbers in their head; IP addresses are long strings of numbers; this is a problem. To solve it, the DNS system was invented: DNS, like a phone book, translates names such as “example.com” into IP addresses (such as 104.20.23.154). These names are called domains. Maintaining the DNS system costs money, and so domains cost money: you buy them from a registrar (it depends on the exact domain, but most of them cost between $10 and $60 a year).

DNS domains are hierarchical: “thing.example.com” is a subdomain of “example.com”, and the person who owns “example.com” can also control what “thing.example.com” points to. You can use subdomains for any legal purpose, but some people resell them: for example, Wordpress blogs, by default, are created as myblog.wordpress.com or similar. Wordpress loans you the use of one of its subdomains. Just because something has an official-looking top domain (like “google.com”) doesn’t mean it’s legit! Someone might have rented the right to use the subdomain and be up to nefarious purposes. This happens often with scams.

You may have noticed that most sites end in “.com”, “.net”, “.org”, or something similar. If “thing.example.com” is a subdomain of “example.com”, is “example.com” a subdomain of “com”? The answer is Yes, it is! “com”, “net”, and the rest are called Top-Level Domains, or TLDs; they are owned by various groups, who delegate the actual job of managing them to registrars (the people you buy a domain name from).

When you buy a domain, you will get the opportunity to create “records”: the most important kind are A records, which tell what IP address the domain belongs to. This needs to be a WAN address, not a LAN one (because outside computers cannot connect to your LAN!) and it cannot be your router’s WAN address (because your router does not accept incoming connections from unknown machines).

So: if you are making a multiplayer game, you need to rent a server.

2.7. Acquiring server space

There are multiple companies who will rent you a server in a datacenter. Usually the server comes with a WAN IP and is a Linux machine that you can do what you like with (provided you violate no laws). There are dozens of such companies out there, but the two I’ve worked with are DigitalOcean and CloudFanatic; DigitalOcean is slightly more expensive but slightly more reliable. Both will give you the use of a serviceable machine for $4/month.

Since the machine isn’t physically in your house, you can’t plug a screen and keyboard into it; you’ll need to use SSH and control it over the command line. SSH and the Linux command line are unfortunately out of scope for this article; we may come back to them in a future part two. For now, if you’re used to working with a Linux machine, you’ll be reasonably conversant over SSH.

Infrastructure intermezzo over: let’s return to how to design a communications protocol.

3. Message Design

Now that you have the ability to send and receive messages, you must decide how to carry data in them. The simplest way is to have each message contain a command and some parameters separated by spaces; for example, the server might update an entity’s position in the client:

MOVE entity01 230 329

Perhaps this moves entity01 to (230, 329). Of course, you would need to parse and validate the string on each side. You could also use more structured schemes, like sending JSON objects in each message:

{"cmd": "move", "id": "entity01", "pos": [230, 329]}

If you decide to drop the newline-delimited message system and switch to a length-prefixed one, you can transmit binary data. While this is less convenient to debug, you can then use efficient binary formats like MessagePack to send/receive data.

Keep in mind that you may have many messages in flight at any given moment. In general, you should try not to send two versions of a message twice in the same frame; for example, if an enemy takes damage from two players in the same frame, and you display an animation for this in the client (maybe it flashes red, or grunts), wait until the update loop is completed and then send the damage message once, rather than sending damage messages for every player that hits it on that frame. Likewise, when receiving messages, if you get an older and a newer message where the newer supersedes the older, you may be able to discard the older message; for example, if a client submits two “move” packets, you can usually just use the newest one.

Unfortunately, because packets do not travel instantly, there will always be a small amount of inconsistency between the world model in the server and the world model in each client. Modern multiplayer clients are usually designed to be “optimistic”; if the player presses W to move forward, it would be maddening to wait for the server to accept the command before starting to move the player. So the client will make a best-guess simulation and wait for the server to confirm it. If you’ve ever been playing a multiplayer game and been suddenly “snapped” back to a location you were at a few seconds ago, that is this system breaking down: the client moved you, and the server received the movement packets later and rejected them, ordering the client to teleport you back to your previous position and try again.

Remember too that messages arrive in the order they are sent. If you send a very large message (for example, the entire contents of the map), all other messages will be blocked waiting for the big one to finish arriving. It’s better to keep messages small so the queue keeps moving.

3.1. Useful tools

There is an extremely helpful program called Netcat, which lets you connect to a socket and send and receive data directly to/from the terminal; each line you type is sent to the socket, and data returned from the socket is pasted directly to the terminal. Since our message protocol uses newlines as separators, each new message will show up as a separate line (see why we picked this method?).

Netcat is installed as nc by default on most Linux systems. On Windows, Nmap ships a version of Netcat called Ncat, or you can use PuTTY in Raw mode. If you choose the latter, make sure to force-enable local line editing in Terminal configuration; otherwise your messages will be sent character-by-character when you’re typing them rather than when you press Enter (and Backspace will consequently not work correctly).

3.2. Message flow graphs and pathological states

You should be able to draw a graph of which messages imply which other messages. If a player right-clicks on a chest, we presumably send an ACTION_OPEN_CHEST message or something like that. The server needs to send back the chest’s contents, maybe in an OPEN_CHEST_INVENTORY message. So the ACTION_OPEN_CHEST message has an arrow on the graph pointing to OPEN_CHEST_INVENTORY.

The correspondence is not one-to-one: some messages imply many responses (e.g. moving to a new zone on the map might send lots of messages), and some messages can happen in lots of situations (teleporting, opening doors, or logging in/out might trigger loading new locations, for example). However, your graph should have no loops in it: if message A trigger response B, and B in turn triggers A, the server and client will flood the connection with useless back-and-forth.

Try to avoid situations where two messages need to both get sent in order for things to work correctly. It’s unavoidable sometimes, but they are a source of bookkeeping bugs and complexity. For example, if your authentication protocol requires a separate USERNAME and PASSWORD message, for example, there are at least four possible states for the connection:

Neither has been sent
USERNAME was sent but not PASSWORD
PASSWORD was sent but not USERNAME
Both were sent

You can save a lot of headaches by reducing the number of possible states you have to think about. Why not combine the USERNAME and PASSWORD messages into a single AUTH <username> <password> message?

3.3. Smoothing out cascading errors

When a player walks into a zone, the server will usually send over an initial copy of the zone map – the terrain, and the positions and attributes of all entities in the zone. Over time, those things will update, and you will need to send server-to-client messages to keep the client’s world model in sync with the server’s. If your code contains a bug, you may fail to notify the client on some world state changes, which will cause the client’s world model to drift out of sync.

For example, suppose the current zone contains a chest. If you send over the chest’s contents initially, but forget to write code that updates all other players when one player adds/removes items from the chest, other players will be incorrect contents when they open the chest, and will experience frustrating errors when they try to add or remove things.

Obviously this is due to a bug in the game which you should fix. But some bugs always escape notice during testing, and so a good rule is to implement a fallback so that they annoy players as little as possible. In the case of bugs related to de-synced world state, one way to do this is to periodically re-transmit the entire state of the world as if the player had just entered the zone for the first time again. (As mentioned above, try to do this bit-by-bit rather than all at once.)

Ideally, when you do this, have the client check whether the new copy of world state is identical or different to what they had on file, and raise an alarm if it is different (as this indicates that there is a bug in your world update code).

3.4. Let the client log unexpected behavior on the server

Detailed logging is your friend. While developing singleplayer games, print(..) suffices; when you ship to production, you can replace print(..) with log.debug(..), so that if a player reports a bug you can have them send you the log file. If you are writing a multiplayer game, you have a great advantage: you can submit the log messages yourself, over the socket, and store them in the server’s log.

For example, on the server you might want to log every instance of a client appearing to move in invalid ways (too fast, or clipping through a wall). It might be cheating – in which case you want it on record for when you ban them – or it might be a bug in the movement code, in which case having details logs from the client will be extremely helpful for you.

4. Security Considerations

Some concrete principles for reducing the likelihood and severity of damage:

4.1. Observe the principle of least knowledge

A few players will always cheat, even if the game is collaborative. Since even a small percentage of cheaters significantly damages the experience for everyone else, it’s worth spending a disproportionate amount of time minimizing the ability of cheaters to cheat.

In general, you should always assume that the client program will be compromised. If your game is competitive, players will use rendering hacks to get the game to draw players who are not actually visible (e.g. making them “glow” through walls). You can cut this off by not sending any data to the client that it does not strictly need to know to render the scene.

4.2. Assume all client input is untrustworthy

If it is possible for a badly-formatted message to crash your server or give a client powers it shouldn’t have, that will be exploited at some point in the future. Stress-test your message parsing code! Are there any unusual sequences of bytes/message types that can kill the server?

4.3. Cap message length and set timeouts

Denial of Service attacks are where an attacker attempts to use up all a server’s resources so that legitimate users cannot play the game. One easy way to do this is to send a message of infinite length; the server will exhaust its memory trying to hold the entire message while it waits for the delimiter. The solution to this is to impose a maximum size on messages.

Likewise, an attacker might hold open thousands of inactive connections to try and bog down the mainloop. Timeouts for inactivity will help catch this. A good design is to use a stringent timeout (say ~5s) until a user logs in, then switch to a more lenient timeout (e.g. 2min).

4.4. Do not store plaintext passwords on the server

The server has to let players log in, which means it needs to store login information in a database of some kind. This database will need to be backed up, which makes it more likely that hackers will be able to gain access to it; disgruntled former employees may also leak database information. For this reason, do not store passwords unencrypted in the database! You must use a modern hashing algorithm specifically intended for passwords, such as Bcrypt, Scrypt, or Argon2.

4.5. Do not store insecurely hashed passwords on the server

Python has a built-in hashing library, hashlib, which performs MD-5, SHA-2, and SHA-3 hashing. These hashes are not safe for passwords! They are specifically designed to be easy to compute, which means given the hash value attackers can deduce the password by brute force. (Consider that there are only about 6.6 quadrillion 8-character passwords; if an attacker can test 10 billion combinations per second, which is achievable with just a few machines, they can break a user’s password within 4 days on average, and much less by employing common heuristics.) Password-hashing algorithms such as those above are designed on purpose to be slow to compute, avoiding this problem.

4.6. Do not transmit plaintext passwords on unencrypted channels

If you are not using encryption, every byte you transmit or receive can be inspected and modified by any device along the path it takes to its destination (and generally by all devices on your LAN). Do not send passwords unencrypted! (It’s fine to do it in debug, of course, but remember to fix it before shipping your game.)

In my own game, I split the authentication off onto its own server. This happens via HTTPS, so it is covered by TLS; the server then sends back a one-time token that the client uses to authenticate itself with the gameplay server. This way the password is never sent unencrypted.

If you cannot use TLS for some reason, you may be able to use SRP or another PAKE protocol to do a zero-knowledge password exchange. This method is considered less reliable than encryption, as it is subject to less scrutiny.

4.7. Use a three-strikes rule

If an attacker is repeatedly doing something detectably malicious, you can set up your server to automatically block them after a certain number of “strikes”. The block can be permanent or temporary. Usually, you block them by IP address; the most straightforward way to do this is to compare the IP address returned by .accept() to an IP blacklist, and immediately close the connection if it appears on that registry. However, rather than doing this yourself, you can delegate the work to a firewall like UFW, which will prevent the malicious connection from ever reaching .accept() in the first place, saving you cycles. Fail2Ban is another popular choice for enforcing N-strikes rules.

5. Conclusion

I’ve left out a number of things in this article: selectors like DefaultSelector, password encryption mechanisms, UDP (“datagram”) sockets versus TCP (“stream”) sockets, and the other, lower-level forms of packet communication on the modern Internet. Those can wait for another article; for now, hopefully this provided enough to get you started on your journey. Good luck, and happy hacking!

Table of Contents

Recent Articles

Categories

Introduction to Network Programming

1. Basics

1.1. Connecting to example.com

1.2. Testing with socket pairs

1.3. Message Passing

1.4. Gotchas and glitches

1.5. A basic connection object

1.6. Listener sockets

1.7. Connection pool and event loop

1.8. Improving performance with DefaultSelector

2. Internet Addresses and Hosting

2.1. IP addresses

2.2. Port numbers

2.3. Networks

2.4. The WAN and localhost

2.5. What do routers do?

2.6. Domains

2.7. Acquiring server space

3. Message Design

3.1. Useful tools

3.2. Message flow graphs and pathological states

3.3. Smoothing out cascading errors

3.4. Let the client log unexpected behavior on the server

4. Security Considerations

4.1. Observe the principle of least knowledge

4.2. Assume all client input is untrustworthy

4.3. Cap message length and set timeouts

4.4. Do not store plaintext passwords on the server

4.5. Do not store insecurely hashed passwords on the server

4.6. Do not transmit plaintext passwords on unencrypted channels

4.7. Use a three-strikes rule

5. Conclusion