This is part of an ongoing series where we build a webserver from “scratch” for a certain definition of scratch. The primitives we are using are tcp libraries to handle network communication and threading libraries in python to handle concurrent connections.
In the previous post, we practiced TCP communication, sending basic messages using netcat
and the socket
library in python. This post, we’ll upgrade from basic text messages to messages that conform to the protocol defined by HTTP.
HTTP Requests
To keep our webserver simple, we’re just going to focus on the most basic communication model of requests and responses. We’re writing the server so we need to be able to understand requests and form responses. We’ll start with understanding requests.
The easiest way I could think to visualize what an HTTP request looks like was to just start sending HTTP requests to our program from the last post and print out what we see. Here’s what our test server looks like.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Now instead of connecting via netcat and sending text messages, we’ll use curl which sends http requests. In the next gif, I use curl to send a couple different types of requests. If any of the curl options don’t make sense, you can look up the options on the man page
Curl is one way to send http requests but most likely, people will be using a web browser to make requests to my server. I was curious what request Google Chrome would make:
It looks really similar, there’s just more going on. I’m going to take one of the incoming requests we got and try and parse what’s going on.
1
|
|
yielded
1 2 3 4 5 6 7 8 9 |
|
So first off, we’ve got a couple sections: The start-line (aka POST /hello HTTP/1.1
), The Header block and the Body. The formal specification for the format of an HTTP message can be found in RFC 2616
Both [requests and responses] consist of a start-line, zero or more header fields (also known as “headers”), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.
And that’s exactly what we see above. The start-line aka Request-Line can be further broken down into three parts: POST
, /hello
and HTTP/1.1
. Again from RFC 2616:
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters.
Parsing an HTTP Request
Given what we know, we can create a data structure to store the HTTP Request. A really scrappy class to store an HTTP request:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
And if we update our code to parse the incoming message now, we get something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Trying it out:
Which will serve us just fine.
HTTP Response
On the flip side, we’re going to need to form an HTTP response. The web browser right now just errors when we close the connection. Again, we’ll take the approach of seeing the raw response from an actual server. One way to do that would be using netcat:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Note: the echo
line above is manually constructing an HTTP request that looks like:
1 2 |
|
The response that came back from running that command looks a lot like the HTTP request. It’s got a “Status-Line” that mirrors the “Request-Line”. It also has a header block and a body. This makes parsing it really similar to parsing an HTTP request.
Refactoring our earlier code to support both requests and responses as well as both parsing and constructing http messages, we get something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
Where the implementation difference between HTTPRequests and HTTPResponses is just in the start_line
which matches what we saw above.
Putting it all together
We’ve got everything we need to do a really basic server. It’s going to simply spit back out the URL that was requested with a 200 status code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
And indeed, Google Chrome knows how to understand the response it gets back:
Wrapping Up
In this post, we saw what raw HTTP requests and responses look like and threw together some code to serialize and deserialize these HTTP messages. While it doesn’t do any fancy routing, and any necessary response headers have to be set explicitly (for instance, if we were returning a JSON response and wanted to tell the client about the Content-Type
), we could do all of those things by hand with the pieces that we have.
Before moving on, try messing with some of this stuff on your own. An easy way to jump in is to take this command from above:
1
|
|
and modify it. For instance, try sending an Accept
header with the value application/json
, and see if the server responds with an appropriate Content-Type
header
One glaring problem with our web server so far is that it can only handle one request at a time. If a request comes in while this code is in the while
loop above, that request just has to wait for the next iteration of the loop. This is no good, we want to be able to support concurrent requests otherwise our webserver would fall apart pretty quickly under any reasonable amount of load. In the next post we’ll look at some basic strategies for supporting concurrent requests.