Reliable delivery in HTTP

Achieving reliable delivery in HTTP is not difficult. It takes a little bit of understanding of how HTTP works and what reliable delivery means.

Let's first define the scope of the problem: we are talking about problems with reliability of transmission and not reliability at one end-point or another. You can buy expensive systems that use disk-backed storage to guarantee reliable delivery of queued messages using proprietary protocols. There is no reason that a similar system could not be developed for HTTP. That is a software issue, not a protocol issue.

Next let's discuss what can go wrong in a transmission. No matter what protocol you are using there is some chance that something will go down with a network cable or intermediary while a message is en-route. The only way around this is to re-send the message after the network is back up. If the network problem occurs while the sender is the middle of sending the message then it knows it was not received and it can re-send it. But if it occurs while the responder is replying (or before it starts replying) then there is some ambiguity about whether the responder was about to report successful acceptance or failure, or whether the responder got the message properly at all.

One way to make sure that your data gets through is to try and try again until you get a proper acknowledgement. If the action you were completing is "idempotent" then by definition it is safe to try again and again until it succeeds. The HTTP GET, PUT and DELETE methods are idempotent. If you think about them as if they were file system commands it should be clear why. It never hurts to read a file over and over again. It also does not hurt to write a file over and over again as long as you are writing the same data. And deleting a file repeatedly will at most give you an error message. Bear in mind for PUT and DELETE that there is still the lost update problem that affects any multi-user system. It is not really specific to reliable messaging but should be considered nevertheless.

On the other hand, the POST command is sort of like adding to a file. You can see how adding the same data to a file over and over again is very different than merely overwriting the file with the same data over and over again. POSTed data often accumulates. Because POSTed data accumulates we need to make certain that we set up our system so that multiple POSTs of the same data are not harmful.

The way to avoid this is to put some kind of message ID in a header or in your message body. You can keep track of these messageIDs and ignore any message that comes from a message ID you have already seen. This is not required by the HTTP specification because it takes more effort to keep a list of message IDs than the average CGI programmer would want to spend for non-critical applications. Plus HTTP is supposed to be a stateless protocol. This strategy requires a little bit of mandatory server state so it arguably does not belong in the protocol.

Who generates the message IDs? The client software can use some form of UUID generator. If you trust your clients this is a reasonable strategy but more scalable apps should probably move ID generation to the server as we will discuss next.

The better way to do message ID generation is on the server. UUID generator algorithms are tricky and the client may not get it right. It might reuse messages IDs or accidentally clash with another client.

Conversely, the client may not trust the server. If the server has not implemented the reliable messaging algorithm then it would just ignore the message ID! The client has no way of knowing that it did not know that it was supposed to handle it.

A better strategy is to have the server generate the message IDs. One elegant way to do this is to have the client do a POST asking for an message ID. The server can return a "Location:" header which points to a newly generated URI where the client may POST the data. For instance:

Client request:

POST /reliableservice.cgi HTTP/1.1

Server respose:

HTTP/1.1 201 CREATED
Location: http://mysite.com/reliableservice.cgi?messageid=32868937368

POST http://mysite.com/reliableservice.cgi?messageid=32868937368

(message body here)

The reason this works is because we are using the original non-reliable POST only to generate a new message ID and message IDs are cheap. We can retire them (whether they have been used or not) after a few hours. They are so cheap that we could hold on to them for weeks!

A client need not worry about accidentally creating two because the "wasted" one is irrelevant. The client also need not worry about accidentally posting to the generated messageID URI twice because the server can ensure that it only acts on the first message posted to that URI. The response of subsequent POSTs should be the same as if there had been only one POST so that the client can get the correct response even if there is a network outage in the middle of the first response.

It would be easy for us to turn this into a formal specification but it is so easy to do with HTTP that there does not seem much point. If I were going to standardize it, I would standardize an RPOST (for reliable post) method which would either accept a message ID from the client or return a new URI with a message ID in it.