Asynchronous operations in REST

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.
Posted on 02 Jun 2011
Tagged with:

REST is hot! But doing REST right is more difficult than most people think. Idempotent methods, hateoas, RMM levels… All terms that a REST developer should know and master. But from a learning (as I do too, by the way) developer perspective, it looks pretty simple: use HTTP methods like get, post, put and delete, map them onto resources, call the underlying database models and you’re done: a fully RESTful API in just 5 minutes. But off course, when you actually have created a RESTful API, you find out very quickly that nothing could be more difficult. One of the more common problems when dealing with REST might be asynchronous operations. Let’s find out how to deal with those…

Creating resources

To sum a REST call up: you POST a representation of a resource to a certain URI, let’s say, blog article to a blog site somewhere along the lines of this code (skipping non relevant headers):

POST /blogs HTTP/1.1
Content-type: application/vnd.myblog.article+xml ; version=1.1

<?xml version="1.0" encoding="UTF-8" ?>
<article>
<title>My blogpost</title>
<author>John Doe</author>
<content>This is the content for my blog article</content>
</article>

The API will respond (when everything goes correctly) something like this (skipping non-relevant headers):

HTTP/1.1 201 Created
Location: /blog/20010101-myblogpost

If you want to edit your blog post, you can use this URI to PUT the new blog content.

As stated in RFC2616 10.2.2: The origin server MUST create the resource before returning the 201 status code.

In other words: you can use the resources as soon as you get the 201 status back from the server. But sometimes the resources cannot be created immediately. Maybe it will be placed inside a task/message queue that will handle the actual creation of the resource or something similar. But whatever the reason is why the resource cannot be created, we MUST NOT send back a 201, even if the resource will be available in a few seconds from now. This is something that has changed between HTTP 1.0 and HTTP 1.1. Instead, we must send back a 202 Accepted message.

202 accepted

This message tells a client that the server has accepted the request, but has not processed (or still is busy processing) the request. Instead of the URI of the actual resource, it would send a location to a status resource.

Request:

POST /blogs HTTP/1.1
<blogdata>

Response:

HTTP/1.1 202 Accepted
Location: /queue/621252

The location URI points to a (created) resource that will display the status of the asynchronous processing:

Request:

GET /queue/621252 HTTP/1.1

Response:

HTTP/1.1 200 OK
<queue>
    <status>Pending</status>
    <eta>10 minutes</eta>
    <link rel="cancel" method="delete" href="/queue/621252"/>
</queue>

Remember the hateoas bit of REST? The “cancel” link tells us that we are able to cancel the processing by deleting the queue-resource. As soon as we are processing, we might not be able to stop, so we get another response:

Request:

GET /queue/621252 HTTP/1.1

Response:

HTTP/1.1 200 OK
<queue>
    <status>In progress</status>
    <eta>3 minutes, 25 seconds</eta>
</queue>

Our cancel-link has disappeared, implying cancellation is not possible anymore.

As soon as processing is done, the server can create the original resource and delete the queue-resource. As soon as a client wants to fetch the status again, the server will return a 303 code:

Request:

GET /queue/621252 HTTP/1.1

Response:

HTTP/1.1 303 See Other
Location: /blog/20010101-myblogarticle

The 303 code implies that there is another resource that must be fetched instead of this resource, in our case, the actual blog post resource.

Conclusion

Off course, this is one way of doing things, but probably the most flexible way. I’ve seen other methods, like creating the original resource with temporary data and filling it later on with other data as soon data is actually processed. But using that method would result it not knowing if the data is processed, if somebody else has modified the (temporary) resource, and we cannot even notify the user about if we are even able to create the resource in the first place.