Retry! in the wild

In perfect world every action finish with the success. In real world it's not. In perfect world sending network request returns with a result. In real world it fail sometimes due to dozen of reasons, and since we're living in the real world I have to deal with it.

don't give up

don't give up right away, maybe if you try again you'll succeed. This sounds like life coaching lesson but no. This is real world.

Background

if you can't but you really want, you may do it

this golden rule apply to network requests too. Especially to requests to cloud services. Network services are designed in a way where it is normal and expected to try again for the result. Again and again. An application that communicates with network service should be sensitive to the transient faults that may occur along the way. Especially mobile applications. Temporary network issues, timeouts that arise when remote service is busy, or requests limits are just an examples of transient faults that should be handled by the application.

Some (mostly cloud) services I know report temporary inaccessibility and kindly ask to retry after given amount of time. CloudKit service is one of such service. When I reach the limit of requests per second, subsequent requests fail with information how long I should wait before retry. These are transient errors that can, and should be handled by the client application.

When to retry

Failable task may fail in different ways and possible scenarios are:

  1. Response indicate that error is not transient or is unlikely to be successful if repeated
  2. Fault is highly unexpected and I should retry to eliminate freak circumstances
  3. Response said I should retry because of "something" happened and response can't be served right away.

Prerequisites

  • Retry by definition require the very same input parameters for every subsequent try (for example handled by conformance to NSCoding)
  • Operation should be "atomic" in sense that single task with multiple retries should be treated still as single task.
  • Result should report success of failure.

Implementation

With all this in mind I've created small wrapper for tasks that can be retried. Below is example implementation for repetitive task FailableTransientURLTask. A struct that is initialized with input parameters and define what exactly is performed by the task. Here I send HTTP request and process response by reporting success or fault:

struct FailableTransientURLTask: RepetitiveTaskProtocol  {  
    private var session: NSURLSession?
    private var url: NSURL
    private var archivedParameters: NSData

    /// Input parameters for the Task. This should be adjusted for the actual Task
    /// For this example required input is in parameters
    init(session: NSURLSession, url: NSURL, parameters: NSCoding) {
        self.session = session
        self.url = url
        self.archivedParameters = NSKeyedArchiver.archivedDataWithRootObject(parameters)
    }

    /// Run the request
    func run(completion: RepetitiveTaskProtocolCompletion) {
        guard let parameters = NSKeyedUnarchiver.unarchiveObjectWithData(self.archivedParameters), httpBody = try? NSJSONSerialization.dataWithJSONObject(parameters, options: []) else
        {
            fatalError("Missing parameters")
        }

        let request = NSMutableURLRequest(URL: self.url)
        request.HTTPBody = httpBody

        let sessionURLTask = session?.dataTaskWithRequest(request) { (data, response, error) in
            // success and error handling
            if let data = data {
                completion(RepetitiveTaskResult(success: data))
            } else if let error = error {
                // check if error is transient or final and throw right error
                if let delay = error.userInfo["ErrorRetryDelayKey"] as? NSNumber {
                    // request failed and can be retry later
                    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, Int64(delay.doubleValue * Double(NSEC_PER_SEC))), dispatch_get_main_queue(), {
                        completion(RepetitiveTaskResult(error: RepetitiveTaskError.RetryDelay(delay.doubleValue)))
                    })
                } else {
                    // request failed for other reason
                    completion(RepetitiveTaskResult(error: RepetitiveTaskError.Failed(error)))
                }
            } else {
                // no data received scenario
                completion(RepetitiveTaskResult(error: RepetitiveTaskError.NoData))
            }
        }

        sessionURLTask?.resume()
    }
}
let task = FailableTransientURLTask(session: NSURLSession.sharedSession(),  
                                    url: NSURL(string: "http://blog.krzyzanowskim.com/rss")!, 
                                    parameters: ["foo": "bar"])

when task is build it can be called finally. Here I instruct that I will retry up to 3 times until definite failure:

task.run(retryCount: 3,  
         failure: { (error) -> Void in
             print("failure \(error)")
         },
         success: { (data) -> Void in
             print("success \(data)")
         })

It may look like boilerplate at first but in fact it is not much more than typical request code with NSURLSession. What is added are initializer for the task, and fact that request is wrapped in the struct (of course).

function run(..) from the task is called on "retry" and is responsible for doing the actual network request. The abstraction layer is responsible for handling result and do the actual retry for specified parameters.

The code

The code above is not complete though - this is just crucial part to be used, to work with. What's not attached here are protocol RepetitiveTaskProtocol, enum RepetitiveTaskResult and enum RepetitiveTaskError. RepetitiveTaskProtocol implements run(..) method that do all the work handling request results and do retry, fail or success to the completion closure.

You'll find all code with demo Playground on my Github repository: RepetitiveTask.

Conclusion

Retry Pattern is commonly used and is expected to be in place for cloud services and for mobile devices with mobile internet. Turn out it is fairly easy to implement in a handy way for the greater good. It's simple thing that should increase comfort level when working with remote services.