Failing safely in iOS Development

Failing safely in iOS development

As developers we work hard to prevent these bugs and use all tools to mind to improve the confidence in our code and projects. Sometimes the inevitable happens though and a system crashes despite all our efforts and sometimes developers have the discussion on whether we would like our software to crash instead of failing silently. It is this last part this article is about, the silent failures which some feel obscures the problem itself.

Why not to Fail-fast on iOS

In systems design, a fail-fast system is one which immediately reports at its interface any condition that is likely to indicate a failure. Fail-fast systems are usually designed to stop normal operation rather than attempt to continue a possibly flawed process. Such designs often check the system’s state at several points in an operation, so any failures can be detected early. The responsibility of a fail-fast module is detecting errors, then letting the next-highest level of the system handle them.

Source: Wikipedia

The above is the short description from Wikipedia about the Fail-fast principle in system design. So is this applicable on iOS development? Well, it isn’t… and it is at same time.

The purpose of Fail-fast within iOS development is to find issues during the development- and test phase of an app, this is about gaining valuable feedback and resolve issues early and thus improve the quality of the app by resolving these issues. The “Next-highest level” in an iOS app, however, is the Springboard. There is no system the app can overload a fail to so it can be handled gracefully. On top of that, this principle is meant and applicable for system design, many of these articles which talk about the Fail-Fast principle (Fail-fast principle in software development, Fail-Fast as examples) are written with “the app” as part of a self controlled system and a wide variety of logging available where quick fixes can be made. With iOS development this is rarely the case; worst case it will take Apple a few days to review your new version of the app or you need to request a expedited review if the “crashing” issue is urgent.

Reading suggestion: “For the second time this year, the Facebook SDK is causing apps like TikTok and Spotify to crash on launch”

On iOS we should always Fail-fast during development and during test; with test I mean not only unit- and UI test, but also regression- and smoke tests. For instance, at a previous client we had the hard rule that the unit-test library may never make actual HTTP calls… never-ever. This resulted in that the network manager had a failsafe that if and when it was called while running test, it would fail-fast by crashing the app with a description that the app tried to make an external network call.

Lets dive into some code

Example one

Consider the following code:

let adBanners = try! JSONDecoder().decode([AdBanner].self, from: bannerData)

If bannerData cannot be decoded to an array of AdBanner, the app will crash using the example above. This is another reason not to have the app crash on iOS; since mobile apps usually are heavily dependent on backend systems using REST services, this could mean that for whatever minor error the backend presents the users app would crash. In this case I would say that the error should be logged to an external service such as Firebase Crashlytics as a Non fatal and the user should be presented with an informational dialogue, preferably even with the possibility to request support from the developer (e.g. customer service).

One could argue “But, silent failures won’t inform us about issues!”, right? Well if this is the case for your app, than I would like to counter with the question: Why not? Why do the silent failures not inform you? In the case of Crashlytics, mentioned above, it has support for letting you customize crash reports, using this support you can create reports which log specific types of errors which you make sure you check regularly… just as you would check regularly for any crashes.

Example two

This is a well known example within iOS development, the sub-classing of a UITableViewCell and the force unwrapping to said specific cell. It is widely accepted that the following code is OK to force unwrapp in this case:

func tableView(_ tableView: UITableView, cellForRowAt indexPath: IndexPath) -> UITableViewCell {
    let postCell = tableView.dequeueReusableCell(withIdentifier: "postCell") as! PostCell
    postCell.postTitle.text = "Hello, world!"

    return postCell
}

If for some reason there would be an issue when developing and the cell with identifier “postCell” does not exist all the while testing failed to identify it and it’s pushed out to production, the app will result in a crash and return the user to the springboard. “Well, that should and would never happen! In this case, the app should crash!” I can hear some say, and to that I would say… why should we put that burden onto the user?

That being said, this article is not about wether to allow force unwrap or not… that is a different discussion. The example above is here to make a point with regards to this articles bottom line.

With a few lines of code we can still Fail-fast and yet be safe in production:

extension UITableView {
    func dequeueReusableCell<T: UITableViewCell>(withIdentifier identifier: String) -> T? {
        guard let cell = dequeueReusableCell(withIdentifier: identifier) as? T else {
            assert(false, "The requested cell type could not be found")
            return nil
        }

        return cell
    }
}

extension ViewController: UITableViewDataSource {
    func tableView(_ tableView: UITableView, cellForRowAt indexPath: IndexPath) -> UITableViewCell {
        guard let postCell: PostCell = tableView.dequeueReusableCell(withIdentifier: "postCell") else { return UITableViewCell() }
        postCell.postTitle.text = "Hello, world!"

        return postCell
    }
}

So what happens in the above? Well, the code in the viewController is virtually the same, except for the removal of the force unwrap and returning an empty cell instead.. however now we’ll crash the app in DEBUG and testing using assert, which would comply with Fail-Fast but we protect our end users with an app not crashed. Yes, the UI will look funky for the user… we could even return a special cell, but the app doesn’t crash. Lets assume that this was an cell for secondary functionality in a view, on the same view there is functionality which is important for the user, but a crash will now prohibit the user from actually using the content the user came to your app for.

How to prevent having to Fail-fast in production

In one word: Tests. I can’t stretch enough how important this is. Preferably you’ll apply TDD (Test Driven Development) and you write your tests beforehand and make them succeed one by one, or in worst case you’ll apply PTDD (Parallel Test Driven Development)… [is that a thing? 🤔 lets assume it is] which means that you write your test and your code simultaneously. These principles will increase the confidence in your code by increasing the tested code (Code Coverage) and thus it will decrease the risk of something going wrong; however, this entails also making tests for endpoints with mock with data that should fail your app.

On top of tests, you can prevent it with… more tests. If you work in a team, make sure you emphasize to your manager how important a dedicated QA resource is. If there really is no budget for it, which oddly enough often seems the case, make sure you set aside a day or more for the team to together regression- and smoke test the app. Collaborate with your Android counterparts and have them regression- and smoke test the iOS app while you do the same for Android. If you do this, and also keep track of the costs of doing so you might have a stronger argument later on to get a QA resource.

In the best (or worst?) of worlds, have other members of staff test out your app… make a little bug contest to see who can find the most bugs. In either way, make sure your app gets tested.

Conclusion

If there is anything I’d like you to take with you from this article, it is that the end user always comes first. In order to increase quality you should not have to decrease it first. If you find that silent failures do not give you enough information, make sure to go back and redo it so it does bring you the data you need.

Reading suggestion: “Enough with ‘Fail fast’ already”.

In the first code example above we’re talking about an “ad banner”… which might be an important source of revenue for the app, however users which get annoyed by your app crashing, or users who remove the app all together, won’t return… so you might be very happy that your users indicated that ad banners could not be decoded but now you have fewer users to show your fix to and even worse… your AppStore reviews might have paid the price for it.

A lot of companies have in their core values to put the customer first and in focus, that they deliver their product for their end user and the end user should be kept in mind every step of the way. So why should we stop with this mindset in our code? If you put yourself in the end users shoes and you were the person investing your time and effort in downloading your app from the AppStore, waiting for the download to finish, opening the app… possibly signing up or completing some task only to come to a screen which requires some part of data which it doesn’t get, and crashes… wouldn’t you get annoyed that yet another app crashed for what might seem to you as “no reason at all”; the end user doesn’t care that the app crashed so that you as a developer could easily get the crash report, they don’t care that the fault comes from the backend… they care about the experience… which the app just failed to deliver upon.