With the last post I tried to establish the problem: The fact that exception handling needs more than try/catch. That it needs a proper exception management strategy, which should be part of the application architecture. And I paid special attention to the paradox demand that exception management should work when everything else fails.
Now, I cannot provide THE strategy, for there is no such thing. But I may present one feasibly approach that has served me well, quite a few times. I’ll simplify quite a bit, but it is nonetheless a complete approach that may serve as starting point for more complex demands.
OK, let’s play…
To define the playing field: The following is about classical n-Tier ASP.NET web applications, and I have decided to represent user induced issues, like business validation errors, as exceptions.
Some Groundwork
Exceptions are used for different error conditions, but also for non-erroneous situations, which leads to a surprising variety:
- Exceptions being thrown as result to infrastructure issues. E.g. IOException, SecurityException. They may be caused by invalid configurations, by unavailable infrastructure resources, or for other reasons.
- Exceptions representing errors reported by external services, like database constraint violations or SOAP exceptions.
- Exceptions being thrown because the caller did not meet the expectations of the called code, e.g. a violation of a contract. (Usually ArgumentExceptions).
- Exceptions being thrown because a developer’s lapse. NullReferenceException.
- Exceptions being thrown to sidestep the current flow of control, e.g. HttpResponse.Redirect() throws a ThreadAbortException to prevent an ASP.NET page from further unnecessary processing.
- Business validation errors. The business logic decides that – despite proper input validation – something is not quite as it should be. Not exactly an error in a technical sense, but the processing simply cannot go on.
- User concurrency issues. Like some user wants to update some data that has been changed by someone else in the meantime. And “last one wins” may not be good enough.
Please note that while Basic Rule Number 1 for exception handling (no abuse of exceptions for non-exceptional conditions) is as valid as ever, there are certainly exceptions (no pun intended) to that rule. For example, aborting the request upon redirection certainly makes sense; (ab)using an exception to do so may violate said rule, yet it is far more efficient and convenient than any other alternative. The same may be true for business validation errors. Not exceptional, yet exceptions may be the most convenient way to accommodate them.
This is a (perhaps surprising) variety and may even be incomplete – actually that variety is a major contributor to the complexity of exception handling. And yet, it is only half of the equation. Exceptions have cause and effect, and I found that tackling exception handling from the opposite angle, the effect that is, has certain advantages.
But before we come to that, there’s a question that has to be answered…
What is the purpose, the intention of throwing exceptions? (Seriously!)
This is a question one should ask at least once, even so it may seem a bit superfluous. For the answer does not include some of the usually mentioned suspects: Preventing data inconsistencies? Let’s simplify a bit and state that this is the job of transactions. Gracefully shut down? An error happened, what do I care how the application goes down.
Actually it’s far simpler: An error happened! Something which the developer didn’t anticipate or couldn’t cope with. The primary purpose of exceptions is to communicate that situation. And exception handling in turn it is about notifying everyone involved in the appropriate way, telling them what they are supposed to do now, either to compensate the issue or to prevent it from happening again. In other words: It’s about responsibilities emerging out of this error.
And this is what my strategy revolves around: Responsibilities.
An Exception Handling Strategy
Back to cause and effect, and let’s start by the later one. More to the point, the desired effect, driven by what is actually asked for in case of an error (rather than what could be done). This will lead to a strategy that is more simple, easier to understand, and easier to implement.
As I mentioned ‘responsibilities’ the first question is kind of obvious:
Question #1: Who should be responsible for what?
This is no esoteric question aiming at pieces of architecture or code. Rather it’s asking which person has to shoulder the work (or the blame, if you like):
- The Administrator: He has to solve problems caused by infrastructure issues (such as invalid configuration entries, unavailable databases, and so on).
- The User himself: Let him deal with errors caused by invalid input, other users working simultaneously with the same data, or in some other way inherent to the intended usage of the product. RTFM!
- The developer (YOU!): Anything else falls back on the developer’s desk.
Obviously the last point is the inconvenient part: Every exception defaults to that category, unless the developer explicitly “tells” someone else that he is responsible. Let’s stress that: An error caused by an invalid configuration entry is the developer’s problem – unless he explicitly tells the administrator that he should take care of this.
So, now we are back where we started right at the beginning, at the developer’s desk. The difference is that the developer now has some specific goal: Lay the blame on someone else’s doorstep!
This is actually a very important point. Making someone else responsible is in the developer’s own interest. It helps him avoiding unnecessary and annoying future work. Quite motivating if you ask me.
This makes the next question obvious:
Question #2: How does the developer tell the user or the administrator that he is to take responsibility?
The answer is simple: Provide the necessary feedback, e.g. a message box or an event log entry; in more detail:
- In case of a user error you tell him with a message box, a popup, within some message area or on a special “you did wrong!” page.
You don’t tell the admin, because he doesn’t care for typos. Neither do you as developer. - In case of an infrastructure issue you show a standard message to the user, telling him to call the admin and come back later. You don’t want to include details, because they might include security relevant information, such as the always cited database connection string.
Additionally you have to give the admin all information he needs, usually within the event log.
In both cases it’s important that you provide not only the information that an error has occurred (that much is obvious, isn’t it?). You need to include information on how to solve it. Otherwise it will eventually again be up to you, the developer, to solve the issue. That’s why it makes sense not to stop at “couldn’t save data” or “data source not available”. Rather invest some work – in your own self-interest – in providing “Customer data could not be save because it is locked by user ‘XYZ’” or “Database not available, connection string:=…”.
- Remember the last case: In case of technical issues for which the developer is responsible, you still have to tell the user and the admin something. The user should again get some standard error page (not the ASP.NET error page!), the admin should get an event log entry that tells him to notify the developer. This entry should include as much information as possible for a post mortem diagnosis.
So, essentially this paragraph revolved around presenting an exception, at the UI level where the information leaves the area of your application code. It’s quite simple to implement that, because you can rely on centralized features: ASP.NET provides a global error handler that can be used to provide standard event log entries. Presenting user errors on a page should be boilerplate as well, the only distinction being whether the current page and data is still valid (just show a message box) or not (redirect to some business error page). Nice, easy, and no rocket science.
Of course in order to do this we need a means of distinguishing those three cases. For this, all you need is respective exception classes, two actually. One BlameTheUserException and another BlameTheAdminException. You don’t need a BlameTheDeveloperException, because he is already to blame for every other exception, no need to stress that ;-).
Question #3: Who to put exceptions to work… ?
If the developer’s goal is to blame someone else, then this goal can guide him answering the more detailed questions, such as, where in his code should he be catching exceptions (where not)? When should he throw exceptions (when not)? What about logging? And so on. Everything else follows as consequence…
First about throwing exceptions. Yes, not only the system throws exceptions; you may do the same, and no need to be shy about it. Sprinkle your methods with guard calls throwing argument exceptions, they imply coding errors on the callers side. If the code allows certain conditions that are logically impossible, guard yourself against that impossibility. Throwing exceptions in exceptional cases is no problem at all. Actually it is by far the preferred way to react this harshly, rather than obscuring erroneous conditions because of falsely understood modesty. Fail early, fail often!
Handling exceptions, as in solving the problem once and for ever is of course possible. If it is possible – which is in my experience rarely the case. (It had to be mentioned, though.)
Enriching exception information is a more frequent task. Whenever it makes sense for the post mortem diagnosis, you should add context information to the exception. Adding the actual SQL statement and the connection string to a database error, adding the user name and the accessed resource to security exceptions. This is valuable, if not necessary, to diagnose the root cause of the problem. It’s not even necessary to wrap the exception to do that, since the exception class has a dictionary Data for this very purpose.
Promoting exceptions: Once your code is able to decide that a certain error is an administration or user problem, you should wrap it in the respective class, thus “promoting” it.
This sentence implies the fact that not every location is suited to make that decision. The database layer should probably never decide whether a particular error is the consequence of some erroneous user action; this usually depends on the business context in which it is called, and should be decided there. Infrastructure issues on the other hand happen very low in the call chain and couldn’t even be decided further up.
The real work here lies in actually putting in the necessary handlers. However they tend to gather in classes that deal with the outside world, the database, configuration, some service, or whatever. They also tend to be very boilerplate.
That’s it. … Wait! … That’s not possible!
Question #4: Can that really be all?
Actually this is it. No “every method should…”. No “layer specific exceptions…”. No “log it just in case…”. Actually whenever I tell people how to implement an exception handling strategy in an existing application, the majority of the work is removing exception handlers. Sometimes life can be so easy.
In case it escaped you: If the developer doesn’t have to do something to employ proper exception management, he can’t do anything wrong. We don’t have to rely on him to properly handle his own bugs. We just solved a paradoxon!
Going further, I even consider it an issue by itself if you did too much exception handling!
For example “shouldn’t we at least log the error, just in case?” No, you shouldn’t. The global error handler is responsible for doing that, so there is no need to log it. Worse, if the calling code decided to handle the situation and carry on, there is by definition no application issue at all. And yet the event log would show one, waking the admins for no reason, or flooding the event log and obscuring the one important entry.
So what? That’s a strategy?
Yes it is. Check the demands the initial questions if you like, they’re all accounted for. Take some file that doesn’t exist. Either you did nothing to anticipate that situation, then an IOException will ensue. Neither user nor admin are taking care of this, so it’s back on your desk. Or you did handle it and threw a BlameTheAdminException.
Of course I simplified quite a bit, but not at the expense of the concept, only in terms of features. For example you may need to have different reactions for the user, some errors calling for a message box, other calling for a redirect. Similarly some unavailable service may allow retrying rather than requiring aborting the whole application. Or you may need additional exception classes to handle them in code. Anyway, nothing of this affects the strategy as such.
Once you’ve established such a strategy, every decision you make on the code level becomes easier, and more confident. And what’s more, in my experience it streamlines error handling, makes it more robust, and frequently simplifies the code by removing unnecessary exception handling. Because quite often if someone asks for exception handling, the answer is “already taken care of, nothing to do”.
Getting back to baseball, every player on the field now knows exactly what he is supposed to do. Some do all the work, most do nothing most of the time… .
PS: The ending anecdote: http://leedumond.com/blog/the-greatest-exception-handling-wtf-of-all-time/
That’s all for now folks,
AJ.NET
Leave a Reply