Hosting WF is easy. Real Easy. You can host it anywhere you want. Console applications, ASP.NET applications, windows services, shopping bags, closets, … . Well. It turns out that most available samples are too simplistic for my taste. Hosting for the sake of it may be easy, hosting in real world scenarios has its pitfalls.
Hosting in IIS
Given that our example application is a web application, the first attempt of hosting WF might be somewhere within the web application itself. Actually this is quite simple and covers all but one demand. Due to the IIS architecture it is even a very robust solution, one that can easily be adorned with an IIS hosted WCF service interface, again backed by IIS, namely the security features. Just great. Apart from the „but one demand“: IIS hosted code relies on a request! Not a “current” request, just any request that keeps the appdomain running. Should the appdomain shut down and go to sleep for the rest of the night, there is no running WF engine. What if a workflow is supposed to timeout somewhen during the night? What if it is supposed to regularly poll a directory or database table? Nothing of this happens until the next request comes in, wakes the appdomain and eventually starts the WF engine. If you can live with the time lag, go for IIS. If not, IIS is out of question.
And of course our example application cannot live with that time lag. There regularly won‘t be a vacation request for days, sometimes weeks. No timeouts and reminder mails? No way!
Hosting in a Windows Service
I‘ve seen various developers at this point thinking of alternatives: Regularly trigger the web application to revive the appdomain… Hosting the WF in a console application… A winforms application… . Eventually I realized that quite a few developers just try to avoid the obvious solution: Windows Services (as in NT Service).
Why avoid them? Because Windows Services are weird. And demanding. They have to be installed and started via the service controll manager (SCM). You have to deal with service accounts and permissions. Windows Services are supposed to be multithreaded. They have no decent UI other than some logs. They have to be robust—and if you think your current application is robust, a Windows Service has to be robust^3. For example:
- A Windows Service may start at boot time and shut down 3 years later.
- A Windows Service may have to talk to a database (or other resource) that occasionally goes off line for maintenance reasons.
- A Windows Service may become what I call a zombie if the worker thread terminates but the service keeps running.
Cheer up, with .NET everything gets better. The .NET Framework devotes a whole namespace to Windows Services: System.ServiceProcess. The most important class is ServiceBase which acts as base class for your own Windows Service implementation; actually it’s a thin layer over the Service Control Handler Function. There are also classes for installation or controlling other services. So there already is some very welcome support available.
Additionally there are quite a few examples available on how to use these classes to host WF, usually in conjunction with WCF (if the WF is not hosted in your web application you need some means to talk to it, we’ll get to that, too). The most simplistic implementation would start and stop the WF engine in the respective SCM commands (i.e. ServiceBase.OnStart and ServiceBase.OnStop respectively). It really doesn‘t need more to implement a valid windows service hosting WF. Well, for the sake of it that may be true. But real world demands? Are these examples actually production ready? Do they fulfill the demands regarding operations, robustness, etc.? Not in my opinion.
Better Hosting in a Windows Service
ServiceBase has no special support for worker threads. No support for any runtime diagnosis (such as a ping or heartbeat). ServiceBase is actually prone to become a zombie, because an exception during any SCM command (such as pause) will be written to the event log but it will _not_ stop the service. However this cannot be called „robustness“ (if it were defined that way) because an exception in another thread will turn down the whole process, including all services contained in the same EXE and registered with the SCM.
Actually these are all fairly generic issues for any Windows Service and not dependant on the work the service actually does. Therefore its quite easy to come up with a reusable Windows Service framework on top of ServiceBase.
And here is the pattern to be implemented by such a framework (based on two real world projects):
- Restrict your EXE to one Windows Service. We want the service to go down if something bad happens; and dragging innocent services into death just because they happen to run in the same process won’t do.
- Don‘t do your work in ServiceBase.OnStart. Use it to start a separate thread, acting as watchdog. Notify that thread about pause, continue, stop, and the other SCM commands. If an exception is raised during an SCM command, again, kill the service (you can allways do that by starting another thread that raises an exception).
- The watchdog thread should start the engine and afterwards enter a loop. It should leave the loop if it receives the stop request, stopping the engine before it finishes.
- Within the loop the watchdog thread should regularly check whether the engine is still operating (hence the term “watchdog”). If the engine fails for some reason the watchdog thread may try to compensate (restart the WF engine). E.g. if the engine failed because the databases went off line, it may be an option to wait for a certain amount of time. If the database server was just rebooted the databases may come online in a few minutes. Anyway, if that is not possible, kill the service.
- The watchdog should also maintain a heartbeat, say trigger a performance counter that tells the operator the Windows Service is healthy — even if the system doesn‘t do anything worthwhile right now.
The reason for killing the service is simple: A Windows Service that stopped working is more obvious and will be noted far earlier than one that just wrote an event log entry and kept lingering around. Also a stopped service usually can be tracked by operations software such as NetView.
Now we have a service that fullfills basic demands of robustness. There is more to say about robustness, but before that we need a means to talk to the workflow, now that we cut it out of the web application. Next post.