AJ's blog

October 19, 2007

Talking to the Windows Service

Filed under: .NET, .NET Framework, SOA, Software Architecture, Software Development, WCF — ajdotnet @ 9:15 pm

Note: This is part of a series:
1. Talking WF (introduction)
2. Workflow Instance State Management
3. Workflow Communication and Workflow Communication Clarification
4. Hosting WF
more to come (error management).

The last post left us with the web application residing in IIS and WF hosted in a Windows Service. Separated like a divorced couple. Now we need to setup some communication between them. The means of choice? WCF of course. (There is a choice?)

I’m not going into too much details about WCF here as this series is devoted to WF and the documentation for WCF is considerably broader than the WF related information. For hosting of WCF in a Windows Service have a look at How to: Host a WCF Service in a Managed Windows Service. For a more elaborate example including WF Integrating Windows Workflow Foundation and Windows Communication Foundation will show you how to host the WF engine as service extension (gee, the service pattern again).

‘S’ervice or ‘s’ervice?

Let’s have a look at the architecture: What will become of our Windows Service if I attach a WCF interface to it? Will it become a Service in the SOA sense (capital ‘S’!)? Do we therefore need to set up HTTP bindings, WS-Security (my vacation is certainly important enough to be protected!)? Are we going to advertise the Service so that a customer can plug into it? Probably not. None of it. From a logical point of view the WF part is still part of the business logic of our application as the reason to separate it was purely technical. It will therefore become an internal service (see WebService? What do you mean, „WebService“?), lowercase ‘s’!.

WCF actually supports internal services quite well with the named pipe binding. It’s is a binary, efficient binding and it is restricted to one machine (the binding, not named pipes in general of course). This might rule out distributing the two parts on different machines, but seriously, we don‘t have that many employees (yet).
Windows actually also „supports“ that choice by making it very hard to choose HTTP instead. (The “very” stems from the fact that it’s hard to diagnose the reason for the problem. There is solution, though not a pretty one, all in the name of security.)

The fact that we factored the WF engine out of our web application was not intended to change the logical architecture more than necessary, i.e. the workflow is still part of our business logic:

arch_logical

However the physical architecture is somewhat different:

arch_physical

The danger here lies in misinterpreting the physical architecture as logical architecture. (It’s actually debateable whether the call to the Windows Service originates from the DAL or the BL. Don’t sweat it, it’s of no consequence.)

‘S’ervice?

The demand of our application is fulfilled, but let‘s think a bit more about the Service notion. What if I actually wanted a Service in the SOA sense, one that is publicly available?

In this case I would still propose the very same approach for the Windows Service and an accompanying web application. The web application would consist of a publicly available Service (WCF hosted in IIS, e.g. a Web Service), part of the presentation layer, that handled the communication with the caller (validation, security, versioning, ….). It would implement the service interface that was developed during the contracting phase. Internally it would talk, again using WCF, to the Windows Service, just like above. For a prototype SOA ecosystem based on this idea (and being a little more fleshed out) have a look at Security analysis on WCF for the German Gouvernment.

For a series about WF this post was kind of a diversion. Yet I think it is part of the architecture implied by the employment of WF, therefore I kept it in. The next post will look into robust processing, as promised in a previous post.

Quick! How many different notions of the word ‘Service’ did you count in this post? 😉

That’s all for now folks,
AJ.NET

kick it on DotNetKicks.com

October 14, 2007

Hosting WF

Filed under: .NET, .NET Framework, C#, Software Architecture, Software Development, WF — ajdotnet @ 2:37 pm

Note: This is part of a series:
1. Talking WF (introduction)
2. Workflow Instance State Management
3. Workflow Communication and Workflow Communication Clarification
more to come (error management).

Hosting WF is easy. Real Easy. You can host it anywhere you want. Console applications, ASP.NET applications, windows services, shopping bags, closets, … . Well. It turns out that most available samples are too simplistic for my taste. Hosting for the sake of it may be easy, hosting in real world scenarios has its pitfalls.

Hosting in IIS

Given that our example application is a web application, the first attempt of hosting WF might be somewhere within the web application itself. Actually this is quite simple and covers all but one demand. Due to the IIS architecture it is even a very robust solution, one that can easily be adorned with an IIS hosted WCF service interface, again backed by IIS, namely the security features. Just great. Apart from the „but one demand“: IIS hosted code relies on a request! Not a “current” request, just any request that keeps the appdomain running. Should the appdomain shut down and go to sleep for the rest of the night, there is no running WF engine. What if a workflow is supposed to timeout somewhen during the night? What if it is supposed to regularly poll a directory or database table? Nothing of this happens until the next request comes in, wakes the appdomain and eventually starts the WF engine. If you can live with the time lag, go for IIS. If not, IIS is out of question.

And of course our example application cannot live with that time lag. There regularly won‘t be a vacation request for days, sometimes weeks. No timeouts and reminder mails? No way!

Hosting in a Windows Service

I‘ve seen various developers at this point thinking of alternatives: Regularly trigger the web application to revive the appdomain… Hosting the WF in a console application… A winforms application… . Eventually I realized that quite a few developers just try to avoid the obvious solution: Windows Services (as in NT Service).

Why avoid them? Because Windows Services are weird. And demanding. They have to be installed and started via the service controll manager (SCM). You have to deal with service accounts and permissions. Windows Services are supposed to be multithreaded. They have no decent UI other than some logs. They have to be robust—and if you think your current application is robust, a Windows Service has to be robust^3. For example:

  • A Windows Service may start at boot time and shut down 3 years later.
  • A Windows Service may have to talk to a database (or other resource) that occasionally goes off line for maintenance reasons.
  • A Windows Service may become what I call a zombie if the worker thread terminates but the service keeps running.

Cheer up, with .NET everything gets better. The .NET Framework devotes a whole namespace to Windows Services: System.ServiceProcess. The most important class is ServiceBase which acts as base class for your own Windows Service implementation; actually it’s a thin layer over the Service Control Handler Function. There are also classes for installation or controlling other services. So there already is some very welcome support available.

Additionally there are quite a few examples available on how to use these classes to host WF, usually in conjunction with WCF (if the WF is not hosted in your web application you need some means to talk to it, we’ll get to that, too). The most simplistic implementation would start and stop the WF engine in the respective SCM commands (i.e. ServiceBase.OnStart and ServiceBase.OnStop respectively). It really doesn‘t need more to implement a valid windows service hosting WF. Well, for the sake of it that may be true. But real world demands? Are these examples actually production ready? Do they fulfill the demands regarding operations, robustness, etc.? Not in my opinion.

Better Hosting in a Windows Service

ServiceBase has no special support for worker threads. No support for any runtime diagnosis (such as a ping or heartbeat). ServiceBase is actually prone to become a zombie, because an exception during any SCM command (such as pause) will be written to the event log but it will _not_ stop the service. However this cannot be called „robustness“ (if it were defined that way) because an exception in another thread will turn down the whole process, including all services contained in the same EXE and registered with the SCM.
Actually these are all fairly generic issues for any Windows Service and not dependant on the work the service actually does. Therefore its quite easy to come up with a reusable Windows Service framework on top of ServiceBase.

And here is the pattern to be implemented by such a framework (based on two real world projects):

  • Restrict your EXE to one Windows Service. We want the service to go down if something bad happens; and dragging innocent services into death just because they happen to run in the same process won’t do.
  • Don‘t do your work in ServiceBase.OnStart. Use it to start a separate thread, acting as watchdog. Notify that thread about pause, continue, stop, and the other SCM commands. If an exception is raised during an SCM command, again, kill the service (you can allways do that by starting another thread that raises an exception).
  • The watchdog thread should start the engine and afterwards enter a loop. It should leave the loop if it receives the stop request, stopping the engine before it finishes.
  • Within the loop the watchdog thread should regularly check whether the engine is still operating (hence the term “watchdog”). If the engine fails for some reason the watchdog thread may try to compensate (restart the WF engine). E.g. if the engine failed because the databases went off line, it may be an option to wait for a certain amount of time. If the database server was just rebooted the databases may come online in a few minutes. Anyway, if that is not possible, kill the service.
  • The watchdog should also maintain a heartbeat, say trigger a performance counter that tells the operator the Windows Service is healthy — even if the system doesn‘t do anything worthwhile right now.

The reason for killing the service is simple: A Windows Service that stopped working is more obvious and will be noted far earlier than one that just wrote an event log entry and kept lingering around. Also a stopped service usually can be tracked by operations software such as NetView.

Now we have a service that fullfills basic demands of robustness. There is more to say about robustness, but before that we need a means to talk to the workflow, now that we cut it out of the web application. Next post.

That’s all for now folks,
AJ.NET

kick it on DotNetKicks.com

October 6, 2007

Workflow Communication Clarification

Filed under: .NET, .NET Framework, ASP.NET, C#, Software Architecture, Software Development, WF — ajdotnet @ 2:47 pm

My last post may need a clarification…

Tim raised the question whether it is such a good idea to synchronize the communication between web application and workflow instance. To make it short: Generally it isn’t! Period. It hurts scalability and it may block the web request being processed. Therfore from a technical perspective it usually is preferable to avoid it. Ways around this include:

  • Redirect the user to a different page telling him that his request has been placed and that it may take a while until it is processed.
  • Ignore the fact that the current page may show an outdated workflow state, hope for the best and take means that another superfluous request won’t cause any problems.
  • ….

However sometimes you cannot avoid having to support a use case like the one I presented. But even then — if the circumstances demand it — it might be possible to avoid blocking on the workflow communication. E.g. the web application could write a “processing requested…” state into the worflow instance state table (contrary to what I wrote in an earlier post, but that was just one pattern, not a rule either). The actual call could be made asynchronously, even be delayed somehow.

If you still think about following the pattern I layed out, make sure the processing within the workflow instance is fast enough to be feasibly made synchronously. E.g.:

sequence_1

As you can see, synchronization encapsulates the whole processing.

What if the processing takes longer? Well, the pattern still holds if you introduce an “I’m currently processing…” state. The callback won’t tell you the work has been done any longer, but it will still tell you that the demand has been placed and accepted by the workflow instance.

sequence_2

In this case synchronization encapsulates only the demand being accepted, not the processing itself. However it still serves the original purpose, which was telling the user that his request has been placed.

What if you need to synchronize with the end of a longer running task? In that case this pattern is not the way to go. The user clicking on a button and the http post not returning for a minute? This definitely calls for another pattern.

I hope this has made things clearer.

That’s all for now folks,
AJ.NET

kick it on DotNetKicks.com

October 2, 2007

Workflow Communication

Filed under: .NET, .NET Framework, C#, Software Architecture, Software Development, WF — ajdotnet @ 8:36 pm

Note: This is part of a series:
1. Talking WF (introduction)
2. Workflow Instance State Management
more to come (hosting, error management).

The last post brought up the following topic:

“I clicked a button and somehow the workflow took over and did something. As you know, this happens asynchronously, nevertheless I saw the correct state after the processing instantly. To make that work, the web application has to have a means to know when the state has changed. Workflow communication.”

So, how do you talk to a workflow instance? How do you know it processed your request? How does it talk back?

Please note: Talking to a workflow is a topic that is actually well documented. Yet it still has its pitfalls and it takes a little getting used to. Therefore I’ll recap the pattern, trying to explain it along the line. And I will assume a certain use case (the synchronization issue) because it showed up regularly in our use cases.

If you are new to WF you’ll have to master two or three things to talk to a workflow: The service pattern, a silly interface pattern, and some threading issues.

The service pattern

If you read an article about WF, the service pattern is most likely not introduced properly; most articles simply say “do this, and you’ll accomplish that”. The service pattern has been widely used for visual studio design time support but that’s the only usage I am aware of. Until WinFx came around, that is. Firstly WF uses the service pattern for runtime services such as persistence. In this case the developer only has to announce existing service implementations. Secondly it also employes services for data exchange — in which case the developer has to deal with the service pattern in all its beauty.

For the record: The service pattern usually defines a service as an interface. It decouples the consumer of a service and the provider. A certain service may be optional (such as the tracking service) or mandatory (such as the scheduler service). If the consumer needs a particular service, it’ll ask the provider. Usually providers form a chain, so if the first provider does not know the service, it’ll ask the next one.

And no, it’s got nothing to do with web services…

You might want to read Create And Host Custom Designers With The .NET Framework 2.0 Chapter “Extensibility with Services” for a slightly more elaborate description.

For data exchange the developer has to define an interface (the service contract) and a class that implements the interface (the service). During startup he creates an instance of the service and registers it with the data exchange service (a service provider in itself). If you need it in an activity to do something worthwhile you ask the ActivityExecutionContext (the service provider at hand, see Activity.Execute) for the service. Then you work with it.

Some activities already provide design time support for services and the activities for data exchange are among them (HandleExternalEventActivity and CallExternalMethodActivity):

activity2  prop2

activity1   prop1

Please note that many of the predefined activities offer other anchors to attach your own code, events and the like, that do not provide easy access to the ActivityExecutionContext, thus no service provider is readily available.

A silly interface pattern

When I first wanted to talk to a workflow, I knew I need an interface. To call into the workflow I expected a method; to get notified from the workflow I expected an event. Of course I expected to call and be called on my own thread. Guess what… .

Well, my attitude actually was “I – the code of the host application – create the workflow, I’m in charge, I define the point of view.” The reality of course is “I – the workflow – do not care what the host application code thinks who he is. I do the work, I’m in charge, I define the point of view.” This turns my initial expectations upside down. In other words: A call into the workflow instance notifies the instance of some external demand, thus it is made via an event. If the workflow instance has to say something it straightly calls a method. And it does not care what thread is involved.

[ExternalDataExchange]
interface IVactionRequestWorkflow
{
    // call into workflow instance
    event EventHandler<ConfirmationEventArgs> ConfirmVacationRequest;
    
    // callback from workflow instance
    void ConfirmationProcessed(bool confirmed);
}

Silly, isn’t it?

Some threading issues

communication1To elaborate more on the threading issue: We know that WF is inherently multithreaded. When you call into a workflow instance (pardon, when you signal it), the WF engine (more precisely the data exchange service) will catch the event on your thread, do some context switch magic, and signal the workflow instance on its thread. (It might have to load the workflow instance, which is just another reason for the decoupling.) If the workflow instance on the other hand has some information it calls your code on its own thread — the WF engine can hardly hijack your thread for some automatic context switch. Consequence: You signal the workflow instance and the call (more or less) immediately returns. But you don’t know when the workflow will actually get signaled, much less when it will have done the respective processing. You may issue a call from the workflow instance but you will have to care for the context switch and the proper reaction yourself.

This is pretty good for performance and scalability. But if you need synchronous behavior (as in our example) it hurts a bit.

To make a long story short, here’s the pattern we came about with:

  • communication2The interface contains a CallIntoWorkflowInstance event and a respective CallIntoWorkflowInstanceAcknowledge callback method (ConfirmVacationRequest and ConfirmationProcessed in the interface above)
  • The workflow instance accepts the CallIntoWorkflowInstance event, does some work (supposed to be synchronous) and calls CallIntoWorkflowInstanceAcknowledge to tell you it finished its part.
  • The service implements the event (along with a SignalCallIntoWorkflowInstance helper method) and the method. It also has an AutoResetEvent for the synchronization.
  • The helper method (called from the web application thread) raises the event and afterwards waits for the AutoResetEvent.
  • The callback method (called from the workflow instance thread) simply signals the AutoResetEvent. This will cause the helper method on the other thread to be unblocked and to return.

The service implementing the above interface consequently looks like this:

class VacationRequestWorkflowService : IVactionRequestWorkflow
{
    AutoResetEvent _confirmationProcessed = new AutoResetEvent(false);
    
    public event EventHandler<ConfirmationEventArgs> ConfirmVacationRequest;
    
    public void OnConfirmVacationRequest(ConfirmationEventArgs ea)
    {
        if (ConfirmVacationRequest != null)
            ConfirmVacationRequest(null, ea);
    }
    
    public void ConfirmationProcessed(bool confirmed)
    {
        _confirmationProcessed.Set();
    }    public void SynchronousConfirmVacationRequest(WorkflowInstance instance, bool confirmed)
    {
        // send event
        OnConfirmVacationRequest(new ConfirmationEventArgs(instance.InstanceId, confirmed));
        // wait for callback
        _confirmationProcessed.WaitOne();
    }
}

Now we have synchronized the workflow instance with out calling application. Of course we rely on the callback or we would wait endlessly, but that’s another story. 

The next post will take a closer look at hosting.

That’s all for now folks,
AJ.NET

kick it on DotNetKicks.com

Blog at WordPress.com.