This post concludes my little workflow series…
1. Talking WF (introduction)
2. Workflow Instance State Management
3. Workflow Communication and Workflow Communication Clarification
4. Hosting WF
5. Talking to the Windows Service
6. Robust Operations
7. Error handling is Error Management
Remember where we started? WF. And what did I mostly talk about? Asynchronous data updates. Threading issues. Robust Windows Service implementation. Asynchronous error handling. Patterns. Practices. Guidelines.
And what did I rarely talk about? Shapes. Activities. Workflow specifics.
Remember what I wrote in the intro post: „And also very similar in the demand for knowledge of things that are not WF specific but are far from common knowledge for the average developer…“
Why are these things “far from common knowledge”, especially if they are not WF specific?
In my opinion this is because WF did not only introduce workflows. It also introduced asynchronous behavior and reliability demands far more forcibly than any other technology before.
- We had threading before – but only the odd developer actually embraced it.
- We had Windows Services before – but only rarely were they employed.
- We had human workflow and state before – but it was largely hand grown and synchronous state machines.
- We had demanding applications that could not live with these simplistic notions – but these called for specialized server software like BizTalk anyway.
With WF any developer might have to face all these new demands at once. It’s not WF in itself that is complex, in fact I can hardly imagine a workflow engine more easy to use than WF. It is the architectural consequences, the need for until then somewhat exotic concepts, the complicated asynchronous processing patterns. And the need to master all these demands at once.
Truth to be told…. ?
I once worked in a project that had quite amazing characteristics: 6 mio frontend transactions, processed to eventually enter the balance sheet, being subject to GAAP and quite a set of other legal compliance demands. Of course the software was built using BizTalk, not WF. We did then much of what I told you here. We had no choice and we had the budget.
Vacations@SDX does not even handle 1 vacation requests per day on the average. Does Vacations@SDX adhere to all the guidelines? Of course it does not. No one would have paid for that amount of fault tolerance just for one single workflow. We had to stay on budget and to meet a deadline; and making it foolproof simply was not feasible. (And the hosting part was a learning experience anyway.)
The reality is: What I presented here is in certain parts the 120% solution. (I am a friend of delivering 80% and waiting which of the missing 20% parts cause the most trouble. And 120% is simply 20% waste in any case.) But since this project was meant to have reference character we designed for the 120%. And with changing demands, new versions, or other applications built on the same principles, we may evolve the framework and the patterns. Gradually and where it hurts most. And in one respect we have accomplished more than with a simple “coding” experience: We have the architectural patterns (even if not fleshed out in toto) and we have the Windows Service Framework implementation.
The pragmatic point of view for you is: Decide for yourself which parts do hurt you. If you leave out certain aspects, do it knowingly. And I hope I could present some patterns that will help you addressing the aspects you can’t leave out.
Anyway, this concludes that little series. It’s been a number of posts, but believe me, this is only where it begins. On the missing list are testing of workflows, workflow design (including choosing between sequence and state driven workflows), and versioning, among others. Anyway, I wanted to talk about those areas that I came to realize caused the most problems for the people involved in the projects. I hope to have provided some useful hints, even if I got carried away sometimes
.
PS: I know, I promised another post about the replay pattern. But given my current workload and other topics in my blog queue, I decided I should close this series this year. I haven’t forgotten it and if want to prioritize it, drop a respective comment.
I wish you a peaceful Christmas and a happy new year.
That’s all for now folks,
AJ.NET


To elaborate more on the threading issue: We know that WF is inherently multithreaded. When you call into a workflow instance (pardon, when you signal it), the WF engine (more precisely the data exchange service) will catch the event on your thread, do some context switch magic, and signal the workflow instance on its thread. (It might have to load the workflow instance, which is just another reason for the decoupling.) If the workflow instance on the other hand has some information it calls your code on its own thread — the WF engine can hardly hijack your thread for some automatic context switch. Consequence: You signal the workflow instance and the call (more or less) immediately returns. But you don’t know when the workflow will actually get signaled, much less when it will have done the respective processing. You may issue a call from the workflow instance but you will have to care for the context switch and the proper reaction yourself.
The interface contains a CallIntoWorkflowInstance event and a respective CallIntoWorkflowInstanceAcknowledge callback method (ConfirmVacationRequest and ConfirmationProcessed in the interface above)
All well and good. But I go for number 2. For two reasons: Number 1 would imply more communication and synchronization between workflow instance and web application (we‘ll look into that). And the main reason: If my workflow is long running, it might change its state (say to „timeout because no one cared“) at a time when no user is online and the web application is fast asleep. With number 1 some other piece of code would have to do what the web application already does.