Some time ago I promised some additional thoughts about how LINQ 2 SQL affects (web) application architectures. Well, here we go…
There are two defining characteristics of LINQ 2 SQL that may affect the architecture of a web application:
- The DataContext class, being “the main entry point for the LINQ to SQL framework.” (as the documentation states.)
- The way code generation works.
While the first point affects what features of LINQ 2 SQL we are able to use in web applications, the second one affects the application architecture.
Note: if you need to catch up with LINQ 2 SQL have a look at the my earlier post.
In this post I’ll focus on the DataContext class generated by LINQ 2 SQL.
The DataContext class is not only an anchor to access the tables, views, and stored procedures. It also offers more interesting features like change tracking, maintaining object identities, optimistic concurrency, and deferred loading. Consequently this is what most people I talked to attributed to LINQ 2 SQL: The possibility of reading some data, manipulating it, and call SubmitChanges, while LINQ 2 SQL handles all the gory details of mapping the data, efficiently talking to the database (including load on demand), dealing with concurrent changes, and so on.
This read/manipulate/submit approach is what the documentation describes as being “designed to last for one ‘unit of work’.” And in case the consequence escaped you: This this implies a statefull programming model. The very DataContext instance that was used to read the data has to be available until every potential change or deferred access has been settled.
This does generally not agree with web applications, as read/manipulate/submit regularly spans several requests: The data is read, presented to the user, and then the request dies, and in a stateless scenario the DataContext as well. The next request comes in, say an update request, but there is no DataContext available, carrying information about the data that was read before. In order to work with the laid out programming model, one would have to create a new DataContext, reread the data, and then apply the changes and submit them. With optimistic locking being the first victim, because for that the information from the previous request is mandatory. And delayed loading degenerates to “optional loading”, because either you need the data (no benefit in delaying then) or you don’t.
How about maintaining the DataContext instance across requests? No way either; since it is not serializable, you can’t put it into the session. (And that’s just a technical reason, whether one should do that even if it were possible is a different matter altogether.)
To make a long story short, let me rephrase and emphasize the first sentence:
For web applications the generated DataContext class is only an anchor to access the tables, views, and stored procedures.
BTW: Services are frequently bound to be stateless, too. Thus the same reasoning applies to them as well. Additionally they face another problem: They hand out the entities and get fresh ones from the outside world via serialization. Thereby they are removed from the DataContext, which is kind of unforgiving.
Less is More
Now we have concluded that the statefull approach doesn’t work. Reverting to a stateless employment of LINQ 2 SQL however leaves out a lot and raises the question whether there is something left at all. Well, there certainly is. We still have code generation for entity classes (based on tables and views) and for stored procedures and functions.
May web applications are really quite simple in that they are mainly “browsers over a (one) data store”. The complexity of those kind of applications lies more in sophisticated presentation rather than coding the business logic. In this scenario, and to really benefit from the productiveness of LINQ 2 SQL, I have had good experience using LINQ 2 SQL with just two simple tweaks:
- Limited number of result set types (i.e. the set of columns):
Rather than having my SELECTs (e.g. in stored procedures) produce different projections (i.e. tailored to the particular stored procedures task), I let them produce the same set of columns if they work on the same table. This way I will get a Customer entity and respective instances rather than a CustomerWithAddress, CustomerWithRating, CustomerWithName, CustomerWithAllInfo, … . And if I get that customer, I can always be sure it’s not some half filled entity.
- Explicit result set types:
It’s quite common to work on JOINed data. LINQ 2 SQL handles this quite well and will figure out the entity type even for stored procedures as “auto-generated type”. In order to make this type explicit in the database I would define a view containing the JOIN and have the stored procedure work against this view. And of course I use the view to generate a respective entity with LINQ 2 SQL.
- I also refrain from having stored procedures return more than one result set or varying result set types depending on some condition. (I don’t consider this to be a tweak, because I would do that anyway.)
Of course the database may need some tuning, as some of these tweaks may hurt the performance. But optimization when optimization is due… . And while I like the stored procedure approach, those tweaks will also work with free statements in code.
These tweaks are not meant as rules but as guidelines. Feel free to break them at leisure ;-) , their intention is merrily to support the LINQ 2 SQL code generation. However… I experienced that the define-your-entities-and-the-operations-on-top approach caused more OO-like thinking than the traditional define-your-operations-and-let-them-return-the-result-at-hand approach.
So, after dragging and dropping the tables, views, and stored procedures LINQ 2 SQL will happily generated the respective classes and methods. Getting type safe data and DB access was never easier.
If you are in the “use CRUD statements to access the database” camp, the usual round trip for an update would be: read the data that is to be changed, change it, submit changes. Like this:
using (MyDataContext dc = new MyDataContext())
var customer = (from c in dc.Customer where c.ID == id select c).First();
customer.Name = “new name”;
Compared to ADO.NET this still eases db access a lot.
If you follow the “use stored procedures to access the database” approach it gets even simpler:
using (MyDataContext dc = new MyDataContext())
This is kind of “anti-sophisticated” but quite efficient. And it allows to do concurrency in stored procedures if you maintain the version field on the client and provide it during the update call. I found this to be much more concise and better suited to the way web applications work.
One word of warning (or rather a disclaimer if you like): This works for me. And it works for a certain kind of web applications. And it only works because I’m willing to trade some academic architectural demands for the KISS principle, productivity, and maintainability. So please don’t follow this advice blindly.
Oh, by the way: I used this approach in a project using an Oracle database as well. Writing a simple code generator that read the system tables and generated wrappers around stored procedures was no rocket science, the only caveat was dealing with REF CURSORs.
The next post will take a closer look at LINQ 2 SQL code generation and how it affects the architecture.