I Work On Software

Wednesday, June 23, 2010

Cascading Deletes in EF4

Something I was curious about in Entity Framework was how to delete a whole graph of entities by deleting its root. In the DB world, doing this automatically is known as "cascading deletes", and in SQL Server it's something you can specify on a foreign key relationship. When you specify the cascade delete action, when the "parent" row is deleted from the database, dependent rows are silently deleted as well. This avoids running into foreign key constraints when you are deleting rows, but it can also result in deleting more data than you thought you were, so cascading deletes should be used carefully.

Anyways, I was curious as to how this idea was represented in EF, so I fired up a little demo app to try a few things out. After some experimentation, and consulting this excellent blog post, I think I've got it figured out:

Specifying cascading deletes on the database is entirely separate from specifying them on the client. By far the safest way to do things if you want cascading deletes is to specify them on the database. If you don't, some odd things can happen (keep reading).
To specify cascading deletes on the client, select the relationship in your model. In the properties window, for the end that represents the "1" multiplicity, set OnDelete to Cascade. This tells EF to issue delete requests for dependent entities that have been loaded into memory before deleting the parent.
Here's where things get interesting: Say I have a Student entity in memory, and the FK_StudentGrade_Student relationship is set up for cascading deletes on the client, but not on the DB. If I mark the entity for deletion and SaveChanges(), what happens? EF will issue a delete command for the Student after issuing delete requests for all StudentGrades that are loaded. EF does not retrieve all the dependent entities prior to deletion to learn about them so it can issue delete requests for them. If you don't have all the dependent entities loaded, you'll get an FK violation when EF tries to delete the parent.
What if you have cascade set up on both the DB and the client? EF will still issue delete requests for the dependents it knows about, which is redundant but harmless. If some dependents weren't loaded, the cascade rule on the DB will take care of them.
What if you don't have cascade set up on the client? It depends on whether the parent entity you are trying to delete has any dependents loaded. If it does, EF will stop you with an exception before it even issues the delete request to the DB, because it sees that you are violating a rule of the conceptual model. If the parent doesn't have any dependents loaded, EF will go ahead and issue the delete request. From there, the behavior is determined by whether or not any dependent entities exist in the DB and whether or not you have cascading deletes specified on the server. If the answers are "yes" and "no," respectively, you'll get an FK violation.

Highlights of the Entity Framework's "Working With Objects" Documentation

I'm trying to get my head around Entity Framework 4 and the official MSDN documentation is pretty helpful in explaining the high level concepts. In particular, the "Querying a Conceptual Model" and "Working With Objects" sections are very useful when it comes to learning about how to actually get code that uses EF4 up and running.

I read through "Working With Objects" last night and this is a quick bullet-list of the highlights.

Defining and Managing Relationships

If you include foreign key columns in the model, you can create/change relationships by manipulating the foreign key, or by manipulating the navigation properties. This is called a foreign key association.
If you don't include foreign key columns, relationships are managed as independent objects in memory. This is called an independent association. With these, the only way to change/create relationships is by manipulating the navigation properties.
When a primary key of an entity is part of the primary key of a dependent entity, the relationship is an identifying relationship - the dependent entity cannot exist without the principal entity. In this case, deleting the primary entity also deletes the dependent entity, as does removing the relationship.
In a non-identifying relationship, if the model is based on foreign key associations, deleting the principal objects will set foreign keys of dependents to null if they are nullable. If they are not, you must manually delete the dependent objects or assign a new parent, or you can specify in the model to automatically delete the dependent entities.

Creating, Adding, Modifying, and Deleting Objects

The CreateObjectName static method on an entity type is created by the EDM tools when the types are generated. The parameters for this object include the columns that cannot be null, and nothing else.
If the primary key column is exposed and not nullable (and thus exposed in the signature to CreateObjectName), you can just assign the default value, since the ID used prior to SaveChanges is always temporary.
When using POCOs, use CreateObject instead of new.

Attaching and Detaching Objects

Detaching objects can help keep the memory requirements of the object context in check. If you execute a query with MergeOption.NoTracking, the returned entities are never attached.
Consider creating a new ObjectContext if the scope of data has changed (like if you are displaying a new form) instead of detaching objects from ObjectContext.

Identity Resolution, State Management, and Change Tracking

If you are working with POCOs without change-tracking proxies, you must call DetectChanges so the EF can figure out what changes have been made.
To find out if the value of a property has changed between calls to SaveChanges, query the collection of changed property names returned by the GetModifiedProperties method.

Saving Changes and Managing Concurrency

By default, EF uses "optimistic concurrency," meaning that locks are not held while your app has data from the data source, and object changes are saved to the database without checking for concurrent modifications. If an entity type has a high degree of concurrency, you can set ConcurrencyMode to Fixed, which will cause EF to check for changes in the DB before saving. Conflicting changes will throw an OptimisticConcurrencyException.
You can handle OptimisticConcurrencyExceptions by calling Refresh with your RefreshMode of choice.
In high concurrency scenarios, call Refresh frequently. The RefreshMode controls how changes are propagated: StoreWins causes EF to overwrite all data in the cache with DB values. ClientWins replaces original values only in the cache with DB values.
Call Refresh after calling SaveChanges if your updates may modify data that belongs to other objects. For example, if you have a trigger that fires when you update a row in a table, calling Refresh after saving changes to that table is a good idea.

Binding Objects to Controls

(From http://social.msdn.microsoft.com/Forums/en/adonetefx/thread/c76d72c0-951c-4033-b75f-fc84f735826e): EntityCollection represents a collection of entities related to another entity (a many relationship). ObjectSet is not actually a collection type: it provides access to entities that belong to an entityset in the DB. LINQ on an EntityCollection is LINQ to Objects. LINQ on an ObjectSet uses LINQ to Entities (results in DB query).
To bind entities directly to a control, set the DataSource property of a control to an EntityCollection or to the ObjectResult you get when you call Execute on an ObjectSet or ObjectQuery. In WPF, you can set a DataContext to an EntityCollection or ObjectResult.
Don't bind directly to an ObjectQuery, bind to the result of Execute.
When working with LINQ, cast the result of your query to an ObjectQuery and you can call Execute on it.
To refresh the bound control, call Execute on the query again. This will bind the control to a new ObjectResult.
Calling OfType on an EntityCollection returns an IEnumerable, which cannot be bound to a control. Instead of OfType, use CreateSourceQuery to get the ObjectQuery that defines the base EntityCollection. You can then bind a control to the execution of the ObjectQuery that is returned by OfType on the ObjectQuery.

Serializing Objects

When serializing entities, if lazy loading is not disabled, it will be triggered, possibly causing your object graph to become very large.
Binary and WCF datacontract serialization serialize related objects. XML serialization does not.

Friday, May 28, 2010

Silverlight, WCF, Deserialization and Public Setters

I've been working recently on doing "WCF by hand," meaning putting ServiceContracts and DataContracts in separate assemblies, sharing those assemblies between server and client projects, and using ChannelFactory or manually writing ClientBase classes to access the service instead of dealing with all the cruft that Add Service Reference pushes on you (post forthcoming!).

I've been working on doing this in Silverlight by adding all of my DataContract class files as links into a Silverlight project, essentially recompiling the DataContracts in Silverlight so I can share them with a Silverlight app. The first thing I learned is that it may be worthwhile for you to do this with DataContracts, but don't bother with your service interface - you really need to use slsvcutil or Visual Studio's Add Service Reference to generate Silverlight proxy clients, because Silverlight mandates the use of specific asynchronous patterns implemented in a specific way. However, with the default settings, Visual Studio will reuse DataContracts in referenced assemblies, meaning that your business objects won't end up as generated code, and you can keep all your logic and calculated properties.

The next and most interesting thing I learned was regarding the presence of public setters on DataMembers. One of my data contracts has a private setter - this contract and its service are working fine with a standard console application client, but as soon as I access the service in Silverlight, things blow up. Check this out:

System.Security.SecurityException: The data contract type 'MyContract' cannot be deserialized because the property 'MyPrivateSetterProperty' does not have a public setter. Adding a public setter will fix this error. Alternatively, you can make it internal, and use the InternalsVisibleToAttribute attribute on your assembly in order to enable serialization of internal members - see documentation for more details. Be aware that doing so has certain security implications.

Whoa. What's the deal?

Quite simply, it's important to never forget that Silverlight code is only partially trusted. Without full trust, System.Runtime.Serialization can't instantiate a raw object and populate it's properties, it needs to use the same method that developers do from user code; namely, a public setter.

As the exception text states, the alternative is to make System.Runtime.Serialization a friend assembly of your data contract assembly via InternalsVisibleTo, and make the setter internal instead of private. Apparently, if you are using the JSON serializer, you will also need to friend System.ServiceModel.Web and System.Runtime.Serialization.Json as well.

As an aside, kudos to the teams that take the time to add exception text like this. Seriously, there's not much better than getting an error with text like, "Try approaches X, Y or Z to fix this, and check out the documentation too."

Sources:
MSDN Forums
Silverlight Serialization - avoiding having public setters in properties

Sunday, May 23, 2010

A Rant on Reuse

From a design and development perspective, code reuse is beautiful. Really beautiful. Code that's reusable is, by definition, highly decoupled, well-encapsulated and well organized, with a bevy of options that are available via simple configuration. It checks the "Don't Repeat Yourself" checkbox (and the "Three Tenets of Object-Oriented Programming" checkboxes, if you swing that way) so hard that the pen pierces the paper. Reusing that code is like the fulfillment of a prophecy. It feels good. The elegance of it all is so alluring that I think every developer sits down once in a while to try to solve their current problem with a glorious, reusable library. It's like writing the Great American Novel for geeks.

From a management point of view, reuse is obviously a no-brainer. When you view development as manufacturing (fast forward to 20:30), reuse looks a lot like replacing a human assembly workflow in the production of your widget with a mechanized one - you gain speed and reduce variability and error rate. The return on investment is clearly fantastic.

I have a vision in my head of a meeting taking place between IT professionals and upper-middle management at a technology-related company sometime in the 70's, maybe the 80's, as they are about to break ground on a new application.

Exec: "... Alright then, let's design and build it. Get to work."
IT Pro: "You know, if we are careful to keep these functions separate and generic, I think we could reuse them for other purposes later.
Exec: "You can... reuse them? Like, take code from one application and plug it into another? For free?"
IT Pro: "Absolutely. Like your golf clubs - you don't have a separate set of clubs for the 5 private courses you belong to, you take the same set of clubs to each one. A set of clubs is designed to be flexible and accommodating."
Exec: "Well this is a great idea. We need more of this. This is the norm from now on - I expect to see you start reusing everything."

Management was excited because their scorecard numbers were going to go up. The developers were excited because they got a mandate to build beautiful things. And thus was born the fallacy that internal code reuse is free, easy, and something we should be doing all the time.

As a learning exercise, I took one chunk of functionality from one of our applications I'm working on and set about making a crisp, clean, fully reusable WCF service out of it. I wanted to see the result from a purely technical standpoint - I didn't really care about the potential return on investment, I just wanted to see if I could work through the little fiddly things that make a service ugly and tightly coupled unless you clean them up.

From a technology point of view, I'm happy with the result and learned a lot about making a great service. It took a while, even though it only provides a very discrete bit of functionality, but it's shiny and beautiful. It's even got documentation. I can use what I learned in the future, even on services that I don't intend to be reusable, to make them clean and easy to use.

The real learning, though, took place after I finished, while I was reflecting on my sparkling work and started to ask myself what I now realize are the most important questions.

Who is ever going to need this functionality?
If they need it, how likely is it that they're going to find out about this code I wrote, even if they're in my same organization?
If they find out about it, how likely is it that they're going to want to spend the time reading the documentation to figure out how to use it? (From a more general perspective, it's a stretch to assume that there's documentation in the first place).
If they read the docs, how likely is it that they're still going to want to use it when they realize it's going to need additional features to support their needs?
If they still want to use it, how likely is it that they're actually going to jump through the organizational hoops to get access to the code, do development on my code to add features (or get me to do it) in a way that keeps it reusable (difficult and time-consuming), and ensure that the new version can be deployed without messing up the application that currently uses it?

All that for one small piece of functionality. That funnel is pretty narrow at the top and tapers to the width of a hair at the end. Reuse is supposed to be a best practice; it's supposed to make your arsenal of applications clean and organized, and reduce the amount of work you need to do. Why does reuse all of a sudden look so difficult and expensive in light of these questions?

The answer: Internal reuse trades one kind of technical work for another that's just as difficult if not more, and adds extra organizational work on top. The new work that you've bought yourself is the most nefarious kind: the kind that looks like it's free. The kind that ends up as estimate line-items with .5 hours because they have to be in the project plan, but they won't result in new deliverables, so they must require no effort. Even worse, this cost isn't only paid the first time a component is reused - it's paid every time it's reused. At least when it comes to reuse of internal code (as opposed to third party controls and frameworks), I believe that Not Invented Here syndrome is less of an unwillingness to adopt work from another culture and more of a subconscious rejection of this new work that we know we are generating, but can't quite put our fingers on.

The technical work being traded out is new development. Everyone knows that the best code is no code, or perhaps more appropriately in this case, code written by someone else who must be smarter than the herd of cats that is your current development team, so wiping new development off of the project plan looks great.

There's no such thing as a free lunch, though. First, let's look at the code we're thinking about reusing. To be worth reusing, code has to solve just about every aspect of a very distinct and common problem from just about every angle, be well encapsulated, be distributed in a way that supports reuse, and above all, it must be hard to create. Really hard. If you want to benefit from its reuse, solving the new technical and organizational problems that reuse introduces must be easier than initially creating it.

The only way code can hit all those criteria is if it's developed away from a project that's going to be using it. It has to be its own effort, and that effort has to involve looking at (and testing) lots of different scenarios and use cases, not just the one that your new app needs and that you think some other apps are going to need later. It doesn't have to start off that way, but it has to end up that way before it's truly reusable. Reusable components aren't parts of projects, they are projects. Unfortunately, developing a component in isolation that doesn't solve a full business problem, but might solve a part of multiple common problems, almost never looks like a good short-term investment, and thus almost never happens.

Now, without knowing yet if this old code you're looking at can really solve your problem, your team, Team ABC, now has the job of understanding and using someone else's software, arguably the second-most reviled task in development. To misquote jwz, "Some people, when confronted with a problem, think 'I know, I'll reuse some code we already have.' Now they have two problems." Unlike a lot of other industries, in software development, understanding and leveraging someone else's work is often actually harder than producing your own.

So, flush some design hours down the drain, and what's the result? The ABC designers come back and say, "This almost solves our problem. It needs features X, Y and Z." Now they've got to add features onto existing code someone else wrote, arguably the first-most reviled task in development, and certainly one of the hardest. This is where the organizational problems, which began well before your project was set in motion, start to clearly manifest themselves.

The code that ABC wants to add features to and reuse was written by Team 123. Team 123 isn't a permanent team; it was assembled for a project two years ago. Three of its members have moved on to other organizations. No one knows where the documentation is, how good it is, or if there was any to begin with. There are three code bases scattered throughout source control and no one's sure which one's the golden one. Furthermore, no one is sure which applications are using the component in question, and which quirks they rely on, so the potential for making breaking changes is high.

And this all assumes that Team ABC had heard of Team 123's code in the first place. In an organization of any size, unless you have a full-time "code reuse librarian" (you don't), the chances of this are virtually nil.

In the end, what it comes down to is that your problem probably isn't common enough or hard enough to warrant development of a reusable component, and you probably don't have the budget or time to put in enough work on something to make it reusable when it doesn't fully solve a business problem by itself. By all means, make your code beautiful. Design it in a loosely-coupled, encapsulated fashion, because it will be easier to maintain. Just remember that while reusable code is loosely coupled and well-encapsulated, loosely-coupled and well-encapsulated code isn't necessarily reusable. Reusability is a meta-feature, and if you want your code to be reusable, you have to design and build all of it around that. Don't try too hard to reuse code or create reusable code - focus on solving business problems.

-----

Side note on SOA: SOA aims to reduce the organizational problems caused by reuse. It does not address the technical ones, nor does it preclude the need to spend extra time developing truly reusable components. In fact, it mandates it - you can't have SOA without solid, reusable code.

One more side note: Interestingly, a Google search on "code reuse" without any adjectives or qualifiers pulls the following three articles on the first results page:

Obviously, most topics searched for anywhere on the internet will result in a mix of positive and negative opinions, but "code reuse" is often billed as a best practice, when clearly it causes a lot of problems. This is true for a lot of other practices billed as "best practices" as well, like unit tests and big-design-up-front, but the nature of code reuse can cause people to put it in a bucket with practices that are pretty tough to argue with, like loose coupling or encapsulation, when it really shouldn't be.

Thursday, May 20, 2010

Fiddler and WCF

If you've never heard of Fiddler, I strongly encourage you to give it a try. The interface is a bit overwhelming at first, but after you spend a couple minutes getting used to it, you'll realize how powerful it is. With a minimum of configuration (in most cases, just install it and run it), you can start capturing all the HTTP traffic that goes in and out of your machine so you can pull it open and look at it.

To get started, all you really need to know is that in the default view, individual requests fill up the list on the left. If you click on a request, the "statistics" and "inspectors" tabs on the right will fill up with interesting information. The most interesting tab tends to be "inspectors," which will show you the request on top and the response on the bottom. The different tabs, like TextView, HexView, Raw, XML, etc. will change the visualizer used to show you the HTTP stream.

If you're playing around with a WCF application, Fiddler is the fastest and easiest way to see exactly what's going across the wire. However, there are a few things you might need to know about configuring WCF so that Fiddler captures the traffic.

I was going to write a post that gathered up all the tips I had found, but fortunately, someone has already done that for me - unlike a lot of other articles I've read that mention WCF only in passing and stick to getting Fiddler up and running with browsers, Rick Strahl has an excellent post about all the finnicky things you might run into when you're trying to get your WCF application traffic logged. Some of the information seems like maybe it's a bit out of date (for example, on Win7/VS2010/.NET4/Fiddler2, trying to use the "extra dot" trick with 127.0.0.1 to log traffic going to my Cassini debugging server wouldn't work, but "localhost" with an extra dot does, contrary to his post. In any case, the information in his post will at the very least point you in the right direction. Make sure to read the comments too, as there are some insights there as well.

My problem the other day that was driving me bonkers was that I had everything set up correctly, and by forcing use of a proxy I could essentially prove that my traffic was going through Fiddler (my client app would fail when Fiddler wasn't running but would succeed whe nit was), but the traffic wasn't showing up. Rick's post pointed out the easy to miss process filter at the bottom of the Fiddler window - mine had gotten flipped to Web Browsers and so it wasn't logging any traffic that wasn't from IE or FireFox.

UPDATE: If you want to do the "localhost-dot" trick with Silverlight when debugging on Cassini -first, change ServiceReferences.ClientConfig or whatever code or configuration you need to to point the client proxy at "localhost." instead of "localhost." Next, F5 to load up the debugger. When your site loads, change "localhost" in the address to "localhost.". This will re-download and re-run the XAP from "localhost.", so you are free to contact the service and it will be logged through Fiddler. The other way to do it, without changing the browser address, is to add a clientaccesspolicy.xml to your Web project that grants the appropriate permission - Cassini will host this too, so when you run your XAP from "localhost" and it tries to contact services on "localhost.", Silverlight will find it.

Thursday, April 29, 2010

Fusion Logs

I wanted to test out a neat little library I whipped up over the past week on a fresh image of Windows 7: no Visual Studio and no applications that contain dependencies for my library (which I am able to provide individually and independently of the application, if necessary). I wanted to find out exactly which assembly references were needed and where they needed to be placed.

Fusion Logging to the rescue! Fusion is the .NET runtime's "assembly finder", and is responsible for finding the assemblies that your application needs to run, whether they are in the application executable's folder, an appropriately-named subfolder, the GAC, or wherever. Fusion is essentially "silent" by default but with a few tools and registry tweaks you can force it to be very vocal when it's looking for assemblies.

The first stop for most developers should the Fusion Log Viewer, fuslogvw.exe. This is installed with the .NET SDK and you can find it easily on Win7 by typing "fusion" into the Start menu. All the log viewer does is provide a nice friendly interface over a few registry switches and folder locations where the logs are dumped to.

If you don't have the SDK installed, like I didn't on my fresh Win7 image, you can twiddle some bits in the registry to manually enable logging, redirect the logs to a location that's easier to find, and then manually investigate the logs yourself. Junfeng Zhang's blog post here has a great overview of the different registry values you can set to control logging.

One setting Junfeng does not mention is the HKLM/SOFTWARE/Microsoft/Fusion!EnableLog DWORD registry value. Junfeng says: "By default, the log is kept in memory, and is included in FileNotFoundException (If you read FileNotFoundException's document, you will find it has a member called “FusionLog“, surprise!)". However, if the EnableLog registry value isn't present and an assembly load fails, the FusionLog property will only contain a message that says you need to set EnableLog to 1 to see the load failure information. If you set EnableLog in the registry to 1, no log information will be written to disk, but the FusionLog property will show you what you want to see. A handy feature of FileNotFoundException is that if it is thrown due to an assembly loading failure, the message in the FusionLog property is included in the exception message.

Flipping some or all of the aforementioned registry bits might be a good idea on a test or developer machine to help debug loading problems.

Thursday, April 22, 2010

BizTalk Assembly Reflection In a Child Appdomain

I'm just wrapping up some work I'm doing on a robust way to perform reflection on BizTalk assemblies (we need to be able to inspect BizTalk assemblies directly for a big efficiency-boosting project going on here) and wanted to share a few things I learned about reflection.

First, specific to BizTalk: investigating BizTalk assemblies with Reflector and then writing code to get information based on the types you can find with standard reflection will only get you so far. About as far as orchestrations and anything contained in them, as a matter of fact. While schemas, maps and pipelines get compiled to fairly simple objects that subclass from common BizTalk types, orchestration types and subtypes (port types, message types, etc). have many more interesting properties and are generated by the BizTalk compiler in much more convoluted ways.

BizTalk ships with two assemblies, Microsoft.BizTalk.Reflection.dll and Microsoft.BizTalk.TypeSystem.dll, that can help you. They are largely undocumented and searches for them will only give two good hits, both on Jesus Rodriguez' weblog: here and here. According to Mr. Rodriguez, Reflection.dll is fairly robust and gives you a friendly interface through the Reflector class, and TypeSystem.dll blows the doors off and should give you access to pretty much every speck of metadata you can squeeze from a BizTalk assembly. I'm using Reflection.dll and I'm finding that it does everything I need, although it requires some experimentation: Mr. Rodriguez's posts are enough to get you started, but plan on spending some time playing in the debugger with some simple BizTalk assemblies figuring out how information is organized, particularly in the case of orchestrations. If I get a chance I'll make a post detailing a few things I found - I spent a good chunk of time discovering that Orchestrations are referred to as Services, and the direction of a port is indicated by the Polarity property on PortInfo (which is redundently represented in a number of forms in the Implements, Uses, Polarity and PolarityValue properties).

The other thing I wanted to talk about is what happens when you load assemblies. Now, I'm not an expert regarding reflection, loading and handling assemblies, or Fusion, but the one basic thing you should know about loading assemblies is that you can't unload them. However, the logical container that you load an assembly into is an AppDomain, which can be unloaded. Your application runs inside an AppDomain and you can use the AppDomain class to spawn child AppDomains that you can use for reflection or whatever other nefarious purposes you like. If you load an assembly into your main application's AppDomain by simply doing something like Assembly.Load, that assembly will be loaded into memory until the application is terminated. This also locks the assembly, unless you use something called shadow copying, which I won't get into here, but Junfeng Zhang has a great blog post about it here. Even the ReflectionOnly load methods will lock the assembly and load some data into memory that you can't get rid of until the AppDomain gets trashed. For my purposes, this was bad news, because we are doing the reflection in an IIS-hosted service that can live for quite a long time, and the requirements for the application include the ability for users to modify their BizTalk assemblies at will.

The answer, of course, is to perform reflection inside a child AppDomain, pull the data out in a form that doesn't require the reflected assembly itself, and then trash the child AppDomain when you're done. Creating and unloading AppDomains and injecting code into them is fairly simple and is covered pretty well in a couple of blog posts by Steve Holstad and Jon Shemitz, here and here respectively. Mr. Holstad's post has sample code that you can download and use to get started. If you look at Microsoft.BizTalk.Reflection.Reflector inside of Reflector (that's a lot of reflecting), you'll see that it simply does a LoadFrom on the file path you give it, so as long as you get it running inside a child AppDomain, you'll be good to go.