Wednesday, August 8, 2007

Easing the pain of BizTalk Mapper

I hate the BizTalk Mapper.

I love BizTalk. I think it's an intriguing platform. Developing for it is satisfying - it's fun to see everything come together and work. I have enjoyed simultaneously learning about BizTalk and learning more about the paradigm of XML structures and how to manipulate them with XSLT the .NET framework. For the most part, the Biztalk tools work great. It takes some practice but after a while one can become very good at identifying issues and fixing them quickly.

The Mapper brings the whole BizTalk Server platform down a few notches. It's terrible. Essentially what Microsoft has tried to do is implement a happy GUI over XSLT design, and include some nifty little reusable code snippets at the same time. Unfortunately, that happy GUI is horribly non-transparent (and I don't mean Aero transparent) and hard-to-use, and it generates crap XSLT that is just plain wrong. The worst part is that you can't even clearly see what it's doing unless you make a few backflips through flaming hoops and essentially revert to using XSLT 1.0, which is what the Mapper is based on and what you should be using in the first plcae.

Let me back off of that a bit - the interface is on the right track, it just needs to evolve about four generations before its usable for enterprise-size projects, and it needs to provide a real-time view of what is happening with the XSLT underneath and let you fix what it occasionally, but so horribly, screws up. I envision something like an HTML editor that gives you WYSIWYG and code at the same time. If you poke around on the Internet, you'll notice that most examples you'll see of the Mapper in action are on bite-sized schemas, illustrating the use of one particular functoid. There's two reasons for that:
  1. If the sample schema was any larger you wouldn't be able to figure out what on Earth was going on. I've got enterprise maps that look like a tangled mess of Christmas lights, complete with colored functoids glowing brightly, saying "Just TRY to figure out what I'm linked to!"
  2. If the map was any more complex, the resulting XSLT would be fundamentally broken in at least one place.
My advice to you is to learn XSLT if you don't know it. The Mapper doesn't fully abstract it for you - odds are you'll have to dive into it to see what the map has goofed up.

/end rant

With that out of the way, I'd like to discuss just how to go about diving into the Mapper and discover more about what's going on.

As I stated above, the Mapper is based on XSLT 1.0 (I point out 1.0 for a reason: 2.0 has lots of useful functions, but they are not available to you. You might be able to change this but I haven't confirmed it yet). As in, when you compile a BizTalk map, what it spits out is an XSLT that accomplishes what you see on the map grid. It has some neat Microsoft-added trimmings as well - all of those functoids are written in C# and are represented as script in the XSLT. When the transform is running inside of the Microsoft engine, it can call into the C# and run it. Writing code in C# can be much easier than writing XSLT templates if you're not familiar with XSLT (and often even if you are), so I'll discuss how to take advantage of that as well.

Unfortunately, what you see on the map grid isn't always so obvious. Just exactly how does that stupid looping functoid work anyway? Why on Earth isn't that logical functoid not returning the value you think it should be? The map has no notion of a debugger, it's fire-and-forget - in this case, forget about trying to figure out what it's doing. There is a better way!

Once you've got a map wrapped up, right-click it in the Solution Explorer and click Validate Map. This will essentially compile the map and ensure that it has no errors. When it's finished, there will be a couple of lines near the bottom of the Output window - one has a link to the "extension object" XML (ignore this) and the other is a link to the output XSLT. Copy the XSLT link (don't open it, as it will open it in Internet Explorer view, i.e. "useless view") and do a File > Open on it to open it in an XSLT editor. The XSLT that will appear is your map.

Now for the great part - if you didn't already know, Visual Studio 2005 includes a real-time XSLT debugger. It's one of the greatest XSLT tools I have ever used, and just like .NET code, it can be very educational to plod along with the execution and see what's not working as expected. In fact, it's even more useful than it is for .NET code - even though the Intellisense for XSLT works great, it won't tip you off to certain kinds of typos, and watching carefully for nonsensical values will help you find those kinds of errors. To run the XSLT debugger, mark your breakpoints just as in a C#/VB.NET file, provide an XML document as input with the Input field in the Properties pane, and select "Debug XSLT" in the XML menu.

There you have it - real-time map debugging. But what else can we do with this knowledge? For starters, if you are having difficulty expressing something in a map and you'd rather just do it in XSLT (especially if you see that the map is just soooo close for one particular piece, but if you could just change that one line...), no problem. Drop a script functoid in your map and set the script type to Inline XSLT. Stick whatever XSLT you want in there and it will get dropped into the resulting map at the insertion point corresponding with the target node you link it to. If you do this, I recommend compiling the map first, then writing your XSLT right there in the map's XSLT output - this way you get the advantage of Intellisense and a sane editor control, you can see just what it'll look like once you drop it in, and you can test it right then and there by doing a Debug XSLT. Once your done, copy your code out and drop it into a script functoid. Is the logic of the whole map screwed up? Click on the map grid and use the Custom XSL Path field in the Properties pane to replace the map altogether with some custom XSLT.

What else can you do with this? I mentioned earlier that the functoids are actually C# code represented as script inside the XSLT. This is actually pretty cool, as there are some things that are expressed much more easily and cleanly with C# than with XSLT. Create a map with some functoids and take a look at its XSLT. Down at the bottom, you'll see this:

<msxsl:script language="C#" prefix="userCSharp">

Everything within this tag is C# enclosed in a CDATA tag. More precisely, it is C# script; the code is just a collection of methods not enclosed in a class, although you can create instances of classes within any code you put there (only a handful of common namespaces have been imported - to use any that aren't you'll have to explicitly reference them, as you can't use using statements here). If you want to write your own C# (or JScript.NET or VB.NET - script tags exist for those languages and you can have more than one language within the XSLT) code to stick in here, there are a couple of ways to do it:
  • If you are replacing the whole map with your own XSLT, just plop it in there, exactly how you see it done in the map.
  • Otherwise, you'll need to introduce it via a script functoid. Create a new script functoid with a .NET language type and put your method(s) in it. All methods within a given .NET language must have unique names - even if two methods with the same name have different signatures, only one will make it into the XSLT. If you happen to use the same name and parameter list as a method that the map compiler uses to achieve the functionality of a functoid, the build operation will fail with an error stating that the method is already used. Note that a script functoid doesn't necessarily have to have inputs or outputs - it can just sit there and contribute code to your XSLT.
Once your code is in there, how do you call it? Again, take a look at a map's output XSLT to see. The following expression, used in any XSLT attribute that accepts expressions, calls the MyFunction method with two arguments, the empty string and the text value for the //Record/Field1 node:

userCSharp:MyFunction(&quot;&quot; , string(Record/Field1/text()))

Very, very nice! Plus, when you are debugging your XSLT, the debugger will step through this code as well!

As you can see, not all is bad about BizTalk's mapping system... as long as you stay away from the Mapping toolitself for all but the simplest transforms. Some might argue that XSLT is harder to maintain, but I personally think the maps are tougher. One small change in the map can set off a cascade of changes in the compiled transform, some of which are impossible to figure out without looking at the XSLT itself, so you might as well work in XSLT to begin with. Sometimes a map can be a great way to get started with an XSLT. I suggest playing around with the Mapper, making some simple connections using out-of-the box functoids, and see how they behave in XSLT. Try fiddling around with your schema as well (change the Max Occurs of some of your nodes and marvel at your map that no longer works) to see how the map compiler reacts.

Thursday, August 2, 2007

BRE String comparisons

Another short interjection with another annoying lesson learned (after WAY too much time trying to figure out what was going wrong): when the BRE does a string comparison with "is equal to", the comparison is caps sensitive. This can be a hard thing to see when you're trying to pinpoint a problem among pipeline execution, orchestrations, adapters, etc. and the BRE rule execution tracking is a pain to read.

Pipeline components - You can GAC them, too

I'm interrupting my three-parter with a short post regarding something I learned just the other day. I was originally under the impression that in order to deploy a pipeline component, it must be placed in the Pipeline Components folder of the BizTalk installation directory. Apparently, this was the rule in BizTalk 2004, but it has since changed - pipeline components can now be GACed as well.

It's my understanding that the reason that the Pipeline Components directory exists is because developers need a standard place to put their components after they develop them so they can be easily added to the Visual Studio toolbar and subsequently used in new pipelines. It's also handy because you can easily deploy a debug version of a component there and step through the code in the debugger once the pipeline runs by attaching the VS debugger to the BTSNTSvc service corresponding to the host instance running the pipeline.

As Stephen Thomas points out here, make sure you know and understand where the component is going during development, where it's going in deployment, and that those two strategies mesh. He points out a scenario he ran into in which he built a pipeline component and added it to a pipeline before strong-naming it and GACing it. When he tried to deploy, the operation failed because it couldn't find the pipeline component.

Wednesday, August 1, 2007

On BizTalk, Assemblies and Deployment, pt. 2

This is the second article in a multi-part series about assemblies and how they relate to BizTalk. This part discusses deploying assemblies to BizTalk and where things need to go in order to work correctly.

The nature of BizTalk necessitates the existence of a special type of assembly, called a BizTalk assembly. A BizTalk assembly is a class library (.dll) assembly just like many other assemblies, but it has some special metadata in it that BizTalk uses for its own purposes. A BizTalk assembly is what's created when you build a BizTalk project in Visual Studio (as opposed to a standard C# class library project). If you've ever worked on a BizTalk project in VS, you'll know that unlike a standard class library or executable project, there are only a few types of resources you can add - schemas, pipelines, maps and orchestrations. These constructs are unique to BizTalk and are not fully understood or recognized by the rest of the .NET runtime or the GAC.

To deploy a BizTalk assembly, you must do two things: deploy it to the GAC, and register it within BizTalk. GACing an assembly is simple, and can be done with gacutil (discussed in pt. 1). Note that since a BizTalk assembly must be GACed to be used, it must be strongly-named. To strongly name a BizTalk assembly, first create a .snk (strong-name key) file using Visual Studio's sn.exe utility (the path is already present in a Visual Studio Tools command prompt; the path to it on my machine is C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\sn.exe). The syntax is "sn.exe -k MyKeyFile.snk". Place the key in a location easily referenced from your project folder. In your BizTalk project in Visual Studio, open the project properties (right-click the project and select Properties). Click "Assembly" in the left pane, scroll down to "Assembly Key File", and enter the path to the .snk file or use the ellipsis button within the text field to browse to it. Click OK when finished - now the assembly that will result from building the project will be strongly-named.

Registering the assembly in BizTalk is required so BizTalk "knows" about it. An assembly is not registered directly from the GAC, it must be registered from a file on your machine. For this reason, you should treat your BizTalk assemblies like you treat program files for other applications - give them a folder somewhere where they aren't going to get moved or deleted. When you register a BizTalk assembly, BizTalk reads its metadata to discover what kinds of BizTalk artifacts (schemas, orchestrations, pipelines, maps) are present in it. Registration can be done via the command line, the admin console or Visual Studio. I will walk through the admin console and VS procedures - deploying from the command line is convenient for build/deploy scripts and the like, but if you're new to developing for BizTalk, odds are you'll be doing this manually from a GUI for a while.

To deploy a BizTalk assembly via the admin console, expand the application that you want to add the assembly to, right click the Resources folder and click "Add > BizTalk Assemblies..." Clicking the "Add" button in the next dialog will allow you to search for any file to add as a resource. Anything can be added as a resource to a BizTalk application, but what we are interested in right now is BizTalk assemblies, so dig up a .dll from a BizTalk VS project and add it. The File Type drop-down box should show System.BizTalk:BizTalkAssembly. Everything below this drop-down box is generally only relevant to the creation and usage of BizTalk MSI files, which I will cover in a later post. However, one of the checkbox options that will appear, "Add to the global assembly cache on add resource (gacutil)" will GAC the assembly for you once you click OK if you haven't GACed it yourself with gacutil.

To deploy a BizTalk assembly via Visual Studio, first configure the deployment options in the Deployment page of the project properties. Server and Configuration Database are required. Leaving Application blank will attempt to deploy to the default application, otherwise you may enter the name of a BizTalk application to deploy to. Redeploy should generally always be set to True, as should Install to Global Assembly Cache (this will GAC it for you). If you are working in a local development environment, go ahead and set Restart Host Instances to True as well - restarting the host instances is always a must when deploying or redeploying assemblies, it's just that you may want to wait for an opportune time to do it, and that may not be build/deploy time. Once you have filled out this page, selecting Deploy Solution or Deploy Project from the Build menu will attempt to deploy the assembly/assemblies to BizTalk for you, quick and easy.

If you are redeploying an assembly that already exists in BizTalk and the GAC, you generally need to follow the same procedure as when you deployed it the first time unless the version number has changed. Just like the GAC, multiple version numbers of the same assembly can co-exist within BizTalk. The components of those assemblies will exist as separate entities and are distinguished by the version number of the assembly in their full name. If the version number of your assembly has not changed, simply deploy it from Visual Studio with Redeploy set to True, or manually reGAC it and re-add it as a BizTalk Assembly resource from the admin console and make sure Overwrite is checked. Note that simply replacing the .dll on disk and in the GAC is not enough - it must be re-added to BizTalk's resources so it can read the assembly metadata again.

If the version number has changed as a part of development and you don't want the versions to co-exist (i.e. you want one definitive version in your environment to avoid confusion), you need to remove the old assembly by right-clicking it from the Resources screen of your application and clicking Remove. For this to succeed, none of the assembly's components can be used as configuration items within any of your binding objects (ports, orchestrations, etc.). The admin console will warn you and fail the operation if this is not true. Note that removing the resource from BizTalk does not unGAC the assembly: to keep your GAC clean, I suggest removing the assembly from it if you are no longer using it.

That wraps up part two of the deployment posts. In the third and last post I will discuss MSI files, binding files and assembly references, as well as pipeline components, which are the exceptions to all of the rules.

On BizTalk, Assemblies and Deployment, pt. 1

This is the first article in a multi-part series about assemblies and how they relate to BizTalk. This part covers what you need to know about assemblies in general in order to understand what they do and how to deploy them properly. I don't endeavor to cover every detail of .NET assemblies and how they are put together, but this hits most of the big topics. It's a bit long, so feel free to just skim.

The concept of assemblies and the GAC can be very confusing to a BizTalk developer who hasn't interacted with these ideas in .NET before. Fortunately, if you don't want to spend the time, there's no need to have a complete, detailed understanding of what assemblies are and how they work. By grasping a few simple ideas, working with assemblies in BizTalk becomes very easy, and gives you great insight into how BizTalk and .NET actually work with assemblies.

To get a good overall idea of what an assembly is, Wikipedia has a fairly descriptive and high-level article about it:

Here's the short-short version: an assembly contains your compiled code, resources, and some metadata about that code and those resources. That's about it. Typically, a "project" in Visual Studio (which can consist of multiple code files, resources, references to other assemblies, etc.) equates to an assembly - a single .dll or .exe file. When you build that project it compiles your code and puts all of those other resources together in a big pile and stuffs it into a .dll or .exe. Additionally, it creates an "index" of machine-readable metadata that is incredibly useful - it contains all the information about what callable methods are in that code, it powers Visual Studio's IntelliSense functionality, and generally provides enough information to let everything outside of the assembly know what it does - just not how exactly it does it. Whenever you create any kind of application, all of the code and resources used to run it are in one or more .dll or .exe files. If you make a lot of little one-off "toolbelt" applications that are contained in a single Visual Studio project, that .exe that you get after you build has all of the code in it. If you were to spread the code over multiple projects and add the appropriate references within those projects, then when you ran the .exe and did something in your application that required code in one of those assemblies, the assembly it needs must be in the right place or the application will throw an exception. What is "the right place" for an assembly? I'll discuss that momentarily.

The next important bit about assemblies is how they are named. A name might seem trivial, but it contains a lot of information and is a guaranteed unique identifier for the assembly. The full name of an assembly has four parts: The short name, the version, the culture, and the public key token.
  • The short name is typically the name of the assembly without the file extension, e.g. NW.Applications.MyAssembly. This short name is also typically used as the namespace of all of the modules in the assembly - a namespace is simply something that helps to uniquely identify a module. There's lots of code in the universe, and it's highly likely that for every module, there's another module out there with the same name - let's take a hypothetical module called NumberCruncher. The namespace is like a surname that gets tacked onto it, usually containing a company name or a product name, that guarantees that those two modules with the same name are still unique - NumberCruncher in the NW.Applications.MyAssembly namespace is different from the NumberCruncher module in the ABC.Software.EnterpriseApps namespace. For this reason, many assembly names are like my sample one above: multiple tokens separated with dots, each token representing some kind of arbitrary hierarchy that's unique to my company or organization.
  • The version number of the assembly, presented like so: The names for each of those values are "major version," "minor version", "build" and "revision." Major version is typically used to signify the product version: Is this SuperToolbox 2 or 3? Minor version represents things like service packs or patches. Build number is typically incremented by the developer every time the assembly is built - this number is often represented by four digits (filling in zeroes if necessary) because builds happen thousands of times. The last number can be used for things like hot fixes. Some developers may increment the last two numbers based on the date and time the build was completed at. Here's the important bit about versions: Two versions of the same assembly can co-exist within the GAC (keep reading), and are unique! Note that an assembly has an "assembly version" and a "file version." The one that really counts here is the assembly version. These two version numbers don't necessarily have to always be the same: one way to manage version numbers is to only increment the file version while you are developing. This helps avoid confusion with version numbers when you keep redeploying your code for testing - the assembly version always stays the same, but the file version (which does not make an assembly unique, but can still be viewed) can be used to determine exactly what build you are using.
  • Culture provides information about the language that the assembly is presented in (human language, not programming language). This will typically be "neutral."
  • Public key token: a 16-character hexadecimal string, the public half of a public-key cryptography pair. I won't discuss the details of how public-key cryptography works here, but the short and long of it is that only people who have the private half of the key can generate assemblies that have that public half of the key. This token essentially ensures that the assembly has come from a certain author and is guaranteed authentic. An assembly does not have to be given a public key token. If it has one, it can be called a "strongly-named assembly." Assigning a strong name is done in Visual Studio using a .snk file, which can be generated by using a Visual Studio command-line tool. An assembly must be strongly-named to be placed in the GAC.
So... what's the GAC? The GAC is the Global Assembly Cache, a universal repository of strong-named assemblies on your machine. Any assembly placed here is globally accessible and can be referenced easily and shared by multiple applications. An assembly can co-exist here with other assemblies that have the same short name, so long as they have different versions. Try browsing to C:\Windows\Assembly (the Assembly folder in your Windows install directory, wherever that might be): what you will see isn't actually a folder on your disk, but a specially-crafted view of all the assemblies in the GAC. There's a folder structure under there, but it's generally irrelevant to human beings. Try right-clicking on a few assemblies and click Properties to see interesting info about them.

This is where I get to the part about where assemblies need to be placed so they can be used. The part of .NET that finds assemblies when it needs them is called Fusion. If you are running an application that has references to other assemblies and it needs one of those other assemblies, Fusion kicks in and looks for assemblies in the following places in this order (this is direct from the .NET Assembly article on Wikipedia):
  1. If the assembly is strongly named it will first look in the GAC (your app knows if the assembly is strongly named because it captures this information from the assembly when you add a reference to it in your project).
  2. Fusion will then look for redirection information in the application's configuration file. If the library is strongly named then this can specify that another version should be loaded, or it can specify an absolute address of a folder on the local hard disk, or the URL of a file on a web server. If the library is not strongly named, then the configuration file can specify a subfolder beneath the application folder to be used in the search path.
  3. Fusion will then look for the assembly in the application folder with either the extension .exe or .dll.
  4. Fusion will look for a subfolder with the same name as the short name (PE file name) of the assembly and then looks for the assembly in that folder with either the extension .exe or .dll.
So, as you can see, you can essentially put an assembly anywhere as long as you configure your application properly. However, for ease of use and universal understanding, most people will either GAC their assemblies or put them in the application folder. Ah, I almost forgot to mention how to GAC an assembly: use gacutil.exe (the path should already exist in a Visual Studio Tools command prompt; on my machine it's located at C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\gacutil.exe) with the /i switch and the path to the assembly to GAC. You can use /if to "force" the installation, for example if the assembly already exists and you are reinstalling.

That's it for the lecture on assemblies. Next post I'll talk about how assemblies and the GAC work with BizTalk - how to deploy and redeploy assemblies, what needs to be GACed and what doesn't, where to put things, etc.

Tuesday, July 31, 2007

Refreshing the BRE Facts cache

In my first post I talked a little bit about the FactRetriever object and how it works in concert with a BRE policy. Use of a FactRetriever can help you manage an in-memory cache of a table so that the BRE doesn't have to constantly hit a database for information.

After much debugging, stress, and some research on the part of a coworker, I discovered a crucial bit about how the BRE operates: If a fact (such as a DataTable within a DataSet, or an XML document) is asserted into memory and you wish to refresh that cached version, you must explicitly tell the BRE to retract the fact from memory before asserting it again. Failure to do this will lead to some really strange behavior that will have you debugging in circles.

Depending on your environment and the type of fact you have asserted, there are a couple of ways of doing this. You can call RuleEngine.Clear(), but this halts all BRE execution, cancels all rule firings, and removes all facts from the cache. I'm sure there's a situation where this is desirable, but the odds are that it's not yours. This will clear everything from the BRE's memory.

What you're probably looking for is RuleEngine.Retract() or RuleEngine.RetractByType(). I believe that in the case of a DataTable or DataConnection, these two methods accomplish the same thing, since a DataTable or DataConnection stored in the BRE memory is unique and there can only be one instance of it. Since our goal was to retract a DataTable, we didn't do a lot of research into retracting other types of facts.

Using RetractByType() took a little while to figure out: there is no documentation out there anywhere that we could find, and no extended Intellisense documentation. It took a little experimentation and some creativity to figure out how to do what we wanted. RetractByType() takes a single argument, a FactType object. FactType is abstract, but has a couple of descendants, one of them being DataRowType. DataRowType's constructor takes two strings: the name of a table, and the name of a dataset. These values are what defines a "table type" within the BRE: If you have a table with the Table property set to "ABC" inside a DataSet with the DataSetName property set to "XYZ", that's the only "ABC" "XYZ" table that can be in the BRE's memory at one time. The BRE keys off of those name values when you make references to the table within rules or vocabularies.

In order to create a type identifier so that the BRE can retract your fact for you, all you have to do is create a new DataRowType with the table name and DataSet name correctly specified, and hand that object to RetractByType. The following is an example - this is the first code that runs once the FactRetriever has determined that the cache does indeed need to be refreshed:

if (null != factsHandleIn)
engine.RetractByType(new DataRowType("MyTableName", "MyDataSetName"));


The reason that I check to see if factsHandleIn is null is because my FactRetriever returns a value as factsHandleOut that gets handed in again as factsHandleIn on the next call. If factsHandleIn is null, that means that this is the first time the FactRetriever has run since restarting its host instance, so there's nothing in memory to Retract anyway. The next thing that

Mixed in with some logic that determines when the cache should be refreshed, you can use this control over retracting/asserting facts to keep an efficient cache in the BRE's memory. Our FactRetriever uses DateTimes and values in the registry to determine if the cache needs to be refreshed - it can be set to refresh automatically on an interval (run the policy and record the time. If the policy runs again and the time span between the last update and now is X seconds, refresh again) or by a user-override (if a user edited the table, he can set a value in the registry to DateTime.Now, and the next time the policy runs it will see a newer value than the last value stored in that registry node and will refresh).

Tuesday, July 24, 2007

How to examine your messages in BizTalk

This tip is more for the BizTalk newbies than the seasoned veterans (a group that I don't include myself in, by the way). If you need to get a good look at a message in BizTalk that's hit the MessageBox, including all of its context properties, simply create a Send Port that subscribes to the message and set it to the "Stopped" state (not "Unenlisted").

BizTalk 2006 send ports have message queues attached to them. The filters that subscribe the send port to messages in the MessageBox actually subscribe the queue to those messages, and the send port then processes messages off of the queue. If you unenlist a send port, it's message queue gets shut down and any subscription information becomes invisible to BizTalk. However, if you only stop the send port, the queue and its subscription remain active. Any messages that reach the queue become suspended and will automatically be resubmitted and subsequently processed if the send port is turned back on.

So, subscribe to your message, stop (not unenlist) the send port, run a message through, and then you'll be able to find your message in the suspended instances view. From there, you can examine the message and it's properties in detail.

Monday, July 23, 2007

Unicode and BizTalk

The BizTalk 2006 disassemblers will choke on UTF16 files that don't have byte order marks. What makes this behavior really strange, especially for users of BizTalk 2002, is that BizTalk 2002 accepted them with or without the marks. It's my understanding that this behavior in 2002 is actually a bug.

If you're trying to figure out why your files aren't jiving with BizTalk, take a look at the encoding and see if that FF FE (or FE FF) is present at the beginning of your files. A custom component in your receive pipeline placed before the disassembler that slips these two bytes in can easily correct the problem.

Debugging pipeline components in Visual Studio

I am not a Jedi of the Visual Studio Debugger, so I thought this was pretty magical when I found out how to do it.

To step through the code of a custom pipeline component in real time in the Visual Studio debugger, simply do the following:
  • Compile the component as Debug. Make sure you've got breakpoints appropriately set.
  • Place both the DLL and the PDB file in the Pipeline Components folder (as with all pipeline components, you don't need to GAC anything).
  • Configure a receive location/send port with a pipeline that uses the custom component.
  • Open the component solution in Visual Studio, and in the Debug menu, select Attach to Process. Attach to the process that represents the host instance your port is running on (BizTalk services are named BTSNTSvc.exe. I don't know of any way to identify which one represents the host instance you want, if you have more than one running).
  • Once attached, trigger the component by running a file through BizTalk.
As soon as the component loads, control will transfer to the debugger. You are now controlling the execution of the component in real time. This is a godsend when trying to figure out exactly how BizTalk is interacting with your custom component.

Why is Load being called twice on my pipeline component?

If you write a custom pipeline component and follow its execution through the debugger, you may be surprised to find that sometimes (depending on the component's configuration), the Load method may be called twice. This is confusing behavior at first until you start looking at the values that are being pulled out of the PropertyBag that's handed in to Load.

If the design-time configuration of your pipeline (the configuration items you can set on the component within the pipeline from the Send Port/Receive Location configuration dialog) consists of only default or only non-default values, Load will only be called once. If you have more than one design-time property and some are set to default values while others have non-default values, Load will be called twice. Each call will have a distinct PropertyBag - the first will contain the default values, and the second will contain the non-default values.

Just to clarify, the "default value" is the one you provide in Visual Studio in the Properties pane when you place the component into a pipeline. When setting values at design-time, default values show up as normal text while non-default values will appear in bold.

Here's the part that can trip up your code if you don't plan for it: The PropertyBag that contains the default values contains nulls for all the non-default values, and vice versa. If your component contains more than one design-time property, make sure that it doesn't choke on nulls, as you are guaranteed to get some. A simple way to do this is to give the component object a Dictionary member or some similar type of object that can contain the design-time property values. In your Load method, include some logic that looks to see if a value already exists in that Dictionary if it finds a null.

Oh, and speaking of default values for pipeline components, you can reset all the values to defaults if you simply select a different pipeline for the Send Port/Receive Location, and then select the original pipeline again. This is the only way I know of resetting the values.

What are FactsHandleIn and FactsHandleOut?

Execution within the BizTalk 2006 BRE follows an interesting paradigm. It took me a while to sort it all out as I'm not familiar at all with rules-based programming or anything of the sort. One of the things that tripped me up the longest was the notion of a FactRetriever.

A FactRetriever is a class that attaches to a policy and is in charge of making sure that when the policy runs, it has all of the facts that it needs in memory. Use of a FactRetriever allows fine control over when and how things are done - instead of just letting the BRE connect to a database, you can craft DataSets however you'd like (build it in memory, call a stored procedure, etc.) and assert those into memory as facts.

The heart of the FactRetriever is the UpdateFacts method, which is called every time the policy it is attached to is run. One of its parameters is object FactsHandleIn, which has a name that doesn't make it too obvious as to what it does. Additionally, UpdateFacts returns object FactsHandleOut. These two objects give you a lot of flexibility in how you'd like to control fact retrieval.

It's this simple: The first time the policy is called, when UpdateFacts is called, FactsHandleIn is null. Whatever you choose to return as FactsHandleOut is passed in as FactsHandleIn when the policy is called again. If the service/host instance/machine/etc. is ever restarted, a null will be passed in again on first calling.

This doesn't sound all that great when you first hear it, but it's a nice way of letting the BRE persist anything you want in memory to be used as a "hint" to the FactRetriever that you want to do something. For example, a great use of the BRE is as a local, in-memory cache of a DataSet. You can control the refreshing of this cache by passing a DateTime around as FactsHandleIn/Out, and using some custom logic in UpdateFacts to determine whether or not the data should be retrieved and re-asserted into memory.

You can use FactsHandleIn/Out for anything. A DateTime is a common use. I imagine that you could think up a great use for just about any kind of objects as a FactsHandle. This is one of those places where I'm sure someday I'll run across a really ingenious use of some strange object to perform a cool task.