I Work On Software: August 2007

Wednesday, August 8, 2007

Easing the pain of BizTalk Mapper

I hate the BizTalk Mapper.

I love BizTalk. I think it's an intriguing platform. Developing for it is satisfying - it's fun to see everything come together and work. I have enjoyed simultaneously learning about BizTalk and learning more about the paradigm of XML structures and how to manipulate them with XSLT the .NET framework. For the most part, the Biztalk tools work great. It takes some practice but after a while one can become very good at identifying issues and fixing them quickly.

The Mapper brings the whole BizTalk Server platform down a few notches. It's terrible. Essentially what Microsoft has tried to do is implement a happy GUI over XSLT design, and include some nifty little reusable code snippets at the same time. Unfortunately, that happy GUI is horribly non-transparent (and I don't mean Aero transparent) and hard-to-use, and it generates crap XSLT that is just plain wrong. The worst part is that you can't even clearly see what it's doing unless you make a few backflips through flaming hoops and essentially revert to using XSLT 1.0, which is what the Mapper is based on and what you should be using in the first plcae.

Let me back off of that a bit - the interface is on the right track, it just needs to evolve about four generations before its usable for enterprise-size projects, and it needs to provide a real-time view of what is happening with the XSLT underneath and let you fix what it occasionally, but so horribly, screws up. I envision something like an HTML editor that gives you WYSIWYG and code at the same time. If you poke around on the Internet, you'll notice that most examples you'll see of the Mapper in action are on bite-sized schemas, illustrating the use of one particular functoid. There's two reasons for that:

If the sample schema was any larger you wouldn't be able to figure out what on Earth was going on. I've got enterprise maps that look like a tangled mess of Christmas lights, complete with colored functoids glowing brightly, saying "Just TRY to figure out what I'm linked to!"
If the map was any more complex, the resulting XSLT would be fundamentally broken in at least one place.

My advice to you is to learn XSLT if you don't know it. The Mapper doesn't fully abstract it for you - odds are you'll have to dive into it to see what the map has goofed up.

/end rant

With that out of the way, I'd like to discuss just how to go about diving into the Mapper and discover more about what's going on.

As I stated above, the Mapper is based on XSLT 1.0 (I point out 1.0 for a reason: 2.0 has lots of useful functions, but they are not available to you. You might be able to change this but I haven't confirmed it yet). As in, when you compile a BizTalk map, what it spits out is an XSLT that accomplishes what you see on the map grid. It has some neat Microsoft-added trimmings as well - all of those functoids are written in C# and are represented as script in the XSLT. When the transform is running inside of the Microsoft engine, it can call into the C# and run it. Writing code in C# can be much easier than writing XSLT templates if you're not familiar with XSLT (and often even if you are), so I'll discuss how to take advantage of that as well.

Unfortunately, what you see on the map grid isn't always so obvious. Just exactly how does that stupid looping functoid work anyway? Why on Earth isn't that logical functoid not returning the value you think it should be? The map has no notion of a debugger, it's fire-and-forget - in this case, forget about trying to figure out what it's doing. There is a better way!

Once you've got a map wrapped up, right-click it in the Solution Explorer and click Validate Map. This will essentially compile the map and ensure that it has no errors. When it's finished, there will be a couple of lines near the bottom of the Output window - one has a link to the "extension object" XML (ignore this) and the other is a link to the output XSLT. Copy the XSLT link (don't open it, as it will open it in Internet Explorer view, i.e. "useless view") and do a File > Open on it to open it in an XSLT editor. The XSLT that will appear is your map.

Now for the great part - if you didn't already know, Visual Studio 2005 includes a real-time XSLT debugger. It's one of the greatest XSLT tools I have ever used, and just like .NET code, it can be very educational to plod along with the execution and see what's not working as expected. In fact, it's even more useful than it is for .NET code - even though the Intellisense for XSLT works great, it won't tip you off to certain kinds of typos, and watching carefully for nonsensical values will help you find those kinds of errors. To run the XSLT debugger, mark your breakpoints just as in a C#/VB.NET file, provide an XML document as input with the Input field in the Properties pane, and select "Debug XSLT" in the XML menu.

There you have it - real-time map debugging. But what else can we do with this knowledge? For starters, if you are having difficulty expressing something in a map and you'd rather just do it in XSLT (especially if you see that the map is just soooo close for one particular piece, but if you could just change that one line...), no problem. Drop a script functoid in your map and set the script type to Inline XSLT. Stick whatever XSLT you want in there and it will get dropped into the resulting map at the insertion point corresponding with the target node you link it to. If you do this, I recommend compiling the map first, then writing your XSLT right there in the map's XSLT output - this way you get the advantage of Intellisense and a sane editor control, you can see just what it'll look like once you drop it in, and you can test it right then and there by doing a Debug XSLT. Once your done, copy your code out and drop it into a script functoid. Is the logic of the whole map screwed up? Click on the map grid and use the Custom XSL Path field in the Properties pane to replace the map altogether with some custom XSLT.

What else can you do with this? I mentioned earlier that the functoids are actually C# code represented as script inside the XSLT. This is actually pretty cool, as there are some things that are expressed much more easily and cleanly with C# than with XSLT. Create a map with some functoids and take a look at its XSLT. Down at the bottom, you'll see this:

<msxsl:script language="C#" prefix="userCSharp">

Everything within this tag is C# enclosed in a CDATA tag. More precisely, it is C# script; the code is just a collection of methods not enclosed in a class, although you can create instances of classes within any code you put there (only a handful of common namespaces have been imported - to use any that aren't you'll have to explicitly reference them, as you can't use using statements here). If you want to write your own C# (or JScript.NET or VB.NET - script tags exist for those languages and you can have more than one language within the XSLT) code to stick in here, there are a couple of ways to do it:

If you are replacing the whole map with your own XSLT, just plop it in there, exactly how you see it done in the map.
Otherwise, you'll need to introduce it via a script functoid. Create a new script functoid with a .NET language type and put your method(s) in it. All methods within a given .NET language must have unique names - even if two methods with the same name have different signatures, only one will make it into the XSLT. If you happen to use the same name and parameter list as a method that the map compiler uses to achieve the functionality of a functoid, the build operation will fail with an error stating that the method is already used. Note that a script functoid doesn't necessarily have to have inputs or outputs - it can just sit there and contribute code to your XSLT.

Once your code is in there, how do you call it? Again, take a look at a map's output XSLT to see. The following expression, used in any XSLT attribute that accepts expressions, calls the MyFunction method with two arguments, the empty string and the text value for the //Record/Field1 node:

userCSharp:MyFunction("" , string(Record/Field1/text()))

Very, very nice! Plus, when you are debugging your XSLT, the debugger will step through this code as well!

As you can see, not all is bad about BizTalk's mapping system... as long as you stay away from the Mapping toolitself for all but the simplest transforms. Some might argue that XSLT is harder to maintain, but I personally think the maps are tougher. One small change in the map can set off a cascade of changes in the compiled transform, some of which are impossible to figure out without looking at the XSLT itself, so you might as well work in XSLT to begin with. Sometimes a map can be a great way to get started with an XSLT. I suggest playing around with the Mapper, making some simple connections using out-of-the box functoids, and see how they behave in XSLT. Try fiddling around with your schema as well (change the Max Occurs of some of your nodes and marvel at your map that no longer works) to see how the map compiler reacts.

Thursday, August 2, 2007

BRE String comparisons

Another short interjection with another annoying lesson learned (after WAY too much time trying to figure out what was going wrong): when the BRE does a string comparison with "is equal to", the comparison is caps sensitive. This can be a hard thing to see when you're trying to pinpoint a problem among pipeline execution, orchestrations, adapters, etc. and the BRE rule execution tracking is a pain to read.

Pipeline components - You can GAC them, too

I'm interrupting my three-parter with a short post regarding something I learned just the other day. I was originally under the impression that in order to deploy a pipeline component, it must be placed in the Pipeline Components folder of the BizTalk installation directory. Apparently, this was the rule in BizTalk 2004, but it has since changed - pipeline components can now be GACed as well.

It's my understanding that the reason that the Pipeline Components directory exists is because developers need a standard place to put their components after they develop them so they can be easily added to the Visual Studio toolbar and subsequently used in new pipelines. It's also handy because you can easily deploy a debug version of a component there and step through the code in the debugger once the pipeline runs by attaching the VS debugger to the BTSNTSvc service corresponding to the host instance running the pipeline.

As Stephen Thomas points out here, make sure you know and understand where the component is going during development, where it's going in deployment, and that those two strategies mesh. He points out a scenario he ran into in which he built a pipeline component and added it to a pipeline before strong-naming it and GACing it. When he tried to deploy, the operation failed because it couldn't find the pipeline component.

Wednesday, August 1, 2007

On BizTalk, Assemblies and Deployment, pt. 2

This is the second article in a multi-part series about assemblies and how they relate to BizTalk. This part discusses deploying assemblies to BizTalk and where things need to go in order to work correctly.

The nature of BizTalk necessitates the existence of a special type of assembly, called a BizTalk assembly. A BizTalk assembly is a class library (.dll) assembly just like many other assemblies, but it has some special metadata in it that BizTalk uses for its own purposes. A BizTalk assembly is what's created when you build a BizTalk project in Visual Studio (as opposed to a standard C# class library project). If you've ever worked on a BizTalk project in VS, you'll know that unlike a standard class library or executable project, there are only a few types of resources you can add - schemas, pipelines, maps and orchestrations. These constructs are unique to BizTalk and are not fully understood or recognized by the rest of the .NET runtime or the GAC.

To deploy a BizTalk assembly, you must do two things: deploy it to the GAC, and register it within BizTalk. GACing an assembly is simple, and can be done with gacutil (discussed in pt. 1). Note that since a BizTalk assembly must be GACed to be used, it must be strongly-named. To strongly name a BizTalk assembly, first create a .snk (strong-name key) file using Visual Studio's sn.exe utility (the path is already present in a Visual Studio Tools command prompt; the path to it on my machine is C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\sn.exe). The syntax is "sn.exe -k MyKeyFile.snk". Place the key in a location easily referenced from your project folder. In your BizTalk project in Visual Studio, open the project properties (right-click the project and select Properties). Click "Assembly" in the left pane, scroll down to "Assembly Key File", and enter the path to the .snk file or use the ellipsis button within the text field to browse to it. Click OK when finished - now the assembly that will result from building the project will be strongly-named.

Registering the assembly in BizTalk is required so BizTalk "knows" about it. An assembly is not registered directly from the GAC, it must be registered from a file on your machine. For this reason, you should treat your BizTalk assemblies like you treat program files for other applications - give them a folder somewhere where they aren't going to get moved or deleted. When you register a BizTalk assembly, BizTalk reads its metadata to discover what kinds of BizTalk artifacts (schemas, orchestrations, pipelines, maps) are present in it. Registration can be done via the command line, the admin console or Visual Studio. I will walk through the admin console and VS procedures - deploying from the command line is convenient for build/deploy scripts and the like, but if you're new to developing for BizTalk, odds are you'll be doing this manually from a GUI for a while.

To deploy a BizTalk assembly via the admin console, expand the application that you want to add the assembly to, right click the Resources folder and click "Add > BizTalk Assemblies..." Clicking the "Add" button in the next dialog will allow you to search for any file to add as a resource. Anything can be added as a resource to a BizTalk application, but what we are interested in right now is BizTalk assemblies, so dig up a .dll from a BizTalk VS project and add it. The File Type drop-down box should show System.BizTalk:BizTalkAssembly. Everything below this drop-down box is generally only relevant to the creation and usage of BizTalk MSI files, which I will cover in a later post. However, one of the checkbox options that will appear, "Add to the global assembly cache on add resource (gacutil)" will GAC the assembly for you once you click OK if you haven't GACed it yourself with gacutil.

To deploy a BizTalk assembly via Visual Studio, first configure the deployment options in the Deployment page of the project properties. Server and Configuration Database are required. Leaving Application blank will attempt to deploy to the default application, otherwise you may enter the name of a BizTalk application to deploy to. Redeploy should generally always be set to True, as should Install to Global Assembly Cache (this will GAC it for you). If you are working in a local development environment, go ahead and set Restart Host Instances to True as well - restarting the host instances is always a must when deploying or redeploying assemblies, it's just that you may want to wait for an opportune time to do it, and that may not be build/deploy time. Once you have filled out this page, selecting Deploy Solution or Deploy Project from the Build menu will attempt to deploy the assembly/assemblies to BizTalk for you, quick and easy.

If you are redeploying an assembly that already exists in BizTalk and the GAC, you generally need to follow the same procedure as when you deployed it the first time unless the version number has changed. Just like the GAC, multiple version numbers of the same assembly can co-exist within BizTalk. The components of those assemblies will exist as separate entities and are distinguished by the version number of the assembly in their full name. If the version number of your assembly has not changed, simply deploy it from Visual Studio with Redeploy set to True, or manually reGAC it and re-add it as a BizTalk Assembly resource from the admin console and make sure Overwrite is checked. Note that simply replacing the .dll on disk and in the GAC is not enough - it must be re-added to BizTalk's resources so it can read the assembly metadata again.

If the version number has changed as a part of development and you don't want the versions to co-exist (i.e. you want one definitive version in your environment to avoid confusion), you need to remove the old assembly by right-clicking it from the Resources screen of your application and clicking Remove. For this to succeed, none of the assembly's components can be used as configuration items within any of your binding objects (ports, orchestrations, etc.). The admin console will warn you and fail the operation if this is not true. Note that removing the resource from BizTalk does not unGAC the assembly: to keep your GAC clean, I suggest removing the assembly from it if you are no longer using it.

That wraps up part two of the deployment posts. In the third and last post I will discuss MSI files, binding files and assembly references, as well as pipeline components, which are the exceptions to all of the rules.

On BizTalk, Assemblies and Deployment, pt. 1

This is the first article in a multi-part series about assemblies and how they relate to BizTalk. This part covers what you need to know about assemblies in general in order to understand what they do and how to deploy them properly. I don't endeavor to cover every detail of .NET assemblies and how they are put together, but this hits most of the big topics. It's a bit long, so feel free to just skim.

The concept of assemblies and the GAC can be very confusing to a BizTalk developer who hasn't interacted with these ideas in .NET before. Fortunately, if you don't want to spend the time, there's no need to have a complete, detailed understanding of what assemblies are and how they work. By grasping a few simple ideas, working with assemblies in BizTalk becomes very easy, and gives you great insight into how BizTalk and .NET actually work with assemblies.

To get a good overall idea of what an assembly is, Wikipedia has a fairly descriptive and high-level article about it: http://en.wikipedia.org/wiki/.net_assembly.

Here's the short-short version: an assembly contains your compiled code, resources, and some metadata about that code and those resources. That's about it. Typically, a "project" in Visual Studio (which can consist of multiple code files, resources, references to other assemblies, etc.) equates to an assembly - a single .dll or .exe file. When you build that project it compiles your code and puts all of those other resources together in a big pile and stuffs it into a .dll or .exe. Additionally, it creates an "index" of machine-readable metadata that is incredibly useful - it contains all the information about what callable methods are in that code, it powers Visual Studio's IntelliSense functionality, and generally provides enough information to let everything outside of the assembly know what it does - just not how exactly it does it. Whenever you create any kind of application, all of the code and resources used to run it are in one or more .dll or .exe files. If you make a lot of little one-off "toolbelt" applications that are contained in a single Visual Studio project, that .exe that you get after you build has all of the code in it. If you were to spread the code over multiple projects and add the appropriate references within those projects, then when you ran the .exe and did something in your application that required code in one of those assemblies, the assembly it needs must be in the right place or the application will throw an exception. What is "the right place" for an assembly? I'll discuss that momentarily.

The next important bit about assemblies is how they are named. A name might seem trivial, but it contains a lot of information and is a guaranteed unique identifier for the assembly. The full name of an assembly has four parts: The short name, the version, the culture, and the public key token.

The short name is typically the name of the assembly without the file extension, e.g. NW.Applications.MyAssembly. This short name is also typically used as the namespace of all of the modules in the assembly - a namespace is simply something that helps to uniquely identify a module. There's lots of code in the universe, and it's highly likely that for every module, there's another module out there with the same name - let's take a hypothetical module called NumberCruncher. The namespace is like a surname that gets tacked onto it, usually containing a company name or a product name, that guarantees that those two modules with the same name are still unique - NumberCruncher in the NW.Applications.MyAssembly namespace is different from the NumberCruncher module in the ABC.Software.EnterpriseApps namespace. For this reason, many assembly names are like my sample one above: multiple tokens separated with dots, each token representing some kind of arbitrary hierarchy that's unique to my company or organization.
The version number of the assembly, presented like so: 1.0.4.0. The names for each of those values are "major version," "minor version", "build" and "revision." Major version is typically used to signify the product version: Is this SuperToolbox 2 or 3? Minor version represents things like service packs or patches. Build number is typically incremented by the developer every time the assembly is built - this number is often represented by four digits (filling in zeroes if necessary) because builds happen thousands of times. The last number can be used for things like hot fixes. Some developers may increment the last two numbers based on the date and time the build was completed at. Here's the important bit about versions: Two versions of the same assembly can co-exist within the GAC (keep reading), and are unique! Note that an assembly has an "assembly version" and a "file version." The one that really counts here is the assembly version. These two version numbers don't necessarily have to always be the same: one way to manage version numbers is to only increment the file version while you are developing. This helps avoid confusion with version numbers when you keep redeploying your code for testing - the assembly version always stays the same, but the file version (which does not make an assembly unique, but can still be viewed) can be used to determine exactly what build you are using.
Culture provides information about the language that the assembly is presented in (human language, not programming language). This will typically be "neutral."
Public key token: a 16-character hexadecimal string, the public half of a public-key cryptography pair. I won't discuss the details of how public-key cryptography works here, but the short and long of it is that only people who have the private half of the key can generate assemblies that have that public half of the key. This token essentially ensures that the assembly has come from a certain author and is guaranteed authentic. An assembly does not have to be given a public key token. If it has one, it can be called a "strongly-named assembly." Assigning a strong name is done in Visual Studio using a .snk file, which can be generated by using a Visual Studio command-line tool. An assembly must be strongly-named to be placed in the GAC.

So... what's the GAC? The GAC is the Global Assembly Cache, a universal repository of strong-named assemblies on your machine. Any assembly placed here is globally accessible and can be referenced easily and shared by multiple applications. An assembly can co-exist here with other assemblies that have the same short name, so long as they have different versions. Try browsing to C:\Windows\Assembly (the Assembly folder in your Windows install directory, wherever that might be): what you will see isn't actually a folder on your disk, but a specially-crafted view of all the assemblies in the GAC. There's a folder structure under there, but it's generally irrelevant to human beings. Try right-clicking on a few assemblies and click Properties to see interesting info about them.

This is where I get to the part about where assemblies need to be placed so they can be used. The part of .NET that finds assemblies when it needs them is called Fusion. If you are running an application that has references to other assemblies and it needs one of those other assemblies, Fusion kicks in and looks for assemblies in the following places in this order (this is direct from the .NET Assembly article on Wikipedia):

If the assembly is strongly named it will first look in the GAC (your app knows if the assembly is strongly named because it captures this information from the assembly when you add a reference to it in your project).
Fusion will then look for redirection information in the application's configuration file. If the library is strongly named then this can specify that another version should be loaded, or it can specify an absolute address of a folder on the local hard disk, or the URL of a file on a web server. If the library is not strongly named, then the configuration file can specify a subfolder beneath the application folder to be used in the search path.
Fusion will then look for the assembly in the application folder with either the extension .exe or .dll.
Fusion will look for a subfolder with the same name as the short name (PE file name) of the assembly and then looks for the assembly in that folder with either the extension .exe or .dll.

So, as you can see, you can essentially put an assembly anywhere as long as you configure your application properly. However, for ease of use and universal understanding, most people will either GAC their assemblies or put them in the application folder. Ah, I almost forgot to mention how to GAC an assembly: use gacutil.exe (the path should already exist in a Visual Studio Tools command prompt; on my machine it's located at C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\gacutil.exe) with the /i switch and the path to the assembly to GAC. You can use /if to "force" the installation, for example if the assembly already exists and you are reinstalling.

That's it for the lecture on assemblies. Next post I'll talk about how assemblies and the GAC work with BizTalk - how to deploy and redeploy assemblies, what needs to be GACed and what doesn't, where to put things, etc.

I Work On Software