Tuesday, March 3, 2009

Importing DTD to XSD

Update: Using the trick below to handle the "lang" attribute has been nothing but a headache. BizTalk apparently understands that http://www.w3.org/XML/1998/namespace is a special namespace, and that the xml: prefix is reserved for it, but that doesn't stop a BizTalk map from generating something like "xmlns:ns0="http://www.w3.org/XML/1998/namespace" ns0:lang="EN"". What I ended up doing was creating a new attribute named "lang" in the local namespace and using a custom pipeline component on the send side to tack the "xml:" prefix onto it as it goes out the door. I'm told that adding the "xml:" namespace declaration to the schema that uses the xml:lang attribute in a text editor works, but it gets erased next time you open it and save it in the BizTalk schema editor. I don't consider this a workaround.

I just spent a few hours fumbling around with the DTD -> XSD import tool included in the BizTalk Visual Studio tools, and I thought I'd share my experience. There are a number of small, helpful tips here regarding the tool itself, as well as manipulating schemas and playing around with namespaces (one important one in particular).

The tool is available if you right-click a BizTalk project in Solution Explorer and do Add > Add Generated Items... and use the Generate Schemas wizard. There are three options shown: DTD Schema, XDR Schema, and Well-Formed XML. Two of these tools, DTD and Well-Formed XML, are not available right off of the bat - you need to run a couple of script files to install them.

Before we even get there though, there's an important hotfix here that you must install. The library that contains the DTD converter is completely broken out of the box and needs to be replaced with this hotfix before running the script file to install it.

Once you've obtained and run the hotfix, navigate to %programfiles%\Microsoft BizTalk Server 2006\SDK\Utilities\Schema Generator and run the two .vbs scripts there, InstallDTD and InstallWFX. These scripts will copy the DLLs in that folder (one of which was just updated by the hotfix) to the appropriate location where they can be used by Visual Studio. You may need to restart Visual Studio after running the scripts.

Head back to Add > Add Generated Items... > Generate Schemas and feed the DTD -> XSD tool a DTD schema. What you'll get is a big jumbly mess of node definitions, all at the root level. See my post on root_reference and displayroot_reference for more information about this and how to "fix" it.

So now I've got my DTD imported, but I've got one more problem: The root node on my schema, "Document", has an attribute with the namespace "xml". This isn't represented in the DTD at all, so its understandable that it isn't reflected in the XSD schema, but the sample documents I have all show that the "lang" attribute uses the "xml" namespace prefix. My schema validates just fine, but trying to validate my sample documents fails.

The most important thing to know here is that the "xml" namespace is a special case - XML parsers should universally understand that the "xml" prefix is implicitly reserved to resolve to the namespace "http://www.w3.org/XML/1998/namespace". Defined in this namespace are a couple of attributes, one of them being "lang". So if this namespace is supposed to be implicitly understood, why is schema validation getting hung up on it?

The reason is that BizTalk and its tools understand the namespace, but they don't inherently know what's contained in it. Our schema needs to reference another schema that defines the types in the http://www.w3.org/XML/1998/namespace namespace. If I was to take one of my sample documents and import it using the Well-Formed XML -> XSD generator (a great tool, but be careful - it can only define the nodes present in the particular sample document you use), I would get a second schema, referenced in the first, that defines the attributes available in http://www.w3.org/XML/1998/namespace. The Well-Formed XML -> XSD generator knows about this namespace, and knows it needs a schema that defines those types. Unfortunately, the way it resolves the problem really isn't the best way of going about it - if you deploy the project as-is, you'll get a warning that a schema is already deployed that defines types in http://www.w3.org/XML/1998/namespace.

The BizTalk product team has already defined a schema that contains the types in the http://www.w3.org/XML/1998/namespace. The schema is called "BTS.xml", and it is located in the Microsoft.BizTalk.GlobalPropertySchemas assembly, which by default is referenced in every BizTalk project and deployed to the BizTalk.System application. To reference the schema, open your document schema, click the "Schema" root node, and in the Properties pane, click the Imports property. This will surface the hidden ellipsis button - click this to open the Imports dialog. Select "XSD Import" in the dropdown and click Add to open the BizTalk Type Picker. In the Type Picker, select References > Microsoft.BizTalk.GlobalPropertySchemas > Schemas > BTS.xml and click OK. BTS.xml will be added as an XSD import with a default namespace like "ns0". This is fine, but it is more appropriate to change the prefix to "xml", which is specifically reserved for defining this namespace (note: by using the prefix "xml", no new prefix/namespace declaration will appear in the root xs:schema node, since the "xml" prefix is implicitly understood. Using any other prefix will cause a new namespace declaration to appear).

The reference has been added, but there's still one more thing to fix: the schema still thinks that the "lang" attribute is part of its namespace, not the http://www.w3.org/XML/1998/namespace. To fix this, close the schema and re-open it using the XML editor. Scroll down to where the "lang" attribute is defined and replace the entire type definition with the following: <xs:element ref="xml:lang" use="required">. You can change the value of the "use" attribute depending on your needs, but the key is that you are using "ref" instead of "name", and you have specified the "xml" prefix.

In the XML editor, you will get a blue underline with the message that the attribute is not defined, but this is because the editor can't get to the schema since it's referenced in a remote assembly. Now, if I validate a sample document against the schema, it works perfectly. When I deploy this project to BizTalk, it will automatically reference the BTS.xml schema that has already been deployed.


Arun said...

Hi Author
Its a very good article however i am facing the same issue in BizTalk Server 2006.

Any solution for the same problem in BTS2006?

Kindly help


Nick Walker said...

Hi Arun,

The article was about my experience with this issue in BizTalk 2006 as well.

After trying various workarounds, I ended up using the solution I posted in the update at the top. Don't use the "xml" prefix in your schemas at all, and use a custom pipeline component to add or remove the xml prefix before/after any validation is performed.

Arun said...

Hi Nick Walker
I am so happy to see your reply.
Very kind of you. I am new to this BizTalk development and if you can provide your email-id, i will send the snapshot of my error code.

1. Is it possible to download the BTS.xml? As you said in your article i cant add reference to BTS.xml because its BizTalk Server 2006

Thanks a ton