The two things that drive us crazy, are xml namespaces and unicode. During our most recent programming task that carries out data migration from a Voyager library system to VITAL/Fedora repository we came across some tricky issues with namespaces. These incidents really put a spanner in the works several times, so we thought we would share the knowledge we have gained, and perhaps save others from the frustration we endured.
Basic namespace theory: Name spaces were brought about to ensure that conflicts do not occur between xml documents if they are ever brought together for any form of processing. Namespaces allow the element names in xml documents to be uniquely identified.
Namespace format: xmlns:namespace-prefix=”namespaceURI” for example xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
Breaking down what it means:
The xmlns stands for xml namespace.
The :xsi is stating the xml namespace sttribute and is the prefix combined with the element names in the xml document. Remember that this attribute/prefix can theoretically be anything. They are given logical names by those who create them for obvious reasons, what is important is that all child elements with the same prefix are associated with the namespace defined in the start tag of an element.
The URL “http://www.w3.org/2001/XMLSchema-instance” again can theoretically be anything. The reason that URL’s are used is that they are unique (a very important point when you are trying to uniquely identify something) and that they can be used as a pointer to a web site that may contain information regarding that namespace.
Here is an example of why you would want to uniquely identify your element names. This example is commonly used, but is easy to understand so here goes.
Suppose that you had the following 2 xml documents.
These documents are not able to be combined or processed in any way as they stand, as this would render the data useless. One is describing an HTML table and the other is describing an item of furniture.
Solution: enter namespaces
If we add an xmlns attribute to the table tag <f:table xmlns:f=”http://furniture”> the prefix f now has a qualified name and will no longer conflict with other table elements.
Default namespaces:To save you from using prefixes in all the child elements, you can define a default namespace for an element <table xmlns=”namespaceURI”> to create an xml document like this.
Our issue:We had a document that used default namespaces in the root node of the document. This is the way that the repository needs the information to be structured of records can not be edited once inside (but that is another story).
Having an xmlns=http://etc in the root of the document made it impossible to match elements using x-path commands.
The x-path code we used in the end to get a match is as follows
See x-path syntax at http://www.w3schools.com/xpath/xpath_syntax.asp for more info
<?xml version=”1.0″ ?>
<leader>03244ctm a2200313 a 4500</leader>
<controlfield tag=”008“>060426s2005 xnaa bm 000 0 eng d</controlfield>
<datafield tag=”260“ ind1=”“ ind2=”“>
In order to match the 260 c you would use the following