During the Digital Commons to Dspace data-migration (documented as Harvesting OAI data to Dspace) there was a namespace issue that stopped the harvest_digital_commons.py script from running.
The command “dcxml = xml_util.xml(dcString)” is designed so that the python code code can access and process the document as xml. The information extracted from the document includes page url’s as well as links to pdf and zip files needed for the data migration to take place.
Problem: The dcString does not contain any namespace declaration at the head of the document because it is a temporary metadata file split from a full digital commons document. And therefore the python was not able to access and process the xml.
Issue: When the above command was issued python was unable to match the dc and xsi elements, attributes etc due to the fact that “dcxml” had no valid DOM.
Solution: The code needed to be upgraded to include both dc and xsi namespaces in the argument list as follows