Monday, March 2, 2009

Using FOR XML and OPEN XML in MS SQL Server

Generating XML from Tables

When working with XML files, importing and exporting data from and into XML file, we 'FOR XML' clause made available by MS SQL server can come in very handy. This clause make it possible to completely eliminate the need of conversion code to represent data in XML or read data from an XML file into database. Since we use XML to represent our data and interact with our modules, it makes a lot of sense to get familiar with this clause.

Let's start with creating the wordlist table which has the most frequent search terms and its language-

ID

Words

Frequency

Language

1

hello

87

English

2

bonjour

59

French

3

Bye

56

English

4

hola

44

Spanish

5

gibberish

5

NULL

Generate flat file from database tables

Often we need to create a delimited list of data and when the data is in the database-

SELECT words +','+convert(varchar, frequency)+','+ISNULL(language,'')+'|'
FROM wordList
ORDERBY frequency DESC
FOR XML PATH('')

Which will generate a list of pipe separated words which can further be used in a SQL query or be saved in a table to be used by another process.

hello,87,English|bonjour,59,French|Bye,56,English|hola,44,Spanish|gibberish,5,|

Generating XML from table records.

Now let's generate the following XML file from the above data.

<Term id="1">
<Word xml:language="English">hello</Word>
<frequency>87</frequency>
</Term >
<Term id="2">
<Word xml:language="French">bonjour</Word>
<frequency>59</frequency>
</Term>

Notice how the elements and attributes are specified

SELECT TOP 2 ID AS'@id', language 'Word/@xml:language', words 'Word', frequency 'frequency'
FROM wordList
ORDER BY frequency DESC
FOR XML PATH('Term')

Adding namespace and root element

WITH XMLNAMESPACES allows you to provide namespace in the XML generated by the FOR XML clause and ROOT keyword will allow us to add a root tag to the generated XML as below.

WITH XMLNAMESPACES( 'uri1'as ns1, DEFAULT'uri')
SELECT TOP 2 ID as'@id', language 'ns1:Word/@xml:language', words 'ns1:Word', frequency 'frequency'
FROM wordList
ORDER BY frequency DESC
FOR XML PATH('Term'), ROOT('WordFrequencyList')

<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>

Using XSINIL directive for null values

With XSINIL directive we can add all the null values to the elements with xsi:nil directive set to true.

SELECT ID as '@id', words 'Word', frequency 'Frequency',language
FROM wordList
WHERE ID ='5' --IGNORE: Added to get only one result in the output.
ORDER BY frequency DESC
FOR XML PATH('Term'), ROOT('WordFrequencyList'), ELEMENTS XSINIL

<WordFrequencyList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Term id="5">
<Word>gibberish</Word>
<Frequency>5</Frequency>
<language xsi:nil="true" />
</Term>
</WordFrequencyList>

Now that I have your attention on the power FOR XML clause provides while working with XML documents, let us understand the FOR XML clause in further detail

FOR XML clause has 4 modes-


RAW – With RAW mode each not NULL column value in the rowset is mapped to an XML attribute value of <row> element.

SELECT ID as
'@id', words 'Word', frequency 'Frequency',
language

FROM wordList
ORDER BY frequency DESC
FOR XML RAW

<row_x0040_id="1"Word="hello"Frequency="87"language="English" />
<row_x0040_id="2"Word="bonjour"Frequency="59"language="French" />
<row_x0040_id="3"Word="Bye"Frequency="56"language="English" />
<row_x0040_id="4"Word="hola"Frequency="44"language="Spanish" />
<row_x0040_id="5"Word="gibberish"Frequency="5" />


AUTO– With AUTO mode the query result forms simple nested XML hierarchy. It does not allow you to control the hierarchy. It uses heuristics in shaping the XML which leads to nested XML for nested queries.


SELECT ID as
'@id', words 'Word', frequency 'Frequency',language

FROM wordList

ORDER BY frequency DESC
FOR XML RAW

<wordList_x0040_id="1"Word="hello"Frequency="87"language="English" />
<wordList_x0040_id="2"Word="bonjour"Frequency="59"language="French" />
<wordList_x0040_id="3"Word="Bye"Frequency="56"language="English" />
<wordList_x0040_id="4"Word="hola"Frequency="44"language="Spanish" />
<wordList_x0040_id="5"Word="gibberish"Frequency="5" />

PATH – With PATH mode we can construct nested XML and can control the mix of elements and attributes. This mode considers column names (or aliases) as XPATH expressions. Examples for PATH mode is discussed in the first section of this document above.

EXPLICIT – With EXPLICIT mode the table must be in a specific format which defines the XML hierarchy. It expects the TAG and PARENT column to appear first. The EXPLICIT mode queries can get a bit cumbersome so why not stay with PATH mode which gives us enough flexibility to generate XML with nested hierarchy.

Generating Tables from XML documents

The OPENXML, a SQL keyword, generates a dynamic in-memory table representation of the XML document where the OPENXML can be used in SELECT and SELECT INTO statements. Stored procedure, sp_xml_prepareddocument, should be called before OPENXML, as this stored procedure parses the XML document, generates in-memory (DOM) representation of the document and provides OPENXML with the handle to generate the tabular representation from the XML Dom structure.

Generating Table from XML Document Process Flow


Do not forget to call sp_xml_removedocument stored procedure in the end to free the memory held by in-memory DOM structure.

The Process of converting XML document into table

To understand the generation of table structure from an XML document let us take the following XML document as our sample.
<WordFrequencyList>
<Term id="1">
<Word xml:language="English">hello</Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<Word xml:language="French">bonjour</Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>
We can pass this XML document as an argument to our stored procedure or read it directly as a file. For the sake of understanding the concept we are discussing I will define the XML in our stored procedure itself.


DECLARE @hxmldoc int
DECLARE @xmldoc varchar(500)
SET @xmldoc ='
<WordFrequencyList>
<Term id="1">
<Word xml:language="English">hello</Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<Word xml:language="French">bonjour</Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>'
--CALL to sp_xml_preparedocument for generating the in-memory DOM structure
EXEC sp_xml_preparedocument @hxmldoc OUTPUT, @xmldoc
--CALL to OPENXML in the SELECT INTO SQL query
SELECT*
INTO WordFreq
FROM OPENXML(@hxmldoc, '/WordFrequencyList/Term', 1)
WITH (
ID int '@id',
words nvarchar(50) 'Word',
frequency int 'Frequency',
language nvarchar(50) 'Word/@xml:language'
)
EXECsp_xml_removedocument @hxmldoc


Result-
IDWordsFrequencyLanguage
1hello87English
2bonjour59French

sp_xml_preparedocument takes two arguments, first is the handle to the XML in-memory (DOM) structure which is returned by the procedure once it is executed and second is the XML document itself. We pass this XML document handle to the OPEN XML call to generate table records.

XML Doc with Namespaces

To read XML docs with namespaces there is a third parameter in the sp_xml_preparedocument stored procedure which can be defined. Let us understand this in detail.


<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Termid="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>

In the XML document above, the WordFrequencyList, Term,and Frequency elements belongs to a default namespace defined by uri and element Word belongs to namespace defined by uri1.


If we feed this XML to our OPENXML call above we would get an empty table. To address this issue due to namespace, we need to pass the namespace declaration to the sp_xml_preparedocument call.



DECLARE @hxmldoc int
DECLARE @xmldoc varchar(500)
DECLARE @xmlns varchar(500)
SET @xmldoc ='
<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>'
SET @xmlns ='<root xmlns:wf="uri" xmlns:wf1="uri1" />'
--CALL to sp_xml_preparedocument for generating the in-memory DOM structure
EXEC sp_xml_preparedocument @hxmldoc OUTPUT, @xmldoc, @xmlns
--CALL to OPENXML in the SELECT INTO SQL query
SELECT*
INTO WordFreq
FROM OPENXML (@hxmldoc,'/wf:WordFrequencyList/wf:Term',1)
WITH (
ID int '@id',
words nvarchar(50) 'wf1:Word',
frequency int 'wf:Frequency',
language nvarchar(50) 'wf1:Word/@xml:language'
)
EXEC sp_xml_removedocument @hxmldoc


Notice the definition of @xmlns variable. We have used prefix ws and ws1 as the prefix as the prefix you use does not have to match the one in the XML document. We then use these prefixes in the OPENXML call to refer to elements in the XML document.

The examples discussed above, pretty much covers most of the requirements of working with XML document conversion. Please refer to the MSDN in case you have specific requirement. Also you can always email me at bhawnablog@gmail.com with your specific questions and I will be more than happy to help you.

No comments:

Post a Comment