- The size of memory made available to the SQL server defines the lock granularity that the server will pick while processing a transaction.
- The lowest granularity level is row.
- SQL server gets shared locks on data being queried which means all queries can see data, but queries will block writes and writes will block queries, unlike Oracle which uses snapshots for executing queries, so queries will not block writes and writes does not block queries (although writes blocks other writes).
- For updating a single row, SQL server acquires a single lock but if you are updating a huge set of rows, viz. 1000 rows, SQL server might decide on acquiring a page, extent or whole table lock depending on how the data is stored physically. One can control this by specifying ROWLOCK HINT in the update statement. Although tuning the query using HINTs should be done only under expert supervision or by experts.
- SQL server acquires/ chooses Bulk Update lock for Bulk copy operations which improves performance at the cost of concurrency.
- TRANSACTION ISOLATION LEVEL defined can affect the SQL server's choice of deciding on one level of lock over the other.
- SERIALIZABLE is the most restrictive of all the transaction isolation levels (READ COMMITED, READ UNCOMMITED, REPEATABLE READ, SERIALIZABLE). It ensures that each transaction is completely isolated from others.
- By default, SQL Server transactions do not time out, unless LOCK_TIMEOUT is specified.
- SQL Server has deadlock detection and resolution mechanism which picks one of the transaction thread involved in deadlock to roll back. One can control which transaction gets rolled back using SET DEADLOCK_PRIORITY (LOW, NORMAL, HIGH or integer range from -10 to 10, default is NORMAL) statement. The transaction session with lower priority is picked to roll back in deadlock situations. For transaction sessions with same deadlock priority level, the one which is least expensive to roll back is picked and if nothing can be decided for the pick, the transaction to roll back is picked randomly.
Tuesday, March 31, 2009
SQL Server Locking Mechanism Quick Facts
Tuesday, March 24, 2009
Useful /Handy SQL Queries: MS SQL server
Q. Find duplicates in a table
SELECT zip ,
COUNT (zip) AS NumOccurrences
FROM zipcode GROUP BY zip
HAVING (COUNT(zip)> 1 )
Q. Select a row or column value at random
SELECT TOP 1 city
FROM cityAddress ORDER BY NEWID()
Q. List items in one table that are not in the other
(LEFT JOIN)
SELECT customers.*
FROM customers LEFTJOIN orders ON customers.customer_id = orders.customer_id
WHERE orders.customer_id IS NULL
Alternatively,
SELECT customers.*
FROM customers
WHERE customers.customer_id NOT IN(SELECT customer_id FROM orders)
Note: In clauses are slower in execution
Q. List items in one table that are also in another table
(INNER JOIN)
SELECT
DISTINCT customers.*
FROM customers INNER JOIN orders
ON customers.customer_id = orders.customer_id
Alternatively,
SELECT customers.*
FROM customers
WHERE customers.customer_id IN(SELECT customer_id FROM orders)
Note: In clauses are slower in execution
Q. Get Total count of distinct column value
Notice the DISTINCT keyword placement
SELECT COUNT(DISTINCT customer_state) AS total
FROM customers
Q. Copy data from one table into another
INSERT INTO customers(customer_id, customer_name)
SELECT customer_id, customer_name
FROM partnerCustList
Q. Bulk Insert data from one table into another.
The new table will have same structure as the one where the data is copied from with Bulk Insert
SELECT *
INTO customers
FROM partnerCustList
Sunday, March 22, 2009
Java Buffer
A buffer is an object, used to write some primitive type data into or read from. A buffer provides structured access to the data while keeping track for the reading and writing processes. Buffers allow I/O operations on blocks of data instead of working with them byte by byte (stream-oriented) which speeds up the I/O operations.
To understand buffers in depth we need to take a tour to the buffer internals.
Buffer Internals
State Variables
Buffer state variables help in keeping the "internal accounting" for them. With each read/ write operation, buffer's state variable is updated to help buffers manage its resources and help us perform I/O operations in blocks. Buffers has 3 state variables to track its state and the data it holds-
Position – keeps track of how much data was written or read from the buffer i.e, where should the next set of data block we added to the buffer or read from.
Limit – keeps track of how much data is left in the buffer to read from or how much space is left in the buffer to write data into
Capacity – specifies the max amount of data that the buffer can hold.
This brings us to the equation,
position ≤ limit ≤ capacity where none of the state variables can be negative.
Now let us try to visualize these variables. Assuming the capacity of our buffer is 16 bytes shown by dashes below,
State: Empty
position =0 limit, capacity = 16
____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____
State: First write of 8 bytes
position = 8 limit, capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ ____ ____ ____ ____ ____ ____ ____ ____
State: Second write of 4 bytes
position = 12 limit, capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ ____ ____ ____ ____
Now let us flip the buffer to read the data from, flip(), this sets the limit to the current position and resets position to 0.
State: flip()
position = 0 limit = 12 capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ ____ ____ ____ ____
The buffer is now ready to be read the data from,
State: Read 8 bytes
position = 8 limit = 12 capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ ____ ____ ____ ____
The next read statement can read maximum 4 more bytes from our buffer due to the limit set to 12.
State: Read 4 bytes
position, limit = 12 capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1___1_ __1_ __1_ __1_ ___ ___ ___ ___
And finally we clear up our buffer before using it further, clear(), this sets the position to 0 and the limit equal to the buffer capacity.
State: clear()
position =0 limit, capacity = 16
____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____
Accessor Methods
Get (ByteBuffer)
- byte get(); - returns single byte.
- ByteBuffer get( byte dest[] ); - reads a group of bytes into the array dest
- ByteBuffer get( byte dest[], int offset, int length ); - reads a group of bytes into the array dest
- byte get( int index ); - returns a byte of data from the position specified by index
The methods from 1-3 respect the buffer state variables whereas, method 4 does not. So the 4th method ignores the position and limit state variable of the buffer and does not change their values either. Method 4 is referred as an absolute method while other methods are relative. Methods 2 and 3 just return this object on which they were called which allows chaining of the methods when needed.
buffer.get(data).flip();
Put (ByteBuffer)
1. ByteBuffer put( byte b ); - puts one byte in the buffer
2. ByteBuffer put( byte src[] ); - puts an array of bytes in the buffer
3. ByteBuffer put( byte src[], int offset, int length ); - puts an array of bytes in the buffer
4. ByteBuffer put( ByteBuffer src ); - copies data from source buffer into this buffer
5. ByteBuffer put( int index, byte b ); - puts data byte into the position specified by index
Here the method 5 is absolute and all others are relative.
The methods discussed above are all related to ByteBuffer class. Other buffer types have equivalent get() and put() methods dealing with the corresponding primitive type they handle.
ByteBuffer class also has methods to get or put data of specific primitive type both in absolute and relative form.
ByteBuffer Quick Facts
- Buffer allocation automatically empties the ByteBuffer and resets the state variables.
- duplicate and slice methods perform shallow copy of the original ByteBuffer. So anything you do on the returned buffer will affect the original.
Other handy methods
Creating buffers: allocate() and wrap()
Buffers can be created by allocating space for it using method allocate() or by wrapping existing array into a buffer using method wrap().
ByteBuffer buffer = ByteBuffer.allocate(1024);
Allocates 1024 bytes of space for the object buffer.
You can also wrap an array of primitive type into a corresponding buffer.
Byte arr[] = new byte[1024];
Bytebuffer buffer = ByteBuffer.wrap(arr);
Both buffer and arr share the same memory space now.
Direct vs. in-direct ByteBuffer Allocations
Direct ByteBuffer space is allocated in the native OS memory, although java does not guarantee the success. Allocation of direct ByteBuffer in memory is costly but it provides faster I/O.
ByteBuffer byte_buff = ByteBuffer.allocateDirect (2000);
There is no allocateDirect method for other primitive buffer types but we can use ByteBuffer view buffers to read the data in other primitive type while still making use of ByteBuffer's allocateDirect underneath.
ByteBuffer byte_buff = ByteBuffer.allocateDirect (2000);
CharBuffer cbuf = buffer.asCharBuffer();
Slicing buffers: slice()
Creates a sub-buffer out of the original buffer it is called upon and both share the same memory space. Slicing a buffer creates a shallow copy.
ByteBuffer origBuffer = ByteBuffer.allocate(16);
origBuffer.position(4);
origBuffer.limit(12);
ByteBuffer slicedBuffer = origBuffer.slice();
Now if we add 4 to each value in the buffer the above buffer can be represented as
position = 0 position(slicedBuffer) = 4 limit(slicedBuffer) = 12 capacity = 16
__1_ __1_ __1_ __1_ __5_ __5_ __5_ __5_ __5_ __5_ __5_ __5_ ____ ____ ____ ____
This feature allows data abstraction by helping you write functions to work with whole or a slice of buffer data.
Marking the buffer position: mark()
Marks the current position in the buffer such that any subsequent buffer reset() will bring the buffer position to the current mark position instead of setting it to 0.
Rewind Buffer: rewind()
Sets the buffer position to 0 and discards any mark settings
Creating read-only buffers: asReadOnlyBuffer()
ByteBuffer buffer = ByteBuffer.allocate(1024);
ByteBuffer readoonlyBuffer = buffer.asReadOnlyBuffer();
Buffer in Action
Copying data from input stream into buffer and writing the data from the buffer into output stream.
import java.io.*; import java.nio.*; import java.nio.channels.*; public class BufferCopy { public static void main(String[] args) throws IOException { FileInputStream inFile = new FileInputStream(args[0]); FileOutputStream outFile = new FileOutputStream(args[1]); FileChannel inChannel = inFile.getChannel(); FileChannel outChannel = outFile.getChannel(); ByteBuffer buffer = ByteBuffer.allocate(1024*1024); for (; inChannel.read(buffer) != -1; buffer.clear()) { buffer.flip(); while (buffer.hasRemaining()) outChannel.write(buffer); } inChannel.close(); outChannel.close(); } } |
Converting ByteBuffer to CharBuffer
char[] data = "ByteToCharBuffer".toCharArray(); ByteBuffer bb = ByteBuffer.allocate(data.length * 2); CharBuffer cb = bb.asCharBuffer(); cb.put(data); while ((c = cb.getChar()) != 0) System.out.print(c + " "); |
Wrap a char array into a charBuffer
CharBuffer buffer = CharBuffer.allocate(8); |
Converting between string and bytes
// Create the encoder and decoder Charset charset = Charset.forName("ISO-8859-1"); CharsetDecoder decoder = charset.newDecoder(); CharsetEncoder encoder = charset.newEncoder(); try { // Convert string to bytes (ISO-LATIN-1) in ByteBuffer ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("string")); // Convert bytes from ByteBuffer into CharBuffer and then to a string. CharBuffer cbuf = decoder.decode(bbuf); String s = cbuf.toString(); } catch (CharacterCodingException e) { } |
String and byte conversion using the direct allocation for ByteBuffer
// Create a direct ByteBuffer for channeling the data ByteBuffer bytebuf = ByteBuffer.allocateDirect(1024); // Create a non-direct character ByteBuffer CharBuffer charbuf = CharBuffer.allocate(1024); // Convert characters in charbuf to bytebuf encoder.encode(charbuf, bytebuf, false); // flip bytebuf before reading from it bytebuf.flip(); // Convert bytes in bytebuf to charbuf decoder.decode(bytebuf, charbuf, false); // flip charbuf before reading from it charbuf.flip(); |
Wednesday, March 11, 2009
Keeping search robots away!
At times, you would want few of your web pages to not be visible in the search engine result page (SERP). The reason being it is under construction; or it's a semi-private page, something you would like to share with smaller community; or any other. Here I have discussed ways to keep the search engine robots or crawlers from visiting the page or a link on a page supported by Robots Exclusion Protocol (REP).
To block the search engine robot from indexing a particular page, use the mate tag robots with the content value noindex.
<meta name="robots" content="noindex" />
Alternatively, if you'd like the web page to be indexed but suggest the search robot to not follow any of the links on the page, use the nofollow content value.
<meta name="robots" content="nofollow" />
More Content values for robots Meta tag -
Content Value | Description | Supported By |
noindex | Do not index the web page | Google, Yahoo, Ask, MSN Live |
index | Index the web page | |
nofollow | Do not follow/visit any link on the web page | Google, Yahoo, Ask, MSN Live |
follow | Follow all the links on the web page | |
noarchive | Do not cache the web page | Google, Yahoo, Ask, MSN Live |
nosnippet | Do not auto generate the description based on page content | |
noodp | Do not overwrite the description or title tag content from Open Directory project [home page only] | Google, Yahoo, MSN Live |
noydir | Do no overwrite the description or title tag content | Yahoo |
You can also have combinations of content values (of course the combinations should make sense).
<meta name="robots" content="noindex, follow" />
Now if you are dealing with keeping search robots from visiting multiple web pages of your website, you can make use of robots.txt file which is placed in the top-level directory hierarchy of your web site.
Here is the syntax of the robots.txt file-
User-Agent: *
Disallow: /
In the above syntax, User-Agent identifies the search robot and * refers to all search robots. You can also specify the search robot name here to address a particular search engine robot. Refer to the User-Agent of major search engines.
To restrict certain directory of your website -
User-Agent: *
Disallow: /Songs
To restrict particular robot from visiting your web directory
User-Agent: Googlebot/2.1
Disallow: /Songs
If you are addressing multiple search robots in your robots.txt, make sure that the directive for specific User-agent is specified before.
#Disallow Google bot from visiting any webpage/ content under /Songs/private
User-Agent: Googlebot/2.1
Disallow: /Songs/private
#Disallow all other bots from visiting any web page/ content under /Songs
User-Agent: *
Disallow: /Songs
More Robots.txt directives –
Content Value | Description | Supported By |
Disallow | Do not visit specified web page | ALL search robots |
Allow | Allow visiting the particular web page | Google, Yahoo, Ask, MSN Live |
Sitemap | Location of your sitemap Location of sitemap index file can also be included here. | Google, Yahoo, Ask, MSN Live |
Wild card (*/$) | Wildcard * - matches sequence of characters | Google, Yahoo, MSN Live |
Crawl-Delay | Specifies minimum delay between two successive requests made by search robot | Ask |
robots.txt quick Tips
Q. To allow all search engine spiders to index all the files of your website
Your robots.txt file should like below
User-agent: *
Disallow:
Q. To disallow all spiders to index any file
Your robots.txt file should like below
User-agent: *
Disallow: /
Note: Slash '/' here is your root directory and by adding that in your Disallow statement you are restricting spiders from indexing all the files of your website.
For specific questions related to writing robots.txt for your website, please reach me at bhawnablog@gmail.com
Monday, March 9, 2009
Creating XML Site Map
Apart from the HTML site map on your web site, which can help the web site visitors and search engine robots in navigating through your web site, you can create a XML Sitemap. The XML Sitemaps are specifically for search engine robots and can be submitted to the particular search engine. A Sitemap lists all the links of your website that you would like to be visited by the search engine robots, specifically helps with dynamic pages, which search engine robots, will have no knowledge of otherwise.
Sitemap
<?xml version="1.0" encoding="UTF-8"?> | ||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> | Required | |
<url> | Required | |
<loc>http://www.example.com/index.html</loc> | Required | |
<lastmod>2005-01-01</lastmod> | Optional | |
<changefreq>monthly</changefreq> | Optional | [alwayshourlydailyweeklymonthlyyearlynever] |
<priority>0.8</priority> | Optional | Default = 0.5 |
</url> | ||
</urlset> |
All the data in Sitemap must be entity-escaped, UTF-8 encoded. Sitemaps have an upper limit of 50,000URLs and 10MB size per Sitemap. Sample Sitemap.xml.
Location
Sitemap.xml file is usually located under the high-level directory of your website (http://www.yourwebsite.com/Sitemap.xml
). This is not a requirement but highly recommended. The location of a Sitemap.xml decides the URLs it can contain in it. So if the Sitemap.xml is located under www.youwebsite.com/product/Sitemap.xml, the Sitemap.xml can only contain URLs for pages under http://www.yourwebsite.com/product/
which also means all the URLs in a Sitemap.xml must be for the same host. You also need to specify path to your Sitemap.xml in robots.txt.
Sitemapindex
Sitemapindex groups multiple Sitemap files together with a Sitemap element entry for each Sitemap file location on your website. There is an upper limit of 1,000 Sitemap per website. A Sitemapindex can only group Sitemap of the same website and as with Sitemap, all the data in Sitemapindex should entity escaped and UTF-8 encoded.
<?xml version="1.0" encoding="UTF-8"?> | ||
| Required | |
| Required | |
| Required | |
| Optional | |
</sitemap> | ||
</sitemapindex> |
Submitting Sitemap
1. Through robots.txt
Specify the location of your Sitemap.xml in robots.txt
Sitemap: http://yourwebsite.com/sitemap.xml
2. Thru Search Engine Submission Interface
Most search engine provide interface to submit Sitemap, some also provide tools to generate one for your website.
Google Sitemap Submission interface
Yahoo Sitemap Submission interface
3. Via PING URL
<SearchEngineURL>/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml
Google ping URL -www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml
Ask ping URL - http://submissions.ask.com/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xmlYahoo ping URL - http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=YahooDemo&url=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml
Here the SearchEnginerURL is the URL of the search engine you would like to submit the Sitemap to. Once you receive the HTTP 200 response, you know that the search engine received your Sitemap (although it does not guarantee that your site is valid). The ping request can be issued from wget, curl or any other mechanism.
Other Formats of Sitemap
Although the other formats carry limited information about your website, sometimes they can come in handy for the Sitemap submission.
RSS /ATOM Feed – RSS feeds can also be submitted as
Sitemaps. The <link> in the feed is interpreted as the URL to the page and <pubDate> or <modified> field is interpreted as last
modified info by search engine robots.
Text File – A simple text file containing URL to your web pages per line can be submitted as Sitemap.
The text file must be UTF-8 encoded and must not have any comment lines. A text file can have 50,000 URLs and should be no larger than 10MB. The text file can be separated at Sitemap into several text files with list of URLs (less than 50,000) and each file can be submitted separately. The text file must be in the highest level directory of your website.
There are more formats of Sitemap which are accepted by search engines to satisfy different data formats, such as, video sitemap,
mobile sitemap, news sitemap, code search sitemap etc. Also not all search
engines support them. If interested in these sitemap
content, please refer Google Webmaster Help.
Compressing Sitemap
You can compress the Sitemap xml or text file and provide the link to the compressed file in your links or submissions and is accepted per Sitemap standard.
|
Sitemap Validation
Schema for validating sitemaps can be downloaded from:
Sitemap Schema:
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
XML header for referring the xsd will change to-
|
Sitemap index Schema:http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd
XML header for referring the xsd will change to
|
For specific questions related to generating or writing Sitemap for your website, please reach me at bhawnablog@gmail.com
Sunday, March 8, 2009
Organic SEO Best Practices Checklist
The DOs | Description | TIPs |
Title Tag | <title>keyword in the Title</Title> |
|
Image Tag | <img src="" alt="keyword in the alternate text"/> |
|
Anchors | <a href="link to a related website">keyword</a> |
|
Meta Tags | <meta name="keyword" content="related keyword list"/> <meta name="description" content="short description of your website with few keywords"/> |
|
Header Tags | <h1></h1>, <h2></h2>, <h3></h3>, <h4></h4> |
|
URL | Parameter: http://www.yoursite.com/products.jsp?id=12356&category=7&type=42&size=6&batch=65 Depth: http://www.yoursite.com/products/category/batch/season/item |
|
Inbound Links |
| |
Visible Body Text | <body>body text</body> |
|
Sitemap | XML based document at the root of your website. http://www.yourwebsite.com/sitemap.xml |
|
Navigation |
| |
Domain Name |
| |
Bread-crumb trail | On your web page, provides navigation depth info. Home > kids > Toys > 4T – 5T |
|
FAQs | FAQs of popular searches related to your product |
|
Popular Search List | Maintain a list of popular /most frequently searched items/ keywords related to your product /website on very web page. |
|
Robots.txt | Learn more about robots.txt |
|
Company Address | Company Name, Street Address, City, State, Zip, Country, Phone# |
|
Contact Statement | If you need more info, please contact …… |
|
The DON'Ts | The Fix | |
Duplicate URLs |
| |
Broken Links |
| |
Cookies |
| |
Session Ids |
| |
Frames |
| |
302 redirect |
|
Monday, March 2, 2009
Using FOR XML and OPEN XML in MS SQL Server
Generating XML from Tables
When working with XML files, importing and exporting data from and into XML file, we 'FOR XML' clause made available by MS SQL server can come in very handy. This clause make it possible to completely eliminate the need of conversion code to represent data in XML or read data from an XML file into database. Since we use XML to represent our data and interact with our modules, it makes a lot of sense to get familiar with this clause.
Let's start with creating the wordlist table which has the most frequent search terms and its language-
ID | Words | Frequency | Language |
1 | hello | 87 | English |
2 | bonjour | 59 | French |
3 | Bye | 56 | English |
4 | hola | 44 | Spanish |
5 | gibberish | 5 | NULL |
Generate flat file from database tables
Often we need to create a delimited list of data and when the data is in the database-
SELECT words +','+convert(varchar, frequency)+','+ISNULL(language,'')+'|'
FROM wordList
ORDERBY frequency DESC
FOR XML PATH('')
Which will generate a list of pipe separated words which can further be used in a SQL query or be saved in a table to be used by another process.
hello,87,English|bonjour,59,French|Bye,56,English|hola,44,Spanish|gibberish,5,|
Generating XML from table records.
Now let's generate the following XML file from the above data.
<Term id="1">
<Word xml:language="English">hello</Word>
<frequency>87</frequency>
</Term >
<Term id="2">
<Word xml:language="French">bonjour</Word>
<frequency>59</frequency>
</Term>
Notice how the elements and attributes are specified
SELECT TOP 2 ID AS'@id', language 'Word/@xml:language', words 'Word', frequency 'frequency'
FROM wordList
ORDER BY frequency DESC
FOR XML PATH('Term')
Adding namespace and root element
WITH XMLNAMESPACES allows you to provide namespace in the XML generated by the FOR XML clause and ROOT keyword will allow us to add a root tag to the generated XML as below.
WITH XMLNAMESPACES( 'uri1'as ns1, DEFAULT'uri')
SELECT TOP 2 ID as'@id', language 'ns1:Word/@xml:language', words 'ns1:Word', frequency 'frequency'
FROM wordList
ORDER BY frequency DESC
FOR XML PATH('Term'), ROOT('WordFrequencyList')
<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>
Using XSINIL directive for null values
With XSINIL directive we can add all the null values to the elements with xsi:nil directive set to true.
SELECT ID as '@id', words 'Word', frequency 'Frequency',language
FROM wordList
WHERE ID ='5' --IGNORE: Added to get only one result in the output.
ORDER BY frequency DESC
FOR XML PATH('Term'), ROOT('WordFrequencyList'), ELEMENTS XSINIL
<WordFrequencyList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Term id="5">
<Word>gibberish</Word>
<Frequency>5</Frequency>
<language xsi:nil="true" />
</Term>
</WordFrequencyList>
Now that I have your attention on the power FOR XML clause provides while working with XML documents, let us understand the FOR XML clause in further detail
FOR XML clause has 4 modes-
RAW – With RAW mode each not NULL column value in the rowset is mapped to an XML attribute value of <row> element. |
SELECT ID as | <row_x0040_id="1"Word="hello"Frequency="87"language="English" /> |
AUTO– With AUTO mode the query result forms simple nested XML hierarchy. It does not allow you to control the hierarchy. It uses heuristics in shaping the XML which leads to nested XML for nested queries. |
SELECT ID as | <wordList_x0040_id="1"Word="hello"Frequency="87"language="English" /> <wordList_x0040_id="2"Word="bonjour"Frequency="59"language="French" /> <wordList_x0040_id="3"Word="Bye"Frequency="56"language="English" /> <wordList_x0040_id="4"Word="hola"Frequency="44"language="Spanish" /> <wordList_x0040_id="5"Word="gibberish"Frequency="5" /> |
PATH – With PATH mode we can construct nested XML and can control the mix of elements and attributes. This mode considers column names (or aliases) as XPATH expressions. Examples for PATH mode is discussed in the first section of this document above. |
EXPLICIT – With EXPLICIT mode the table must be in a specific format which defines the XML hierarchy. It expects the TAG and PARENT column to appear first. The EXPLICIT mode queries can get a bit cumbersome so why not stay with PATH mode which gives us enough flexibility to generate XML with nested hierarchy. |
Generating Tables from XML documents
The OPENXML, a SQL keyword, generates a dynamic in-memory table representation of the XML document where the OPENXML can be used in SELECT and SELECT INTO statements. Stored procedure, sp_xml_prepareddocument, should be called before OPENXML, as this stored procedure parses the XML document, generates in-memory (DOM) representation of the document and provides OPENXML with the handle to generate the tabular representation from the XML Dom structure.Do not forget to call sp_xml_removedocument stored procedure in the end to free the memory held by in-memory DOM structure.
The Process of converting XML document into table
<WordFrequencyList>
<Term id="1">
<Word xml:language="English">hello</Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<Word xml:language="French">bonjour</Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>
We can pass this XML document as an argument to our stored procedure or read it directly as a file. For the sake of understanding the concept we are discussing I will define the XML in our stored procedure itself.
DECLARE @hxmldoc int DECLARE @xmldoc varchar(500) SET @xmldoc =' <WordFrequencyList> <Term id="1"> <Word xml:language="English">hello</Word> <Frequency>87</Frequency> </Term> <Term id="2"> <Word xml:language="French">bonjour</Word> <Frequency>59</Frequency> </Term> </WordFrequencyList>' --CALL to sp_xml_preparedocument for generating the in-memory DOM structure EXEC sp_xml_preparedocument @hxmldoc OUTPUT, @xmldoc --CALL to OPENXML in the SELECT INTO SQL query SELECT* INTO WordFreq FROM OPENXML(@hxmldoc, '/WordFrequencyList/Term', 1) WITH ( ID int '@id', words nvarchar(50) 'Word', frequency int 'Frequency', language nvarchar(50) 'Word/@xml:language' ) EXECsp_xml_removedocument @hxmldoc |
Result-
ID | Words | Frequency | Language |
1 | hello | 87 | English |
2 | bonjour | 59 | French |
sp_xml_preparedocument takes two arguments, first is the handle to the XML in-memory (DOM) structure which is returned by the procedure once it is executed and second is the XML document itself. We pass this XML document handle to the OPEN XML call to generate table records.
XML Doc with Namespaces
To read XML docs with namespaces there is a third parameter in the sp_xml_preparedocument stored procedure which can be defined. Let us understand this in detail.
<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Termid="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>
In the XML document above, the WordFrequencyList, Term,and Frequency elements belongs to a default namespace defined by uri and element Word belongs to namespace defined by uri1.
If we feed this XML to our OPENXML call above we would get an empty table. To address this issue due to namespace, we need to pass the namespace declaration to the sp_xml_preparedocument call.
DECLARE @hxmldoc int DECLARE @xmldoc varchar(500) DECLARE @xmlns varchar(500) SET @xmldoc =' <WordFrequencyList xmlns="uri" xmlns:ns1="uri1"> <Term id="1"> <ns1:Word xml:language="English">hello</ns1:Word> <Frequency>87</Frequency> </Term> <Term id="2"> <ns1:Word xml:language="French">bonjour</ns1:Word> <Frequency>59</Frequency> </Term> </WordFrequencyList>' SET @xmlns ='<root xmlns:wf="uri" xmlns:wf1="uri1" />' --CALL to sp_xml_preparedocument for generating the in-memory DOM structure EXEC sp_xml_preparedocument @hxmldoc OUTPUT, @xmldoc, @xmlns --CALL to OPENXML in the SELECT INTO SQL query SELECT* INTO WordFreq FROM OPENXML (@hxmldoc,'/wf:WordFrequencyList/wf:Term',1) WITH ( ID int '@id', words nvarchar(50) 'wf1:Word', frequency int 'wf:Frequency', language nvarchar(50) 'wf1:Word/@xml:language' ) EXEC sp_xml_removedocument @hxmldoc |
Notice the definition of @xmlns variable. We have used prefix ws and ws1 as the prefix as the prefix you use does not have to match the one in the XML document. We then use these prefixes in the OPENXML call to refer to elements in the XML document.
The examples discussed above, pretty much covers most of the requirements of working with XML document conversion. Please refer to the MSDN in case you have specific requirement. Also you can always email me at bhawnablog@gmail.com with your specific questions and I will be more than happy to help you.