Tuesday, March 31, 2009

SQL Server Locking Mechanism Quick Facts

  1. The size of memory made available to the SQL server defines the lock granularity that the server will pick while processing a transaction.
  2. The lowest granularity level is row.
  3. SQL server gets shared locks on data being queried which means all queries can see data, but queries will block writes and writes will block queries, unlike Oracle which uses snapshots for executing queries, so queries will not block writes and writes does not block queries (although writes blocks other writes).
  4. For updating a single row, SQL server acquires a single lock but if you are updating a huge set of rows, viz. 1000 rows, SQL server might decide on acquiring a page, extent or whole table lock depending on how the data is stored physically. One can control this by specifying ROWLOCK HINT in the update statement. Although tuning the query using HINTs should be done only under expert supervision or by experts.
  5. SQL server acquires/ chooses Bulk Update lock for Bulk copy operations which improves performance at the cost of concurrency.
  6. TRANSACTION ISOLATION LEVEL defined can affect the SQL server's choice of deciding on one level of lock over the other.
  7. SERIALIZABLE is the most restrictive of all the transaction isolation levels (READ COMMITED, READ UNCOMMITED, REPEATABLE READ, SERIALIZABLE). It ensures that each transaction is completely isolated from others.
  8. By default, SQL Server transactions do not time out, unless LOCK_TIMEOUT is specified.
  9. SQL Server has deadlock detection and resolution mechanism which picks one of the transaction thread involved in deadlock to roll back. One can control which transaction gets rolled back using SET DEADLOCK_PRIORITY (LOW, NORMAL, HIGH or integer range from -10 to 10, default is NORMAL) statement. The transaction session with lower priority is picked to roll back in deadlock situations. For transaction sessions with same deadlock priority level, the one which is least expensive to roll back is picked and if nothing can be decided for the pick, the transaction to roll back is picked randomly.

Tuesday, March 24, 2009

Useful /Handy SQL Queries: MS SQL server

Q. Find duplicates in a table

SELECT zip ,
COUNT (zip) AS NumOccurrences
FROM zipcode GROUP BY zip
HAVING
(COUNT(zip)> 1 )

Q. Select a row or column value at random

SELECT TOP 1 city
FROM cityAddress ORDER BY NEWID()

Q. List items in one table that are not in the other

(LEFT JOIN)
SELECT customers.*
FROM customers LEFTJOIN orders ON customers.customer_id = orders.customer_id
WHERE orders.customer_id IS NULL

Alternatively,

SELECT customers.*
FROM customers
WHERE customers.customer_id NOT IN(SELECT customer_id FROM orders)

Note: In clauses are slower in execution

Q. List items in one table that are also in another table

(INNER JOIN)

SELECT
DISTINCT customers.*
FROM customers INNER JOIN orders
ON customers.customer_id = orders.customer_id

Alternatively,

SELECT customers.*
FROM customers
WHERE customers.customer_id IN(SELECT customer_id FROM orders)

Note: In clauses are slower in execution

Q. Get Total count of distinct column value

Notice the DISTINCT keyword placement

SELECT COUNT(DISTINCT customer_state) AS total
FROM customers

Q. Copy data from one table into another

INSERT INTO customers(customer_id, customer_name)
SELECT customer_id, customer_name
FROM partnerCustList

Q. Bulk Insert data from one table into another.

The new table will have same structure as the one where the data is copied from with Bulk Insert

SELECT *
INTO customers
FROM partnerCustList

Sunday, March 22, 2009

Java Buffer

A buffer is an object, used to write some primitive type data into or read from. A buffer provides structured access to the data while keeping track for the reading and writing processes. Buffers allow I/O operations on blocks of data instead of working with them byte by byte (stream-oriented) which speeds up the I/O operations.

To understand buffers in depth we need to take a tour to the buffer internals.

Buffer Internals

State Variables

Buffer state variables help in keeping the "internal accounting" for them. With each read/ write operation, buffer's state variable is updated to help buffers manage its resources and help us perform I/O operations in blocks. Buffers has 3 state variables to track its state and the data it holds-

Position – keeps track of how much data was written or read from the buffer i.e, where should the next set of data block we added to the buffer or read from.

Limit – keeps track of how much data is left in the buffer to read from or how much space is left in the buffer to write data into

Capacity – specifies the max amount of data that the buffer can hold.

This brings us to the equation,

position ≤ limit ≤ capacity where none of the state variables can be negative.

Now let us try to visualize these variables. Assuming the capacity of our buffer is 16 bytes shown by dashes below,

State: Empty

position =0                                                                                                                                                       limit, capacity = 16
down arrow____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____down arrow

State: First write of 8 bytes

                                                                                    position = 8                                                                             limit, capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ down arrow____ ____ ____ ____ ____ ____ ____ ____down arrow

State: Second write of 4 bytes

                                                                                                                              position = 12        limit, capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ down arrow ____ ____ ____ ____down arrow

Now let us flip the buffer to read the data from, flip(), this sets the limit to the current position and resets position to 0.

State: flip()

position = 0                                                                                                           limit = 12                            capacity = 16
down arrow__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ down arrow____ ____ ____ ____down arrow

The buffer is now ready to be read the data from,

State: Read 8 bytes

                                                                           position = 8                            limit = 12                                  capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1_ down arrow__1_ __1_ __1_ __1_ down arrow ____ ____ ____ ____down arrow

The next read statement can read maximum 4 more bytes from our buffer due to the limit set to 12.

State: Read 4 bytes

                                                                                                              position, limit = 12                       capacity = 16
__1_ __1_ __1_ __1_ __1_ __1_ __1_ __1___1_ __1_ __1_ __1_ down arrow___ ___ ___ ___down arrow

And finally we clear up our buffer before using it further, clear(), this sets the position to 0 and the limit equal to the buffer capacity.

State: clear()

position =0                                                                                                                             limit, capacity = 16
down arrow____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____down arrow

Accessor Methods

Get (ByteBuffer)
  1. byte get(); - returns single byte.
  2. ByteBuffer get( byte dest[] ); - reads a group of bytes into the array dest
  3. ByteBuffer get( byte dest[], int offset, int length ); - reads a group of bytes into the array dest
  4. byte get( int index ); - returns a byte of data from the position specified by index

The methods from 1-3 respect the buffer state variables whereas, method 4 does not. So the 4th method ignores the position and limit state variable of the buffer and does not change their values either. Method 4 is referred as an absolute method while other methods are relative. Methods 2 and 3 just return this object on which they were called which allows chaining of the methods when needed.

buffer.get(data).flip();

Put (ByteBuffer)

1. ByteBuffer put( byte b ); - puts one byte in the buffer

2. ByteBuffer put( byte src[] ); - puts an array of bytes in the buffer

3. ByteBuffer put( byte src[], int offset, int length ); - puts an array of bytes in the buffer

4. ByteBuffer put( ByteBuffer src ); - copies data from source buffer into this buffer

5. ByteBuffer put( int index, byte b ); - puts data byte into the position specified by index

Here the method 5 is absolute and all others are relative.

The methods discussed above are all related to ByteBuffer class. Other buffer types have equivalent get() and put() methods dealing with the corresponding primitive type they handle.

ByteBuffer class also has methods to get or put data of specific primitive type both in absolute and relative form.

ByteBuffer Quick Facts

  1. Buffer allocation automatically empties the ByteBuffer and resets the state variables.
  2. duplicate and slice methods perform shallow copy of the original ByteBuffer. So anything you do on the returned buffer will affect the original.

Other handy methods

Creating buffers: allocate() and wrap()

Buffers can be created by allocating space for it using method allocate() or by wrapping existing array into a buffer using method wrap().

ByteBuffer buffer = ByteBuffer.allocate(1024);

Allocates 1024 bytes of space for the object buffer.

You can also wrap an array of primitive type into a corresponding buffer.

Byte arr[] = new byte[1024];

Bytebuffer buffer = ByteBuffer.wrap(arr);

Both buffer and arr share the same memory space now.
Direct vs. in-direct ByteBuffer Allocations

Direct ByteBuffer space is allocated in the native OS memory, although java does not guarantee the success. Allocation of direct ByteBuffer in memory is costly but it provides faster I/O.

ByteBuffer byte_buff = ByteBuffer.allocateDirect (2000);

There is no allocateDirect method for other primitive buffer types but we can use ByteBuffer view buffers to read the data in other primitive type while still making use of ByteBuffer's allocateDirect underneath.

ByteBuffer byte_buff = ByteBuffer.allocateDirect (2000);
CharBuffer cbuf = buffer.asCharBuffer();

Slicing buffers: slice()

Creates a sub-buffer out of the original buffer it is called upon and both share the same memory space. Slicing a buffer creates a shallow copy.

ByteBuffer origBuffer = ByteBuffer.allocate(16);

origBuffer.position(4);

origBuffer.limit(12);

ByteBuffer slicedBuffer = origBuffer.slice();

Now if we add 4 to each value in the buffer the above buffer can be represented as

position = 0     position(slicedBuffer) = 4                                  limit(slicedBuffer) = 12                      capacity = 16
down arrow__1_ __1_ __1_ __1_ down arrow __5_ __5_ __5_ __5_ __5_ __5_ __5_ __5_ down arrow ____ ____ ____ ____down arrow

This feature allows data abstraction by helping you write functions to work with whole or a slice of buffer data.

Marking the buffer position: mark()

Marks the current position in the buffer such that any subsequent buffer reset() will bring the buffer position to the current mark position instead of setting it to 0.

Rewind Buffer: rewind()

Sets the buffer position to 0 and discards any mark settings

Creating read-only buffers: asReadOnlyBuffer()

ByteBuffer buffer = ByteBuffer.allocate(1024);
ByteBuffer readoonlyBuffer = buffer.asReadOnlyBuffer();

Buffer in Action

Copying data from input stream into buffer and writing the data from the buffer into output stream.


import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class BufferCopy
{
  public static void main(String[] args) throws IOException
  {
    FileInputStream inFile = new FileInputStream(args[0]);
    FileOutputStream outFile = new FileOutputStream(args[1]);
    FileChannel inChannel = inFile.getChannel();
    FileChannel outChannel = outFile.getChannel();

    ByteBuffer buffer = ByteBuffer.allocate(1024*1024);

    for (; inChannel.read(buffer) != -1; buffer.clear())
    {
      buffer.flip();
      while (buffer.hasRemaining())
       outChannel.write(buffer);
    }
    inChannel.close();
    outChannel.close();
  }
}

Converting ByteBuffer to CharBuffer

char[] data = "ByteToCharBuffer".toCharArray();
ByteBuffer bb = ByteBuffer.allocate(data.length * 2);
CharBuffer cb = bb.asCharBuffer();
cb.put(data);
while ((c = cb.getChar()) != 0)
System.out.print(c + " ");


Wrap a char array into a charBuffer

CharBuffer buffer = CharBuffer.allocate(8);
char[] myBuffer = new char[100];
CharBuffer cb = CharBuffer.wrap(myBuffer);


Converting between string and bytes

// Create the encoder and decoder
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();
try
{
// Convert string to bytes (ISO-LATIN-1) in ByteBuffer
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("string"));

// Convert bytes from ByteBuffer into CharBuffer and then to a string.
CharBuffer cbuf = decoder.decode(bbuf);
String s = cbuf.toString();
}
catch (CharacterCodingException e) {
}

String and byte conversion using the direct allocation for ByteBuffer

// Create a direct ByteBuffer for channeling the data
ByteBuffer bytebuf = ByteBuffer.allocateDirect(1024);
// Create a non-direct character ByteBuffer
CharBuffer charbuf = CharBuffer.allocate(1024);
// Convert characters in charbuf to bytebuf
encoder.encode(charbuf, bytebuf, false);
// flip bytebuf before reading from it
bytebuf.flip();
// Convert bytes in bytebuf to charbuf
decoder.decode(bytebuf, charbuf, false);
// flip charbuf before reading from it
charbuf.flip();

Wednesday, March 11, 2009

Keeping search robots away!

At times, you would want few of your web pages to not be visible in the search engine result page (SERP). The reason being it is under construction; or it's a semi-private page, something you would like to share with smaller community; or any other. Here I have discussed ways to keep the search engine robots or crawlers from visiting the page or a link on a page supported by Robots Exclusion Protocol (REP).

To block the search engine robot from indexing a particular page, use the mate tag robots with the content value noindex.

<meta name="robots" content="noindex" />

Alternatively, if you'd like the web page to be indexed but suggest the search robot to not follow any of the links on the page, use the nofollow content value.

<meta name="robots" content="nofollow" />

More Content values for robots Meta tag -

Content Value

Description

Supported By

noindex

Do not index the web page

Google, Yahoo, Ask, MSN Live

index

Index the web page

 

nofollow

Do not follow/visit any link on the web page

Google, Yahoo, Ask, MSN Live

follow

Follow all the links on the web page

 

noarchive

Do not cache the web page

Google, Yahoo, Ask, MSN Live

nosnippet

Do not auto generate the description based on page content

Google

noodp

Do not overwrite the description or title tag content from Open Directory project [home page only]

Google, Yahoo, MSN Live

noydir

Do no overwrite the description or title tag content

Yahoo

You can also have combinations of content values (of course the combinations should make sense).

<meta name="robots" content="noindex, follow" />

Now if you are dealing with keeping search robots from visiting multiple web pages of your website, you can make use of robots.txt file which is placed in the top-level directory hierarchy of your web site.

Here is the syntax of the robots.txt file-

User-Agent: *
Disallow: /

In the above syntax, User-Agent identifies the search robot and * refers to all search robots. You can also specify the search robot name here to address a particular search engine robot. Refer to the User-Agent of major search engines.

To restrict certain directory of your website -

User-Agent: *
Disallow: /Songs

To restrict particular robot from visiting your web directory

User-Agent: Googlebot/2.1
Disallow: /Songs

If you are addressing multiple search robots in your robots.txt, make sure that the directive for specific User-agent is specified before.

#Disallow Google bot from visiting any webpage/ content under /Songs/private
User-Agent: Googlebot/2.1
Disallow: /Songs/private

#Disallow all other bots from visiting any web page/ content under /Songs
User-Agent: *
Disallow: /Songs

More Robots.txt directives –

Content Value

Description

Supported By

Disallow

Do not visit specified web page

ALL search robots

Allow

Allow visiting the particular web page
Eg, Disallow: /Songs
Allow: /Songs/Favs

Above statement will restrict the search robot from visiting all the directories under Songs other than Favs subfolder

Google, Yahoo, Ask, MSN Live

Sitemap

Location of your sitemap


Sitemap: http://yourwebsite.com/sitemap_location.xml

Location of sitemap index file can also be included here.

Google, Yahoo, Ask, MSN Live

Wild card (*/$)

Wildcard * - matches sequence of characters
Eg, Disallow: /Songs/*personal*
Wildcard $ - matches everything from the end of the URL
Eg, Disallow: /Songs/*.mp3$
Learn more on pattern matching

Google, Yahoo, MSN Live

Crawl-Delay

Specifies minimum delay between two successive requests made by search robot

Ask


robots.txt quick Tips


Q. To allow all search engine spiders to index all the files of your website

Your robots.txt file should like below
User-agent: *
Disallow:

Q. To disallow all spiders to index any file

Your robots.txt file should like below
User-agent: *
Disallow: /

Note: Slash '/' here is your root directory and by adding that in your Disallow statement you are restricting spiders from indexing all the files of your website.

For specific questions related to writing robots.txt for your website, please reach me at bhawnablog@gmail.com

Monday, March 9, 2009

Creating XML Site Map

Apart from the HTML site map on your web site, which can help the web site visitors and search engine robots in navigating through your web site, you can create a XML Sitemap. The XML Sitemaps are specifically for search engine robots and can be submitted to the particular search engine. A Sitemap lists all the links of your website that you would like to be visited by the search engine robots, specifically helps with dynamic pages, which search engine robots, will have no knowledge of otherwise.

Sitemap

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Required

<url>

Required

<loc>http://www.example.com/index.html</loc>

Required

<lastmod>2005-01-01</lastmod>

Optional

W3C Datetime standard

<changefreq>monthly</changefreq>

Optional

[alwayshourlydailyweeklymonthlyyearlynever]

<priority>0.8</priority>

Optional

Default = 0.5

</url>

</urlset>


All the data in Sitemap must be entity-escaped, UTF-8 encoded. Sitemaps have an upper limit of 50,000URLs and 10MB size per Sitemap. Sample Sitemap.xml.

Location

Sitemap.xml file is usually located under the high-level directory of your website (http://www.yourwebsite.com/Sitemap.xml). This is not a requirement but highly recommended. The location of a Sitemap.xml decides the URLs it can contain in it. So if the Sitemap.xml is located under www.youwebsite.com/product/Sitemap.xml, the Sitemap.xml can only contain URLs for pages under http://www.yourwebsite.com/product/ which also means all the URLs in a Sitemap.xml must be for the same host. You also need to specify path to your Sitemap.xml in robots.txt.

Sitemapindex

Sitemapindex groups multiple Sitemap files together with a Sitemap element entry for each Sitemap file location on your website. There is an upper limit of 1,000 Sitemap per website. A Sitemapindex can only group Sitemap of the same website and as with Sitemap, all the data in Sitemapindex should entity escaped and UTF-8 encoded.

<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Required

     <sitemap>

Required

    <loc>http://www.yourwebsite.com/sitemap1.xml</loc>

Required

       <lastmod>2009-01-31</lastmod>

Optional

W3C Datetime standard

</sitemap>

</sitemapindex>


Submitting Sitemap

1. Through robots.txt

Specify the location of your Sitemap.xml in robots.txt

Sitemap: http://yourwebsite.com/sitemap.xml

2. Thru Search Engine Submission Interface

Most search engine provide interface to submit Sitemap, some also provide tools to generate one for your website.

Google Sitemap Submission interface

Google Sitemap Generator

Yahoo Sitemap Submission interface

3. Via PING URL

<SearchEngineURL>/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml

Google ping URL -www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml

Ask ping URL - http://submissions.ask.com/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml


Yahoo ping URL - http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=YahooDemo&url=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml

Here the SearchEnginerURL is the URL of the search engine you would like to submit the Sitemap to. Once you receive the HTTP 200 response, you know that the search engine received your Sitemap (although it does not guarantee that your site is valid). The ping request can be issued from wget, curl or any other mechanism.


Other Formats of Sitemap

Although the other formats carry limited information about your website, sometimes they can come in handy for the Sitemap submission.

RSS /ATOM Feed – RSS feeds can also be submitted as
Sitemaps. The <link> in the feed is interpreted as the URL to the page and <pubDate> or <modified> field is interpreted as last
modified info by search engine robots.

Text File – A simple text file containing URL to your web pages per line can be submitted as Sitemap.

The text file must be UTF-8 encoded and must not have any comment lines. A text file can have 50,000 URLs and should be no larger than 10MB. The text file can be separated at Sitemap into several text files with list of URLs (less than 50,000) and each file can be submitted separately. The text file must be in the highest level directory of your website.

There are more formats of Sitemap which are accepted by search engines to satisfy different data formats, such as, video sitemap,
mobile sitemap, news sitemap, code search sitemap etc. Also not all search
engines support them. If interested in these sitemap
content, please refer Google Webmaster Help.


Compressing Sitemap

You can compress the Sitemap xml or text file and provide the link to the compressed file in your links or submissions and is accepted per Sitemap standard.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      <sitemap>
          <loc>http://www.yourwebsite.com/sitemap1.xml.gz</loc>
          <lastmod>2009-01-31</lastmod>
      </sitemap>
</sitemapindex>

Sitemap Validation

Schema for validating sitemaps can be downloaded from:

Sitemap Schema:
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd

XML header for referring the xsd will change to-

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9"
url="http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
...
</url>
</urlset>

Sitemap index Schema:http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd

XML header for referring the xsd will change to

<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9"
url="http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
...
</sitemap>
</sitemapindex>

For specific questions related to generating or writing Sitemap for your website, please reach me at bhawnablog@gmail.com

Sunday, March 8, 2009

Organic SEO Best Practices Checklist

The DOs

Description

TIPs

Title Tag

<title>keyword in the Title</Title>

  • < 60-65 characters including spaces.
  • First 3 words in any combination will lead to keyword phrase.

Image Tag

<img src="" alt="keyword in the alternate text"/>

  • Alt is another opportunity to add keywords
  • Add only image related text in Alt
  • Create short and meaningful alt text

Anchors

<a href="link to a related website">keyword</a>

  • Text based links
  • As long as it is a important keyword text and the link is relevant, anchor it
  • Avoid broken links
  • Use anchor text for linking to relevant dynamic content (crawlers will favor you)

Meta Tags

<meta name="keyword" content="related keyword list"/>

<meta name="description" content="short description of your website with few keywords"/>

  • <200 characters (description)
  • Do not repeat exact title in description
  • Avoid keyword repetition

Header Tags

<h1></h1>,

<h2></h2>,

<h3></h3>,

<h4></h4>

  • Most important <h1>------> less important <h4>
  • <h1> occurrence – 2 at most

URL

Parameter: http://www.yoursite.com/products.jsp?id=12356&category=7&type=42&size=6&batch=65

Depth: http://www.yoursite.com/products/category/batch/season/item

  • Max parameters: 2


  • Max depth: 4

Inbound Links

  • As many strong inbound links, the better
  • Request links, write articles with link to your site, PR, social network community, paid links

Visible Body Text

<body>body text</body>

  • Use <strong></strong> for relevant keywords
  • Use <em></em> for relevant keywords

Sitemap

XML based document at the root of your website.

http://www.yourwebsite.com/sitemap.xml

Learn more about writing Sitemap

  • < 50,000 URLs and 10MB size per site map
  • <1,000 site maps, per website, if multiple site maps used
  • Submit Site to the search engines

Navigation

  • Text based navigation on the left

Domain Name

  • If new website, try using keyword in domain name to specifically define the content of your website

Bread-crumb trail

On your web page, provides navigation depth info.

Home > kids > Toys > 4T – 5T

  • Use your website name instead of HOME

FAQs

FAQs of popular searches related to your product

  • 4-15 questions
  • 200-800 words for each FAQ.

Popular Search List

Maintain a list of popular /most frequently searched items/ keywords related to your product /website on very web page.

  • The search list should be relevant and should link to pages in your website.

Robots.txt

Learn more about robots.txt

  • Suggest crawlers on pages to crawl and pages to avoid
  • Helps in logging search engine visits

Company Address

Company Name, Street Address, City, State, Zip, Country, Phone#

  • Provides visibility in location search

Contact Statement

If you need more info, please contact ……

  • Instead,

    If you need more info about <yourproductname>, please contact …..


The DON'Ts

The Fix

Duplicate URLs

  • Use canonical link tag in the <head> section of all the duplicate pages to point to the original web page content:

    <link rel="canonical" href=http://yourwebsite.com/product.html"/>

Broken Links

  • Fix them!
  • Add nofollow keyword to the link suggesting the search crawler to not visit the link

Cookies

  • Un restrict cookies
  • Set some default content when cookie is unavailable.

Session Ids

  • Generate a guest user and allow to view the un restricted content

Frames

  • Provide alternative to framed web site using the <noframes> tag.
  • The <noframes> tag content should be exactly the same as frames site content
  • Add link to HOME, with attribute TARGET="_top"

302 redirect

  • Avoid, if possible
  • Use robot.txt to avoid crawling this link, if possible
  • Add > 15-sec delay before redirecting

Monday, March 2, 2009

Using FOR XML and OPEN XML in MS SQL Server

Generating XML from Tables

When working with XML files, importing and exporting data from and into XML file, we 'FOR XML' clause made available by MS SQL server can come in very handy. This clause make it possible to completely eliminate the need of conversion code to represent data in XML or read data from an XML file into database. Since we use XML to represent our data and interact with our modules, it makes a lot of sense to get familiar with this clause.

Let's start with creating the wordlist table which has the most frequent search terms and its language-

ID

Words

Frequency

Language

1

hello

87

English

2

bonjour

59

French

3

Bye

56

English

4

hola

44

Spanish

5

gibberish

5

NULL

Generate flat file from database tables

Often we need to create a delimited list of data and when the data is in the database-

SELECT words +','+convert(varchar, frequency)+','+ISNULL(language,'')+'|'
FROM wordList
ORDERBY frequency DESC
FOR XML PATH('')

Which will generate a list of pipe separated words which can further be used in a SQL query or be saved in a table to be used by another process.

hello,87,English|bonjour,59,French|Bye,56,English|hola,44,Spanish|gibberish,5,|

Generating XML from table records.

Now let's generate the following XML file from the above data.

<Term id="1">
<Word xml:language="English">hello</Word>
<frequency>87</frequency>
</Term >
<Term id="2">
<Word xml:language="French">bonjour</Word>
<frequency>59</frequency>
</Term>

Notice how the elements and attributes are specified

SELECT TOP 2 ID AS'@id', language 'Word/@xml:language', words 'Word', frequency 'frequency'
FROM wordList
ORDER BY frequency DESC
FOR XML PATH('Term')

Adding namespace and root element

WITH XMLNAMESPACES allows you to provide namespace in the XML generated by the FOR XML clause and ROOT keyword will allow us to add a root tag to the generated XML as below.

WITH XMLNAMESPACES( 'uri1'as ns1, DEFAULT'uri')
SELECT TOP 2 ID as'@id', language 'ns1:Word/@xml:language', words 'ns1:Word', frequency 'frequency'
FROM wordList
ORDER BY frequency DESC
FOR XML PATH('Term'), ROOT('WordFrequencyList')

<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>

Using XSINIL directive for null values

With XSINIL directive we can add all the null values to the elements with xsi:nil directive set to true.

SELECT ID as '@id', words 'Word', frequency 'Frequency',language
FROM wordList
WHERE ID ='5' --IGNORE: Added to get only one result in the output.
ORDER BY frequency DESC
FOR XML PATH('Term'), ROOT('WordFrequencyList'), ELEMENTS XSINIL

<WordFrequencyList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Term id="5">
<Word>gibberish</Word>
<Frequency>5</Frequency>
<language xsi:nil="true" />
</Term>
</WordFrequencyList>

Now that I have your attention on the power FOR XML clause provides while working with XML documents, let us understand the FOR XML clause in further detail

FOR XML clause has 4 modes-


RAW – With RAW mode each not NULL column value in the rowset is mapped to an XML attribute value of <row> element.

SELECT ID as
'@id', words 'Word', frequency 'Frequency',
language

FROM wordList
ORDER BY frequency DESC
FOR XML RAW

<row_x0040_id="1"Word="hello"Frequency="87"language="English" />
<row_x0040_id="2"Word="bonjour"Frequency="59"language="French" />
<row_x0040_id="3"Word="Bye"Frequency="56"language="English" />
<row_x0040_id="4"Word="hola"Frequency="44"language="Spanish" />
<row_x0040_id="5"Word="gibberish"Frequency="5" />


AUTO– With AUTO mode the query result forms simple nested XML hierarchy. It does not allow you to control the hierarchy. It uses heuristics in shaping the XML which leads to nested XML for nested queries.


SELECT ID as
'@id', words 'Word', frequency 'Frequency',language

FROM wordList

ORDER BY frequency DESC
FOR XML RAW

<wordList_x0040_id="1"Word="hello"Frequency="87"language="English" />
<wordList_x0040_id="2"Word="bonjour"Frequency="59"language="French" />
<wordList_x0040_id="3"Word="Bye"Frequency="56"language="English" />
<wordList_x0040_id="4"Word="hola"Frequency="44"language="Spanish" />
<wordList_x0040_id="5"Word="gibberish"Frequency="5" />

PATH – With PATH mode we can construct nested XML and can control the mix of elements and attributes. This mode considers column names (or aliases) as XPATH expressions. Examples for PATH mode is discussed in the first section of this document above.

EXPLICIT – With EXPLICIT mode the table must be in a specific format which defines the XML hierarchy. It expects the TAG and PARENT column to appear first. The EXPLICIT mode queries can get a bit cumbersome so why not stay with PATH mode which gives us enough flexibility to generate XML with nested hierarchy.

Generating Tables from XML documents

The OPENXML, a SQL keyword, generates a dynamic in-memory table representation of the XML document where the OPENXML can be used in SELECT and SELECT INTO statements. Stored procedure, sp_xml_prepareddocument, should be called before OPENXML, as this stored procedure parses the XML document, generates in-memory (DOM) representation of the document and provides OPENXML with the handle to generate the tabular representation from the XML Dom structure.

Generating Table from XML Document Process Flow


Do not forget to call sp_xml_removedocument stored procedure in the end to free the memory held by in-memory DOM structure.

The Process of converting XML document into table

To understand the generation of table structure from an XML document let us take the following XML document as our sample.
<WordFrequencyList>
<Term id="1">
<Word xml:language="English">hello</Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<Word xml:language="French">bonjour</Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>
We can pass this XML document as an argument to our stored procedure or read it directly as a file. For the sake of understanding the concept we are discussing I will define the XML in our stored procedure itself.


DECLARE @hxmldoc int
DECLARE @xmldoc varchar(500)
SET @xmldoc ='
<WordFrequencyList>
<Term id="1">
<Word xml:language="English">hello</Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<Word xml:language="French">bonjour</Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>'
--CALL to sp_xml_preparedocument for generating the in-memory DOM structure
EXEC sp_xml_preparedocument @hxmldoc OUTPUT, @xmldoc
--CALL to OPENXML in the SELECT INTO SQL query
SELECT*
INTO WordFreq
FROM OPENXML(@hxmldoc, '/WordFrequencyList/Term', 1)
WITH (
ID int '@id',
words nvarchar(50) 'Word',
frequency int 'Frequency',
language nvarchar(50) 'Word/@xml:language'
)
EXECsp_xml_removedocument @hxmldoc


Result-
IDWordsFrequencyLanguage
1hello87English
2bonjour59French

sp_xml_preparedocument takes two arguments, first is the handle to the XML in-memory (DOM) structure which is returned by the procedure once it is executed and second is the XML document itself. We pass this XML document handle to the OPEN XML call to generate table records.

XML Doc with Namespaces

To read XML docs with namespaces there is a third parameter in the sp_xml_preparedocument stored procedure which can be defined. Let us understand this in detail.


<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Termid="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>

In the XML document above, the WordFrequencyList, Term,and Frequency elements belongs to a default namespace defined by uri and element Word belongs to namespace defined by uri1.


If we feed this XML to our OPENXML call above we would get an empty table. To address this issue due to namespace, we need to pass the namespace declaration to the sp_xml_preparedocument call.



DECLARE @hxmldoc int
DECLARE @xmldoc varchar(500)
DECLARE @xmlns varchar(500)
SET @xmldoc ='
<WordFrequencyList xmlns="uri" xmlns:ns1="uri1">
<Term id="1">
<ns1:Word xml:language="English">hello</ns1:Word>
<Frequency>87</Frequency>
</Term>
<Term id="2">
<ns1:Word xml:language="French">bonjour</ns1:Word>
<Frequency>59</Frequency>
</Term>
</WordFrequencyList>'
SET @xmlns ='<root xmlns:wf="uri" xmlns:wf1="uri1" />'
--CALL to sp_xml_preparedocument for generating the in-memory DOM structure
EXEC sp_xml_preparedocument @hxmldoc OUTPUT, @xmldoc, @xmlns
--CALL to OPENXML in the SELECT INTO SQL query
SELECT*
INTO WordFreq
FROM OPENXML (@hxmldoc,'/wf:WordFrequencyList/wf:Term',1)
WITH (
ID int '@id',
words nvarchar(50) 'wf1:Word',
frequency int 'wf:Frequency',
language nvarchar(50) 'wf1:Word/@xml:language'
)
EXEC sp_xml_removedocument @hxmldoc


Notice the definition of @xmlns variable. We have used prefix ws and ws1 as the prefix as the prefix you use does not have to match the one in the XML document. We then use these prefixes in the OPENXML call to refer to elements in the XML document.

The examples discussed above, pretty much covers most of the requirements of working with XML document conversion. Please refer to the MSDN in case you have specific requirement. Also you can always email me at bhawnablog@gmail.com with your specific questions and I will be more than happy to help you.