Monday, March 9, 2009

Creating XML Site Map

Apart from the HTML site map on your web site, which can help the web site visitors and search engine robots in navigating through your web site, you can create a XML Sitemap. The XML Sitemaps are specifically for search engine robots and can be submitted to the particular search engine. A Sitemap lists all the links of your website that you would like to be visited by the search engine robots, specifically helps with dynamic pages, which search engine robots, will have no knowledge of otherwise.

Sitemap

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Required

<url>

Required

<loc>http://www.example.com/index.html</loc>

Required

<lastmod>2005-01-01</lastmod>

Optional

W3C Datetime standard

<changefreq>monthly</changefreq>

Optional

[alwayshourlydailyweeklymonthlyyearlynever]

<priority>0.8</priority>

Optional

Default = 0.5

</url>

</urlset>


All the data in Sitemap must be entity-escaped, UTF-8 encoded. Sitemaps have an upper limit of 50,000URLs and 10MB size per Sitemap. Sample Sitemap.xml.

Location

Sitemap.xml file is usually located under the high-level directory of your website (http://www.yourwebsite.com/Sitemap.xml). This is not a requirement but highly recommended. The location of a Sitemap.xml decides the URLs it can contain in it. So if the Sitemap.xml is located under www.youwebsite.com/product/Sitemap.xml, the Sitemap.xml can only contain URLs for pages under http://www.yourwebsite.com/product/ which also means all the URLs in a Sitemap.xml must be for the same host. You also need to specify path to your Sitemap.xml in robots.txt.

Sitemapindex

Sitemapindex groups multiple Sitemap files together with a Sitemap element entry for each Sitemap file location on your website. There is an upper limit of 1,000 Sitemap per website. A Sitemapindex can only group Sitemap of the same website and as with Sitemap, all the data in Sitemapindex should entity escaped and UTF-8 encoded.

<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Required

     <sitemap>

Required

    <loc>http://www.yourwebsite.com/sitemap1.xml</loc>

Required

       <lastmod>2009-01-31</lastmod>

Optional

W3C Datetime standard

</sitemap>

</sitemapindex>


Submitting Sitemap

1. Through robots.txt

Specify the location of your Sitemap.xml in robots.txt

Sitemap: http://yourwebsite.com/sitemap.xml

2. Thru Search Engine Submission Interface

Most search engine provide interface to submit Sitemap, some also provide tools to generate one for your website.

Google Sitemap Submission interface

Google Sitemap Generator

Yahoo Sitemap Submission interface

3. Via PING URL

<SearchEngineURL>/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml

Google ping URL -www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml

Ask ping URL - http://submissions.ask.com/ping?sitemap=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml


Yahoo ping URL - http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=YahooDemo&url=http%3A%2F%2Fwww.yourwebsite.com%2Fsitemap.xml

Here the SearchEnginerURL is the URL of the search engine you would like to submit the Sitemap to. Once you receive the HTTP 200 response, you know that the search engine received your Sitemap (although it does not guarantee that your site is valid). The ping request can be issued from wget, curl or any other mechanism.


Other Formats of Sitemap

Although the other formats carry limited information about your website, sometimes they can come in handy for the Sitemap submission.

RSS /ATOM Feed – RSS feeds can also be submitted as
Sitemaps. The <link> in the feed is interpreted as the URL to the page and <pubDate> or <modified> field is interpreted as last
modified info by search engine robots.

Text File – A simple text file containing URL to your web pages per line can be submitted as Sitemap.

The text file must be UTF-8 encoded and must not have any comment lines. A text file can have 50,000 URLs and should be no larger than 10MB. The text file can be separated at Sitemap into several text files with list of URLs (less than 50,000) and each file can be submitted separately. The text file must be in the highest level directory of your website.

There are more formats of Sitemap which are accepted by search engines to satisfy different data formats, such as, video sitemap,
mobile sitemap, news sitemap, code search sitemap etc. Also not all search
engines support them. If interested in these sitemap
content, please refer Google Webmaster Help.


Compressing Sitemap

You can compress the Sitemap xml or text file and provide the link to the compressed file in your links or submissions and is accepted per Sitemap standard.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      <sitemap>
          <loc>http://www.yourwebsite.com/sitemap1.xml.gz</loc>
          <lastmod>2009-01-31</lastmod>
      </sitemap>
</sitemapindex>

Sitemap Validation

Schema for validating sitemaps can be downloaded from:

Sitemap Schema:
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd

XML header for referring the xsd will change to-

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9"
url="http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
...
</url>
</urlset>

Sitemap index Schema:http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd

XML header for referring the xsd will change to

<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9"
url="http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
...
</sitemap>
</sitemapindex>

For specific questions related to generating or writing Sitemap for your website, please reach me at bhawnablog@gmail.com

No comments:

Post a Comment