Introduction to Robots.txt

Robots.txt is a magical file used to control search engine crawlers and robots.

You might be aware of search engine crawlers like Google, Yahoo, MSN etc which tries to index your website pages. These crawlers or robots are useful and they help you get listed in the search results. There are hundreds of other robots which might collect email address from your site ( Spam bots ). These unwanted robots will eat up your site's bandwidth. As a site owner you need to decide which robots are allowed to crawl your site and what they are allowed to do.
"robots.txt" is a regular text file where you can add a few set of rules to instruct robots not crawl and index certain files, directories within your site.
 
  Ads
 

Creating your "robots.txt" file

 

Create a regular text file called "robots.txt", and make sure it's named exactly "robots.txt". This file must be uploaded to the root directory of your site.
(ie: http://www.yoursite.com but NOT http://www.yoursite.com/inner-directory/).

Now that you have learned to create and where to upload the robots.txt file, lets learn the format of robots,txt file ( the exact text to be added in the robots.txt file ).

Examples of Robots.txt :

Example: 1

Disallow all robots on your website. This is not what you want, but will give you an idea.

User-agent: *
Disallow: /

Example: 2

You may not want Google's Image bot crawling your site's images and making them searchable online. To restrict Google's Image bot add the below given declaration to your robots.txt file.

User-agent: Googlebot-Image
Disallow: /

Example: 3

To disallows all search engines and robots from crawling select directories or pages, use the following declaration.

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /admin/test.html

Example: 4

Targetting multiple robots

User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/

In the above example we declare that crawlers in general should not crawl any parts of our site. Then we have allwoed Google to crawl the entire site apart from /cgi-bin/ directory.

Example: 5

Per Google's FAQs for webmasters, the below is the preferred way to disallow all crawlers from your site EXCEPT Google:

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

 
<< Previous Next >>
 
 

Sponsors

• Your website link here
• Your website link here
• Your website link here
Advertise Here
 
Ads
 

Services

• Link Popularity Development
• Directory Submissions
• On-page Optimization
• Article Writing and Submissions
 

SEO Articles

• Importance of Sitemap
• Introduction to robots.txt
• Benefits of Search Engine Optimization
More..
 

Free SEO Tools

• Back Link Checker
• Google Page Rank Checker
 
 
Our Network Sites
Linux and Web Hosting Tutorials
Free Tutorials & Articles for Linux, Web Designing,
Hosting and various other curious topics!!

www.TechCuriosity.com
Reliable Hosting Solutions
Reliable Web Designing, Development
& Hosting Service. Instant setups!!

www.HostingSolutions4u.com

Lovely Wallpapers
Free Lovely Wallpapers
for Desktop and Mobile!!

www.LovelyWallpapers4u.com
 
Privacy Statement | Link to Us   © LinkExchange4seo.com. All rights reserved
Powered by www.HostingSolutions4u.com