How to create
and use the robots.txt file & robots meta
tags-
Robots Text Configuration -
The robots.txt file is a means to keep search
engines out of a site or specific sections of
a site. This is useful if there are sections of
the site one would rather not have indexed (though
secure password protection is much more reliable).
Often times, as a site is being redesigned, developers
will implement a robots.txt to keep engines out
during development. The problem is, sometimes
this is forgotten about and not removed when the
site goes live. If a site is not getting indexed
check for a robots.txt file and make sure the
engines are being allowed in. A site without a
robots txt file will be (by default) indexable
by engines when discovered.
Using robots tags on individual pages -
A variation on a robots txt file is a Meta robots
tag, where parameters are determined on each page
uniquely.
A basic Meta robots tag is placed within the <head>
part of the pages actual source code and would
read:
<meta name="Robots" content="NOINDEX
,NOFOLLOW" />
In this instance it is telling all search engines
to not index the content of, or even follow links
on this page.
Should there be particular pages one would NOT
want to be indexed by the engines; it is safest
to put the Meta Robots tag on the page itself
also. However, when both the Robots.txt file on
the server and the Meta robots tag clash, the
latter is overwritten.
Using sitewide robots txt files -
Like htaccess
files, using notepad to create the file
is usually the best bet.
To generate a specific robots txt file, you can
also view Microsoft's guidelines at http://support.microsoft.com/default.aspx?scid=KB;en-us;q217103
Once the file created, upload to the root of
your server (usually same folder as your homepage).
For other site examples -
Currently many sites show theirs at nameofsite.com/robots.txt
- such as even google.com/robots.txt
Which lists a lot of folders that they dont want
their (or anyone elses engine) indexing.
the first line reads - User-agent: * - which
is telling all search engine spiders (using *
as "user agent" means "all"),
to allow or disallow go on pages within the list
of folders (regardless of what the pages themselves
may have on their coding).
The safest way to block content is to set the
robots file before the page goes live (as once
it gets indexed, it may still linger around in
data centres even after you block it). And ideally,
one should insert a line for each page that they
want a rule to apply for.
Example:
User-agent: *
Disallow: /iwanttoblockthispage.html
Disallow: /privatefolder/
However, be careful not to put -
User-agent: *
Disallow: /
Because this snippet of coding above will tell
the robots to not index any part of the site.
Lastly, you can set the robots.txt file to block
certain engines, or even certain file types, for
instance you may want the engine not to index
any of your sites PowerPoint presentations.
Disallow: /*.ppt$ #
disallow access to PowerPoint Presentations
The same works for all other common formats too.
Find out this and other ways to make your site
index well on search engines via the services
we provide at evolution search marketing. Contact
us with any questions you may have.
Tag this page and use it for resource. share
with your friends on....
del.icio.us
Digg
Newsvine
Reddit
MyYahoo!
Facebook
or Sphinn
it for the internet marketing community
|