Posted on Wednesday, 1st September 2010 by FTS
Search system robots and how they actually work
Records (records) a file /robots.txt
The expanded comments of a format
Each record begins since a line User-Agent in which it is described what or to what retrieval robot this record intends. The next line: Disallow. Here not subject indexings of a way and files are described. EACH record SHOULD have at least these two lines (lines). All remaining lines are options. Record can contain any amount of lines of comments. Each line of the comment should begin with # character. Lines of comments can be placed in the end of lines User-Agent and Disallow. # character in the end of these lines is sometimes added to specify to the retrieval robot that the long line agent_id or path_root is finished. If in line User-Agent it is specified a little agent_id the condition path_root in line Disallow will be fulfilled for all equally. Restrictions on length of lines User-Agent and Disallow are not present. If the retrieval robot has not found out in a file /robots.txt the agent_id it ignores /robots.txt.
If not to consider specificity of operation of each retrieval robot is possible to specify exceptions for all robots at once. It is reached by the job of a line User-Agent: *
If the search robot finds out in a file/robots.txt some records with value satisfying it agent_id the robot is free to choose any of them.
Each search robot will define absolute URL for reading from a server with use of records/robots.txt. Header and lower case symbols in path_root MATTER.
Examples:
Example 1:
User-Agent: *
Disallow: /
User-Agent: Lycos
Disallow:/cgi-bin//tmp/
1 file/robots.txt contains in an example two records. The first concerns all search robots and forbids indexing all files. The second concerns search robot Lycos and at indexing of a server by it forbids directories/cgi-bin/and/tmp/, and the others – resolves. Thus the server will be indexed only by system Lycos.
Example 2:
User-Agent: Copernicus Fred
Disallow:
User-Agent: * Rex
Disallow:/t
2 file/robots.txt contains in an example of two records. The first resolves search robots Copernicus and Fred to index the entire server. The second – forbids all and especially robot Rex to index such directories and files, as/tmp/,/tea-time/,/top-cat.txt,/traverse.this etc. It is just a case of the task of a mask for directories and files.
Example 3:
# This is for every spider!
User-Agent: *
# stay away from this
Disallow:/spiders/not/here/#and everything in it
Disallow: # a little nothing
Disallow: #This could be habit forming!
# Don “t comments make code much more readable!!!
In an example 3 is one record. Here all robots are forbidden to index a directory/spiders/not/here/, including such ways and files as/spiders/not/here/really/,/spiders/not/here/yes/even/me.html. However here do not enter/spiders/not/or/spiders/not/her (into directories “/spiders/not /”).
Currently the web technologies are very popular. The web network is not only a place to entertain but also a spot to make money. In spite of the reason, to be presented in the Internet one needs a site. And this is when the question how to make a website arises. Those who are looking for details on how to build a website, are advised to refer to the Internet itself. There are lots of tutorials on how to make a website and related topics.
In any way, it wouldn’t be wise not to take advantage of this opportunity provided to us by modern technologies. Google and other search engines, social networks and forums, blogs – all of them could assist to find information on “make a new website” and similar topics.
Popularity: unranked [?]
Tags: build website, business website, make website, site, website
Posted in Turnkey Websites | Comments (0)



