CurvedSpace Forums: Googlebot Learnred - CurvedSpace Forums

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • {lang:pm_locked} This topic is locked

Googlebot Learnred

#1 {lang:macro__useroffline}   Superbuu91 {lang:icon}

  • Advanced Member
  • Icon
  • Group: Member
  • Posts: 496
  • Joined: 30-November 03

Posted 19 January 2004 - 02:22 AM

QUOTE
Googlebot: Google's Web Crawler
Googlebot is Google's web-crawling robot. It collects documents from the web to build a searchable index for the Google search engine. On this page, you'll find answers to the most commonly asked questions about how our web crawler works.

As one of the premier search sites on the web, Google focuses on providing the highest-quality search results for our own users and for corporate partners such as Yahoo!, Netscape, and the Washington Post. You can learn more about Google via our website, or you can try a Google search right here:

   

Frequently Asked Questions
How often will Googlebot access my web pages?
How do I request that Google not crawl parts or all of my site?
Why is Googlebot asking for a file called robots.txt which isn't on my server?
Why is Googlebot trying to download incorrect links from my server? Or from a server that doesn't exist?
Why is Googlebot downloading information from our "secret" web server?
Why isn't Googlebot obeying my robots.txt file?
How do I register my site with Googlebot so it will be indexed?
Why are there hits from multiple machines at Google.com all with user-agent Googlebot?
How can I prevent Googlebot from following links from a particular page or archiving a copy of a page?
Why is Googlebot downloading the same page on my site multiple times?
Why don't the pages that Googlebot crawled on my site show up in your index?
What kinds of links does Googlebot follow?
My Googlebot question is not answered here. Where do I send my question?
Answers
How often will Googlebot access my web pages?

For most sites, Googlebot should not access your site more than once every few seconds on average. Since network delays are involved it is possible over short periods the rate will appear to be slightly higher. If you find that we are placing too high a load on your site, please let us know by sending us e-mail at googlebot@google.com.

How do I request Google to not crawl parts or all of my site?

robots.txt is a standard document that can tell Googlebot not to download some or all information from your web server. The format of the robots.txt file is specified in the Robot Exclusion Standard. When deciding which pages to crawl on a particular host, Googlebot will obey the first record in the robots.txt file with a User-Agent starting with "googlebot". If no such entry exists, it will obey the first entry with a User-Agent of "*".

There is a standard for robot exclusion at http://www.robotstxt....html#robotstxt. You can put a file on your server called robots.txt that can exclude Googlebot or other "web crawlers." Googlebot has a user-agent of "Googlebot". There is another standard for telling robots not to index a web page or follow links on it, which may be more helpful in some cases, since it can be used more conveniently on a page-by-page basis. It involves placing a "META" element into a page of HTML, and is described here; you can also read what the HTML standard has to say about these tags. Remember, changing your server's robots.txt file or changing the "META" elements on its pages will not cause an immediate change in what results Google returns. It is likely that it will take a while for any changes you make to propagate to Google's next index of the web.

Why is Googlebot asking for a file called robots.txt which isn't on my server?

robots.txt is a standard document that can tell Googlebot not to download some or all information from your web server. For information on how to create a robots.txt file, see The Robot Exclusion Standard. If you just want to prevent the "file not found" error messages in your webserver log, create an empty file named robots.txt.

Why is Googlebot trying to download incorrect links from my server? Or from a server that doesn't exist?

It is a property of the web that many links will be broken or outdated at any given time. Whenever anyone types a link incorrectly that points to your site, or fails to update their pages to reflect changes in your server, Googlebot will try to download an incorrect link from your site. Also, this is why you may get hits on a machine that is not even a web server.

Why is Googlebot downloading information from our "secret" web server?

It is almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, it is likely that your "secret" URL is in the referer tag, and it can be stored and possibly published by the other web server in its referer log. So, if there is a link to your "secret" web server or page on the web anywhere, it is likely that Googlebot and other "web crawlers" will find it.

Why isn't Googlebot obeying my robots.txt file?

In order to save bandwidth Googlebot only downloads the robots.txt file once a day or whenever we have fetched many pages from the server. So, it may take a while for Googlebot to learn of any changes that might have been made to your robots.txt file. Also, check that your syntax is correct against the standard at http://www.robotstxt....html#robotstxt. A common source of problems is that the robots.txt file must be placed in the top directory of the server (e.g., www.myhost.com/robots.txt); placing the file in any subdirectory will not have any effect. If there still seems to be a problem, please let us know, and we will correct it. For more info, see the Robots FAQ.

How do I register my site with Googlebot so it will be indexed?

See the Add URL form.

Why are there hits from multiple machines at Google.com all with user-agent Googlebot?

Googlebot was designed to be distributed on several machines to improve performance and scale as the web grows. Also, to cut down on bandwidth usage we would like to run many crawlers which run on machines close to the sites they are indexing in the network.

How can I prevent Googlebot from following links from a particular page or archiving a copy of a page?

Googlebot obeys the noindex, nofollow, and noarchive meta-tags. If you place these tags in the head of your HTML document, you can cause Google to not index, not follow, and/or not archive particular documents on your site. The tags to include and their effects are:


<META NAME="robots" CONTENT="noindex"> Googlebot will retrieve the document, but it will not index the document.
<META NAME="robots" CONTENT="nofollow"> Googlebot will not follow any links that are present on the page to other documents.
<META NAME="robots" CONTENT="noarchive"> Google maintains a cache of all the documents that we fetch, to permit our users to access the content that we indexed (in the event that the original host of the content is inaccessible, or the content has changed). If you do not wish us to archive a document from your site, you can place this tag in the head of the document, and Google will not provide an archive copy for the document.



The "robots" tag is obeyed by many different web robots. If you'd like to specify some of these restrictions just for googlebot, you may use "googlebot" in place of "robots". You can also combine any or all of these tags into a single meta tag. For example:

        <META NAME="robots" CONTENT="noarchive,nofollow">  -- or --
        <META NAME="googlebot" CONTENT="noarchive,nofollow">

Why is Googlebot downloading the same page on my site multiple times?

In general, Googlebot should only download one copy of each file from your site during a given crawl. Occasionally the crawler is stopped and restarted, and it may recrawl pages that it has recently retrieved. These recrawls should happen infrequently.

Why don't the pages that Googlebot crawled on my site show up in your index?

Don't be alarmed if you can't find documents that Googlebot has crawled from your site in the Google search engine immediately. The documents will be indexed and entered into the search database soon after being crawled. Occasionally, documents fetched by Googlebot will end up not being included in the index, for a variety of reasons (e.g. they appear to be duplicates of other pages on the web, etc.)

What kinds of links does Googlebot follow?

Googlebot follows HREF links and SRC links.

My Googlebot question is not answered here. Where do I send my question?

Please send questions regarding our Googlebot technology to googlebot@google.com.


I was bored so i searched for googlebot on google. Well i hope this answers some questions biglaugh.gif
0

#2 {lang:macro__useroffline}   cheetahx6 {lang:icon}

  • Advanced Member
  • Icon
  • Group: New Member
  • Posts: 380
  • Joined: 17-December 03
  • Location:United States,NJ

Posted 19 January 2004 - 03:13 AM

????? What is this about????? explain
Here is my Darkstorm stuff
name Ian
Class warrior
Race Orge
Sex male
Stats 350 str 20 points 15 for being orge
420 hp 0 points and 14 from weopons
Hometown streets left as an orphan on a road and that all the child hood he can remember
Apperance . phsycally stong looking. always 2 swords one is called Mithila and the other is called Tithila. He all so has a bow and some arrows which are on his back
Ablities
- Block (0): EC=0; DP=0: Will prevent the warrior from taking damage from the next physical attack used against him/her. Not effective against magic, and will remain in effect as long as the warrior does not move and does not use any abilities (or attack).

- Ambidexterity (0): EC=0; DP=0: Can hold two weapons at the same time (as long as the weapon does not require both hands).

- Fire Bow (150): EC=0; DP=5: Fires an arrow which can do damage to anyone in your region (must post in targeted location if applicable). Must have a bow in your possession.
0

#3 {lang:macro__useroffline}   Superbuu91 {lang:icon}

  • Advanced Member
  • Icon
  • Group: Member
  • Posts: 496
  • Joined: 30-November 03

Posted 19 January 2004 - 02:26 PM

I searched for the googlebot on google.com and thats what it said. You know the guy at the bottom of the forums
0

#4 {lang:macro__useroffline}   Kowboy {lang:icon}

  • 05.Banshee.SE
  • Icon
  • Group: New Member
  • Posts: 2,330
  • Joined: 26-July 03
  • Location:United States,NJ

Posted 19 January 2004 - 02:56 PM

Heh, thats cool. Now we know how to throw the googlebot into an endless loop. biglaugh.gif
0

Page 1 of 1
  • You cannot start a new topic
  • {lang:pm_locked} This topic is locked

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users