[MLton] GNU configure and autoconf.
Wesley W. Terpstra
terpstra@gkec.tu-darmstadt.de
Tue, 11 Jan 2005 15:02:14 +0100
On Tue, Jan 11, 2005 at 07:38:21AM -0600, Henry Cejtin wrote:
> Has any of the URL's moved around? I know that some pages Google doesn't
> crawl more than once every month or even a bit longer. Could that account
> for the uncrawled mail? Also, although I doubt that this is the problem,
> is your robots.txt ok?
I don't claim to understand the mystery that is google's crawler, but I
think it might not crawl pages arbitrarily deep, especially if those URLs
are similar. I suspect this because I know that when google is pointed at a
lurker archive, it only 'discovers' the first few thousand or so emails.
Though I am only theorizing, this behaviour makes sense if you think about
it. Google's crawler has to avoid traversing infinitely deep dynamic
content. Also, google only indexes 8 billion pages. Some sites have more
content than this all by themselves; google shouldn't spend all its
resources indexing just those sites.
--
Wesley W. Terpstra