www.Inmagic.com    Inmagic Forums    Inmagic Forums  Hop To Forum Categories  Gatherer    Spider Log File
Go
New
Find
Notify
Tools
Reply
  
-star Rating Rate It!  Login/Join 
<Amanda>
Posted
Each time the spider runs on our intranet it manages to report errors to pages that haven't existed for several years. I manually removed them from the textbase.

The consistent message is Spider error - HTTP error 404 File not found, URL:

If I untick the re-spider documents already catalogued it doesnt appear to pick up any new pages that are added.

Ticking the box means a massive log file.

If these files aren't in the textbase and they aren't on the server and there are no links to them, (I know this becuuase we moved from .htm to .shtm extensions on everything), why is the spider still reporting these broken links?
 
Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
Because they're still in the NAVSDB.* files, which is where the pages are "already catalogued". The Spider doesn't read your textbase to find out what "already catalogued" pages to spider, it reads these NAVSDB.* files.

You can start your spider from scratch by deleting these files, but you'll need to make sure your Initial URL List is right, or the spider won't crawl anything.
 
Posts: 1920 | Location: Woburn, MA, USA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
What do you mean by making sure the initial URL list is right? How can I test this?
 
Posts: 3 | Location: England | Registered: Wed September 17 2003Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
That the Initial URL List contains the list of URLs from which you want to start spidering.
 
Posts: 1920 | Location: Woburn, MA, USA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
  Powered by Eve Community  
 

www.Inmagic.com    Inmagic Forums    Inmagic Forums  Hop To Forum Categories  Gatherer    Spider Log File